[JENKINS-20967] Cloud provisioning called when Jenkins is quieting Down

Type: Bug
Resolution: Unresolved
Priority: Major
Component/s: core
Labels:
None

Similar Issues:
Powered by SuggestiMate

Show

If Jenkins is quieting down and there are builds in the queue, nodes are still provisioned from any clouds.

Ideally, Jenkins would not provision new slaves when it is supposed to be quieting down.

is related to

JENKINS-27565 Nodes can be removed as idle before the assigned tasks have started

Closed

links to

CloudBees-internal issue

Ryan Campbell added a comment - 2013-12-11 17:34

It seems that the root cause is that each label's LoadStatistics are still reporting queue lengths over zero.

So a simple fix would just have hudson.slaves.NodeProvisioner#update not provision if Jenkins.isQuietingDown().

Ryan Campbell added a comment - 2013-12-11 17:34 It seems that the root cause is that each label's LoadStatistics are still reporting queue lengths over zero. So a simple fix would just have hudson.slaves.NodeProvisioner#update not provision if Jenkins.isQuietingDown().

Thomas Suckow added a comment - 2015-02-26 00:18

Changing NodeProvisioner would create a deadlock situation with NonBlockingTasks, such as a Matrix Build. Their slaves may never get created. I think it would be more appropriate to modify the behaviour of countBuildable*() in Queue to only count tasks that are not blocked by shutdown.

I have another pull request manipulating countBuildable, I may make a pull request for this after that one gets accepted.

Thomas Suckow added a comment - 2015-02-26 00:18 Changing NodeProvisioner would create a deadlock situation with NonBlockingTasks, such as a Matrix Build. Their slaves may never get created. I think it would be more appropriate to modify the behaviour of countBuildable*() in Queue to only count tasks that are not blocked by shutdown. I have another pull request manipulating countBuildable, I may make a pull request for this after that one gets accepted.

Jesse Glick added a comment - 2015-03-24 21:21

Better wait for https://github.com/jenkinsci/jenkins/pull/1596.

Jesse Glick added a comment - 2015-03-24 21:21 Better wait for https://github.com/jenkinsci/jenkins/pull/1596 .

Kanstantsin Shautsou added a comment - 2015-05-04 15:21

Any news?

Kanstantsin Shautsou added a comment - 2015-05-04 15:21 Any news?

Kanstantsin Shautsou added a comment - 2015-09-28 00:10

Highlight.

Kanstantsin Shautsou added a comment - 2015-09-28 00:10 Highlight.

Thomas Suckow added a comment - 2015-09-29 21:19

For anyone interested. I had started work on straightening out countBuildable* but a conflicting change made mine unmergable. I don't have the time to look into this in the near future, but my work is still at https://github.com/thomassuckow/jenkins/commits/feature/fix-stuck-queue

Thomas Suckow added a comment - 2015-09-29 21:19 For anyone interested. I had started work on straightening out countBuildable* but a conflicting change made mine unmergable. I don't have the time to look into this in the near future, but my work is still at https://github.com/thomassuckow/jenkins/commits/feature/fix-stuck-queue

Stephen Connolly added a comment - 2018-10-31 09:59

Removing myself as assignee. My current work assignments do not provide sufficient bandwidth to review these issues and in the majority of cases I am only assigned by virtue of being the default assignee. For the credentials-api and scm-api related plugins I have permission to allocate time reviewing changes to these APIs themselves to ensure these APIs remain cohesive, but that can be handled through PR reviews rather than assigning issues in JIRA

Stephen Connolly added a comment - 2018-10-31 09:59 Removing myself as assignee. My current work assignments do not provide sufficient bandwidth to review these issues and in the majority of cases I am only assigned by virtue of being the default assignee. For the credentials-api and scm-api related plugins I have permission to allocate time reviewing changes to these APIs themselves to ensure these APIs remain cohesive, but that can be handled through PR reviews rather than assigning issues in JIRA

pjdarton added a comment - 2018-11-07 12:59

I've just hit this issue in my own working environment... but it's fortunate that I found this issue report as I was thinking of coding a workaround as described in Ryan's initial comment as I hadn't considered Thomas' concerns...

SitRep:
So, back in 2015, Jesse said to wait for PR 1596 - that was merged in early 2016.
Thomas's PR is still readable, but it was closed due to inactivity early this year (2018).
Looking at the history for NodeProvisioner, Stephen wrote most of it - kinda ironic that Stephen un-assigned this only a week ago :-/

TL;DR: That PR needs a lot of tidying up to extract the core intended changes, followed by a review by folks who know this code.

pjdarton added a comment - 2018-11-07 12:59 I've just hit this issue in my own working environment... but it's fortunate that I found this issue report as I was thinking of coding a workaround as described in Ryan's initial comment as I hadn't considered Thomas' concerns ... SitRep: So, back in 2015, Jesse said to wait for PR 1596 - that was merged in early 2016. Thomas's PR is still readable, but it was closed due to inactivity early this year (2018). Looking at the history for NodeProvisioner , Stephen wrote most of it - kinda ironic that Stephen un-assigned this only a week ago :-/ TL;DR: That PR needs a lot of tidying up to extract the core intended changes, followed by a review by folks who know this code.

Jesse Glick added a comment - 2018-11-07 13:31

The PR being linked to is for JENKINS-27034, which sounds unrelated. I think thomassuckow was merely saying that the fixes for both would touch similar areas of code, so he wanted to serialize them. If there is a PR open for this issue, it is not mentioned here.

I would not be inclined to waste much more time on Queue + Cloud + NodeProvisioner when there is a more straightforward way of provisioning a “one-shot” agent on demand for a particular build, exemplified by the dockerNode step in docker-plugin.

Jesse Glick added a comment - 2018-11-07 13:31 The PR being linked to is for JENKINS-27034 , which sounds unrelated. I think thomassuckow was merely saying that the fixes for both would touch similar areas of code, so he wanted to serialize them. If there is a PR open for this issue, it is not mentioned here. I would not be inclined to waste much more time on Queue + Cloud + NodeProvisioner when there is a more straightforward way of provisioning a “one-shot” agent on demand for a particular build, exemplified by the dockerNode step in docker-plugin .

pjdarton added a comment - 2018-11-07 14:49

From what I've read, it's the incorrect counting of the runnable workload that's causing this issue - it may well be that the fix for JENKINS-27034 will help fix this issue (or perhaps even fix this problem entirely).
i.e. This issue may just be a symptom of JENKINS-27034.

Also, I would not consider time spent fixing Queue/Cloud/NodeProvisioner as time wasted - that's all core cloud functionality that's used to provide executors by all cloud plugins (e.g. we use docker, vSphere and OpenStack; there are others).

I appreciate that dockerNode is useful, but pipeline-specified one-shot nodes aren't the answer to everything. When it takes a long time for a node to start up (e.g. fully featured VMs rather than lightweight containers), it's important to have clouds configured to supply nodes (with a retention strategy that is not "one shot") in order to maintain build throughput.

FYI I didn't encounter this issue via the docker-plugin; I noticed this because the Jenkins core was asking the vsphere-plugin for new nodes (where dockerNode isn't a viable replacement) and I was monitoring my vSphere cloud at the time. There may well have been OpenStack and Docker nodes being created as well (but I wasn't monitoring those at the time).

pjdarton added a comment - 2018-11-07 14:49 From what I've read, it's the incorrect counting of the runnable workload that's causing this issue - it may well be that the fix for JENKINS-27034 will help fix this issue (or perhaps even fix this problem entirely). i.e. This issue may just be a symptom of JENKINS-27034 . Also, I would not consider time spent fixing Queue/Cloud/NodeProvisioner as time wasted - that's all core cloud functionality that's used to provide executors by all cloud plugins (e.g. we use docker, vSphere and OpenStack; there are others). I appreciate that dockerNode is useful, but pipeline-specified one-shot nodes aren't the answer to everything. When it takes a long time for a node to start up (e.g. fully featured VMs rather than lightweight containers), it's important to have clouds configured to supply nodes (with a retention strategy that is not "one shot") in order to maintain build throughput. FYI I didn't encounter this issue via the docker-plugin; I noticed this because the Jenkins core was asking the vsphere-plugin for new nodes (where dockerNode isn't a viable replacement) and I was monitoring my vSphere cloud at the time. There may well have been OpenStack and Docker nodes being created as well (but I wasn't monitoring those at the time).

Jesse Glick added a comment - 2018-11-07 15:15

This issue may just be a symptom of JENKINS-27034.

Might be. A functional test ought to be able to find out.

it's important to have clouds configured to supply nodes (with a retention strategy that is not "one shot") in order to maintain build throughput

Well, there is nothing stopping an implementation from keeping a pool of booted and warm VMs ready for use. But yes this was off-topic.

Jesse Glick added a comment - 2018-11-07 15:15 This issue may just be a symptom of JENKINS-27034 . Might be. A functional test ought to be able to find out. it's important to have clouds configured to supply nodes (with a retention strategy that is not "one shot") in order to maintain build throughput Well, there is nothing stopping an implementation from keeping a pool of booted and warm VMs ready for use. But yes this was off-topic.

Assignee:: Unassigned

Reporter:: Ryan Campbell

Votes:: 2 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2013-12-11 17:27

Updated:: 2021-01-11 20:03

Jenkins

Details

Description

Attachments

Issue Links

Activity

Collapse comment: Ryan Campbell added a comment - 2013-12-11 17:34

Expand comment: Ryan Campbell added a comment - 2013-12-11 17:34

Collapse comment: Thomas Suckow added a comment - 2015-02-26 00:18

Expand comment: Thomas Suckow added a comment - 2015-02-26 00:18

Collapse comment: Jesse Glick added a comment - 2015-03-24 21:21

Expand comment: Jesse Glick added a comment - 2015-03-24 21:21

Collapse comment: Kanstantsin Shautsou added a comment - 2015-05-04 15:21

Expand comment: Kanstantsin Shautsou added a comment - 2015-05-04 15:21

Collapse comment: Kanstantsin Shautsou added a comment - 2015-09-28 00:10

Expand comment: Kanstantsin Shautsou added a comment - 2015-09-28 00:10

Collapse comment: Thomas Suckow added a comment - 2015-09-29 21:19

Expand comment: Thomas Suckow added a comment - 2015-09-29 21:19

Collapse comment: Stephen Connolly added a comment - 2018-10-31 09:59

Expand comment: Stephen Connolly added a comment - 2018-10-31 09:59

Collapse comment: pjdarton added a comment - 2018-11-07 12:59

Expand comment: pjdarton added a comment - 2018-11-07 12:59

Collapse comment: Jesse Glick added a comment - 2018-11-07 13:31

Expand comment: Jesse Glick added a comment - 2018-11-07 13:31

Collapse comment: pjdarton added a comment - 2018-11-07 14:49

Expand comment: pjdarton added a comment - 2018-11-07 14:49

Collapse comment: Jesse Glick added a comment - 2018-11-07 15:15

Expand comment: Jesse Glick added a comment - 2018-11-07 15:15

People

Dates