-
Bug
-
Resolution: Fixed
-
Major
-
Jenkins 2.0-rc-1
Running from WAR on Mac
Jenkins 1.651.1 running from WAR on Linux
-
Powered by SuggestiMate -
workflow-cps 2.78
Start a couple long-running pipelines with
node {
sleep 100
}
Queue up a few more jobs. Go to "manage jenkins" and "prepare for shutdown."
Now pipeline jobs that would finish and unenqueue never finish and have to manually be killed (which does work). Freestyle jobs complete normally. Queued jobs aren't run, so that part of prepare-for-shutdown works.
Even stranger: upon killing and restarting with Ctrl+C, we get this lovely conundrum:
Those pipeline builds won't show up in the build queue on the main screen.
Checks to do:
- Regression in core?
- Regression in pipeline?
- does /safeRestart or /restart trigger it?
- duplicates
-
JENKINS-34281 Queue isn't saved on any shutdown
-
- Resolved
-
- is duplicated by
-
JENKINS-38316 Entering, then cancelling, quiet mode causes builds to hang
-
- Resolved
-
- is related to
-
JENKINS-38316 Entering, then cancelling, quiet mode causes builds to hang
-
- Resolved
-
-
JENKINS-51215 Timeouts suspending pipeline executions (that are already suspended?)
-
- In Review
-
-
JENKINS-60434 "Prepare for shutdown" should continue executing already running pipelines to completion
-
- Open
-
- links to
[JENKINS-34256] Preparing Jenkins For Shutdown Hangs Running Pipelines
So what need to happens is
1) nothing on the queue, one executor available.
2) I click on "build now" for my "sleep job" several times
3) the executor will be running one of them, and you will see several "part of sleep job #xxx" in the queue
4) when I mark jenkins for shutdown, the current job will finish, the remaining jobs will stay on the queue
5) if I click on "build now" for the job again, I see "sleep job #xx" in the queue instead of "part of sleep job #xx".
6) I wait until the job is finished, then I send a SIGINT to the jenkins master, which comes down.
7) when I bring jenkins up again, the jobs will not be on the executor queue, but if I look in the job history, they will show up as if they were running, when I look at the details, they will be waiting to be scheduled, but they will be stuck forever.
This is a fairly fresh install, with the "Pipeline" plugin installed and all plugins up-to-date.
I suspect this is related to how the pipeline execution seem to use the master node, but not one of the executor slots. i.e.: when I click several times, I see the master node being assigned the outside of the pipeline job, which will then try to allocate a node. I see several of those in parallel, even when the master node has only one executor.
To provide an update: I have recently restarted the deeper investigation of this issue.
ruoso Okay, I must admit to being stumped: no matter what I try, I can't reproduce this. Testing with Jenkins 2.8 and the latest pipeline plugin – whether I start prepare for shutdown before the first node block, during its execution and whether or not I have a second node block on the job. This also applies whether or not I schedule an additional execution during prepare for shutdown mode.
Can you provide an exact job and timing that will trigger this issue consistently? I am wondering if it is related to the queueing issues resolved in
https://issues.jenkins-ci.org/browse/JENKINS-34281 – which are included in Jenkins 2.1.
ruoso Please can you copy the JENKINS_HOME and try with the the 2.8 WAR? I suspect this is linked to JENKINS-34281, which is not fixed on the Jenkins 1.651.1 release line.
ruoso It should be compatible except for dropping AJP support. You don't need to do a full upgrade anyway, just start with a fresh instance and provide a testcase that reproduces this under Jenkins 2.1+ (I suggest 2.8 as the latest). If it can't be reproduced, the issue is probably resolved by JENKINS-34281 fix - for OSS Jenkins, that means an upgrade, otherwise it would need a backport to 1.651 line.
ok, since Jenkins2 is now actually released, I'll do the upgrade and see if I still have the problem.
I just tested with latest Jenkins release and I can't reproduce the bug.
When I initially reported this Jenkins2 was not really released, it is now... I'll just move to the new version.
ruoso Excellent! I'm going to go ahead and close this one out as a duplicate of the other one then.
I seem to be having this same issue again with Jenkins 2.18, and the workflow-job plugin 2.5.
Problem still present on Jenkins 2.73.1. Using Prepare For Shutdown breaks all currently running pipelines, kind of falling into a deadlock. Need to manually kill job though the CLI and restart Jenkins. All plugins at latest version.
I had this problem when using docker containers that ran without a PID 1. The fix for this was to add --init as an argument to docker run.
It looks like I can Edit: REPRODUCE this locally like so:
stage ("going to bed") {
{{ node {}}
{{ echo 'running a sleep'}}
{{ sh 'for i in `seq 1 70`; do echo "sleep $i" && sleep 1; done'}}
{{ } }}
}
Which means it should be debuggable/fixable now.
Maybe JENKINS-38316 and in particular https://issues.jenkins-ci.org/browse/JENKINS-38316?focusedCommentId=332021&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-332021 explains the current situation (in case it has changed over the past almost 2 years)...
reinholdfuereder I'm 99% sure this is where the hang originates from: https://github.com/jenkinsci/workflow-cps-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java#L235
This behavior seemingly was is by design, because we wanted Pipelines to halt where they are (rather than completing fully) before shutdown.
A better design might have been a separate "paused" state.
Unfortunately AFAIK there's not a listener in Core that we can use to notify the Pipeline to wake back up when leaving QuietingDown mode. My best notion has been for halted Pipelines to poll periodically to see if we've left quietDown mode and then resume if so – doable but rather unfortunate.
Worse, I can't seem to actually reproduce this behavior in unit test for reasons I'm still trying to ascertain, even though it's easy to demonstrate on a normal instance: https://github.com/svanoort/workflow-cps-plugin/blob/60308b567d4bff6904d6fbc3cb57fbda564eaff7/src/test/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecutionTest.java#L249
jglick Do you have any notions here?
See JENKINS-38316 for the same issue but with additional comments/info.
Jesse's suggestions given issues with the testcase here:
[11:11 AM] Jesse Glick: very roughly: `semaphore 'wait'` can be the whole program; wait for it to start; `doQuietDown`; succeed step; `doCancelQuietDown`; wait for finish
[11:11 AM] Jesse Glick: (maybe?)
[11:11 AM] Jesse Glick: no need for `node`, `Thread.sleep`, or `waitForSuspension`
[11:12 AM] Jesse Glick: untested, obviously, but I would try something along those lines
[11:18 AM] Jesse Glick: @Sam I suspect you are quieting down in the middle of a `sleep`, then canceling that before anything else happens in the program, so… `CpsFlowExecution` never even notices that the state flipped
[11:19 AM] Jesse Glick: @Sam JENKINS-38316 is about the more likely scenario that the admin goes into quiet down mode, then the CPS VM thread wakes up for whatever reason, sees that it is supposed to be in quiet mode, pauses, and then never receives a notification to do anything else (unless perhaps someone manually pauses and resumes the build)
Jglick: so my suggested test case would first quiet down, then do something to wake up the program, then cancel quiet down
Jesse Glick·11:21 AM you might need to do something else there, b/c I suspect there is still a race condition in that test—`SemaphoreStep.succeed` will post a task to the CPS VM thread, but you need to wait for that task to actually be processed.
Jesse Glick·11:21 AM That might be a valid use of `waitForSuspension`.
[11:23 AM] Jesse Glick: @sam I would suggest that https://github.com/jenkinsci/workflow-cps-plugin/blob/564a12c05eb54d5a84062cd3bf1d68deb47e1d9f/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java#L236 if the reason for pausing is quiet down mode, you print something to the build log. That is something the test could `waitForMessage` to see.
[11:23 AM] Jesse Glick: (as well as being better UX)
[11:23 AM] Jesse Glick: for the other reason we already print a message to the log: https://github.com/jenkinsci/workflow-cps-plugin/blob/564a12c05eb54d5a84062cd3bf1d68deb47e1d9f/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java#L1491
[11:25 AM] Jesse Glick: thus the test would be: wait for `semaphore` step to start; set Jenkins to quiet mode; permit the step to finish; wait for the message saying that the build is paused due to quiet mode; cancel quiet mode; wait for build to complete on its own
[11:26 AM] Jesse Glick: @Sam ^^^
More or less accidentally I just successfully resumed 3 pipelines after cancelling the shutdown mode after restarting Jenkins after putting Jenkins in shutdown mode (cf. also ):JENKINS-38316
- and I did NOT have to wake them up manually by the "pause"-"resume" workaround
- maybe/presumably because I entered/started the shutdown mode in the middle of 'sh' steps, AND then waited until the end of this steps before restarting Jenkins?
However, the following minor issues popped up – please mind that there were actually two Jenkins restarts, because Jenkins Plugins were updated (just one in fact) after the first Jenkins restart via Jenkins init.d hook scripts, followed by a second restart (after the updates):
- Resuming after Jenkins restart is slow
... Resuming build at Fri May 04 07:52:09 CEST 2018 after Jenkins restart Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: ??? Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Resuming build at Fri May 04 07:55:41 CEST 2018 after Jenkins restart Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: ??? Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Waiting to resume part of Unattended-Upgrades » ACME 20180504-074900-revUNKNOWN: Jenkins is about to shut down Ready to run at Fri May 04 07:57:42 CEST 2018 [Pipeline] sh 07:57:42 [ACME] Running shell script ...
- ... maybe because of running into timeouts when suspending (the more or less still suspended pipelines; because first action in init.d hook scripts is setting Jenkins in shutdown mode)!?
2018-05-04 07:52:09 INFO [hudson.WebAppMain$3 run] Jenkins is fully up and running 2018-05-04 07:52:10 SEVERE [jenkins.model.Jenkins$24 run] Restarting VM as requested by SYSTEM 2018-05-04 07:52:10 INFO [jenkins.model.Jenkins cleanUp] Stopping Jenkins 2018-05-04 07:52:10 INFO [jenkins.model.Jenkins$19 onAttained] Started termination 2018-05-04 07:52:10 WARNING [hudson.util.ExceptionCatchingThreadFactory uncaughtException] Thread Computer.threadPoolForRemoting [#2] terminated unexpectedly java.nio.channels.ClosedSelectorException at sun.nio.ch.SelectorImpl.keys(SelectorImpl.java:68) at org.jenkinsci.remoting.protocol.IOHub.getThreadNameBase(IOHub.java:426) at org.jenkinsci.remoting.protocol.IOHub.access$200(IOHub.java:69) at org.jenkinsci.remoting.protocol.IOHub$IOHubSelectorWatcher.run(IOHub.java:536) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-05-04 07:53:10 WARNING [org.jenkinsci.plugins.workflow.cps.CpsFlowExecution suspendAll] Error waiting for Pipeline to suspend: CpsFlowExecution[Owner[Unattended-Upgrades/ACME2/273:Unattended-Upgrades/ACME2 #273]] java.util.concurrent.TimeoutException: Timeout waiting for task. at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:259) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:91) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:1555) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104) at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175) at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296) at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117) at jenkins.model.Jenkins$18.execute(Jenkins.java:3333) at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:128) at jenkins.model.Jenkins$18.execute(Jenkins.java:3333) at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139) at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:276) at jenkins.model.Jenkins._cleanUpRunTerminators(Jenkins.java:3330) at jenkins.model.Jenkins.cleanUp(Jenkins.java:3251) at hudson.lifecycle.UnixLifecycle.restart(UnixLifecycle.java:73) at jenkins.model.Jenkins$24.run(Jenkins.java:4234) 2018-05-04 07:54:10 WARNING [org.jenkinsci.plugins.workflow.cps.CpsFlowExecution suspendAll] Error waiting for Pipeline to suspend: CpsFlowExecution[Owner[Unattended-Upgrades/ACME/182:Unattended-Upgrades/ACME #182]] java.util.concurrent.TimeoutException: Timeout waiting for task. at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:259) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:91) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:1555) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104) at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175) at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296) at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117) at jenkins.model.Jenkins$18.execute(Jenkins.java:3333) at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:128) at jenkins.model.Jenkins$18.execute(Jenkins.java:3333) at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139) at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:276) at jenkins.model.Jenkins._cleanUpRunTerminators(Jenkins.java:3330) at jenkins.model.Jenkins.cleanUp(Jenkins.java:3251) at hudson.lifecycle.UnixLifecycle.restart(UnixLifecycle.java:73) at jenkins.model.Jenkins$24.run(Jenkins.java:4234) 2018-05-04 07:55:10 WARNING [org.jenkinsci.plugins.workflow.cps.CpsFlowExecution suspendAll] Error waiting for Pipeline to suspend: CpsFlowExecution[Owner[Unattended-Upgrades/ACME3/181:Unattended-Upgrades/ACME3 #181]] java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:258) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:91) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:1555) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104) at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175) at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296) at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117) at jenkins.model.Jenkins$18.execute(Jenkins.java:3333) at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:128) at jenkins.model.Jenkins$18.execute(Jenkins.java:3333) at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139) at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:276) at jenkins.model.Jenkins._cleanUpRunTerminators(Jenkins.java:3330) at jenkins.model.Jenkins.cleanUp(Jenkins.java:3251) at hudson.lifecycle.UnixLifecycle.restart(UnixLifecycle.java:73) at jenkins.model.Jenkins$24.run(Jenkins.java:4234) 2018-05-04 07:55:10 INFO [jenkins.model.Jenkins$19 onAttained] Completed termination 2018-05-04 07:55:10 INFO [jenkins.model.Jenkins _cleanUpDisconnectComputers] Starting node disconnection 2018-05-04 07:55:10 INFO [jenkins.model.Jenkins _cleanUpShutdownPluginManager] Stopping plugin manager 2018-05-04 07:55:10 INFO [jenkins.model.Jenkins _cleanUpPersistQueue] Persisting build queue 2018-05-04 07:55:10 INFO [jenkins.model.Jenkins _cleanUpAwaitDisconnects] Waiting for node disconnection completion 2018-05-04 07:55:10 INFO [jenkins.model.Jenkins cleanUp] Jenkins stopped Listening for transport dt_socket at address: 5005 Running from: /usr/share/jenkins/jenkins.war 2018-05-04 07:55:11 INFO [org.eclipse.jetty.util.log.Log initialized] Logging initialized @525ms to org.eclipse.jetty.util.log.JavaUtilLog 2018-05-04 07:55:11 INFO [winstone.Logger logInternal] Beginning extraction from war file ... 2018-05-04 07:55:46 INFO [jenkins.InitReactorRunner$1 onAttained] Completed initialization 2018-05-04 07:55:46 INFO [hudson.WebAppMain$3 run] Jenkins is fully up and running 2018-05-04 07:58:12 INFO [org.jenkinsci.plugins.workflow.job.WorkflowRun finish] Unattended-Upgrades/ACME2 Worker #273 completed: SUCCESS 2018-05-04 07:58:35 INFO [org.jenkinsci.plugins.workflow.job.WorkflowRun finish] Unattended-Upgrades/ACME #182 completed: SUCCESS 2018-05-04 08:01:11 INFO [org.jenkinsci.plugins.workflow.job.WorkflowRun finish] Unattended-Upgrades/ACME3 #181 completed: SUCCESS
- svanoort Should I file a dedicated issue for that ("Error waiting for Pipeline to suspend: CpsFlowExecution")?
- ... maybe because of running into timeouts when suspending (the more or less still suspended pipelines; because first action in init.d hook scripts is setting Jenkins in shutdown mode)!?
- And the build executor status does not stop showing the pipeline as being in-progress or so:
- Cancelling/aborting it with this 'x' button/link finally removes it (after confirming in the pop-up dialog "Are you sure you want to abort null?")
reinholdfuereder I would open a separate issue for Timeouts suspending executions – especially if you can come up with a consistent way to reproduce it. I saw it from time to time with Pipelines doing very complex processing (where we can't block the shutdown forever and shouldn't).
My suspicion is that there's a subtle bug around the halt-at-shutdown logic, which may have been pre-existing but is visible now because the process is more closely monitored and logged now (also because we actually have some test coverage for it). Unfortunately
By the way, you will sometimes be able to resume Pipelines after going into prepare-for-shutdown if the toggle happens at the right time – but in general there's no wakeup hook to resume execution (see notes above about how we plan to add one).
So with pipelines, what is the recommended way of completely stopping a busy Jenkins instance for maintenance? The maintenance is in part due to a broken pipeline resume a'la JENKINS-50199, so I specifically don't want any additional half-done pipelines waiting to be resumed. I also would prefer to avoid having to abort jobs.
In JENKINS-38316 there's an explicit mention that "prepare for shutdown" is not that:
The whole idea of "Prepare for shutdown" is to [...] allow you to finish currently running freestyle (Maven, matrix, …) builds. So if you /safeRestart Jenkins will restart as soon as any of those are completed, and running Pipeline builds will be left alone.
What should I do then?
We're having similar issues. We use pipeline extensively to build on different platforms and types of slaves and we're also seeing the pipeline jobs finish but not remove from slave.
Restarting jenkins, which is usually the reason for shutdown, gets the jobs even more out of shape as the pipeline job reconnects to the slave, then tries to continue on the slave, but cannot as it's waiting for executor on the slave it's running on.
21:13:21 Running on ella in /home/jenkins/slave/workspace/Security/SAMATE/SAMATE-java -- stuff happens here -- -- put jeckins in shutdown mode -- Waiting to resume part of Security » SALADE » SALADE-java #490: Jenkins is about to shut down -- Restart jenkins -- Resuming build at Tue Aug 21 07:50:31 BST 2018 after Jenkins restart Waiting to resume part of Security » SALADE » SALADE-java #490: Waiting for next available executor on ella My expectation is the pipeline job on that node would finish and the next pipeline job will be queued unassigned to a node to allow restart and connecting to a new node ?
mkozell That specific case sounds a lot like a gremlin we've been chasing on and off for quite a while. I'm assigning this to jtaboada to investigate.
I think what you report may be independent of what was discussed here though which is probably the root cause of the issue: https://issues.jenkins-ci.org/browse/JENKINS-34256?focusedCommentId=336282&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-336282 through https://issues.jenkins-ci.org/browse/JENKINS-34256?focusedCommentId=332080&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-332080
Also experiencing this new since around 2.140 in all pipeline jobs. These jobs are on lowest durability setting.
Previously had no issue holding the queue by "preparing for shutdown" and the currently running jobs would finish. Now have to force Jenkins to restart to get rid of the jobs.
We have the same problem on Jenkins 2.138.2.
Is there any time estimation for resolving the issue?
Same issue here with Jenkins LTS 2.150.2
I'm seeing this with pipeline durability set to "PERFORMANCE_OPTIMIZED" in the global configuration.
svanoort re-reading your comment here https://issues.jenkins-ci.org/browse/JENKINS-34256?focusedCommentId=332080&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-332080 I am wondering why the currently executing pipeline should actually halt by design – wouldn't it be more intuitive if any running pipelines just complete (as it was the case with freestyle jobs earlier)
Having the same issue here on Jenkins 2.165 - even with simple 'sh "sleep 60"' test jobs.
Attempting to work around the issue by checking "Do not allow the pipeline to resume after master restarts" and changing the pipeline to PERFORMANCE_OPTIMIZED makes the pipeline attempt to resume after restart (??) and makes me run into JENKINS-50407 instead.
According to my observations, the bug only affects Jenkins pipelines and happens when a Jenkins is put into shutdown mode when there are some pipelines running on background; those pipelines will not be able to proceed to next pipeline stage(s) and will indefinitely stuck in whatever last stage there was prior shutdown mode.
This can be reproduced with the following pseudo-pipeline:
stages { stage('build') { steps { sh('make build') } } stage('prepare') { steps { //During this stage, Jenkins is put into shutdown mode sh('make prepare-for-restart') } } stage('deploy') { steps { // Pipeline will not reach this stage sh('make deploy') } } } post { always { sh('echo Test') } }
The following pipeline will never reach neither deploy or post stages
My guess shutdown prevents any new build threads to be executed and since each stage runs in separate thread (for serialization purposes), pipelines get stuck. This behavior seems to be intended because this allows Jenkins to continue stages after hard restart.
In my use-case I would like to conduct a safe, controlled Jenkins restart, allowing any existing workloads to finish.
It is distressing that major issues like this sit for 3+ years with no resolution – and worse, not assigned to anyone.
Since pipeline jobs are "the norm" now, the "Prepare For Shutdown" button is a trap for users to get their system into a broken state. If this issue cannot be fixed, at the very least there should be a warning label next to that button.
I am looking into this issue in PR 340. I am a little confused by some of the comments in this thread. As far as I can tell, this has never worked, regardless of what step is executing when quiet mode is enabled, because there is no code to tell Pipeline executions that quiet mode was cancelled and they should try to resume themselves. Maybe I am misunderstanding something or there are multiple distinct issues being discussed in the ticket, so I am going to reread all of the comments in case and do some additional testing with the sh step.
I am experiencing the same problem as described in the title. This is how I reproduce it:
- Run a container from jenkins/jenkins:lts image (which for me has version Jenkins ver. 2.190.2)
- create a simple pipeline:
pipeline { agent any stages { stage('x') { steps { sh 'sleep 30' sh 'sleep 30' } } } }
- run a build of the pipeline above, then go to Manage Jenkins and click on Prepare for shutdown
At this point Jenkins shows the red stripe Jenkins is going to shut down but it never does. The pipeline never proceed (in fact, it doesn't even reach the second sh if the prepare for shutdown happened during the first sleep. The pipeline hangs, and never terminate.
I am currently at Jenkins world and showed the behaviour to bitwiseman today.
ferulee46 Yes, it's confusing, but that's the intended behavior. Clicking "Prepare for shutdown" pauses all running Pipelines. My PR prints a message to the build log of Pipelines when this happens to make it clear that the build is paused. Once the build is paused, Jenkins can be restarted, and the Pipeline will resume after the restart.
If you are only using "Prepare for shutdown" to restart Jenkins without breaking in-progress builds of non-Pipeline jobs, you can navigate to the /safeRestart URL, which is like "Prepare for shutdown" except that it automatically restarts after all non-Pipeline jobs complete.
The Pipeline builds should resume after Jenkins restarts with or without my PR. The main change in my PR is that today, if you cancel shutdown after clicking "Prepare for shutdown", the Pipeline builds stay paused. You have to manually pause and unpause the builds to get them to resume or restart Jenkins. After my PR, canceling shutdown will unpause the builds and resume them automatically.
Hi Devin, thanks for your prompt reply.
It is very confusing indeed. Shouldn't the message in red saying something different than 'Jenkins is going to shut down' if that is not true (because it actually does not shut down at all, ever).
Also, I haven't tried it right now, but I'm fairly confident this is exactly what happens when you update a plugin that needs Jenkins to be restarted. if you've got pipelines running it will never do until you kill those pipelines.
In the modern world of transient agents (e.g. Kubernetes pods) that won't exist after restart, this approach of pausing the pipeline is painful. It would sure be nice if there was a way to allow current jobs to finish while not allowing queued jobs to be started.
ferulee46 I think the intention of the message is something like how you might use wall on a multi-user Unix system, in the sense that the message is a way for admins to signify to anyone that might be using Jenkins that it will be shut down at some point (just guessing, I did not add the feature). The admin still has to actually initiate the shutdown themselves, so for admins, I agree, the message is confusing.
4 years old serious usability issue, hasn't been fixed, won't be fixed because nobody cares. Jenkins is dead, use something else.
As mentioned above, in the modern world of transient agents such as Kubernetes pods, this is quite painful. Transient agents are likely to become more prevelant.
dnusbaum Thanks, I think that ("The main change in my PR is that today, if you cancel shutdown after clicking "Prepare for shutdown", the Pipeline builds stay paused. You have to manually pause and unpause the builds to get them to resume or restart Jenkins. After my PR, canceling shutdown will unpause the builds and resume them automatically.") should actually really address one of my (many months ago) experienced problems in this concern! (Because I have a groovy init hook script that always configures Jenkins to start in so-called quiet mode...)
As other users more or less diplomatically commented, there is still room for related important enhancements: maybe these can be collected and discussed and prioritised and addressed in another future sprint/story? (And I think Jenkins is still in massive use nowadays and hopefully not dead for a long time...)
A fix for this issue was just released in Pipeline: Groovy Plugin version 2.78. I think there is/was some confusion as to the expected behavior (myself included!), so let me try to clarify: When Jenkins prepares for shutdown, all running Pipelines are paused, and this is the intended behavior. The unintended behavior was that if you canceled shutdown, Pipelines remained paused. This has been fixed in 2.78; Pipelines will now resume execution if shutdown is canceled. Before 2.78, you had to manually pause and unpause each Pipeline to get it to resume execution, or restart Jenkins. Additionally, preparing Jenkins for shutdown and canceling shutdown now each cause a message to be printed to Pipeline build logs indicating that the Pipeline is being paused or resumed due to shutdown so that it is easier to understand what is happening.
Based on comments here and elsewhere, I think some users would prefer a variant of "Prepare for shutdown" in which Pipelines continue executing to completion, the same as other types of jobs like Freestyle. If that is something you want, please open a new ticket, describing your use case and the desired behavior.
For anyone curious as to why Pipelines are paused when Jenkins prepares for shutdown, instead of continuing to execute and only saving at the last possible second when Jenkins is stopped, the reasoning is to avoid race conditions saving Pipeline metadata that could prevent Pipelines from resuming correctly.
If there is some other aspect of this issue that you would like to see addressed, or a different behavior you would prefer, please open a new ticket describing your particular use case.
Thanks!
Thanks again dnusbaum! And following your advice => JENKINS-60434
dnusbaum could you clarify whether this same logic/behavior also applies to the restart that happens after plugin installation (when checking the "Restart Jenkins when installation is complete and no jobs are running" checkbox) or when clicking the Restart Safely button under Manage Jenkins (i.e., the /safeRestart URL, as enabled by https://plugins.jenkins.io/saferestart)? Do running pipeline jobs get paused in those circumstances too and now (with Pipeline: Groovy 2.78) automatically resumed once Jenkins is back up?
Could you clarify whether this same logic/behavior also applies to the restart that happens after plugin installation (when checking the "Restart Jenkins when installation is complete and no jobs are running" checkbox) or when clicking the Restart Safely button under Manage Jenkins (i.e., the /safeRestart URL
Both of these situations use the /safeRestart URL behind the scenes, which puts Jenkins into the same state as "Prepare for shutdown", which prevents new builds from being started and causes Pipeline builds to pause. The difference between /safeRestart and "Prepare for shutdown" is that safeRestart will also automatically restart Jenkins once all non-Pipeline jobs have completed and all Pipeline jobs have been paused, whereas "Prepare for shutdown" does not actually restart Jenkins.
Even before Pipeline: Groovy version 2.78, once Jenkins restarted due to /safeRestart, all Pipelines should have resumed automatically, and they should continue to have that behavior in Pipeline: Groovy 2.78. If your Pipelines are not resuming after the restart, please open a new ticket, including steps to reproduce the issue from scratch and any messages from your Jenkins logs or Pipeline build logs that seem relevant.
Thanks dnusbaum. So the logic change in 2.78 is only for the specific situation where Jenkins is "put to sleep" (Prepare for Shutdown) and then "woken up" (Cancel Shutdown) without actually being restarted? I.e., the intended/expected behavior even prior to 2.78 is that paused pipeline builds would resume automatically after an actual service restart? I've definitely seen them not resume after a restart, so I'll endeavor to reproduce the problem and then file a new bug with details.
So the logic change in 2.78 is only for the specific situation where Jenkins is "put to sleep" (Prepare for Shutdown) and then "woken up" (Cancel Shutdown) without actually being restarted? I.e., the intended/expected behavior even prior to 2.78 is that paused pipeline builds would resume automatically after an actual service restart?
Yes, although note that you can also cancel /safeRestart before the restart happens, and the logic change fixes that case too.
I've definitely seen them not resume after a restart, so I'll endeavor to reproduce the problem and then file a new bug with details.
Ok, great!
dnusbaum I can confirm that your fix works really fine!
Because – now some coughing and red face – I accidentally restarted Jenkins master without waiting for pipelines to complete (of course looking forward to JENKINS-60434): and there were some non-minor real world pipelines running... Just one of them failed due to JENKINS-49365...
ruoso Is there any particular timing dependency? Fresh install?