-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
-
Pipeline - April 2018
-
Pipeline Job 2.25
Setup
Jenkins v2.89.4 LTS
Pipeline API Plugin: 2.26
Pipeline Nodes and Processes Plugin: 2.19
Durable Task Plugin: 1.18
Pipeline Job Plugin: 2.17
Pipeline Shared Groovy Libraries Plugin: 2.9
Pipeline Supporting APIs Plugin: 2.18
Pipeline Groovy Plugin: 2.45
Script Security Plugin: 1.41
Pipeline Default Speed/Durability Level: Performance-Optimized
"Do not allow the pipeline to resume if the master restarts": Enabled on all jobs
Problem
I logged into a Jenkins master and saw no builds running but there was a queue of about 10 jobs. When mousing over the queued jobs, I saw "pending - Already running 2 builds across all nodes". This is strange because no jobs were showing as running and no Jenkins agents or executors were showing any running builds.
I then ran "http://xx.xxx.xxx.xxx:8080/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds" which did show 5 builds were running. I checked these builds and they were red (failure) and were not running.
Research
I checked the console log of a build that showed as running but isn't and saw the line below near the top of the log.
Resume disabled by user, switching to high-performance, low-durability mode.
At the end of the of the log I saw the following:
Finished: FAILURE
Resuming build at Tue Mar 13 23:04:52 UTC 2018 after Jenkins restart
Why Resume Build?
The build failed on Mar 12, 2018 6:02:37 PM.
Why did the build try to resume almost a day later? The job and system durability are configured to not resume builds. Below are some details taken from the API for the build.
_class "hudson.model.OneOffExecutor"
id "41"
keepLog false
number 41
queueId 7178
result "FAILURE"
timestamp 1520877757466
I checked the Java process on the server and it was last restarted on March 02 2018.
What triggered the "Jenkins restart" identified on Mar 13 23:04:52 UTC 2018 since the Java process was not restarted?
Why does this get the build stuck in a "running" state when it's not running?
Scope
This issue can be seen across many of our Jenkins masters. In each case we see "Resuming build at xxxxx after Jenkins restart" occur a few days after the build failure or abort even though Java was not restarted. This issue didn't occur on Jenkins 2.60.3 running the older (pre-durability configurable) Pipeline plugins.
Logs
I checked the jenkins.log file and saw the following when the build was attempting to be resumed.
Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/42/ Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42] java.lang.NullPointerException at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42] java.lang.NullPointerException at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/40/ Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40] java.lang.NullPointerException at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40] java.lang.NullPointerException at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME2/41/ Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41] java.lang.NullPointerException at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41] java.lang.NullPointerException at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Thread Dump
I only saw WorkflowRun.copyLogs with workflow in the name in a thread dump so I'm not sure if it is related. I didn't see anything BLOCKING. A lot of different items were WAITING on "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject".
"WorkflowRun.copyLogs [#3]" Id=8829 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566 at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566 at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "WorkflowRun.copyLogs [#4]" Id=8830 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566 at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Workaround
Running the Jenkins script below will set the running build to aborted to work around the issue.
Jenkins.instance.getItemByFullName("JOBNAME").getBuildByNumber(JOB#).finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));
- blocks
-
JENKINS-53358 Pipeline Still Persists Run Regularly Even in Performance-Optimized Mode
-
- In Review
-
- is related to
-
JENKINS-45571 "likely stuck" job is not actually stuck.
-
- Open
-
- relates to
-
JENKINS-50407 NPE from SandboxContinuable.run0 after restart
-
- Closed
-
- links to
mkozell The NPEs you see look like a duplicate of https://issues.jenkins-ci.org/browse/JENKINS-49686 which I am investigating right now.
May I please request the following diagnostic information:
1. Please can you attach both the config.xml for the job and a build.xml for one or more Pipelines that experienced this issue.
2. In the build's 'workflow' directory, there's a flownodeStore.xml file - please can you GZIP it and attach it.
3. Can I request the last part of the build log for these builds (the final parts indicating build completed and anything after that).
4. Are all the Pipelines experiencing issues making use of parallels?
5. Do you have any fairly simple pipelines that will reproduce the issues, which you can provide code for?
There's also fix in workflow-cps plugin 2.46 for some issues with Resume Disabled - without that fix there may be odd behaviors if the Resume Disable flag is modified after the build begins. If this has happened, the fix there will help and after gathering the above information, I would suggest installing this update and seeing if issues recur when Resume Disabled in on.
If that does not resolve the resume issue, it may be necessary to toggle this option OFF for the meantime:
We hope to have a hotfix available soon to at least try out.
Thank you