Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50199

Failed pipeline jobs stuck running after incorrect resume

    XMLWordPrintable

    Details

    • Similar Issues:
    • Sprint:
      Pipeline - April 2018
    • Released As:
      Pipeline Job 2.25

      Description

      Setup

      Jenkins v2.89.4 LTS
      Pipeline API Plugin: 2.26
      Pipeline Nodes and Processes Plugin: 2.19
      Durable Task Plugin: 1.18
      Pipeline Job Plugin: 2.17
      Pipeline Shared Groovy Libraries Plugin: 2.9
      Pipeline Supporting APIs Plugin: 2.18
      Pipeline Groovy Plugin: 2.45
      Script Security Plugin: 1.41

      Pipeline Default Speed/Durability Level: Performance-Optimized
      "Do not allow the pipeline to resume if the master restarts": Enabled on all jobs

      Problem

      I logged into a Jenkins master and saw no builds running but there was a queue of about 10 jobs. When mousing over the queued jobs, I saw "pending - Already running 2 builds across all nodes". This is strange because no jobs were showing as running and no Jenkins agents or executors were showing any running builds.

      I then ran "http://xx.xxx.xxx.xxx:8080/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds" which did show 5 builds were running. I checked these builds and they were red (failure) and were not running.

      Research

      I checked the console log of a build that showed as running but isn't and saw the line below near the top of the log.

      Resume disabled by user, switching to high-performance, low-durability mode.

      At the end of the of the log I saw the following:

      Finished: FAILURE
      Resuming build at Tue Mar 13 23:04:52 UTC 2018 after Jenkins restart

      Why Resume Build?

      The build failed on Mar 12, 2018 6:02:37 PM.
      Why did the build try to resume almost a day later? The job and system durability are configured to not resume builds. Below are some details taken from the API for the build.

      _class "hudson.model.OneOffExecutor"
      id "41"
      keepLog false
      number 41
      queueId 7178
      result "FAILURE"
      timestamp 1520877757466

      I checked the Java process on the server and it was last restarted on March 02 2018.

      What triggered the "Jenkins restart" identified on Mar 13 23:04:52 UTC 2018 since the Java process was not restarted?

      Why does this get the build stuck in a "running" state when it's not running?

      Scope

      This issue can be seen across many of our Jenkins masters. In each case we see "Resuming build at xxxxx after Jenkins restart" occur a few days after the build failure or abort even though Java was not restarted. This issue didn't occur on Jenkins 2.60.3 running the older (pre-durability configurable) Pipeline plugins.

      Logs

      I checked the jenkins.log file and saw the following when the build was attempting to be resumed.

       

      Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
       WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/42/
      
       Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
       WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
       java.lang.NullPointerException
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
       at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      
      Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
       WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
       java.lang.NullPointerException
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
       at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      
      Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
       WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/40/
      
       Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
       WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
       java.lang.NullPointerException
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
       at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      
      Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
       WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
       java.lang.NullPointerException
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
       at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      
      Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
       WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME2/41/
      
       Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
       WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
       java.lang.NullPointerException
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
       at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      
      Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
       WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
       java.lang.NullPointerException
       at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
       at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
       at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
       at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
       at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
       at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      

       

      Thread Dump

      I only saw WorkflowRun.copyLogs with workflow in the name in a thread dump so I'm not sure if it is related. I didn't see anything BLOCKING. A lot of different items were WAITING on "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject".

      "WorkflowRun.copyLogs [#3]" Id=8829 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
       at sun.misc.Unsafe.park(Native Method)
       - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
       at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
       at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
       at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
       at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      
      "WorkflowRun.copyLogs [#4]" Id=8830 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
       at sun.misc.Unsafe.park(Native Method)
       - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
       at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
       at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
       at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
      

      Workaround

      Running the Jenkins script below will set the running build to aborted to work around the issue.

      Jenkins.instance.getItemByFullName("JOBNAME").getBuildByNumber(JOB#).finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));

        Attachments

        1. 867.tar.gz
          4 kB
        2. build.log
          0.4 kB
        3. build.xml
          10 kB
        4. build.xml
          1 kB
        5. flownode-84.xml
          21 kB
        6. flowNodeStore.xml
          4 kB
        7. workflow-cps.hpi
          548 kB
        8. workflow-cps.hpi
          540 kB
        9. workflow-cps.hpi
          540 kB
        10. workflow-fallback-flowNodeStore.xml
          7 kB
        11. workflow-job.hpi
          112 kB
        12. workflow-job.hpi
          110 kB
        13. workflow-job.hpi
          110 kB
        14. workflows.xml
          5 kB

          Issue Links

            Activity

            mkozell Mike Kozell created issue -
            mkozell Mike Kozell made changes -
            Field Original Value New Value
            Description h2. Setup

            Jenkins v2.89.4 LTS
             Pipeline API Plugin: 2.26
             Pipeline Nodes and Processes Plugin: 2.19
             Durable Task Plugin: 1.18
             Pipeline Job Plugin: 2.17
             Pipeline Shared Groovy Libraries Plugin: 2.9
             Pipeline Supporting APIs Plugin: 2.18
             Script Security Plugin: 1.41

            Pipeline Default Speed/Durability Level: Performance-Optimized
             "Do not allow the pipeline to resume if the master restarts": Enabled on all jobs
            h2. Problem

            I logged into a Jenkins master and saw no builds running but there was a queue of about 10 jobs. When mousing over the queued jobs, I saw "pending - Already running 2 builds across all nodes". This is strange because no jobs were showing as running and no Jenkins agents or executors were showing any running builds.

            I then ran "http://xx.xxx.xxx.xxx:8080/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds" which did show 5 builds were running. I checked these builds and they were red (failure) and were not running.
            h2. Research

            I checked the console log of a build that showed as running but isn't and saw the line below near the top of the log.

            _*Resume disabled by user, switching to high-performance, low-durability mode.*_

            At the end of the of the log I saw the following:

            *_Finished: FAILURE_*
             *_Resuming build at Tue Mar 13 23:04:52 UTC 2018 after Jenkins restart_*
            h2. Why Resume Build?

            The build failed on *Mar 12, 2018 6:02:37 PM*.
             +Why did the build try to resume almost a day later+? The job and system durability are configured to not resume builds. Below are some details taken from the API for the build.

            _class "hudson.model.OneOffExecutor"
             id "41"
             keepLog false
             number 41
             queueId 7178
             result "FAILURE"
             timestamp 1520877757466

            I checked the Java process on the server and it was last restarted on *March 02 2018*.

            +What triggered the "Jenkins restart" identified on Mar 13 23:04:52 UTC 2018 since the Java process was not restarted?+

            +Why does this get the build stuck in a "running" state when it's not running?+
            h2. Scope

            This issue can be seen across many of our Jenkins masters. In each case we see "Resuming build at xxxxx after Jenkins restart" occur a few days after the build failure or abort even though Java was not restarted. This issue didn't occur on Jenkins 2.60.3 running the older (pre-durability configurable) Pipeline plugins.
            h2. Logs

            I checked the jenkins.log file and saw the following when the build was attempting to be resumed.

             
            {code:java}
            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/42/
             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/40/
             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME2/41/
             Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
             
            h2. Thread Dump

            I only saw WorkflowRun.copyLogs with workflow in the name in a thread dump so I'm not sure if it is related. I didn't see anything BLOCKING. A lot of different items were WAITING on "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject".
            {code:java}
            "WorkflowRun.copyLogs [#3]" Id=8829 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            "WorkflowRun.copyLogs [#4]" Id=8830 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
            h2. Workaround

            I was told that running the Jenkins script below will set the running builds to aborted to work around the issue.
            {noformat}
            Jenkins.instance.getItemByFullName("JOBNAME").getBuildByNumber(JOB#).finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));{noformat}
            h2. Setup

            Jenkins v2.89.4 LTS
             Pipeline API Plugin: 2.26
             Pipeline Nodes and Processes Plugin: 2.19
             Durable Task Plugin: 1.18
             Pipeline Job Plugin: 2.17
             Pipeline Shared Groovy Libraries Plugin: 2.9
             Pipeline Supporting APIs Plugin: 2.18
             Script Security Plugin: 1.41

            Pipeline Default Speed/Durability Level: Performance-Optimized
             "Do not allow the pipeline to resume if the master restarts": Enabled on all jobs
            h2. Problem

            I logged into a Jenkins master and saw no builds running but there was a queue of about 10 jobs. When mousing over the queued jobs, I saw "pending - Already running 2 builds across all nodes". This is strange because no jobs were showing as running and no Jenkins agents or executors were showing any running builds.

            I then ran "http://xx.xxx.xxx.xxx:8080/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds" which did show 5 builds were running. I checked these builds and they were red (failure) and were not running.
            h2. Research

            I checked the console log of a build that showed as running but isn't and saw the line below near the top of the log.

            _*Resume disabled by user, switching to high-performance, low-durability mode.*_

            At the end of the of the log I saw the following:

            *_Finished: FAILURE_*
             *_Resuming build at Tue Mar 13 23:04:52 UTC 2018 after Jenkins restart_*
            h2. Why Resume Build?

            The build failed on *Mar 12, 2018 6:02:37 PM*.
             +Why did the build try to resume almost a day later+? The job and system durability are configured to not resume builds. Below are some details taken from the API for the build.

            _class "hudson.model.OneOffExecutor"
             id "41"
             keepLog false
             number 41
             queueId 7178
             result "FAILURE"
             timestamp 1520877757466

            I checked the Java process on the server and it was last restarted on *March 02 2018*.

            +What triggered the "Jenkins restart" identified on Mar 13 23:04:52 UTC 2018 since the Java process was not restarted?+

            +Why does this get the build stuck in a "running" state when it's not running?+
            h2. Scope

            This issue can be seen across many of our Jenkins masters. In each case we see "Resuming build at xxxxx after Jenkins restart" occur a few days after the build failure or abort even though Java was not restarted. This issue didn't occur on Jenkins 2.60.3 running the older (pre-durability configurable) Pipeline plugins.
            h2. Logs

            I checked the jenkins.log file and saw the following when the build was attempting to be resumed.

             
            {code:java}
            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/42/
             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/40/
             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME2/41/
             Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
             
            h2. Thread Dump

            I only saw WorkflowRun.copyLogs with workflow in the name in a thread dump so I'm not sure if it is related. I didn't see anything BLOCKING. A lot of different items were WAITING on "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject".
            {code:java}
            "WorkflowRun.copyLogs [#3]" Id=8829 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            "WorkflowRun.copyLogs [#4]" Id=8830 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
            h2. Workaround

            I was told that running the Jenkins script below will set the running builds to aborted to work around the issue.
            {noformat}
            Jenkins.instance.getItemByFullName("JOBNAME").getBuildByNumber(JOB#).finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));{noformat}
            mkozell Mike Kozell made changes -
            Description h2. Setup

            Jenkins v2.89.4 LTS
             Pipeline API Plugin: 2.26
             Pipeline Nodes and Processes Plugin: 2.19
             Durable Task Plugin: 1.18
             Pipeline Job Plugin: 2.17
             Pipeline Shared Groovy Libraries Plugin: 2.9
             Pipeline Supporting APIs Plugin: 2.18
             Script Security Plugin: 1.41

            Pipeline Default Speed/Durability Level: Performance-Optimized
             "Do not allow the pipeline to resume if the master restarts": Enabled on all jobs
            h2. Problem

            I logged into a Jenkins master and saw no builds running but there was a queue of about 10 jobs. When mousing over the queued jobs, I saw "pending - Already running 2 builds across all nodes". This is strange because no jobs were showing as running and no Jenkins agents or executors were showing any running builds.

            I then ran "http://xx.xxx.xxx.xxx:8080/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds" which did show 5 builds were running. I checked these builds and they were red (failure) and were not running.
            h2. Research

            I checked the console log of a build that showed as running but isn't and saw the line below near the top of the log.

            _*Resume disabled by user, switching to high-performance, low-durability mode.*_

            At the end of the of the log I saw the following:

            *_Finished: FAILURE_*
             *_Resuming build at Tue Mar 13 23:04:52 UTC 2018 after Jenkins restart_*
            h2. Why Resume Build?

            The build failed on *Mar 12, 2018 6:02:37 PM*.
             +Why did the build try to resume almost a day later+? The job and system durability are configured to not resume builds. Below are some details taken from the API for the build.

            _class "hudson.model.OneOffExecutor"
             id "41"
             keepLog false
             number 41
             queueId 7178
             result "FAILURE"
             timestamp 1520877757466

            I checked the Java process on the server and it was last restarted on *March 02 2018*.

            +What triggered the "Jenkins restart" identified on Mar 13 23:04:52 UTC 2018 since the Java process was not restarted?+

            +Why does this get the build stuck in a "running" state when it's not running?+
            h2. Scope

            This issue can be seen across many of our Jenkins masters. In each case we see "Resuming build at xxxxx after Jenkins restart" occur a few days after the build failure or abort even though Java was not restarted. This issue didn't occur on Jenkins 2.60.3 running the older (pre-durability configurable) Pipeline plugins.
            h2. Logs

            I checked the jenkins.log file and saw the following when the build was attempting to be resumed.

             
            {code:java}
            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/42/
             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/40/
             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME2/41/
             Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
             
            h2. Thread Dump

            I only saw WorkflowRun.copyLogs with workflow in the name in a thread dump so I'm not sure if it is related. I didn't see anything BLOCKING. A lot of different items were WAITING on "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject".
            {code:java}
            "WorkflowRun.copyLogs [#3]" Id=8829 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            "WorkflowRun.copyLogs [#4]" Id=8830 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
            h2. Workaround

            I was told that running the Jenkins script below will set the running builds to aborted to work around the issue.
            {noformat}
            Jenkins.instance.getItemByFullName("JOBNAME").getBuildByNumber(JOB#).finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));{noformat}
            h2. Setup

            Jenkins v2.89.4 LTS
             Pipeline API Plugin: 2.26
             Pipeline Nodes and Processes Plugin: 2.19
             Durable Task Plugin: 1.18
             Pipeline Job Plugin: 2.17
             Pipeline Shared Groovy Libraries Plugin: 2.9
             Pipeline Supporting APIs Plugin: 2.18
             Script Security Plugin: 1.41

            Pipeline Default Speed/Durability Level: Performance-Optimized
             "Do not allow the pipeline to resume if the master restarts": Enabled on all jobs
            h2. Problem

            I logged into a Jenkins master and saw no builds running but there was a queue of about 10 jobs. When mousing over the queued jobs, I saw "pending - Already running 2 builds across all nodes". This is strange because no jobs were showing as running and no Jenkins agents or executors were showing any running builds.

            I then ran "http://xx.xxx.xxx.xxx:8080/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds" which did show 5 builds were running. I checked these builds and they were red (failure) and were not running.
            h2. Research

            I checked the console log of a build that showed as running but isn't and saw the line below near the top of the log.

            _*Resume disabled by user, switching to high-performance, low-durability mode.*_

            At the end of the of the log I saw the following:

            *_Finished: FAILURE_*
             *_Resuming build at Tue Mar 13 23:04:52 UTC 2018 after Jenkins restart_*
            h2. Why Resume Build?

            The build failed on *Mar 12, 2018 6:02:37 PM*.
             +Why did the build try to resume almost a day later+? The job and system durability are configured to not resume builds. Below are some details taken from the API for the build.

            _class "hudson.model.OneOffExecutor"
             id "41"
             keepLog false
             number 41
             queueId 7178
             result "FAILURE"
             timestamp 1520877757466

            I checked the Java process on the server and it was last restarted on *March 02 2018*.

            +What triggered the "Jenkins restart" identified on Mar 13 23:04:52 UTC 2018 since the Java process was not restarted?+

            +Why does this get the build stuck in a "running" state when it's not running?+
            h2. Scope

            This issue can be seen across many of our Jenkins masters. In each case we see "Resuming build at xxxxx after Jenkins restart" occur a few days after the build failure or abort even though Java was not restarted. This issue didn't occur on Jenkins 2.60.3 running the older (pre-durability configurable) Pipeline plugins.
            h2. Logs

            I checked the jenkins.log file and saw the following when the build was attempting to be resumed.

             
            {code:java}
            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/42/

             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/40/

             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME2/41/

             Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
             
            h2. Thread Dump

            I only saw WorkflowRun.copyLogs with workflow in the name in a thread dump so I'm not sure if it is related. I didn't see anything BLOCKING. A lot of different items were WAITING on "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject".
            {code:java}
            "WorkflowRun.copyLogs [#3]" Id=8829 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            "WorkflowRun.copyLogs [#4]" Id=8830 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
            h2. Workaround

            I was told that running the Jenkins script below will set the running builds to aborted to work around the issue.
            {noformat}
            Jenkins.instance.getItemByFullName("JOBNAME").getBuildByNumber(JOB#).finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));{noformat}
            mkozell Mike Kozell made changes -
            Description h2. Setup

            Jenkins v2.89.4 LTS
             Pipeline API Plugin: 2.26
             Pipeline Nodes and Processes Plugin: 2.19
             Durable Task Plugin: 1.18
             Pipeline Job Plugin: 2.17
             Pipeline Shared Groovy Libraries Plugin: 2.9
             Pipeline Supporting APIs Plugin: 2.18
             Script Security Plugin: 1.41

            Pipeline Default Speed/Durability Level: Performance-Optimized
             "Do not allow the pipeline to resume if the master restarts": Enabled on all jobs
            h2. Problem

            I logged into a Jenkins master and saw no builds running but there was a queue of about 10 jobs. When mousing over the queued jobs, I saw "pending - Already running 2 builds across all nodes". This is strange because no jobs were showing as running and no Jenkins agents or executors were showing any running builds.

            I then ran "http://xx.xxx.xxx.xxx:8080/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds" which did show 5 builds were running. I checked these builds and they were red (failure) and were not running.
            h2. Research

            I checked the console log of a build that showed as running but isn't and saw the line below near the top of the log.

            _*Resume disabled by user, switching to high-performance, low-durability mode.*_

            At the end of the of the log I saw the following:

            *_Finished: FAILURE_*
             *_Resuming build at Tue Mar 13 23:04:52 UTC 2018 after Jenkins restart_*
            h2. Why Resume Build?

            The build failed on *Mar 12, 2018 6:02:37 PM*.
             +Why did the build try to resume almost a day later+? The job and system durability are configured to not resume builds. Below are some details taken from the API for the build.

            _class "hudson.model.OneOffExecutor"
             id "41"
             keepLog false
             number 41
             queueId 7178
             result "FAILURE"
             timestamp 1520877757466

            I checked the Java process on the server and it was last restarted on *March 02 2018*.

            +What triggered the "Jenkins restart" identified on Mar 13 23:04:52 UTC 2018 since the Java process was not restarted?+

            +Why does this get the build stuck in a "running" state when it's not running?+
            h2. Scope

            This issue can be seen across many of our Jenkins masters. In each case we see "Resuming build at xxxxx after Jenkins restart" occur a few days after the build failure or abort even though Java was not restarted. This issue didn't occur on Jenkins 2.60.3 running the older (pre-durability configurable) Pipeline plugins.
            h2. Logs

            I checked the jenkins.log file and saw the following when the build was attempting to be resumed.

             
            {code:java}
            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/42/

             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/40/

             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME2/41/

             Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
             
            h2. Thread Dump

            I only saw WorkflowRun.copyLogs with workflow in the name in a thread dump so I'm not sure if it is related. I didn't see anything BLOCKING. A lot of different items were WAITING on "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject".
            {code:java}
            "WorkflowRun.copyLogs [#3]" Id=8829 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            "WorkflowRun.copyLogs [#4]" Id=8830 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
            h2. Workaround

            I was told that running the Jenkins script below will set the running builds to aborted to work around the issue.
            {noformat}
            Jenkins.instance.getItemByFullName("JOBNAME").getBuildByNumber(JOB#).finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));{noformat}
            h2. Setup

            Jenkins v2.89.4 LTS
             Pipeline API Plugin: 2.26
             Pipeline Nodes and Processes Plugin: 2.19
             Durable Task Plugin: 1.18
             Pipeline Job Plugin: 2.17
             Pipeline Shared Groovy Libraries Plugin: 2.9
             Pipeline Supporting APIs Plugin: 2.18
             Pipeline Groovy Plugin: 2.45
             Script Security Plugin: 1.41

            Pipeline Default Speed/Durability Level: Performance-Optimized
             "Do not allow the pipeline to resume if the master restarts": Enabled on all jobs
            h2. Problem

            I logged into a Jenkins master and saw no builds running but there was a queue of about 10 jobs. When mousing over the queued jobs, I saw "pending - Already running 2 builds across all nodes". This is strange because no jobs were showing as running and no Jenkins agents or executors were showing any running builds.

            I then ran "http://xx.xxx.xxx.xxx:8080/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds" which did show 5 builds were running. I checked these builds and they were red (failure) and were not running.
            h2. Research

            I checked the console log of a build that showed as running but isn't and saw the line below near the top of the log.

            _*Resume disabled by user, switching to high-performance, low-durability mode.*_

            At the end of the of the log I saw the following:

            *_Finished: FAILURE_*
             *_Resuming build at Tue Mar 13 23:04:52 UTC 2018 after Jenkins restart_*
            h2. Why Resume Build?

            The build failed on *Mar 12, 2018 6:02:37 PM*.
             +Why did the build try to resume almost a day later+? The job and system durability are configured to not resume builds. Below are some details taken from the API for the build.

            _class "hudson.model.OneOffExecutor"
             id "41"
             keepLog false
             number 41
             queueId 7178
             result "FAILURE"
             timestamp 1520877757466

            I checked the Java process on the server and it was last restarted on *March 02 2018*.

            +What triggered the "Jenkins restart" identified on Mar 13 23:04:52 UTC 2018 since the Java process was not restarted?+

            +Why does this get the build stuck in a "running" state when it's not running?+
            h2. Scope

            This issue can be seen across many of our Jenkins masters. In each case we see "Resuming build at xxxxx after Jenkins restart" occur a few days after the build failure or abort even though Java was not restarted. This issue didn't occur on Jenkins 2.60.3 running the older (pre-durability configurable) Pipeline plugins.
            h2. Logs

            I checked the jenkins.log file and saw the following when the build was attempting to be resumed.

             
            {code:java}
            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/42/

             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/40/

             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME2/41/

             Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
             
            h2. Thread Dump

            I only saw WorkflowRun.copyLogs with workflow in the name in a thread dump so I'm not sure if it is related. I didn't see anything BLOCKING. A lot of different items were WAITING on "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject".
            {code:java}
            "WorkflowRun.copyLogs [#3]" Id=8829 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            "WorkflowRun.copyLogs [#4]" Id=8830 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
            h2. Workaround

            I was told that running the Jenkins script below will set the running builds to aborted to work around the issue.
            {noformat}
            Jenkins.instance.getItemByFullName("JOBNAME").getBuildByNumber(JOB#).finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));{noformat}
            mkozell Mike Kozell made changes -
            Description h2. Setup

            Jenkins v2.89.4 LTS
             Pipeline API Plugin: 2.26
             Pipeline Nodes and Processes Plugin: 2.19
             Durable Task Plugin: 1.18
             Pipeline Job Plugin: 2.17
             Pipeline Shared Groovy Libraries Plugin: 2.9
             Pipeline Supporting APIs Plugin: 2.18
             Pipeline Groovy Plugin: 2.45
             Script Security Plugin: 1.41

            Pipeline Default Speed/Durability Level: Performance-Optimized
             "Do not allow the pipeline to resume if the master restarts": Enabled on all jobs
            h2. Problem

            I logged into a Jenkins master and saw no builds running but there was a queue of about 10 jobs. When mousing over the queued jobs, I saw "pending - Already running 2 builds across all nodes". This is strange because no jobs were showing as running and no Jenkins agents or executors were showing any running builds.

            I then ran "http://xx.xxx.xxx.xxx:8080/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds" which did show 5 builds were running. I checked these builds and they were red (failure) and were not running.
            h2. Research

            I checked the console log of a build that showed as running but isn't and saw the line below near the top of the log.

            _*Resume disabled by user, switching to high-performance, low-durability mode.*_

            At the end of the of the log I saw the following:

            *_Finished: FAILURE_*
             *_Resuming build at Tue Mar 13 23:04:52 UTC 2018 after Jenkins restart_*
            h2. Why Resume Build?

            The build failed on *Mar 12, 2018 6:02:37 PM*.
             +Why did the build try to resume almost a day later+? The job and system durability are configured to not resume builds. Below are some details taken from the API for the build.

            _class "hudson.model.OneOffExecutor"
             id "41"
             keepLog false
             number 41
             queueId 7178
             result "FAILURE"
             timestamp 1520877757466

            I checked the Java process on the server and it was last restarted on *March 02 2018*.

            +What triggered the "Jenkins restart" identified on Mar 13 23:04:52 UTC 2018 since the Java process was not restarted?+

            +Why does this get the build stuck in a "running" state when it's not running?+
            h2. Scope

            This issue can be seen across many of our Jenkins masters. In each case we see "Resuming build at xxxxx after Jenkins restart" occur a few days after the build failure or abort even though Java was not restarted. This issue didn't occur on Jenkins 2.60.3 running the older (pre-durability configurable) Pipeline plugins.
            h2. Logs

            I checked the jenkins.log file and saw the following when the build was attempting to be resumed.

             
            {code:java}
            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/42/

             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/40/

             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME2/41/

             Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
             
            h2. Thread Dump

            I only saw WorkflowRun.copyLogs with workflow in the name in a thread dump so I'm not sure if it is related. I didn't see anything BLOCKING. A lot of different items were WAITING on "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject".
            {code:java}
            "WorkflowRun.copyLogs [#3]" Id=8829 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            "WorkflowRun.copyLogs [#4]" Id=8830 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
            h2. Workaround

            I was told that running the Jenkins script below will set the running builds to aborted to work around the issue.
            {noformat}
            Jenkins.instance.getItemByFullName("JOBNAME").getBuildByNumber(JOB#).finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));{noformat}
            h2. Setup

            Jenkins v2.89.4 LTS
             Pipeline API Plugin: 2.26
             Pipeline Nodes and Processes Plugin: 2.19
             Durable Task Plugin: 1.18
             Pipeline Job Plugin: 2.17
             Pipeline Shared Groovy Libraries Plugin: 2.9
             Pipeline Supporting APIs Plugin: 2.18
             Pipeline Groovy Plugin: 2.45
             Script Security Plugin: 1.41

            Pipeline Default Speed/Durability Level: Performance-Optimized
             "Do not allow the pipeline to resume if the master restarts": Enabled on all jobs
            h2. Problem

            I logged into a Jenkins master and saw no builds running but there was a queue of about 10 jobs. When mousing over the queued jobs, I saw "pending - Already running 2 builds across all nodes". This is strange because no jobs were showing as running and no Jenkins agents or executors were showing any running builds.

            I then ran "http://xx.xxx.xxx.xxx:8080/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds" which did show 5 builds were running. I checked these builds and they were red (failure) and were not running.
            h2. Research

            I checked the console log of a build that showed as running but isn't and saw the line below near the top of the log.

            _*Resume disabled by user, switching to high-performance, low-durability mode.*_

            At the end of the of the log I saw the following:

            *_Finished: FAILURE_*
             *_Resuming build at Tue Mar 13 23:04:52 UTC 2018 after Jenkins restart_*
            h2. Why Resume Build?

            The build failed on *Mar 12, 2018 6:02:37 PM*.
             +Why did the build try to resume almost a day later+? The job and system durability are configured to not resume builds. Below are some details taken from the API for the build.

            _class "hudson.model.OneOffExecutor"
             id "41"
             keepLog false
             number 41
             queueId 7178
             result "FAILURE"
             timestamp 1520877757466

            I checked the Java process on the server and it was last restarted on *March 02 2018*.

            +What triggered the "Jenkins restart" identified on Mar 13 23:04:52 UTC 2018 since the Java process was not restarted?+

            +Why does this get the build stuck in a "running" state when it's not running?+
            h2. Scope

            This issue can be seen across many of our Jenkins masters. In each case we see "Resuming build at xxxxx after Jenkins restart" occur a few days after the build failure or abort even though Java was not restarted. This issue didn't occur on Jenkins 2.60.3 running the older (pre-durability configurable) Pipeline plugins.
            h2. Logs

            I checked the jenkins.log file and saw the following when the build was attempting to be resumed.

             
            {code:java}
            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/42/

             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/40/

             Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
             WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME2/41/

             Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
             WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
             java.lang.NullPointerException
             at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
             at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
             
            h2. Thread Dump

            I only saw WorkflowRun.copyLogs with workflow in the name in a thread dump so I'm not sure if it is related. I didn't see anything BLOCKING. A lot of different items were WAITING on "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject".
            {code:java}
            "WorkflowRun.copyLogs [#3]" Id=8829 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)

            "WorkflowRun.copyLogs [#4]" Id=8830 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at sun.misc.Unsafe.park(Native Method)
             - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
             at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
             at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
             at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
             at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
            {code}
            h2. Workaround

            Running the Jenkins script below will set the running build to aborted to work around the issue.
            {noformat}
            Jenkins.instance.getItemByFullName("JOBNAME").getBuildByNumber(JOB#).finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));{noformat}
            abayer Andrew Bayer made changes -
            Assignee Sam Van Oort [ svanoort ]
            svanoort Sam Van Oort made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            Hide
            svanoort Sam Van Oort added a comment -

            Mike Kozell The NPEs you see look like a duplicate of https://issues.jenkins-ci.org/browse/JENKINS-49686 which I am investigating right now.

            May I please request the following diagnostic information:

            1. Please can you attach both the config.xml for the job and a build.xml for one or more Pipelines that experienced this issue.
            2. In the build's 'workflow' directory, there's a flownodeStore.xml file - please can you GZIP it and attach it.
            3. Can I request the last part of the build log for these builds (the final parts indicating build completed and anything after that).
            4. Are all the Pipelines experiencing issues making use of parallels?
            5. Do you have any fairly simple pipelines that will reproduce the issues, which you can provide code for?

            There's also fix in workflow-cps plugin 2.46 for some issues with Resume Disabled - without that fix there may be odd behaviors if the Resume Disable flag is modified after the build begins. If this has happened, the fix there will help and after gathering the above information, I would suggest installing this update and seeing if issues recur when Resume Disabled in on.

            If that does not resolve the resume issue, it may be necessary to toggle this option OFF for the meantime:

            Do not allow the pipeline to resume if the master restarts": Enabled on all jobs

            We hope to have a hotfix available soon to at least try out.

            Thank you

            Show
            svanoort Sam Van Oort added a comment - Mike Kozell The NPEs you see look like a duplicate of https://issues.jenkins-ci.org/browse/JENKINS-49686 which I am investigating right now. May I please request the following diagnostic information: 1. Please can you attach both the config.xml for the job and a build.xml for one or more Pipelines that experienced this issue. 2. In the build's 'workflow' directory, there's a flownodeStore.xml file - please can you GZIP it and attach it. 3. Can I request the last part of the build log for these builds (the final parts indicating build completed and anything after that). 4. Are all the Pipelines experiencing issues making use of parallels? 5. Do you have any fairly simple pipelines that will reproduce the issues, which you can provide code for? There's also fix in workflow-cps plugin 2.46 for some issues with Resume Disabled - without that fix there may be odd behaviors if the Resume Disable flag is modified after the build begins. If this has happened, the fix there will help and after gathering the above information, I would suggest installing this update and seeing if issues recur when Resume Disabled in on. If that does not resolve the resume issue, it may be necessary to toggle this option OFF for the meantime: Do not allow the pipeline to resume if the master restarts": Enabled on all jobs We hope to have a hotfix available soon to at least try out. Thank you
            mkozell Mike Kozell made changes -
            Attachment build.log [ 41863 ]
            mkozell Mike Kozell made changes -
            Attachment build.xml [ 41864 ]
            mkozell Mike Kozell made changes -
            Attachment flowNodeStore.xml [ 41865 ]
            mkozell Mike Kozell made changes -
            Hide
            svanoort Sam Van Oort added a comment -

            Thanks Mike Kozell that data is indeed very useful and solidly confirms my working theory (which I will explain in a moment). Before I explain (since it'll take a bit to write up and I'm hoping you're still around), please could you do one more analysis:

            1. Do you see any IOExceptions in the jenkins log or build log for these builds
            2. Please can you do a search for "FlowStartNode" and "FlowEndNode" in both flowNodeStore.xml files and report the id and any error shown for that element?

            Thanks!

            Show
            svanoort Sam Van Oort added a comment - Thanks Mike Kozell that data is indeed very useful and solidly confirms my working theory (which I will explain in a moment). Before I explain (since it'll take a bit to write up and I'm hoping you're still around), please could you do one more analysis: 1. Do you see any IOExceptions in the jenkins log or build log for these builds 2. Please can you do a search for "FlowStartNode" and "FlowEndNode" in both flowNodeStore.xml files and report the id and any error shown for that element? Thanks!
            Hide
            svanoort Sam Van Oort added a comment -

            Mike Kozell Also, do you see anything anything in a directory "workflow-fallback" (perhaps with a flowNodeStore.xml)?

            Show
            svanoort Sam Van Oort added a comment - Mike Kozell Also, do you see anything anything in a directory "workflow-fallback" (perhaps with a flowNodeStore.xml)?
            Hide
            svanoort Sam Van Oort added a comment -

            Ahh, nevermind, I missed that in the listing. That confirms another part of the theory.

            Show
            svanoort Sam Van Oort added a comment - Ahh, nevermind, I missed that in the listing. That confirms another part of the theory.
            Hide
            mkozell Mike Kozell added a comment -

            1. I didn't see any IOExceptions in the jenkins.log file. There is one is the workflow-fallback/flowNodeStore.xml file which is included in this ticket. I did get an IOException after I aborted the build with the work around.

            java.io.IOException: Aborting build
                    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
                    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
                    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
                    at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:83)
                    at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.callConstructor(ConstructorSite.java:105)
                    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60)
                    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235)
                    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:247)
                    at Script1.run(Script1.groovy:1)
                    at groovy.lang.GroovyShell.evaluate(GroovyShell.java:585)
                    at groovy.lang.GroovyShell.evaluate(GroovyShell.java:623)
                    at groovy.lang.GroovyShell.evaluate(GroovyShell.java:594)
                    at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:142)
                    at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114)
                    at hudson.remoting.LocalChannel.call(LocalChannel.java:45)
                    at hudson.util.RemotingDiagnostics.executeGroovy(RemotingDiagnostics.java:111)
                    at jenkins.model.Jenkins._doScript(Jenkins.java:4360)
                    at jenkins.model.Jenkins.doScript(Jenkins.java:4331)
                    at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
                    at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343)
                    at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184)
                    at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117)
                    at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129)
                    at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
                    at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:715)
                    at org.kohsuke.stapler.Stapler.invoke(Stapler.java:845)
                    at org.kohsuke.stapler.Stapler.invoke(Stapler.java:649)
                    at org.kohsuke.stapler.Stapler.service(Stapler.java:238)
                    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
                    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:841)
                    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
                    at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154)
                    at com.smartcodeltd.jenkinsci.plugin.assetbundler.filters.LessCSS.doFilter(LessCSS.java:47)
                    at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
                    at hudson.plugins.greenballs.GreenBallFilter.doFilter(GreenBallFilter.java:59)
                    at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
                    at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157)
                    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
                    at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:64)
                    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
                    at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84)
                    at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51)
                    at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
                    at jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117)
                    at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
                    at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125)
                    at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
                    at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142)
                    at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
                    at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271)
                    at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
                    at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93)
                    at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
                    at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249)
                    at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67)
                    at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
                    at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90)
                    at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171)
                    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
                    at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49)
                    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
                    at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82)
                    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
                    at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30)
                    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
                    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
                    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
                    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
                    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
                    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
                    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
                    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
                    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
                    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
                    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
                    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
                    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
                    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
                    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
                    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
                    at org.eclipse.jetty.server.Server.handle(Server.java:564)
                    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:317)
                    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
                    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
                    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)
                    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
                    at org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:128)
                    at org.eclipse.jetty.util.thread.Invocable$InvocableExecutor.invoke(Invocable.java:222)
                    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:294)
                    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:199)
                    at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                    at java.lang.Thread.run(Thread.java:748)
            Finished: ABORTED
            
            
            Show
            mkozell Mike Kozell added a comment - 1. I didn't see any IOExceptions in the jenkins.log file. There is one is the workflow-fallback/flowNodeStore.xml file which is included in this ticket. I did get an IOException after I aborted the build with the work around. java.io.IOException: Aborting build at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:83) at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.callConstructor(ConstructorSite.java:105) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:247) at Script1.run(Script1.groovy:1) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:585) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:623) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:594) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:142) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114) at hudson.remoting.LocalChannel.call(LocalChannel.java:45) at hudson.util.RemotingDiagnostics.executeGroovy(RemotingDiagnostics.java:111) at jenkins.model.Jenkins._doScript(Jenkins.java:4360) at jenkins.model.Jenkins.doScript(Jenkins.java:4331) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627) at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343) at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184) at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117) at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:715) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:845) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:649) at org.kohsuke.stapler.Stapler.service(Stapler.java:238) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:841) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154) at com.smartcodeltd.jenkinsci.plugin.assetbundler.filters.LessCSS.doFilter(LessCSS.java:47) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at hudson.plugins.greenballs.GreenBallFilter.doFilter(GreenBallFilter.java:59) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:64) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84) at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249) at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90) at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:564) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:317) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124) at org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:128) at org.eclipse.jetty.util.thread.Invocable$InvocableExecutor.invoke(Invocable.java:222) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:294) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:199) at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Finished: ABORTED
            Hide
            svanoort Sam Van Oort added a comment -

            So, here's the things I can put together:

            1. Flow completed successfully initially. Speculation: somehow there is invalid state in the list of persisted 'heads' (final FlowNodes) for the build with a null head stored, or that FlowNode wasn't persisted into storage for $reasons and cannot be loaded.
            2. Because the flow ended in a way that does not evaluate as 'isComplete' or state was not persisted such that it would show as complete – this would happen if either the number of 'head' (most recent) nodes for the flow is >1 or the final 'head' FlowNode is NOT the FlowEndNode as expected (could be null) – this means the Pipeline will show as incomplete and eligible to try to resume (may be blocked if resume is disabled explicitly or fails).
            3. Eventually the build is cleared from the SoftReference cache of builds... and one of the processes that iterates through builds is triggered a couple days later (there's a variety of things that can do this, but it's normal)
            4. FlowNode Storage is initialized – if it can't load one of the heads OR can't load one of the startNodes (starts of a block), then it will see the null is a problem, and switch to fallback storage (to avoid overwriting original data which might be recoverable), and try to fake up a dummy set of startNodes/heads/etc.
            5. This dummy storage will NOT show as complete, so it'll try to resume, sees that the build was marked "can't resume" and fails with the IOException observed.
            6. This failure will invoke onProgramEnd and somehow onNewHead runs with the residual FlowNode that isn't in this storage, triggering NPE – also this triggers creation of the secondary fallbock storage ("workflow-fallback/flownodeStore.xml"). Conclusion: something must be wrong internally here, but it's not clear where precisely yet.
            7. The Process from 5 triggers creation of a new FlowEndNode written to the FlowNodeStorage (observed in your workflow-fallback flowNodeStore.xml x2). But yet somehow this isn't persisted as the head for the Execution and is once again marked as incomplete and eligible to be reloaded.
            8. Because the previous does not work correctly, it can happen again (seen 2x in this cycle)

            Show
            svanoort Sam Van Oort added a comment - So, here's the things I can put together: 1. Flow completed successfully initially. Speculation: somehow there is invalid state in the list of persisted 'heads' (final FlowNodes) for the build with a null head stored, or that FlowNode wasn't persisted into storage for $reasons and cannot be loaded. 2. Because the flow ended in a way that does not evaluate as 'isComplete' or state was not persisted such that it would show as complete – this would happen if either the number of 'head' (most recent) nodes for the flow is >1 or the final 'head' FlowNode is NOT the FlowEndNode as expected (could be null) – this means the Pipeline will show as incomplete and eligible to try to resume (may be blocked if resume is disabled explicitly or fails). 3. Eventually the build is cleared from the SoftReference cache of builds... and one of the processes that iterates through builds is triggered a couple days later (there's a variety of things that can do this, but it's normal) 4. FlowNode Storage is initialized – if it can't load one of the heads OR can't load one of the startNodes (starts of a block), then it will see the null is a problem, and switch to fallback storage (to avoid overwriting original data which might be recoverable), and try to fake up a dummy set of startNodes/heads/etc. 5. This dummy storage will NOT show as complete, so it'll try to resume, sees that the build was marked "can't resume" and fails with the IOException observed. 6. This failure will invoke onProgramEnd and somehow onNewHead runs with the residual FlowNode that isn't in this storage, triggering NPE – also this triggers creation of the secondary fallbock storage ("workflow-fallback/flownodeStore.xml"). Conclusion: something must be wrong internally here, but it's not clear where precisely yet. 7. The Process from 5 triggers creation of a new FlowEndNode written to the FlowNodeStorage (observed in your workflow-fallback flowNodeStore.xml x2). But yet somehow this isn't persisted as the head for the Execution and is once again marked as incomplete and eligible to be reloaded. 8. Because the previous does not work correctly, it can happen again (seen 2x in this cycle)
            Hide
            svanoort Sam Van Oort added a comment -

            Mike Kozell No, I think we have enough data, thank you – I've got a potential fix that also adds additional diagnostic information almost ready to go. Just working through some test failures and then I can hand over a binary and PR to try out on a test instance.

            Show
            svanoort Sam Van Oort added a comment - Mike Kozell No, I think we have enough data, thank you – I've got a potential fix that also adds additional diagnostic information almost ready to go. Just working through some test failures and then I can hand over a binary and PR to try out on a test instance.
            Hide
            svanoort Sam Van Oort added a comment -

            Mike Kozell We now have the root cause for https://issues.jenkins-ci.org/browse/JENKINS-49686 identified, which accounts for the final missing piece, the NPE itself. The root cause is described in the comments there, and the fix should resolve most of the remaining issues with the partial fix preceding it and create a comprehensive solution to both parts of the problem.

            Show
            svanoort Sam Van Oort added a comment - Mike Kozell We now have the root cause for https://issues.jenkins-ci.org/browse/JENKINS-49686 identified, which accounts for the final missing piece, the NPE itself. The root cause is described in the comments there, and the fix should resolve most of the remaining issues with the partial fix preceding it and create a comprehensive solution to both parts of the problem.
            Hide
            ppaczyn Piotr Paczyński added a comment -

            We're also affected by this issue and seeing similar NPE errors. I can provide more details if needed.

            Show
            ppaczyn Piotr Paczyński added a comment - We're also affected by this issue and seeing similar NPE errors. I can provide more details if needed.
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-50407 [ JENKINS-50407 ]
            svanoort Sam Van Oort made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            svanoort Sam Van Oort made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            Hide
            svanoort Sam Van Oort added a comment -

            Piotr Paczyński Mike Kozell Would you by any chance be set up with a non-critical / test master and interested in trying out some SNAPSHOT builds of the proposed fixes? I think what we have now is more or less ready to go and would love to see how it plays for you guys in the wild.

            Show
            svanoort Sam Van Oort added a comment - Piotr Paczyński Mike Kozell Would you by any chance be set up with a non-critical / test master and interested in trying out some SNAPSHOT builds of the proposed fixes? I think what we have now is more or less ready to go and would love to see how it plays for you guys in the wild.
            Hide
            mkozell Mike Kozell added a comment -

            I have a dedicated test Jenkins master I can use for testing.  I will just need to know where I can get the build and how to enable it.

            Show
            mkozell Mike Kozell added a comment - I have a dedicated test Jenkins master I can use for testing.  I will just need to know where I can get the build and how to enable it.
            svanoort Sam Van Oort made changes -
            Attachment workflow-cps.hpi [ 42074 ]
            svanoort Sam Van Oort made changes -
            Attachment workflow-job.hpi [ 42075 ]
            Hide
            svanoort Sam Van Oort added a comment -

            Mike Kozell I've attached plugin builds for workflow-cps and workflow-job plugins that should resolve the issues for you – and fixes a whole bunch of other related problems around the same logic (plus adds an optimization and some serious extra bulletproofing)-- please could you give them a try and let me know?

            workflow-cps.hpi
            workflow-job.hpi

            Or if you prefer to build them yourself, the PRs are here:
            https://github.com/jenkinsci/workflow-cps-plugin/pull/216
            https://github.com/jenkinsci/workflow-job-plugin/pull/93

            Now, there's one remaining test failure I can sometimes trigger with my homemade durability-fuzzing tool with Maximum Durability mode – I'm not clear if it's an issue with the test hardness or not, but if you do see a case where that results in zero flow nodes in the build directory, please let me know. I'm not comfortable releasing until I have it addressed + reviewed though (hopefully in the next day or two).

            Logs will show something like this (which is OK and sometimes expected in performance-optimized mode but NOT maximum durability):

            0.150 [id=1636] WARNING o.j.p.w.cps.CpsFlowExecution#initializeStorage: Tried to load head FlowNodes for execution OwnerNestedParallelDurableJob/6:NestedParallelDurableJob #6 but FlowNode was not found in storage for head id:FlowNodeId 1:25
            0.151 [id=1636] WARNING o.j.p.w.cps.CpsFlowExecution#rebuildEmptyGraph: Failed to load pipeline heads, so faking some up for execution CpsFlowExecutionOwner[NestedParallelDurableJob/6:NestedParallelDurableJob #6]
            0.161 [id=1636] WARNING o.j.p.w.cps.CpsFlowExecution#onLoad: Completed flow without FlowEndNode: CpsFlowExecutionOwner[NestedParallelDurableJob/6:NestedParallelDurableJob #6] heads:26::26:org.jenkinsci.plugins.workflow.graph.FlowStartNode[id=27]

            Thank you!

            Show
            svanoort Sam Van Oort added a comment - Mike Kozell I've attached plugin builds for workflow-cps and workflow-job plugins that should resolve the issues for you – and fixes a whole bunch of other related problems around the same logic (plus adds an optimization and some serious extra bulletproofing)-- please could you give them a try and let me know? workflow-cps.hpi workflow-job.hpi Or if you prefer to build them yourself, the PRs are here: https://github.com/jenkinsci/workflow-cps-plugin/pull/216 https://github.com/jenkinsci/workflow-job-plugin/pull/93 Now, there's one remaining test failure I can sometimes trigger with my homemade durability-fuzzing tool with Maximum Durability mode – I'm not clear if it's an issue with the test hardness or not, but if you do see a case where that results in zero flow nodes in the build directory, please let me know. I'm not comfortable releasing until I have it addressed + reviewed though (hopefully in the next day or two). Logs will show something like this (which is OK and sometimes expected in performance-optimized mode but NOT maximum durability): 0.150 [id=1636] WARNING o.j.p.w.cps.CpsFlowExecution#initializeStorage: Tried to load head FlowNodes for execution Owner NestedParallelDurableJob/6:NestedParallelDurableJob #6 but FlowNode was not found in storage for head id:FlowNodeId 1:25 0.151 [id=1636] WARNING o.j.p.w.cps.CpsFlowExecution#rebuildEmptyGraph: Failed to load pipeline heads, so faking some up for execution CpsFlowExecution Owner[NestedParallelDurableJob/6:NestedParallelDurableJob #6] 0.161 [id=1636] WARNING o.j.p.w.cps.CpsFlowExecution#onLoad: Completed flow without FlowEndNode: CpsFlowExecution Owner[NestedParallelDurableJob/6:NestedParallelDurableJob #6] heads:26::26:org.jenkinsci.plugins.workflow.graph.FlowStartNode [id=27] Thank you!
            svanoort Sam Van Oort made changes -
            Attachment workflow-job.hpi [ 42088 ]
            svanoort Sam Van Oort made changes -
            Attachment workflow-cps.hpi [ 42089 ]
            Hide
            svanoort Sam Van Oort added a comment -

            Mike Kozell I'm attaching updated SNAPSHOTs - these I consider finalized and release-ready, and include a resolution of the issue above.

            workflow-job.hpi
            workflow-cps.hpi

            Please can you give these a try?

            Show
            svanoort Sam Van Oort added a comment - Mike Kozell I'm attaching updated SNAPSHOTs - these I consider finalized and release-ready, and include a resolution of the issue above. workflow-job.hpi workflow-cps.hpi Please can you give these a try?
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Sam Van Oort
            Path:
            pom.xml
            src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java
            src/test/java/org/jenkinsci/plugins/workflow/job/WorkflowRunRestartTest.java
            src/test/java/org/jenkinsci/plugins/workflow/job/WorkflowRunTest.java
            http://jenkins-ci.org/commit/workflow-job-plugin/f0c26058f31d4f159a82a3cace52935e93f20701
            Log:
            Merge pull request #93 from svanoort/fix-resume-issues

            Fix resume issues JENKINS-49686 and JENKINS-50199 and JENKINS-50407

            Compare: https://github.com/jenkinsci/workflow-job-plugin/compare/e11cea623f61...f0c26058f31d

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Sam Van Oort Path: pom.xml src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java src/test/java/org/jenkinsci/plugins/workflow/job/WorkflowRunRestartTest.java src/test/java/org/jenkinsci/plugins/workflow/job/WorkflowRunTest.java http://jenkins-ci.org/commit/workflow-job-plugin/f0c26058f31d4f159a82a3cace52935e93f20701 Log: Merge pull request #93 from svanoort/fix-resume-issues Fix resume issues JENKINS-49686 and JENKINS-50199 and JENKINS-50407 Compare: https://github.com/jenkinsci/workflow-job-plugin/compare/e11cea623f61...f0c26058f31d
            svanoort Sam Van Oort made changes -
            Component/s workflow-job-plugin [ 21716 ]
            Component/s pipeline [ 21692 ]
            Component/s workflow-api-plugin [ 21711 ]
            Hide
            svanoort Sam Van Oort added a comment -

            Resolved as of workflow-cps 2.47 and workflow-job 2.18

            Show
            svanoort Sam Van Oort added a comment - Resolved as of workflow-cps 2.47 and workflow-job 2.18
            svanoort Sam Van Oort made changes -
            Resolution Fixed [ 1 ]
            Status In Review [ 10005 ] Closed [ 6 ]
            mkozell Mike Kozell made changes -
            Comment [  

              ]
            svanoort Sam Van Oort made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Reopened [ 4 ]
            Hide
            svanoort Sam Van Oort added a comment -

            Mike Kozell Thanks for providing a reproduction. To update, I'm working on this still – a bit of a tricky one, but having a clear case for what triggers this is very helpful.

            Show
            svanoort Sam Van Oort added a comment - Mike Kozell Thanks for providing a reproduction. To update, I'm working on this still – a bit of a tricky one, but having a clear case for what triggers this is very helpful.
            Hide
            gattie Matt Gaspar added a comment -

            I'm seeing some similar behavior on a few masters running 2.89.3

             

            workflow-cps: 2.48

            workflow-job: 2.19

             

            It appears to even try and resume on a build that completed successfully previously after any restart of jenkins:

            Job Console snippet

            GitHub has been notified of this commit’s build result
            
            Finished: SUCCESS
            Resuming build at Wed Apr 18 03:45:16 MST 2018 after Jenkins restart
            Resuming build at Fri Apr 20 01:21:23 MST 2018 after Jenkins restart
            Resuming build at Fri Apr 20 01:52:35 MST 2018 after Jenkins restart
            Resuming build at Fri Apr 20 11:53:07 MST 2018 after Jenkins restart
            Resuming build at Fri Apr 20 12:13:33 MST 2018 after Jenkins restart

            jenkins.log

            Apr 20, 2018 12:18:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6 onFailure
            WARNING: Failed to interrupt steps in Owner[JOB_NAME1:JOB_NAME #1]
            java.lang.IllegalStateException: completed or broken execution
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:741)
            at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:821)
            at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:694)
            at hudson.model.RunMap.retrieve(RunMap.java:225)
            at hudson.model.RunMap.retrieve(RunMap.java:57)
            at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:500)
            at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:482)
            at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:380)
            at jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:345)
            at jenkins.model.lazy.AbstractLazyLoadRunMap.newestBuild(AbstractLazyLoadRunMap.java:275)
            at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.<init>(LazyLoadRunMapEntrySet.java:65)
            at jenkins.model.lazy.LazyLoadRunMapEntrySet.iterator(LazyLoadRunMapEntrySet.java:63)
            at java.util.AbstractMap$2$1.<init>(AbstractMap.java:411)
            at java.util.AbstractMap$2.iterator(AbstractMap.java:410)
            at hudson.util.RunList.iterator(RunList.java:113)
            at com.google.common.collect.Iterables$15.apply(Iterables.java:1128)
            at com.google.common.collect.Iterables$15.apply(Iterables.java:1125)
            at com.google.common.collect.Iterators$8.next(Iterators.java:812)
            at com.google.common.collect.Iterators$MergingIterator.<init>(Iterators.java:1306)
            at com.google.common.collect.Iterators.mergeSorted(Iterators.java:1274)
            at com.google.common.collect.Iterables$14.iterator(Iterables.java:1113)
            at com.google.common.collect.Iterables$UnmodifiableIterable.iterator(Iterables.java:94)
            at hudson.util.RunList.iterator(RunList.java:113)
            at jenkins.widgets.RunListProgressiveRendering.compute(RunListProgressiveRendering.java:60)
            at jenkins.util.ProgressiveRendering$1.run(ProgressiveRendering.java:122)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)

            Hoping this information might help. If you need more examples or anything let me know.

            Show
            gattie Matt Gaspar added a comment - I'm seeing some similar behavior on a few masters running 2.89.3   workflow-cps: 2.48 workflow-job: 2.19   It appears to even try and resume on a build that completed successfully previously after any restart of jenkins: Job Console snippet GitHub has been notified of this commit’s build result Finished: SUCCESS Resuming build at Wed Apr 18 03:45:16 MST 2018 after Jenkins restart Resuming build at Fri Apr 20 01:21:23 MST 2018 after Jenkins restart Resuming build at Fri Apr 20 01:52:35 MST 2018 after Jenkins restart Resuming build at Fri Apr 20 11:53:07 MST 2018 after Jenkins restart Resuming build at Fri Apr 20 12:13:33 MST 2018 after Jenkins restart jenkins.log Apr 20, 2018 12:18:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6 onFailure WARNING: Failed to interrupt steps in Owner[JOB_NAME1:JOB_NAME #1] java.lang.IllegalStateException: completed or broken execution at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:741) at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:821) at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:694) at hudson.model.RunMap.retrieve(RunMap.java:225) at hudson.model.RunMap.retrieve(RunMap.java:57) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:500) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:482) at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:380) at jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:345) at jenkins.model.lazy.AbstractLazyLoadRunMap.newestBuild(AbstractLazyLoadRunMap.java:275) at jenkins.model.lazy.LazyLoadRunMapEntrySet$1.<init>(LazyLoadRunMapEntrySet.java:65) at jenkins.model.lazy.LazyLoadRunMapEntrySet.iterator(LazyLoadRunMapEntrySet.java:63) at java.util.AbstractMap$2$1.<init>(AbstractMap.java:411) at java.util.AbstractMap$2.iterator(AbstractMap.java:410) at hudson.util.RunList.iterator(RunList.java:113) at com.google.common.collect.Iterables$15.apply(Iterables.java:1128) at com.google.common.collect.Iterables$15.apply(Iterables.java:1125) at com.google.common.collect.Iterators$8.next(Iterators.java:812) at com.google.common.collect.Iterators$MergingIterator.<init>(Iterators.java:1306) at com.google.common.collect.Iterators.mergeSorted(Iterators.java:1274) at com.google.common.collect.Iterables$14.iterator(Iterables.java:1113) at com.google.common.collect.Iterables$UnmodifiableIterable.iterator(Iterables.java:94) at hudson.util.RunList.iterator(RunList.java:113) at jenkins.widgets.RunListProgressiveRendering.compute(RunListProgressiveRendering.java:60) at jenkins.util.ProgressiveRendering$1.run(ProgressiveRendering.java:122) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Hoping this information might help. If you need more examples or anything let me know.
            Hide
            svanoort Sam Van Oort added a comment -

            Matt Gaspar Thanks, do you have the build.xml and the flowNodeStorage.xml or the XML files in the workflow subdirectory of the build?

            I have something that feels like it's 80% of the way there to resolve this cluster of issues – it addresses most of the things noted by Mike Kozell – but it has one known issue still to resolve with shutdown-time persistence. I would like to identify why YOUR build is not showing as already-complete though – all signs point to something going wrong with loading the FlowExecution information, pointing to it looking as if it is incomplete.

            Do you have any other log messages associated with this build, also?

            Show
            svanoort Sam Van Oort added a comment - Matt Gaspar Thanks, do you have the build.xml and the flowNodeStorage.xml or the XML files in the workflow subdirectory of the build? I have something that feels like it's 80% of the way there to resolve this cluster of issues – it addresses most of the things noted by Mike Kozell – but it has one known issue still to resolve with shutdown-time persistence. I would like to identify why YOUR build is not showing as already-complete though – all signs point to something going wrong with loading the FlowExecution information, pointing to it looking as if it is incomplete. Do you have any other log messages associated with this build, also?
            gattie Matt Gaspar made changes -
            Attachment build.xml [ 42237 ]
            gattie Matt Gaspar made changes -
            Attachment workflows.xml [ 42238 ]
            Hide
            svanoort Sam Van Oort added a comment -

            Matt Gaspar Your data shows that we did indeed fail at some point in the loading of the heads/start nodes/flownodes, and switched to the fallback directory. Under your build workflow folder, please could you upload the file 40.xml through 47.xml – or any that are present, in any case? Perhaps as a single matt-flownodes.xml?

            I also see that, paradoxically, we show persistedClean = false, for the execution (it thinks that we did not cleanly save data), and completed is false too (may be a result of preceding issues and showing as the build being incomplete).

            Thanks! I think both of those issues should be covered by fixes I've done now.

            Show
            svanoort Sam Van Oort added a comment - Matt Gaspar Your data shows that we did indeed fail at some point in the loading of the heads/start nodes/flownodes, and switched to the fallback directory. Under your build workflow folder, please could you upload the file 40.xml through 47.xml – or any that are present, in any case? Perhaps as a single matt-flownodes.xml? I also see that, paradoxically, we show persistedClean = false, for the execution (it thinks that we did not cleanly save data), and completed is false too (may be a result of preceding issues and showing as the build being incomplete). Thanks! I think both of those issues should be covered by fixes I've done now.
            Hide
            gattie Matt Gaspar added a comment -

            Sam Van Oort I've attached the build.xml for one of the builds that shows SUCCESS (at least in the console) but appears to want to replay after a Jenkins restart. I also attached workflows.xml which is the concatenated files from the workflow directory of the same build.

            I haven't found any other useful logs associated with this build beyond the exception in my last comment.

            build.xml

            workflows.xml

            Show
            gattie Matt Gaspar added a comment - Sam Van Oort I've attached the build.xml for one of the builds that shows SUCCESS (at least in the console) but appears to want to replay after a Jenkins restart. I also attached workflows.xml which is the concatenated files from the workflow directory of the same build. I haven't found any other useful logs associated with this build beyond the exception in my last comment. build.xml workflows.xml
            Hide
            gattie Matt Gaspar added a comment -

            I didn't see your comment before I posted, but yeah, the workflows.xml are the files from the workflow folder for the build. There was only 45-47 available in there.

            I did see that this particular one failed despite the console output saying SUCCESS near the bottom. Regardless, this was a build that already ran and completed and shouldn't have reran after a restart of Jenkins.

            Show
            gattie Matt Gaspar added a comment - I didn't see your comment before I posted, but yeah, the workflows.xml are the files from the workflow folder for the build. There was only 45-47 available in there. I did see that this particular one failed despite the console output saying SUCCESS near the bottom. Regardless, this was a build that already ran and completed and shouldn't have reran after a restart of Jenkins.
            jbriden Jenn Briden made changes -
            Sprint Pipeline - April 2018 [ 506 ]
            jbriden Jenn Briden made changes -
            Rank Ranked higher
            svanoort Sam Van Oort made changes -
            Attachment workflow-job.hpi [ 42405 ]
            svanoort Sam Van Oort made changes -
            Attachment workflow-cps.hpi [ 42406 ]
            Hide
            svanoort Sam Van Oort added a comment -

            Matt Gaspar Mike Kozell I appreciate your patience and assistance with diagnosing this. Please could you try the attached SNAPSHOTs and see if the issue is fully resolved? workflow-job.hpi workflow-cps.hpi

            I've done several additional rounds of testing (including unit tests based on your scenarios) and fixes to try to ensure this is as robust and the fixes are as comprehensive as possible. These changes have received initial code review, and if all looks good they will be ready for full release soon – but obviously it would be helpful to have you try them out with your situations.

            Show
            svanoort Sam Van Oort added a comment - Matt Gaspar Mike Kozell I appreciate your patience and assistance with diagnosing this. Please could you try the attached SNAPSHOTs and see if the issue is fully resolved? workflow-job.hpi workflow-cps.hpi I've done several additional rounds of testing (including unit tests based on your scenarios) and fixes to try to ensure this is as robust and the fixes are as comprehensive as possible. These changes have received initial code review, and if all looks good they will be ready for full release soon – but obviously it would be helpful to have you try them out with your situations.
            svanoort Sam Van Oort made changes -
            Status Reopened [ 4 ] In Progress [ 3 ]
            svanoort Sam Van Oort made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            Hide
            svanoort Sam Van Oort added a comment -

            Mike Kozell Sure! The code is here, modulo an extra line or two of logging and a couple of small tweaks since then (nothing substantial):

            https://github.com/jenkinsci/workflow-job-plugin/pull/96

            https://github.com/jenkinsci/workflow-cps-plugin/pull/223

            Feel free to add code review too if you like, though it's just gotten a final pass by the other two people that do a lot of Pipeline core development.

            Show
            svanoort Sam Van Oort added a comment - Mike Kozell Sure! The code is here, modulo an extra line or two of logging and a couple of small tweaks since then (nothing substantial): https://github.com/jenkinsci/workflow-job-plugin/pull/96 https://github.com/jenkinsci/workflow-cps-plugin/pull/223 Feel free to add code review too if you like, though it's just gotten a final pass by the other two people that do a lot of Pipeline core development.
            Hide
            svanoort Sam Van Oort added a comment - - edited

            Mike Kozell Also I think we're clear for release of this now – but I'm aiming to delay until at least tonight EST if that'll permit enough time to land it on some real-world Jenkins masters as a sanity check.

            Show
            svanoort Sam Van Oort added a comment - - edited Mike Kozell Also I think we're clear for release of this now – but I'm aiming to delay until at least tonight EST if that'll permit enough time to land it on some real-world Jenkins masters as a sanity check.
            Hide
            svanoort Sam Van Oort added a comment -

            Mike Kozell Are you building on Windows, by any chance? I'd try to set the platform encoding to UTF-8 – that error is safe to ignore though. You can comment out the whole test by adding "@Ignore" to src/test/java/org/jenkinsci/plugins/workflow/job/CLITest.java on line 85.

            Show
            svanoort Sam Van Oort added a comment - Mike Kozell Are you building on Windows, by any chance? I'd try to set the platform encoding to UTF-8 – that error is safe to ignore though. You can comment out the whole test by adding "@Ignore" to src/test/java/org/jenkinsci/plugins/workflow/job/CLITest.java on line 85.
            Hide
            svanoort Sam Van Oort added a comment -

            Fixes released with workflow-cps 2.50 and workflow-job 2.21 – would really appreciate if you guys could install the latest and confirm the issue is resolved.

            CCMike Kozell Matt Gaspar

            Show
            svanoort Sam Van Oort added a comment - Fixes released with workflow-cps 2.50 and workflow-job 2.21 – would really appreciate if you guys could install the latest and confirm the issue is resolved. CC Mike Kozell Matt Gaspar
            svanoort Sam Van Oort made changes -
            Resolution Done [ 10000 ]
            Status In Review [ 10005 ] Closed [ 6 ]
            Hide
            svanoort Sam Van Oort added a comment -

            Mike Kozell That is... the exact opposite of what I would expect. Not sure what's going on there, since the ci.jenkins.io builder does both Windows and Linux builds and they succeed. Weird.

            Show
            svanoort Sam Van Oort added a comment - Mike Kozell That is... the exact opposite of what I would expect. Not sure what's going on there, since the ci.jenkins.io builder does both Windows and Linux builds and they succeed. Weird.
            mkozell Mike Kozell made changes -
            Attachment flownode-84.xml [ 42453 ]
            mkozell Mike Kozell made changes -
            Resolution Done [ 10000 ]
            Status Closed [ 6 ] Reopened [ 4 ]
            Hide
            vqrs Christian V added a comment -

            I'm also seeing this:

            org.jenkinsci.remoting.util.ExecutorServiceUtils$FatalRejectedExecutionException: Cannot execute the command java.util.concurrent.FutureTask@134626b. The executor service is shutting down
            
            at hudson.remoting.SingleLaneExecutorService.execute(SingleLaneExecutorService.java:108)
            
            at java.util.concurrent.AbstractExecutorService.submit(Unknown Source)
            
            at com.google.common.util.concurrent.ForwardingExecutorService.submit(ForwardingExecutorService.java:110)
            
            at jenkins.util.InterceptingExecutorService.submit(InterceptingExecutorService.java:49)
            
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4.onSuccess(CpsFlowExecution.java:903)
            
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4.onSuccess(CpsFlowExecution.java:899)
            
            at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:150)
            
            at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
            
            at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
            
            at com.google.common.util.concurrent.ExecutionList.add(ExecutionList.java:105)
            
            at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:155)
            
            at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:160)
            
            at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:90)
            
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.runInCpsVmThread(CpsFlowExecution.java:899)
            
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getCurrentExecutions(CpsFlowExecution.java:977)
            
            at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl.onLoaded(FlowExecutionList.java:180)
            
            at jenkins.model.Jenkins.<init>(Jenkins.java:972)
            
            at hudson.model.Hudson.<init>(Hudson.java:85)
            
            at hudson.model.Hudson.<init>(Hudson.java:81)
            
            at hudson.WebAppMain$3.run(WebAppMain.java:233)

             

            I have updated to the latest Jenkins version and all my Pipeline plugins are up to date. This happened before the update, after the plugin updates, and after updating Jenkins. It's always the same exception.

            Show
            vqrs Christian V added a comment - I'm also seeing this: org.jenkinsci.remoting.util.ExecutorServiceUtils$FatalRejectedExecutionException: Cannot execute the command java.util.concurrent.FutureTask@134626b. The executor service is shutting down at hudson.remoting.SingleLaneExecutorService.execute(SingleLaneExecutorService.java:108) at java.util.concurrent.AbstractExecutorService.submit(Unknown Source) at com.google.common.util.concurrent.ForwardingExecutorService.submit(ForwardingExecutorService.java:110) at jenkins.util.InterceptingExecutorService.submit(InterceptingExecutorService.java:49) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4.onSuccess(CpsFlowExecution.java:903) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4.onSuccess(CpsFlowExecution.java:899) at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:150) at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253) at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149) at com.google.common.util.concurrent.ExecutionList.add(ExecutionList.java:105) at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:155) at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:160) at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:90) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.runInCpsVmThread(CpsFlowExecution.java:899) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getCurrentExecutions(CpsFlowExecution.java:977) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl.onLoaded(FlowExecutionList.java:180) at jenkins.model.Jenkins.<init>(Jenkins.java:972) at hudson.model.Hudson.<init>(Hudson.java:85) at hudson.model.Hudson.<init>(Hudson.java:81) at hudson.WebAppMain$3.run(WebAppMain.java:233)   I have updated to the latest Jenkins version and all my Pipeline plugins are up to date. This happened before the update, after the plugin updates, and after updating Jenkins. It's always the same exception.
            cortelos Andre Macedo made changes -
            Rank Ranked higher
            cortelos Andre Macedo made changes -
            Rank Ranked lower
            Hide
            svanoort Sam Van Oort added a comment -

            Adding a note before I forget that this likely relates to JENKINS-45571.

            Based on some test scenarios that are generating similar symptoms, I suspect both are potentially related to the the CpsContext.isReady call hanging indefinitely – which seems to be a reproducible but obscure threading bug which I have yet to solve.

            Symptoms if that's the same culprit:

            1. Build never gets marked as complete, even though the execution may - it appears the WorkflowRun never gets its "finish" method called and may not have a result set as a result
            2. Pipeline still shows as having a running OneOffExecutor running its WorkflowJob as a Flyweight task
            3. I generally see the CpsContext.isReady hang running, and this is triggered when something fails in early stages of loading & resuming the Pipeline program for an incomplete Pipeline after a restart
            4. Investigation showed that the AsynchronousExecution created by WorkflowRun#sleep never finished (probably due to the CpsStepContext hang).

            Show
            svanoort Sam Van Oort added a comment - Adding a note before I forget that this likely relates to JENKINS-45571 . Based on some test scenarios that are generating similar symptoms, I suspect both are potentially related to the the CpsContext.isReady call hanging indefinitely – which seems to be a reproducible but obscure threading bug which I have yet to solve. Symptoms if that's the same culprit: 1. Build never gets marked as complete, even though the execution may - it appears the WorkflowRun never gets its "finish" method called and may not have a result set as a result 2. Pipeline still shows as having a running OneOffExecutor running its WorkflowJob as a Flyweight task 3. I generally see the CpsContext.isReady hang running, and this is triggered when something fails in early stages of loading & resuming the Pipeline program for an incomplete Pipeline after a restart 4. Investigation showed that the AsynchronousExecution created by WorkflowRun#sleep never finished (probably due to the CpsStepContext hang).
            svanoort Sam Van Oort made changes -
            Link This issue is related to JENKINS-45571 [ JENKINS-45571 ]
            vivek Vivek Pandey made changes -
            Assignee Sam Van Oort [ svanoort ] Devin Nusbaum [ dnusbaum ]
            Hide
            johnar John Arnold added a comment -

            I'm definitely still seeing this "stuck on resume" problem for completed builds AND aborted builds, after a Jenkins reboot.  I have Jenkins set for max perf, low durability mode, to disable this Resume functionality, and it's still trying to resume.

            I can't Stop the jobs that are stuck on resume, I have to do a hard-kill via the /kill uri.  Even trying to kill .finish() them with Groovy script doesn't seem to work reliably.

            Show
            johnar John Arnold added a comment - I'm definitely still seeing this "stuck on resume" problem for completed builds AND aborted builds, after a Jenkins reboot.  I have Jenkins set for max perf, low durability mode, to disable this Resume functionality, and it's still trying to resume. I can't Stop the jobs that are stuck on resume, I have to do a hard-kill via the /kill uri.  Even trying to kill .finish() them with Groovy script doesn't seem to work reliably.
            johnar John Arnold made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            Hide
            johnar John Arnold added a comment - - edited

            Example job log:
            https://pastebin.com/LbnRRTw4

            Jenkins 2.135

            Show
            johnar John Arnold added a comment - - edited Example job log: https://pastebin.com/LbnRRTw4 Jenkins 2.135
            Hide
            dnusbaum Devin Nusbaum added a comment - - edited

            John Arnold Are you able to upload the build.xml for one of the stuck builds so that we can see all of its persisted state? (Please redact any sensitive information if you do upload it.)

            Also, can you please confirm what versions of the Pipeline Groovy Plugin and the Pipeline Job Plugin plugin you are using? Thanks!

            Show
            dnusbaum Devin Nusbaum added a comment - - edited John Arnold Are you able to upload the build.xml for one of the stuck builds so that we can see all of its persisted state? (Please redact any sensitive information if you do upload it.) Also, can you please confirm what versions of the Pipeline Groovy Plugin and the Pipeline Job Plugin plugin you are using? Thanks!
            Hide
            johnar John Arnold added a comment - - edited

            Devin Nusbaum

            Pipeline Groovy: 2.54
            Pipeline Job: 2.24

            build.xml (redacted internal strings)
            https://pastebin.com/rqZ7EwcG

            Show
            johnar John Arnold added a comment - - edited Devin Nusbaum Pipeline Groovy: 2.54 Pipeline Job: 2.24 build.xml (redacted internal strings) https://pastebin.com/rqZ7EwcG
            Hide
            svanoort Sam Van Oort added a comment -

            Devin Nusbaum I took a quick pass through the build.xml and it fits my comment in https://issues.jenkins-ci.org/browse/JENKINS-50199?focusedCommentId=345572&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-345572 to the T AFAICT.

            Please, would you be able to take a peek? It should tie to the issues with CpsContext.isReady and the reproduction case in https://github.com/svanoort/workflow-cps-plugin/tree/reproduce-hung-pipelines – and at this point a fresh perspective might be helpful.

            Show
            svanoort Sam Van Oort added a comment - Devin Nusbaum I took a quick pass through the build.xml and it fits my comment in https://issues.jenkins-ci.org/browse/JENKINS-50199?focusedCommentId=345572&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-345572 to the T AFAICT. Please, would you be able to take a peek? It should tie to the issues with CpsContext.isReady and the reproduction case in https://github.com/svanoort/workflow-cps-plugin/tree/reproduce-hung-pipelines – and at this point a fresh perspective might be helpful.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Sam Van Oort I have been looking at the issue. Like I commented here I don't think that case is actually a reproduction, because it does complete if you increase the timeout, although I might be misunderstanding the test case. I have been trying to find a way to reproduce the problem, but have been unsuccessful so far.

            Show
            dnusbaum Devin Nusbaum added a comment - Sam Van Oort I have been looking at the issue. Like I commented here I don't think that case is actually a reproduction, because it does complete if you increase the timeout, although I might be misunderstanding the test case. I have been trying to find a way to reproduce the problem, but have been unsuccessful so far.
            Hide
            johnar John Arnold added a comment -

            Devin Nusbaum Sam Van Oort  Is there any additional data I can gather for you. I have lots of repros of this problem.

            Show
            johnar John Arnold added a comment - Devin Nusbaum Sam Van Oort   Is there any additional data I can gather for you. I have lots of repros of  this problem.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            John Arnold Could you upload the whole build folder for one of the zombie builds? I'm most interested in the workflow directory adjacent to build.xml to see if the serialized flow nodes tell us anything interesting, but any exceptions/warnings in the various log files would be interesting as well.

            Show
            dnusbaum Devin Nusbaum added a comment - John Arnold Could you upload the whole build folder for one of the zombie builds? I'm most interested in the workflow directory adjacent to build.xml to see if the serialized flow nodes tell us anything interesting, but any exceptions/warnings in the various log files would be interesting as well.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Sam Van Oort Any thoughts on adding something like the following patch as a temporary fix in workflow-job to prevent these pipelines from resuming while we continue investigating?

            diff --git a/src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java b/src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java
            index 38e6ac1..70d9474 100644
            --- a/src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java
            +++ b/src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java
            @@ -752,2 +752,8 @@ public final class WorkflowRun extends Run<WorkflowJob,WorkflowRun> implements F
                                     if (!completed == Boolean.TRUE) {
            +                            if (execution.isComplete()) {
            +                                LOGGER.log(Level.WARNING, "JENKINS-50199: Run completion inconsistent with execution, defaulting to failure for {0}", this);
            +                                setResult(Result.FAILURE);
            +                                completed = Boolean.TRUE;
            +                                needsToPersist = true;
            +                            } else {
                                         // we've been restarted while we were running. let's get the execution going again.
            @@ -757,2 +763,3 @@ public final class WorkflowRun extends Run<WorkflowJob,WorkflowRun> implements F
                                         Timer.get().submit(() -> Queue.getInstance().schedule(new AfterRestartTask(WorkflowRun.this), 0)); // JENKINS-31614
            +                            }
                                     }
            
            Show
            dnusbaum Devin Nusbaum added a comment - Sam Van Oort Any thoughts on adding something like the following patch as a temporary fix in workflow-job to prevent these pipelines from resuming while we continue investigating? diff --git a/src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java b/src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java index 38e6ac1..70d9474 100644 --- a/src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java +++ b/src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java @@ -752,2 +752,8 @@ public final class WorkflowRun extends Run<WorkflowJob,WorkflowRun> implements F if (!completed == Boolean.TRUE) { + if (execution.isComplete()) { + LOGGER.log(Level.WARNING, "JENKINS-50199: Run completion inconsistent with execution, defaulting to failure for {0}", this); + setResult(Result.FAILURE); + completed = Boolean.TRUE; + needsToPersist = true; + } else { // we've been restarted while we were running. let's get the execution going again. @@ -757,2 +763,3 @@ public final class WorkflowRun extends Run<WorkflowJob,WorkflowRun> implements F Timer.get().submit(() -> Queue.getInstance().schedule(new AfterRestartTask(WorkflowRun.this), 0)); // JENKINS-31614 + } }
            Hide
            svanoort Sam Van Oort added a comment -

            Observations from digging in together:

            1. We can reproduce the restarted builds simply by setting the FlowExecution to completed for an incomplete build and dirty-restarting.
            2. On restart that will trigger the build to be able to resume even when it shouldn't – we can avoid this by marking the build completed upon loading if the execution is done
            3. The only place where the message about the build being done can be printed is WorkflowRun line 815 where it gets the StreamBuildListener and calls #finished on it (that prints the build completed message)
            4. The build record has non-null logsToCopy - since that's nulled a few lines down from where the "build completed" message prints, just before saving the build, something prevented saving the build after that mutation - Devin Nusbaum speculates that it might be due to the BulkChange logic in WorkflowRun.save, which early-exits if there's a BulkChange containing the WorkflowRun – if the BulkChange aborts, the WorkflowRun's save operation will be a no-op (and it may not ever be saved).

            I have a small patch which closes the loophole around the completed execution but incomplete build (and passes a basic testcase), but point 4 still needs some thought.

            Show
            svanoort Sam Van Oort added a comment - Observations from digging in together: 1. We can reproduce the restarted builds simply by setting the FlowExecution to completed for an incomplete build and dirty-restarting. 2. On restart that will trigger the build to be able to resume even when it shouldn't – we can avoid this by marking the build completed upon loading if the execution is done 3. The only place where the message about the build being done can be printed is WorkflowRun line 815 where it gets the StreamBuildListener and calls #finished on it (that prints the build completed message) 4. The build record has non-null logsToCopy - since that's nulled a few lines down from where the "build completed" message prints, just before saving the build, something prevented saving the build after that mutation - Devin Nusbaum speculates that it might be due to the BulkChange logic in WorkflowRun.save, which early-exits if there's a BulkChange containing the WorkflowRun – if the BulkChange aborts, the WorkflowRun's save operation will be a no-op (and it may not ever be saved). I have a small patch which closes the loophole around the completed execution but incomplete build (and passes a basic testcase), but point 4 still needs some thought.
            dnusbaum Devin Nusbaum made changes -
            Status Reopened [ 4 ] In Progress [ 3 ]
            dnusbaum Devin Nusbaum made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            dnusbaum Devin Nusbaum made changes -
            Remote Link This issue links to "jenkinsci/workflow-job-plugin#104 (Web Link)" [ 21423 ]
            Hide
            hellspam Roy Arnon added a comment - - edited

            Sam Van Oort we have also recently encountered this issue, or something similar as in our case the resuming jobs completed with SUCCESS.

            I commented here

            But it seems this issue is more relevant to this ticket.

            In our case the resuming build was marked as SUCCESS which marked at as success in Bitbucket (with bitbucket-branch-source plugin). The build itself never completed and in fact had a failing test. This allowed one of our developers to merge a failing build into release unfortunately, 

            If there is anything you need from our jenkins instance I am happy to provide it.

            Show
            hellspam Roy Arnon added a comment - - edited Sam Van Oort we have also recently encountered this issue, or something similar as in our case the resuming jobs completed with SUCCESS. I commented here But it seems this issue is more relevant to this ticket. In our case the resuming build was marked as SUCCESS which marked at as success in Bitbucket (with bitbucket-branch-source plugin). The build itself never completed and in fact had a failing test. This allowed one of our developers to merge a failing build into release unfortunately,  If there is anything you need from our jenkins instance I am happy to provide it.
            svanoort Sam Van Oort made changes -
            Link This issue blocks JENKINS-53358 [ JENKINS-53358 ]
            johnar John Arnold made changes -
            Attachment 867.tar.gz [ 43981 ]
            Show
            johnar John Arnold added a comment - Devin Nusbaum See  https://issues.jenkins-ci.org/secure/attachment/43981/867.tar.gz   for a build folder.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            I just released version 2.25 of the Pipeline Job plugin which should fix the issue where completed builds are resuming after restarting Jenkins. If anyone is still seeing the issue after upgrading the plugin, please comment with any additional information you can provide, such as the build folder of the build in question.

            Note that there are other issues that may appear similar but seem to have distinct causes. For anyone newly commenting on this issue, if you are seeing stuck executors without ever restarting Jenkins, please check if JENKINS-45571JENKINS-53223, or JENKINS-51568 are closer to your situation and if so, comment there instead. Thanks!

            Show
            dnusbaum Devin Nusbaum added a comment - I just released version 2.25 of the Pipeline Job plugin which should fix the issue where completed builds are resuming after restarting Jenkins. If anyone is still seeing the issue after upgrading the plugin, please comment with any additional information you can provide, such as the build folder of the build in question. Note that there are other issues that may appear similar but seem to have distinct causes. For anyone newly commenting on this issue, if you are seeing stuck executors without ever restarting Jenkins, please check if  JENKINS-45571 ,  JENKINS-53223 , or  JENKINS-51568 are closer to your situation and if so, comment there instead. Thanks!
            dnusbaum Devin Nusbaum made changes -
            Released As Pipeline Job 2.25
            Resolution Fixed [ 1 ]
            Status In Review [ 10005 ] Resolved [ 5 ]
            Hide
            keith_ball Keith Ball added a comment -

            I just tried to update the plugin but received a FileNotFoundException: http://archives.jenkins-ci.org/plugins/workflow-job/2.25/workflow-job.hpi

            Failed to download from http://updates.jenkins-ci.org/download/plugins/workflow-job/2.25/workflow-job.hpi (redirected to: http://archives.jenkins-ci.org/plugins/workflow-job/2.25/workflow-job.hpi)

            Did someone forget to publish the latest version?

            Show
            keith_ball Keith Ball added a comment - I just tried to update the plugin but received a FileNotFoundException: http://archives.jenkins-ci.org/plugins/workflow-job/2.25/workflow-job.hpi Failed to download from http://updates.jenkins-ci.org/download/plugins/workflow-job/2.25/workflow-job.hpi (redirected to: http://archives.jenkins-ci.org/plugins/workflow-job/2.25/workflow-job.hpi) Did someone forget to publish the latest version?
            Hide
            keith_ball Keith Ball added a comment -

            Forget it.  it's there now

            Show
            keith_ball Keith Ball added a comment - Forget it.  it's there now
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Keith Ball There is often a slight time delay between when a new release is visible on the update center and when it is available on all mirrors, my guess is that was what happened for you.

            Show
            dnusbaum Devin Nusbaum added a comment - Keith Ball There is often a slight time delay between when a new release is visible on the update center and when it is available on all mirrors, my guess is that was what happened for you.

              People

              Assignee:
              dnusbaum Devin Nusbaum
              Reporter:
              mkozell Mike Kozell
              Votes:
              5 Vote for this issue
              Watchers:
              16 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: