-
Bug
-
Resolution: Not A Defect
-
Major
-
None
-
Jenkins LTS 2.46.3, lockable-resources 2.0
I have to caveat this upfront that I have two Jenkins masters, both on the same LTS version, with the same version of lockable-resources and the same versions of all the pipeline plugins (latest as available to 2.46.3 today, please ask if you need versions), and this bug only occurs on one of them, but I can't see why.
Script used to repro:
def builders = [:] def stages = [ [name:'first', lockName:'lock-first'], [name:'second', lockName:'lock-second'], [name:'first-two', lockName:'lock-first'], [name:'second-two', lockName:'lock-second'] ] stages.each { stage -> builders[stage.name] = { lock(stage.lockName) { node { commonBuildCommand() } } } } parallel builders def commonBuildCommand() { sleep 10 sleep 10 }
When I run this script (and restart during it, as per the log), I will see the following output:
[Pipeline] parallel [Pipeline] [first] { (Branch: first) [Pipeline] [second] { (Branch: second) [Pipeline] [first-two] { (Branch: first-two) [Pipeline] [second-two] { (Branch: second-two) [Pipeline] [first] lock [first] Trying to acquire lock on [lock-first] [first] Lock acquired on [lock-first] [Pipeline] [first] { [Pipeline] [second] lock [second] Trying to acquire lock on [lock-second] [second] Lock acquired on [lock-second] [Pipeline] [second] { [Pipeline] [first-two] lock [first-two] Trying to acquire lock on [lock-first] [first-two] Found 0 available resource(s). Waiting for correct amount: 1. [first-two] [lock-first] is locked, waiting... [Pipeline] [second-two] lock [second-two] Trying to acquire lock on [lock-second] [second-two] Found 0 available resource(s). Waiting for correct amount: 1. [second-two] [lock-second] is locked, waiting... [Pipeline] [first] node [first] Running on <node> in <workspace>/Test_Restart [Pipeline] [second] node [second] Running on <node> in <workspace>/Test_Restart@2 [Pipeline] [first] { [Pipeline] [second] { [Pipeline] [first] sleep [first] Sleeping for 10 sec [Pipeline] [second] sleep [second] Sleeping for 10 sec Resuming build at Fri Jun 23 15:15:52 BST 2017 after Jenkins restart Waiting to resume part of Experiments ยป Test_Restart #24: ??? Ready to run at Fri Jun 23 15:15:53 BST 2017 [Pipeline] [first] sleep [first] Sleeping for 10 sec [Pipeline] [second] sleep [second] Sleeping for 10 sec [Pipeline] [first] } [Pipeline] [first] // node [Pipeline] [first] } [first] Lock released on resource [lock-first] [Pipeline] [second] } [Pipeline] [first] // lock [Pipeline] [second] // node [Pipeline] [first] } [Pipeline] [second] } [second] Lock released on resource [lock-second] [Pipeline] [second] // lock [Pipeline] [second] }
The script will now stall at this point. The following will be output to the Jenkins log at this time (when FINE logging is available for lockable-resources):
Jun 23, 2017 3:16:03 PM WARNING org.jenkins.plugins.lockableresources.LockableResourcesManager unlockNames Skipping queued context for lock. Can not get the Run object from the context to proceed with lock, this could be a legitimate status if the build waiting for the lock was deleted or hard killed. More information at Level.FINE if debug is needed. Jun 23, 2017 3:16:03 PM FINE org.jenkins.plugins.lockableresources.LockableResourcesManager Can not get the Run object from the context to proceed with lock java.lang.NullPointerException at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getFlowExecution(CpsStepContext.java:472) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:243) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:294) at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:61) at org.jenkins.plugins.lockableresources.LockableResourcesManager.unlockNames(LockableResourcesManager.java:387) at org.jenkins.plugins.lockableresources.LockStepExecution$Callback.finished(LockStepExecution.java:84) at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onSuccess(BodyExecutionCallback.java:114) at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$SuccessAdapter.receive(CpsBodyExecution.java:362) at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:165) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:162) at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122) at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261) at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:162) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32) at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32) at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:330) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Jun 23, 2017 3:16:03 PM WARNING org.jenkins.plugins.lockableresources.LockableResourcesManager unlockNames Skipping queued context for lock. Can not get the Run object from the context to proceed with lock, this could be a legitimate status if the build waiting for the lock was deleted or hard killed. More information at Level.FINE if debug is needed. Jun 23, 2017 3:16:03 PM FINE org.jenkins.plugins.lockableresources.LockableResourcesManager Can not get the Run object from the context to proceed with lock java.lang.NullPointerException at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getFlowExecution(CpsStepContext.java:472) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:243) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:294) at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:61) at org.jenkins.plugins.lockableresources.LockableResourcesManager.unlockNames(LockableResourcesManager.java:387) at org.jenkins.plugins.lockableresources.LockStepExecution$Callback.finished(LockStepExecution.java:84) at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onSuccess(BodyExecutionCallback.java:114) at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$SuccessAdapter.receive(CpsBodyExecution.java:362) at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:165) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:162) at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:122) at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:261) at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:162) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:19) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:35) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:32) at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:32) at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:174) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:330) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:82) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:242) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:230) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
When I then abort the job I'll get this output in the Jenkins log:
Jun 23, 2017 3:16:25 PM WARNING org.jenkins.plugins.lockableresources.LockStepExecution stop Cannot remove context from lockable resource witing list. The context is not in the waiting list. Jun 23, 2017 3:16:25 PM WARNING org.jenkins.plugins.lockableresources.LockStepExecution stop Cannot remove context from lockable resource witing list. The context is not in the waiting list.
And this in the script:
Aborted by Phil McArdle (phil.mcardle) [Pipeline] [second-two] // lock [Pipeline] [first-two] // lock [Pipeline] [first-two] } [first-two] Failed in branch first-two [Pipeline] [second-two] } [second-two] Failed in branch second-two [Pipeline] // parallel [Pipeline] End of Pipeline Exception: null org.jenkinsci.plugins.workflow.steps.FlowInterruptedException at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution.cancel(CpsBodyExecution.java:242) at org.jenkinsci.plugins.workflow.steps.BodyExecution.cancel(BodyExecution.java:76) at org.jenkinsci.plugins.workflow.cps.steps.ParallelStepExecution.stop(ParallelStepExecution.java:66) at org.jenkinsci.plugins.workflow.cps.CpsThread.stop(CpsThread.java:296) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:966) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:955) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:744) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Finished: ABORTED
There are no other warnings or errors in the Jenkins log throughout this, but immediately after the above, it will echo:
Jun 23, 2017 3:22:41 PM INFO org.jenkinsci.plugins.workflow.job.WorkflowRun finish Experiments/Test_Restart #26 completed: ABORTED Jun 23, 2017 3:22:41 PM WARNING org.jenkinsci.plugins.workflow.flow.FlowExecutionList unregister Owner[Experiments/Test_Restart/26:Experiments/Test_Restart #26] was not in the list to begin with: []
I'm not sure if there's anything relevant there, but including just in case.
I'm aware that the repro in my script might seem a little odd - I'm not sure if anyone else uses this exact form - I'm using this in another script to stack up multiple parallel branches that will all queue on shared lockable resources.
I suspect that I could trigger this issue in other ways, e.g. having the locks reserved by other jobs, but I haven't hit them yet.