Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-36479

Locked resources not freed up by Pipeline job hard kill

      Since LockStepExecution.Callback.finished(context) never gets called in the case of a hard kill, resources can be locked forever when a build is hard killed. It's possible to manually unlock those resources from the UI, but it'd be preferable to have some behavior that detects this scenario and is able to unlock resources locked by defunct builds.

          [JENKINS-36479] Locked resources not freed up by Pipeline job hard kill

          Andrew Bayer created issue -

          Andrew Bayer added a comment -

          Also this can happen if the build is deleted while running - a non-ideal usage pattern, sure, but since you can do it, people will end up doing it. So we probably also want to check if the build locking a resource actually even exists in the first place and unlock if the build doesn't exist.

          Andrew Bayer added a comment - Also this can happen if the build is deleted while running - a non-ideal usage pattern, sure, but since you can do it, people will end up doing it. So we probably also want to check if the build locking a resource actually even exists in the first place and unlock if the build doesn't exist.
          Andrew Bayer made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]

          Andrew Bayer added a comment -

          Andrew Bayer added a comment - Very first thoughts on this up at https://github.com/abayer/lockable-resources-plugin/commit/9a0ef2cae5176cef4d5f8439c53b2aad4b6facc0

          Andrew Bayer added a comment -

          Andrew Bayer added a comment - Continued further with https://github.com/jenkinsci/lockable-resources-plugin/compare/master...abayer:jenkins-36479

          Andrew Bayer added a comment -

          Note that this approach doesn't do anything for a scenario where there are already builds queued up for a resource and the build holding that lock gets deleted/hard killed - it only clears that defunct lock when something requests a lock on the resource, though in the X-builds-queued-up scenario, when the X+1th build tries to get a lock, the result will be that the first build in the queue ends up getting a new lock. Still thinking about how to deal with the existing queue without a new lock request...

          Andrew Bayer added a comment - Note that this approach doesn't do anything for a scenario where there are already builds queued up for a resource and the build holding that lock gets deleted/hard killed - it only clears that defunct lock when something requests a lock on the resource, though in the X-builds-queued-up scenario, when the X+1th build tries to get a lock, the result will be that the first build in the queue ends up getting a new lock. Still thinking about how to deal with the existing queue without a new lock request...

          Andrew Bayer added a comment -

          Interesting - turns out LockStepExecution.stop still gets called in the case of a hard kill, and it's not pleased about it -

          INFO: p #1 completed: ABORTED
          [p #1] Hard kill!
          Jul 06, 2016 11:27:29 AM org.jenkins.plugins.lockableresources.LockStepExecution stop
          WARNING: Cannot remove context from lockable resource witing list. The context is not in the waiting list.
          Jul 06, 2016 11:27:29 AM org.jenkinsci.plugins.workflow.cps.CpsStepContext onFailure
          WARNING: already completed CpsStepContext[3]:Owner[p/1:p #1]
          java.lang.IllegalStateException: org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
          	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.onFailure(CpsStepContext.java:320)
          	at org.jenkins.plugins.lockableresources.LockStepExecution.stop(LockStepExecution.java:90)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:760)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:755)
          	at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:150)
          	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
          	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
          	at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134)
          	at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:170)
          	at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:644)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:631)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:568)
          	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:32)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
          	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          	at java.lang.Thread.run(Thread.java:744)
          

          Andrew Bayer added a comment - Interesting - turns out LockStepExecution.stop still gets called in the case of a hard kill, and it's not pleased about it - INFO: p #1 completed: ABORTED [p #1] Hard kill! Jul 06, 2016 11:27:29 AM org.jenkins.plugins.lockableresources.LockStepExecution stop WARNING: Cannot remove context from lockable resource witing list. The context is not in the waiting list. Jul 06, 2016 11:27:29 AM org.jenkinsci.plugins.workflow.cps.CpsStepContext onFailure WARNING: already completed CpsStepContext[3]:Owner[p/1:p #1] java.lang.IllegalStateException: org.jenkinsci.plugins.workflow.steps.FlowInterruptedException at org.jenkinsci.plugins.workflow.cps.CpsStepContext.onFailure(CpsStepContext.java:320) at org.jenkins.plugins.lockableresources.LockStepExecution.stop(LockStepExecution.java:90) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:760) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:755) at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:150) at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253) at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149) at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:134) at com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:170) at com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:53) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:644) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:631) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:568) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:32) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang. Thread .run( Thread .java:744)

          Andrew Bayer added a comment -

          Ok, that's just noise - not ideal noise, but just noise. I've pushed a commit with a test that passes now and hangs forever without my changes to LockableResourceManager - that needs to be fixed to actually fail in that scenario.

          More tests still needed for the doDelete scenario and for the builds-already-in-queue scenario, but I just wanted to make sure this actually worked in the base case. =)

          Andrew Bayer added a comment - Ok, that's just noise - not ideal noise, but just noise. I've pushed a commit with a test that passes now and hangs forever without my changes to LockableResourceManager - that needs to be fixed to actually fail in that scenario. More tests still needed for the doDelete scenario and for the builds-already-in-queue scenario, but I just wanted to make sure this actually worked in the base case. =)
          Jesse Glick made changes -
          Link New: This issue is related to JENKINS-28183 [ JENKINS-28183 ]
          Jesse Glick made changes -
          Labels New: robustness workflow

            abayer Andrew Bayer
            abayer Andrew Bayer
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: