Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-37121

WorkspaceListLeasePickle should help diagnose locked workspaces

      "Waiting to acquire /.../workspace/... : jenkins.util.Timer [#...]" id=... (0x...) state=WAITING cpu=75%
          - waiting on <0x...> (a hudson.slaves.WorkspaceList)
          - locked <0x...> (a hudson.slaves.WorkspaceList)
          at java.lang.Object.wait(Native Method)
          at java.lang.Object.wait(Object.java:502)
          at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:255)
          at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:234)
          at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:223)
          at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:67)
          at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:52)
          at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:62)
          at ...
      

      Looks like a workspace did not get relocked fast enough to avoid getting grabbed by some other job?

      At a minimum, WorkspaceListLeasePickle should printWaitingMessage when acquire blocks, so it is clearer from the build log why the build is still stuck.

      Possibly it should fail if it cannot acquire the workspace immediately, since in this case the workspace can be assumed to have already been clobbered by something else. Currently there is no such core API.

          [JENKINS-37121] WorkspaceListLeasePickle should help diagnose locked workspaces

          Jesse Glick added a comment -

          JENKINS-22767 is one conceivable cause of the situation reported here: one copy of a build locked the workspace upon resumption, and soon after another copy was somehow produced and tried to run, but got stuck attempting to relock the same workspace.

          Jesse Glick added a comment - JENKINS-22767 is one conceivable cause of the situation reported here: one copy of a build locked the workspace upon resumption, and soon after another copy was somehow produced and tried to run, but got stuck attempting to relock the same workspace.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          support/src/main/java/org/jenkinsci/plugins/workflow/support/pickles/WorkspaceListLeasePickle.java
          http://jenkins-ci.org/commit/pipeline-plugin/bed2b2e58b181f7b67df14229410199f307eedce
          Log:
          JENKINS-37121 Use WorkspaceList.record, not .acquire, when dehydrating a pickle
          Backports https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/14.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: support/src/main/java/org/jenkinsci/plugins/workflow/support/pickles/WorkspaceListLeasePickle.java http://jenkins-ci.org/commit/pipeline-plugin/bed2b2e58b181f7b67df14229410199f307eedce Log: JENKINS-37121 Use WorkspaceList.record, not .acquire, when dehydrating a pickle Backports https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/14 .

          Basil Crow added a comment -

          I just hit this in the context of a plugin upgrade to pull in Durability Settings. Since I believe I hit this bug as a result of a cascading failure caused by a regression introduced by JENKINS-47173, I described my findings there.

          Basil Crow added a comment - I just hit this in the context of a plugin upgrade to pull in Durability Settings. Since I believe I hit this bug as a result of a cascading failure caused by a regression introduced by JENKINS-47173 , I described my findings there .

          Basil Crow added a comment -

          Got this error a few times today after a Jenkins restart:

          Resuming build at Thu Mar 14 20:01:13 PDT 2019 after Jenkins restart
          Waiting to resume part of devops-gate » master » git-blackbox
          [Pipeline] cleanWs (hide)
          20:01:14 [WS-CLEANUP] Deleting project workspace...
          20:01:14 [WS-CLEANUP] Deferred wipeout is used...
          20:01:14 [WS-CLEANUP] done
          [Pipeline] }
          [Pipeline] // timeout
          [Pipeline] End of Pipeline
          [Pipeline] }
          java.lang.IllegalStateException: JENKINS-37121: something already locked /var/tmp/jenkins_slaves/jenkins-selfservice/workspace/devops-gate/master/git-blackbox
          	at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:75)
          	at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:51)
          	at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92)
          	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          Caused: java.io.IOException: Failed to load build state
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:854)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:852)
          	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:906)
          	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
          	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	at java.lang.Thread.run(Thread.java:748)
          Finished: FAILURE
          

          Basil Crow added a comment - Got this error a few times today after a Jenkins restart: Resuming build at Thu Mar 14 20:01:13 PDT 2019 after Jenkins restart Waiting to resume part of devops-gate » master » git-blackbox [Pipeline] cleanWs (hide) 20:01:14 [WS-CLEANUP] Deleting project workspace... 20:01:14 [WS-CLEANUP] Deferred wipeout is used... 20:01:14 [WS-CLEANUP] done [Pipeline] } [Pipeline] // timeout [Pipeline] End of Pipeline [Pipeline] } java.lang.IllegalStateException: JENKINS-37121: something already locked /var/tmp/jenkins_slaves/jenkins-selfservice/workspace/devops-gate/master/git-blackbox at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:75) at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:51) at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) Caused: java.io.IOException: Failed to load build state at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:854) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:852) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:906) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Finished: FAILURE

          Swapnil Patne added a comment - - edited

          Getting this error today and strangely it's not even listing this build in left pane in history.

          Jenkins ver. 2.107.2

          java.lang.IllegalStateException: JENKINS-37121: something already locked /var/lib/jenkins/workspace/AutomationPipeline@12
           at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:75)
           at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:51)
           at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92)
           at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
           Caused: java.io.IOException: Failed to load build state
           at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:842)
           at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:840)
           at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:894)
           at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
           at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
           Finished: FAILURE

           

          Swapnil Patne added a comment - - edited Getting this error today and strangely it's not even listing this build in left pane in history. Jenkins ver. 2.107.2 java.lang.IllegalStateException: JENKINS-37121: something already locked / var /lib/jenkins/workspace/AutomationPipeline@12 at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:75) at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:51) at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) Caused: java.io.IOException: Failed to load build state at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:842) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:840) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:894) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Finished: FAILURE  

          ASHOK MOHANTY added a comment -

          We are still getting this error with - Jenkins 2.190.3 and Kub-Plugin v.1.18.3

           

          java.lang.IllegalStateException: JENKINS-37121: something already locked 
          

          I have updated the details in JENKINS-38994, Please let me know - if need to follow any other ticket(s) !!

           

          ASHOK MOHANTY added a comment - We are still getting this error with -  Jenkins 2.190.3 and Kub-Plugin v. 1.18.3   java.lang.IllegalStateException: JENKINS-37121: something already locked I have updated the details in JENKINS-38994 , Please let me know - if need to follow any other ticket(s) !!  

          Basil Crow added a comment -

          For what it's worth, I get this every month or two:

          14:08:01  java.lang.IllegalStateException: JENKINS-37121: something already locked /path/to/job1
          14:08:01  	at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:73)
          14:08:01  	at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:50)
          14:08:01  	at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92)
          14:08:01  	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
          14:08:01  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          14:08:01  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          14:08:01  	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          14:08:01  	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          14:08:01  Caused: java.io.IOException: Failed to load build state
          14:08:01  	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:865)
          14:08:01  	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:863)
          14:08:01  	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:917)
          14:08:01  	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:38)
          14:08:01  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          14:08:01  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          14:08:01  	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
          14:08:01  	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          14:08:01  	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
          14:08:01  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          14:08:01  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          14:08:01  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          14:08:01  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          14:08:01  	at java.lang.Thread.run(Thread.java:748)
          14:08:01  Finished: FAILURE 
          

          The conditions are always the same. About half an hour prior, a regularly scheduled Job DSL "Generate Jobs" step starts:

          13:37:17  Processing DSL script jenkins/jobs/job1.groovy
          14:08:22  Processing DSL script jenkins/jobs/job2.groovy
          

          Note the timestamps: this "Generate Jobs" step is taking a very long time! job1 has thousands of builds in the history, and the storage is a slow NFS server. So this just takes forever. One time I caught a jstack of it and it was stuck in the I/O path in AbstractLazyLoadRunMap or similar.

          Once the DSL script for job1 is processed, we move on to job2 at 14:08, and that's when a few (though not all) runs of job1 blow up with the above stack trace.

          If there's anything else I can do to gather debug information, I'd be happy to.

          Yes I know the NFS server is really the issue, but fixing that is out of my control at the present time for organizational reasons.

          Basil Crow added a comment - For what it's worth, I get this every month or two: 14:08:01 java.lang.IllegalStateException: JENKINS-37121: something already locked /path/to/job1 14:08:01 at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:73) 14:08:01 at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:50) 14:08:01 at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92) 14:08:01 at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67) 14:08:01 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 14:08:01 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 14:08:01 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 14:08:01 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 14:08:01 Caused: java.io.IOException: Failed to load build state 14:08:01 at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:865) 14:08:01 at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:863) 14:08:01 at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:917) 14:08:01 at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:38) 14:08:01 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 14:08:01 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 14:08:01 at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139) 14:08:01 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) 14:08:01 at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) 14:08:01 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 14:08:01 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 14:08:01 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 14:08:01 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 14:08:01 at java.lang.Thread.run(Thread.java:748) 14:08:01 Finished: FAILURE The conditions are always the same. About half an hour prior, a regularly scheduled Job DSL "Generate Jobs" step starts: 13:37:17 Processing DSL script jenkins/jobs/job1.groovy 14:08:22 Processing DSL script jenkins/jobs/job2.groovy Note the timestamps: this "Generate Jobs" step is taking a very long time! job1 has thousands of builds in the history, and the storage is a slow NFS server. So this just takes forever. One time I caught a jstack of it and it was stuck in the I/O path in AbstractLazyLoadRunMap or similar. Once the DSL script for job1 is processed, we move on to job2 at 14:08, and that's when a few (though not all) runs of job1 blow up with the above stack trace. If there's anything else I can do to gather debug information, I'd be happy to. Yes I know the NFS server is really the issue, but fixing that is out of my control at the present time for organizational reasons.

          Jesse Glick added a comment -

          No hypothesis offhand. Possibly needs a core patch to record the stack trace and other metadata of the original locker.

          Jesse Glick added a comment - No hypothesis offhand. Possibly needs a core patch to record the stack trace and other metadata of the original locker.

          Evgeny added a comment -

          We have this issue after update to Jenkins 2.289.1, after when we generated multibanch pipeline by rest api.

          Workraound fix for me:

          edit

          /var/lib/jenkins/org.jenkins.plugins.lockableresources.LockableResourcesManager.xml

          add <queuingStarted>0</queuingStarted>

          restart jenkins

           

          Evgeny added a comment - We have this issue after update to Jenkins 2.289.1, after when we generated multibanch pipeline by rest api. Workraound fix for me: edit /var/lib/jenkins/org.jenkins.plugins.lockableresources.LockableResourcesManager.xml add <queuingStarted>0</queuingStarted> restart jenkins  

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: