Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-37121

WorkspaceListLeasePickle should help diagnose locked workspaces

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      "Waiting to acquire /.../workspace/... : jenkins.util.Timer [#...]" id=... (0x...) state=WAITING cpu=75%
          - waiting on <0x...> (a hudson.slaves.WorkspaceList)
          - locked <0x...> (a hudson.slaves.WorkspaceList)
          at java.lang.Object.wait(Native Method)
          at java.lang.Object.wait(Object.java:502)
          at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:255)
          at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:234)
          at hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:223)
          at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:67)
          at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:52)
          at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:62)
          at ...
      

      Looks like a workspace did not get relocked fast enough to avoid getting grabbed by some other job?

      At a minimum, WorkspaceListLeasePickle should printWaitingMessage when acquire blocks, so it is clearer from the build log why the build is still stuck.

      Possibly it should fail if it cannot acquire the workspace immediately, since in this case the workspace can be assumed to have already been clobbered by something else. Currently there is no such core API.

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            JENKINS-22767 is one conceivable cause of the situation reported here: one copy of a build locked the workspace upon resumption, and soon after another copy was somehow produced and tried to run, but got stuck attempting to relock the same workspace.

            Show
            jglick Jesse Glick added a comment - JENKINS-22767 is one conceivable cause of the situation reported here: one copy of a build locked the workspace upon resumption, and soon after another copy was somehow produced and tried to run, but got stuck attempting to relock the same workspace.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            support/src/main/java/org/jenkinsci/plugins/workflow/support/pickles/WorkspaceListLeasePickle.java
            http://jenkins-ci.org/commit/pipeline-plugin/bed2b2e58b181f7b67df14229410199f307eedce
            Log:
            JENKINS-37121 Use WorkspaceList.record, not .acquire, when dehydrating a pickle
            Backports https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/14.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: support/src/main/java/org/jenkinsci/plugins/workflow/support/pickles/WorkspaceListLeasePickle.java http://jenkins-ci.org/commit/pipeline-plugin/bed2b2e58b181f7b67df14229410199f307eedce Log: JENKINS-37121 Use WorkspaceList.record, not .acquire, when dehydrating a pickle Backports https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/14 .
            Hide
            basil Basil Crow added a comment -

            I just hit this in the context of a plugin upgrade to pull in Durability Settings. Since I believe I hit this bug as a result of a cascading failure caused by a regression introduced by JENKINS-47173, I described my findings there.

            Show
            basil Basil Crow added a comment - I just hit this in the context of a plugin upgrade to pull in Durability Settings. Since I believe I hit this bug as a result of a cascading failure caused by a regression introduced by JENKINS-47173 , I described my findings there .
            Hide
            basil Basil Crow added a comment -

            Got this error a few times today after a Jenkins restart:

            Resuming build at Thu Mar 14 20:01:13 PDT 2019 after Jenkins restart
            Waiting to resume part of devops-gate » master » git-blackbox
            [Pipeline] cleanWs (hide)
            20:01:14 [WS-CLEANUP] Deleting project workspace...
            20:01:14 [WS-CLEANUP] Deferred wipeout is used...
            20:01:14 [WS-CLEANUP] done
            [Pipeline] }
            [Pipeline] // timeout
            [Pipeline] End of Pipeline
            [Pipeline] }
            java.lang.IllegalStateException: JENKINS-37121: something already locked /var/tmp/jenkins_slaves/jenkins-selfservice/workspace/devops-gate/master/git-blackbox
            	at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:75)
            	at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:51)
            	at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92)
            	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
            	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            Caused: java.io.IOException: Failed to load build state
            	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:854)
            	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:852)
            	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:906)
            	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
            	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
            	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
            	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
            	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            	at java.lang.Thread.run(Thread.java:748)
            Finished: FAILURE
            
            Show
            basil Basil Crow added a comment - Got this error a few times today after a Jenkins restart: Resuming build at Thu Mar 14 20:01:13 PDT 2019 after Jenkins restart Waiting to resume part of devops-gate » master » git-blackbox [Pipeline] cleanWs (hide) 20:01:14 [WS-CLEANUP] Deleting project workspace... 20:01:14 [WS-CLEANUP] Deferred wipeout is used... 20:01:14 [WS-CLEANUP] done [Pipeline] } [Pipeline] // timeout [Pipeline] End of Pipeline [Pipeline] } java.lang.IllegalStateException: JENKINS-37121: something already locked /var/tmp/jenkins_slaves/jenkins-selfservice/workspace/devops-gate/master/git-blackbox at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:75) at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:51) at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) Caused: java.io.IOException: Failed to load build state at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:854) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:852) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:906) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Finished: FAILURE
            Hide
            swapnilpatne Swapnil Patne added a comment - - edited

            Getting this error today and strangely it's not even listing this build in left pane in history.

            Jenkins ver. 2.107.2

            java.lang.IllegalStateException: JENKINS-37121: something already locked /var/lib/jenkins/workspace/AutomationPipeline@12
             at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:75)
             at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:51)
             at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92)
             at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
             at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
             Caused: java.io.IOException: Failed to load build state
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:842)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:840)
             at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:894)
             at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
             at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
             at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
             at java.lang.Thread.run(Thread.java:748)
             Finished: FAILURE

             

            Show
            swapnilpatne Swapnil Patne added a comment - - edited Getting this error today and strangely it's not even listing this build in left pane in history. Jenkins ver. 2.107.2 java.lang.IllegalStateException: JENKINS-37121: something already locked / var /lib/jenkins/workspace/AutomationPipeline@12 at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:75) at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:51) at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) Caused: java.io.IOException: Failed to load build state at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:842) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:840) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:894) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Finished: FAILURE  
            Hide
            akmjenkins ASHOK MOHANTY added a comment -

            We are still getting this error with - Jenkins 2.190.3 and Kub-Plugin v.1.18.3

             

            java.lang.IllegalStateException: JENKINS-37121: something already locked 
            

            I have updated the details in JENKINS-38994, Please let me know - if need to follow any other ticket(s) !!

             

            Show
            akmjenkins ASHOK MOHANTY added a comment - We are still getting this error with -  Jenkins 2.190.3 and Kub-Plugin v. 1.18.3   java.lang.IllegalStateException: JENKINS-37121: something already locked I have updated the details in JENKINS-38994 , Please let me know - if need to follow any other ticket(s) !!  
            Hide
            basil Basil Crow added a comment -

            For what it's worth, I get this every month or two:

            14:08:01  java.lang.IllegalStateException: JENKINS-37121: something already locked /path/to/job1
            14:08:01  	at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:73)
            14:08:01  	at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:50)
            14:08:01  	at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92)
            14:08:01  	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
            14:08:01  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            14:08:01  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            14:08:01  	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
            14:08:01  	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            14:08:01  Caused: java.io.IOException: Failed to load build state
            14:08:01  	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:865)
            14:08:01  	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:863)
            14:08:01  	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:917)
            14:08:01  	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:38)
            14:08:01  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            14:08:01  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            14:08:01  	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
            14:08:01  	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
            14:08:01  	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
            14:08:01  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            14:08:01  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            14:08:01  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            14:08:01  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            14:08:01  	at java.lang.Thread.run(Thread.java:748)
            14:08:01  Finished: FAILURE 
            

            The conditions are always the same. About half an hour prior, a regularly scheduled Job DSL "Generate Jobs" step starts:

            13:37:17  Processing DSL script jenkins/jobs/job1.groovy
            14:08:22  Processing DSL script jenkins/jobs/job2.groovy
            

            Note the timestamps: this "Generate Jobs" step is taking a very long time! job1 has thousands of builds in the history, and the storage is a slow NFS server. So this just takes forever. One time I caught a jstack of it and it was stuck in the I/O path in AbstractLazyLoadRunMap or similar.

            Once the DSL script for job1 is processed, we move on to job2 at 14:08, and that's when a few (though not all) runs of job1 blow up with the above stack trace.

            If there's anything else I can do to gather debug information, I'd be happy to.

            Yes I know the NFS server is really the issue, but fixing that is out of my control at the present time for organizational reasons.

            Show
            basil Basil Crow added a comment - For what it's worth, I get this every month or two: 14:08:01 java.lang.IllegalStateException: JENKINS-37121: something already locked /path/to/job1 14:08:01 at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:73) 14:08:01 at org.jenkinsci.plugins.workflow.support.pickles.WorkspaceListLeasePickle$1.tryResolve(WorkspaceListLeasePickle.java:50) 14:08:01 at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:92) 14:08:01 at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67) 14:08:01 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 14:08:01 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 14:08:01 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 14:08:01 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 14:08:01 Caused: java.io.IOException: Failed to load build state 14:08:01 at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:865) 14:08:01 at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$3.onSuccess(CpsFlowExecution.java:863) 14:08:01 at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:917) 14:08:01 at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:38) 14:08:01 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 14:08:01 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 14:08:01 at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139) 14:08:01 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) 14:08:01 at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) 14:08:01 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 14:08:01 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 14:08:01 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 14:08:01 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 14:08:01 at java.lang.Thread.run(Thread.java:748) 14:08:01 Finished: FAILURE The conditions are always the same. About half an hour prior, a regularly scheduled Job DSL "Generate Jobs" step starts: 13:37:17 Processing DSL script jenkins/jobs/job1.groovy 14:08:22 Processing DSL script jenkins/jobs/job2.groovy Note the timestamps: this "Generate Jobs" step is taking a very long time! job1 has thousands of builds in the history, and the storage is a slow NFS server. So this just takes forever. One time I caught a jstack of it and it was stuck in the I/O path in AbstractLazyLoadRunMap or similar. Once the DSL script for job1 is processed, we move on to job2 at 14:08, and that's when a few (though not all) runs of job1 blow up with the above stack trace. If there's anything else I can do to gather debug information, I'd be happy to. Yes I know the NFS server is really the issue, but fixing that is out of my control at the present time for organizational reasons.
            Hide
            jglick Jesse Glick added a comment -

            No hypothesis offhand. Possibly needs a core patch to record the stack trace and other metadata of the original locker.

            Show
            jglick Jesse Glick added a comment - No hypothesis offhand. Possibly needs a core patch to record the stack trace and other metadata of the original locker.

              People

              Assignee:
              jglick Jesse Glick
              Reporter:
              jglick Jesse Glick
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: