Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25890

Deadlock between RunMap and Queue after restart; StepContext.isReady impl acquires lock

      Found one Java-level deadlock:
      =============================
      "Thread-5":
        waiting to lock monitor 0x00007f0984170b38 (object 0x0000000706fe3aa8, a hudson.model.RunMap),
        which is held by "Jenkins initialization thread"
      "Jenkins initialization thread":
        waiting to lock monitor 0x00007f0988015128 (object 0x00000007066b46c0, a hudson.model.Queue),
        which is held by "Thread-5"
      
      Java stack information for the threads listed above:
      ===================================================
      "Thread-5":
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:688)
      	- waiting to lock <0x0000000706fe3aa8> (a hudson.model.RunMap)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:671)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.getById(AbstractLazyLoadRunMap.java:543)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:523)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:533)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getFlowExecution(CpsStepContext.java:386)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getProgramPromise(CpsStepContext.java:230)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.isReady(CpsStepContext.java:236)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.run(ExecutorStepExecution.java:262)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getDisplayName(ExecutorStepExecution.java:281)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getFullDisplayName(ExecutorStepExecution.java:290)
      	at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:107)
      	at hudson.model.LoadBalancer$1.map(LoadBalancer.java:97)
      	at hudson.model.LoadBalancer$2.map(LoadBalancer.java:148)
      	at hudson.model.Queue.maintain(Queue.java:1053)
      	- locked <0x00000007066b46c0> (a hudson.model.Queue)
      	at hudson.model.Queue$1.call(Queue.java:316)
      	at hudson.model.Queue$1.call(Queue.java:313)
      	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:94)
      	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:84)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:104)
      	at java.lang.Thread.run(Thread.java:745)
      "Jenkins initialization thread":
      	at hudson.model.Queue.schedule2(Queue.java:639)
      	- waiting to lock <0x00000007066b46c0> (a hudson.model.Queue)
      	at org.jenkinsci.plugins.workflow.support.pickles.ExecutorPickle.rehydrate(ExecutorPickle.java:67)
      	at org.jenkinsci.plugins.workflow.support.pickles.serialization.PickleResolver.rehydrate(PickleResolver.java:68)
      	at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverReader.restorePickles(RiverReader.java:128)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.loadProgramAsync(CpsFlowExecution.java:401)
      	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:379)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:300)
      	at hudson.model.RunMap.retrieve(RunMap.java:219)
      	at hudson.model.RunMap.retrieve(RunMap.java:56)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:688)
      	- locked <0x0000000706fe3aa8> (a hudson.model.RunMap)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:671)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.getById(AbstractLazyLoadRunMap.java:543)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:523)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:533)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:59)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:51)
      	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
      	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl.onLoaded(FlowExecutionList.java:165)
      	at jenkins.model.Jenkins.<init>(Jenkins.java:845)
      	at hudson.model.Hudson.<init>(Hudson.java:82)
      	at hudson.model.Hudson.<init>(Hudson.java:78)
      	at hudson.WebAppMain$3.run(WebAppMain.java:222)
      

      Ironically, StepContext.isReady is what is supposed to be breaking deadlocks, yet here it is acquiring a lock.

      Since getFlowExecution may block, I think getProgramPromise should be made to return a future which encompasses both getting the execution, and its programPromise.

          [JENKINS-25890] Deadlock between RunMap and Queue after restart; StepContext.isReady impl acquires lock

          Jesse Glick created issue -

          Jesse Glick added a comment -

          Another case I believe is related:

          "Jenkins initialization thread" #22 prio=5 os_prio=0 tid=0x00007fd2c22ec000 nid=0x28b3 waiting for monitor entry [0x00007fd2a1ce0000]
             java.lang.Thread.State: BLOCKED (on object monitor)
          	at hudson.model.Queue.schedule(Queue.java:618)
          	- waiting to lock <0x0000000707eb1070> (a hudson.model.Queue)
          	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:313)
          	at hudson.model.RunMap.retrieve(RunMap.java:221)
          	at hudson.model.RunMap.retrieve(RunMap.java:57)
          	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:465)
          	- locked <0x000000070b279140> (a hudson.model.RunMap)
          	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:448)
          	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:356)
          	at hudson.model.RunMap.getById(RunMap.java:201)
          	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:521)
          	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:531)
          	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:59)
          	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:51)
          	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
          	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
          	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl.onLoaded(FlowExecutionList.java:165)
          	at jenkins.model.Jenkins.<init>(Jenkins.java:862)
          	at hudson.model.Hudson.<init>(Hudson.java:83)
          	at hudson.model.Hudson.<init>(Hudson.java:79)
          	at hudson.WebAppMain$3.run(WebAppMain.java:225)
          "Thread-13" #59 daemon prio=5 os_prio=0 tid=0x00007fd23813e800 nid=0x2916 in Object.wait() [0x00007fd2a0fc3000]
             java.lang.Thread.State: WAITING (on object monitor)
          	at java.lang.Object.wait(Native Method)
          	at java.lang.Object.wait(Object.java:502)
          	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:535)
          	- locked <0x0000000707a5f1c8> (a java.util.HashMap)
          	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getFlowExecution(CpsStepContext.java:387)
          	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getProgramPromise(CpsStepContext.java:228)
          	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.isReady(CpsStepContext.java:234)
          	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.run(ExecutorStepExecution.java:262)
          	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getDisplayName(ExecutorStepExecution.java:281)
          	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getFullDisplayName(ExecutorStepExecution.java:290)
          	at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:107)
          	at hudson.model.LoadBalancer$1.map(LoadBalancer.java:97)
          	at hudson.model.LoadBalancer$2.map(LoadBalancer.java:148)
          	at hudson.model.Queue.maintain(Queue.java:1153)
          	- locked <0x0000000707eb1070> (a hudson.model.Queue)
          	at hudson.model.Queue$1.call(Queue.java:316)
          	at hudson.model.Queue$1.call(Queue.java:313)
          	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:94)
          	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:84)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:104)
          	at java.lang.Thread.run(Thread.java:745)
          

          Jesse Glick added a comment - Another case I believe is related: "Jenkins initialization thread" #22 prio=5 os_prio=0 tid=0x00007fd2c22ec000 nid=0x28b3 waiting for monitor entry [0x00007fd2a1ce0000] java.lang.Thread.State: BLOCKED (on object monitor) at hudson.model.Queue.schedule(Queue.java:618) - waiting to lock <0x0000000707eb1070> (a hudson.model.Queue) at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:313) at hudson.model.RunMap.retrieve(RunMap.java:221) at hudson.model.RunMap.retrieve(RunMap.java:57) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:465) - locked <0x000000070b279140> (a hudson.model.RunMap) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:448) at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:356) at hudson.model.RunMap.getById(RunMap.java:201) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:521) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:531) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:59) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:51) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl.onLoaded(FlowExecutionList.java:165) at jenkins.model.Jenkins.<init>(Jenkins.java:862) at hudson.model.Hudson.<init>(Hudson.java:83) at hudson.model.Hudson.<init>(Hudson.java:79) at hudson.WebAppMain$3.run(WebAppMain.java:225) "Thread-13" #59 daemon prio=5 os_prio=0 tid=0x00007fd23813e800 nid=0x2916 in Object.wait() [0x00007fd2a0fc3000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:535) - locked <0x0000000707a5f1c8> (a java.util.HashMap) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getFlowExecution(CpsStepContext.java:387) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getProgramPromise(CpsStepContext.java:228) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.isReady(CpsStepContext.java:234) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.run(ExecutorStepExecution.java:262) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getDisplayName(ExecutorStepExecution.java:281) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getFullDisplayName(ExecutorStepExecution.java:290) at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:107) at hudson.model.LoadBalancer$1.map(LoadBalancer.java:97) at hudson.model.LoadBalancer$2.map(LoadBalancer.java:148) at hudson.model.Queue.maintain(Queue.java:1153) - locked <0x0000000707eb1070> (a hudson.model.Queue) at hudson.model.Queue$1.call(Queue.java:316) at hudson.model.Queue$1.call(Queue.java:313) at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:94) at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:84) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:104) at java.lang.Thread.run(Thread.java:745)
          Jesse Glick made changes -
          Labels Original: deadlock threads New: deadlock testing threads

          Jesse Glick added a comment -

          Observed to cause a timeout in WorkflowTest.executorStepRestart (JENKINS-26513).

          Jesse Glick added a comment - Observed to cause a timeout in WorkflowTest.executorStepRestart ( JENKINS-26513 ).
          Jesse Glick made changes -
          Link New: This issue is blocking JENKINS-26513 [ JENKINS-26513 ]

          Code changed in jenkins
          User: Jesse Glick
          Path:
          aggregator/src/test/java/org/jenkinsci/plugins/workflow/WorkflowTest.java
          http://jenkins-ci.org/commit/workflow-plugin/f94559d5f48c794c836ee2d80e93a13fbb6c6ed6
          Log:
          JENKINS-25890 causing problems for this test.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: aggregator/src/test/java/org/jenkinsci/plugins/workflow/WorkflowTest.java http://jenkins-ci.org/commit/workflow-plugin/f94559d5f48c794c836ee2d80e93a13fbb6c6ed6 Log: JENKINS-25890 causing problems for this test.
          Jesse Glick made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]
          Jesse Glick made changes -
          Remote Link New: This issue links to "PR 65 (Web Link)" [ 12133 ]

          Code changed in jenkins
          User: Jesse Glick
          Path:
          CHANGES.md
          aggregator/src/test/java/org/jenkinsci/plugins/workflow/WorkflowTest.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodySubContext.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
          step-api/src/main/java/org/jenkinsci/plugins/workflow/steps/StepContext.java
          http://jenkins-ci.org/commit/workflow-plugin/790a5453ef094da16e906acc5b5cc512ac1bde60
          Log:
          [FIXED JENKINS-25890] isReady should not block, so getProgramPromise may not block on getFlowExecution.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: CHANGES.md aggregator/src/test/java/org/jenkinsci/plugins/workflow/WorkflowTest.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodySubContext.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java step-api/src/main/java/org/jenkinsci/plugins/workflow/steps/StepContext.java http://jenkins-ci.org/commit/workflow-plugin/790a5453ef094da16e906acc5b5cc512ac1bde60 Log: [FIXED JENKINS-25890] isReady should not block, so getProgramPromise may not block on getFlowExecution.
          SCM/JIRA link daemon made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: In Progress [ 3 ] New: Resolved [ 5 ]

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: