• workflow-api 1108.v57edf648f5d4 and workflow-durable-task-step 1107.v5dab75aaccbd

      Following the update to the latest LTE version my Jenkins instance would hang during startup and the process would be unresponsive so that systemctl stop and even a plain kill would not remove it. The logs would contain an error message about a thread deadlock (see below). If it's relevant, there was a job in progress which got suspended when the controller was stopped for the upgrade.

      I tried restarting several times, but the same thing happened each time. I then tried downgrading the jenkins package to the previous version but that hit the same error. Restoring from a snapshot allowed me to return to the previous version.

       

      The following error would appear in the logs:

      WARNING j.m.api.Metrics$HealthChecker#execute: Some health checks are reporting as unhealthy: [thread-deadlock : [AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#26] locked on hudson.model.RunMap@166af3a7 (owned by CpsStepContext.isReady [#2]):
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:376)
      	at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:228)
      	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:233)
      	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:104)
      	at jenkins.model.PeepholePermalink.resolve(PeepholePermalink.java:103)
      	at hudson.model.Job.getLastSuccessfulBuild(Job.java:947)
      	at hudson.model.Job.getEstimatedDurationCandidates(Job.java:1019)
      	at hudson.model.Job.getEstimatedDuration(Job.java:1053)
      	at hudson.model.Run.getEstimatedDuration(Run.java:2496)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getEstimatedDuration(ExecutorStepExecution.java:696)
      	at hudson.model.queue.MappingWorksheet.<init>(MappingWorksheet.java:327)
      	at hudson.model.queue.MappingWorksheet.<init>(MappingWorksheet.java:312)
      	at hudson.model.Queue.maintain(Queue.java:1645)
      	at hudson.model.Queue$1.call(Queue.java:325)
      	at hudson.model.Queue$1.call(Queue.java:322)
      	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:107)
      	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:97)
      	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:121)
      	at java.lang.Thread.run(Thread.java:748)
      , CpsStepContext.isReady [#2] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@18965682 (owned by AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#26]):
      	at sun.misc.Unsafe.park(Native Method)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
      	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
      	at hudson.model.Queue.schedule2(Queue.java:567)
      	at hudson.model.Queue.schedule2(Queue.java:693)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution.start(ExecutorStepExecution.java:104)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution.onResume(ExecutorStepExecution.java:210)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener$1.onSuccess(FlowExecutionList.java:265)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener$1.onSuccess(FlowExecutionList.java:243)
      	at com.google.common.util.concurrent.Futures$6.run(Futures.java:975)
      	at org.jenkinsci.plugins.workflow.flow.DirectExecutor.execute(DirectExecutor.java:33)
      	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
      	at com.google.common.util.concurrent.ExecutionList.add(ExecutionList.java:105)
      	at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:155)
      	at com.google.common.util.concurrent.Futures.addCallback(Futures.java:985)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener.onResumed(FlowExecutionList.java:243)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionListener.fireResumed(FlowExecutionListener.java:84)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:567)
      	at hudson.model.RunMap.retrieve(RunMap.java:231)
      	at hudson.model.RunMap.retrieve(RunMap.java:58)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:506)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:488)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:386)
      	at hudson.model.RunMap.getById(RunMap.java:211)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:948)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:959)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getExecution(CpsStepContext.java:217)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:242)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.access$000(CpsStepContext.java:97)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext$1.call(CpsStepContext.java:263)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext$1.call(CpsStepContext.java:261)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      , Splunk data monitor thread locked on hudson.model.RunMap@166af3a7 (owned by CpsStepContext.isReady [#2]):
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:376)
      	at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:228)
      	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:233)
      	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:104)
      	at hudson.model.Run.fromExternalizableId(Run.java:2483)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.runForDisplay(ExecutorStepExecution.java:527)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getUrl(ExecutorStepExecution.java:536)
      	at com.splunk.splunkjenkins.HealthMonitor.sendPendingQueue(HealthMonitor.java:110)
      	at com.splunk.splunkjenkins.HealthMonitor.execute(HealthMonitor.java:44)
      	at hudson.model.AsyncPeriodicWork.lambda$doRun$0(AsyncPeriodicWork.java:101)
      	at hudson.model.AsyncPeriodicWork$$Lambda$545/292627145.run(Unknown Source)
      	at java.lang.Thread.run(Thread.java:748)
      

          [JENKINS-67351] thread deadlock after update to 2.319.1

          James Robson created issue -
          Mark Waite made changes -
          Summary Original: thread deadlock after update to 2.391.1 New: thread deadlock after update to 2.319.1
          Mark Waite made changes -
          Environment Original: jenkins: 2.391.1
          OS: ubuntu 20.04
          Java: 1.8.0_292
          New: jenkins: 2.319.1
          OS: ubuntu 20.04
          Java: 1.8.0_292

          Devin Nusbaum added a comment - - edited

          Appears to be caused by JENKINS-67164 in workflow-api 1105.v3de5e2efac97 (you may be able to downgrade workflow-api to 2.47 to avoid the issue).

          The "[AtmostOneTaskExecutor[Periodic Jenkins queue maintenance]" thread is trying to access a build via RunMap, which requires a lock, and currently holds the queue lock.

          The "CpsStepContext.isReady 2" thread is loading a build and is holding a RunMap lock, and is stuck trying to schedule a placeholder task to resume a node step which requires the queue lock.

          I think the "Splunk data monitor" thread is irrelevant.

          I will look into patching workflow-api to fix this.

          Devin Nusbaum added a comment - - edited Appears to be caused by JENKINS-67164 in workflow-api 1105.v3de5e2efac97 (you may be able to downgrade workflow-api to 2.47 to avoid the issue). The "[AtmostOneTaskExecutor [Periodic Jenkins queue maintenance] " thread is trying to access a build via RunMap , which requires a lock, and currently holds the queue lock. The "CpsStepContext.isReady 2" thread is loading a build and is holding a RunMap lock, and is stuck trying to schedule a placeholder task to resume a node step which requires the queue lock. I think the "Splunk data monitor" thread is irrelevant. I will look into patching workflow-api to fix this.
          Devin Nusbaum made changes -
          Link New: This issue is caused by JENKINS-67164 [ JENKINS-67164 ]
          Devin Nusbaum made changes -
          Component/s New: workflow-api-plugin [ 21711 ]
          Component/s Original: core [ 15593 ]
          Devin Nusbaum made changes -
          Assignee New: Devin Nusbaum [ dnusbaum ]
          Jesse Glick made changes -
          Labels New: deadlock regression
          Jesse Glick made changes -
          Description Original: Following the update to the latest LTE version my Jenkins instance would hang during startup and the process would be unresponsive so that {{systemctl stop}} and even a plain {{kill}} would not remove it. The logs would contain an error message about a thread deadlock (see below). If it's relevant, there was a job in progress which got suspended when the controller was stopped for the upgrade.

          I tried restarting several times, but the same thing happened each time. I then tried downgrading the jenkins package to the previous version but that hit the same error. Restoring from a snapshot allowed me to return to the previous version.

           

          The following error would appear in the logs:

          {{WARNING j.m.api.Metrics$HealthChecker#execute: Some health checks are reporting as unhealthy: [thread-deadlock : [AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#26] locked on hudson.model.RunMap@166af3a7 (owned by CpsStepContext.isReady [#2]):}}
           \{{ at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:376)}}
           \{{ at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:228)}}
           \{{ at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:233)}}
           \{{ at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:104)}}
           \{{ at jenkins.model.PeepholePermalink.resolve(PeepholePermalink.java:103)}}
           \{{ at hudson.model.Job.getLastSuccessfulBuild(Job.java:947)}}
           \{{ at hudson.model.Job.getEstimatedDurationCandidates(Job.java:1019)}}
           \{{ at hudson.model.Job.getEstimatedDuration(Job.java:1053)}}
           \{{ at hudson.model.Run.getEstimatedDuration(Run.java:2496)}}
           \{{ at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getEstimatedDuration(ExecutorStepExecution.java:696)}}
           \{{ at hudson.model.queue.MappingWorksheet.<init>(MappingWorksheet.java:327)}}
           \{{ at hudson.model.queue.MappingWorksheet.<init>(MappingWorksheet.java:312)}}
           \{{ at hudson.model.Queue.maintain(Queue.java:1645)}}
           \{{ at hudson.model.Queue$1.call(Queue.java:325)}}
           \{{ at hudson.model.Queue$1.call(Queue.java:322)}}
           \{{ at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:107)}}
           \{{ at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:97)}}
           \{{ at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)}}
           \{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
           \{{ at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:121)}}
           \{{ at java.lang.Thread.run(Thread.java:748)}}
           {{, CpsStepContext.isReady [#2] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@18965682 (owned by AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#26]):}}
           \{{ at sun.misc.Unsafe.park(Native Method)}}
           \{{ at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)}}
           \{{ at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)}}
           \{{ at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)}}
           \{{ at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)}}
           \{{ at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)}}
           \{{ at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)}}
           \{{ at hudson.model.Queue.schedule2(Queue.java:567)}}
           \{{ at hudson.model.Queue.schedule2(Queue.java:693)}}
           \{{ at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution.start(ExecutorStepExecution.java:104)}}
           \{{ at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution.onResume(ExecutorStepExecution.java:210)}}
           \{{ at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener$1.onSuccess(FlowExecutionList.java:265)}}
           \{{ at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener$1.onSuccess(FlowExecutionList.java:243)}}
           \{{ at com.google.common.util.concurrent.Futures$6.run(Futures.java:975)}}
           \{{ at org.jenkinsci.plugins.workflow.flow.DirectExecutor.execute(DirectExecutor.java:33)}}
           \{{ at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)}}
           \{{ at com.google.common.util.concurrent.ExecutionList.add(ExecutionList.java:105)}}
           \{{ at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:155)}}
           \{{ at com.google.common.util.concurrent.Futures.addCallback(Futures.java:985)}}
           \{{ at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener.onResumed(FlowExecutionList.java:243)}}
           \{{ at org.jenkinsci.plugins.workflow.flow.FlowExecutionListener.fireResumed(FlowExecutionListener.java:84)}}
           \{{ at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:567)}}
           \{{ at hudson.model.RunMap.retrieve(RunMap.java:231)}}
           \{{ at hudson.model.RunMap.retrieve(RunMap.java:58)}}
           \{{ at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:506)}}
           \{{ at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:488)}}
           \{{ at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:386)}}
           \{{ at hudson.model.RunMap.getById(RunMap.java:211)}}
           \{{ at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:948)}}
           \{{ at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:959)}}
           \{{ at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getExecution(CpsStepContext.java:217)}}
           \{{ at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:242)}}
           \{{ at org.jenkinsci.plugins.workflow.cps.CpsStepContext.access$000(CpsStepContext.java:97)}}
           \{{ at org.jenkinsci.plugins.workflow.cps.CpsStepContext$1.call(CpsStepContext.java:263)}}
           \{{ at org.jenkinsci.plugins.workflow.cps.CpsStepContext$1.call(CpsStepContext.java:261)}}
           \{{ at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)}}
           \{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
           \{{ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
           \{{ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
           \{{ at java.lang.Thread.run(Thread.java:748)}}
           {{, Splunk data monitor thread locked on hudson.model.RunMap@166af3a7 (owned by CpsStepContext.isReady [#2]):}}
           \{{ at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:376)}}
           \{{ at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:228)}}
           \{{ at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:233)}}
           \{{ at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:104)}}
           \{{ at hudson.model.Run.fromExternalizableId(Run.java:2483)}}
           \{{ at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.runForDisplay(ExecutorStepExecution.java:527)}}
           \{{ at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getUrl(ExecutorStepExecution.java:536)}}
           \{{ at com.splunk.splunkjenkins.HealthMonitor.sendPendingQueue(HealthMonitor.java:110)}}
           \{{ at com.splunk.splunkjenkins.HealthMonitor.execute(HealthMonitor.java:44)}}
           \{{ at hudson.model.AsyncPeriodicWork.lambda$doRun$0(AsyncPeriodicWork.java:101)}}
           \{{ at hudson.model.AsyncPeriodicWork$$Lambda$545/292627145.run(Unknown Source)}}
           \{{ at java.lang.Thread.run(Thread.java:748)}}
           {{]]}}
          New: Following the update to the latest LTE version my Jenkins instance would hang during startup and the process would be unresponsive so that {{systemctl stop}} and even a plain {{kill}} would not remove it. The logs would contain an error message about a thread deadlock (see below). If it's relevant, there was a job in progress which got suspended when the controller was stopped for the upgrade.

          I tried restarting several times, but the same thing happened each time. I then tried downgrading the jenkins package to the previous version but that hit the same error. Restoring from a snapshot allowed me to return to the previous version.

           

          The following error would appear in the logs:

          {code:none}
          WARNING j.m.api.Metrics$HealthChecker#execute: Some health checks are reporting as unhealthy: [thread-deadlock : [AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#26] locked on hudson.model.RunMap@166af3a7 (owned by CpsStepContext.isReady [#2]):
          at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:376)
          at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:228)
          at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:233)
          at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:104)
          at jenkins.model.PeepholePermalink.resolve(PeepholePermalink.java:103)
          at hudson.model.Job.getLastSuccessfulBuild(Job.java:947)
          at hudson.model.Job.getEstimatedDurationCandidates(Job.java:1019)
          at hudson.model.Job.getEstimatedDuration(Job.java:1053)
          at hudson.model.Run.getEstimatedDuration(Run.java:2496)
          at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getEstimatedDuration(ExecutorStepExecution.java:696)
          at hudson.model.queue.MappingWorksheet.<init>(MappingWorksheet.java:327)
          at hudson.model.queue.MappingWorksheet.<init>(MappingWorksheet.java:312)
          at hudson.model.Queue.maintain(Queue.java:1645)
          at hudson.model.Queue$1.call(Queue.java:325)
          at hudson.model.Queue$1.call(Queue.java:322)
          at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:107)
          at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:97)
          at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:121)
          at java.lang.Thread.run(Thread.java:748)
          , CpsStepContext.isReady [#2] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@18965682 (owned by AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#26]):
          at sun.misc.Unsafe.park(Native Method)
          at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
          at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
          at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
          at hudson.model.Queue.schedule2(Queue.java:567)
          at hudson.model.Queue.schedule2(Queue.java:693)
          at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution.start(ExecutorStepExecution.java:104)
          at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution.onResume(ExecutorStepExecution.java:210)
          at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener$1.onSuccess(FlowExecutionList.java:265)
          at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener$1.onSuccess(FlowExecutionList.java:243)
          at com.google.common.util.concurrent.Futures$6.run(Futures.java:975)
          at org.jenkinsci.plugins.workflow.flow.DirectExecutor.execute(DirectExecutor.java:33)
          at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
          at com.google.common.util.concurrent.ExecutionList.add(ExecutionList.java:105)
          at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:155)
          at com.google.common.util.concurrent.Futures.addCallback(Futures.java:985)
          at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener.onResumed(FlowExecutionList.java:243)
          at org.jenkinsci.plugins.workflow.flow.FlowExecutionListener.fireResumed(FlowExecutionListener.java:84)
          at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:567)
          at hudson.model.RunMap.retrieve(RunMap.java:231)
          at hudson.model.RunMap.retrieve(RunMap.java:58)
          at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:506)
          at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:488)
          at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:386)
          at hudson.model.RunMap.getById(RunMap.java:211)
          at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:948)
          at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:959)
          at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getExecution(CpsStepContext.java:217)
          at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:242)
          at org.jenkinsci.plugins.workflow.cps.CpsStepContext.access$000(CpsStepContext.java:97)
          at org.jenkinsci.plugins.workflow.cps.CpsStepContext$1.call(CpsStepContext.java:263)
          at org.jenkinsci.plugins.workflow.cps.CpsStepContext$1.call(CpsStepContext.java:261)
          at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          , Splunk data monitor thread locked on hudson.model.RunMap@166af3a7 (owned by CpsStepContext.isReady [#2]):
          at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:376)
          at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:228)
          at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:233)
          at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:104)
          at hudson.model.Run.fromExternalizableId(Run.java:2483)
          at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.runForDisplay(ExecutorStepExecution.java:527)
          at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getUrl(ExecutorStepExecution.java:536)
          at com.splunk.splunkjenkins.HealthMonitor.sendPendingQueue(HealthMonitor.java:110)
          at com.splunk.splunkjenkins.HealthMonitor.execute(HealthMonitor.java:44)
          at hudson.model.AsyncPeriodicWork.lambda$doRun$0(AsyncPeriodicWork.java:101)
          at hudson.model.AsyncPeriodicWork$$Lambda$545/292627145.run(Unknown Source)
          at java.lang.Thread.run(Thread.java:748)
          {code}

          Devin Nusbaum added a comment -

          One interesting thing about the thread dump is that the Pipeline is resuming because something called CpsStepContext.isReady (some method in ExecutorStepExecution.PlaceholderTask?) rather than because FlowExecutionList$ItemListenerImpl.onLoaded ran, which is what we would normally expect. organised_chaos do you have the full Jenkins log from when the error happened? I am curious to understand the timing of the deadlock in relation to Jenkins starting up.

          I was not able to reproduce the deadlock myself in a test, but from what I can tell switching from MoreExecutors.directExecutor to Timer.get in ResumeStepExecutionListener seems to work.

          Devin Nusbaum added a comment - One interesting thing about the thread dump is that the Pipeline is resuming because something called CpsStepContext.isReady (some method in ExecutorStepExecution.PlaceholderTask ?) rather than because  FlowExecutionList$ItemListenerImpl.onLoaded ran, which is what we would normally expect. organised_chaos  do you have the full Jenkins log from when the error happened? I am curious to understand the timing of the deadlock in relation to Jenkins starting up. I was not able to reproduce the deadlock myself in a test, but from what I can tell switching from MoreExecutors.directExecutor to Timer.get in ResumeStepExecutionListener seems to work.

            dnusbaum Devin Nusbaum
            organised_chaos James Robson
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: