Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67351

thread deadlock after update to 2.319.1

XMLWordPrintable

    • workflow-api 1108.v57edf648f5d4 and workflow-durable-task-step 1107.v5dab75aaccbd

      Following the update to the latest LTE version my Jenkins instance would hang during startup and the process would be unresponsive so that systemctl stop and even a plain kill would not remove it. The logs would contain an error message about a thread deadlock (see below). If it's relevant, there was a job in progress which got suspended when the controller was stopped for the upgrade.

      I tried restarting several times, but the same thing happened each time. I then tried downgrading the jenkins package to the previous version but that hit the same error. Restoring from a snapshot allowed me to return to the previous version.

       

      The following error would appear in the logs:

      WARNING j.m.api.Metrics$HealthChecker#execute: Some health checks are reporting as unhealthy: [thread-deadlock : [AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#26] locked on hudson.model.RunMap@166af3a7 (owned by CpsStepContext.isReady [#2]):
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:376)
      	at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:228)
      	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:233)
      	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:104)
      	at jenkins.model.PeepholePermalink.resolve(PeepholePermalink.java:103)
      	at hudson.model.Job.getLastSuccessfulBuild(Job.java:947)
      	at hudson.model.Job.getEstimatedDurationCandidates(Job.java:1019)
      	at hudson.model.Job.getEstimatedDuration(Job.java:1053)
      	at hudson.model.Run.getEstimatedDuration(Run.java:2496)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getEstimatedDuration(ExecutorStepExecution.java:696)
      	at hudson.model.queue.MappingWorksheet.<init>(MappingWorksheet.java:327)
      	at hudson.model.queue.MappingWorksheet.<init>(MappingWorksheet.java:312)
      	at hudson.model.Queue.maintain(Queue.java:1645)
      	at hudson.model.Queue$1.call(Queue.java:325)
      	at hudson.model.Queue$1.call(Queue.java:322)
      	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:107)
      	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:97)
      	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:121)
      	at java.lang.Thread.run(Thread.java:748)
      , CpsStepContext.isReady [#2] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@18965682 (owned by AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#26]):
      	at sun.misc.Unsafe.park(Native Method)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
      	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
      	at hudson.model.Queue.schedule2(Queue.java:567)
      	at hudson.model.Queue.schedule2(Queue.java:693)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution.start(ExecutorStepExecution.java:104)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution.onResume(ExecutorStepExecution.java:210)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener$1.onSuccess(FlowExecutionList.java:265)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener$1.onSuccess(FlowExecutionList.java:243)
      	at com.google.common.util.concurrent.Futures$6.run(Futures.java:975)
      	at org.jenkinsci.plugins.workflow.flow.DirectExecutor.execute(DirectExecutor.java:33)
      	at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
      	at com.google.common.util.concurrent.ExecutionList.add(ExecutionList.java:105)
      	at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:155)
      	at com.google.common.util.concurrent.Futures.addCallback(Futures.java:985)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener.onResumed(FlowExecutionList.java:243)
      	at org.jenkinsci.plugins.workflow.flow.FlowExecutionListener.fireResumed(FlowExecutionListener.java:84)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:567)
      	at hudson.model.RunMap.retrieve(RunMap.java:231)
      	at hudson.model.RunMap.retrieve(RunMap.java:58)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:506)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:488)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:386)
      	at hudson.model.RunMap.getById(RunMap.java:211)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:948)
      	at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:959)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getExecution(CpsStepContext.java:217)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:242)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.access$000(CpsStepContext.java:97)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext$1.call(CpsStepContext.java:263)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext$1.call(CpsStepContext.java:261)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      , Splunk data monitor thread locked on hudson.model.RunMap@166af3a7 (owned by CpsStepContext.isReady [#2]):
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:376)
      	at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:228)
      	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:233)
      	at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:104)
      	at hudson.model.Run.fromExternalizableId(Run.java:2483)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.runForDisplay(ExecutorStepExecution.java:527)
      	at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getUrl(ExecutorStepExecution.java:536)
      	at com.splunk.splunkjenkins.HealthMonitor.sendPendingQueue(HealthMonitor.java:110)
      	at com.splunk.splunkjenkins.HealthMonitor.execute(HealthMonitor.java:44)
      	at hudson.model.AsyncPeriodicWork.lambda$doRun$0(AsyncPeriodicWork.java:101)
      	at hudson.model.AsyncPeriodicWork$$Lambda$545/292627145.run(Unknown Source)
      	at java.lang.Thread.run(Thread.java:748)
      

            dnusbaum Devin Nusbaum
            organised_chaos James Robson
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: