-
Bug
-
Resolution: Fixed
-
Blocker
-
jenkins: 2.319.1
OS: ubuntu 20.04
Java: 1.8.0_292
-
-
workflow-api 1108.v57edf648f5d4 and workflow-durable-task-step 1107.v5dab75aaccbd
Following the update to the latest LTE version my Jenkins instance would hang during startup and the process would be unresponsive so that systemctl stop and even a plain kill would not remove it. The logs would contain an error message about a thread deadlock (see below). If it's relevant, there was a job in progress which got suspended when the controller was stopped for the upgrade.
I tried restarting several times, but the same thing happened each time. I then tried downgrading the jenkins package to the previous version but that hit the same error. Restoring from a snapshot allowed me to return to the previous version.
The following error would appear in the logs:
WARNING j.m.api.Metrics$HealthChecker#execute: Some health checks are reporting as unhealthy: [thread-deadlock : [AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#26] locked on hudson.model.RunMap@166af3a7 (owned by CpsStepContext.isReady [#2]): at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:376) at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:228) at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:233) at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:104) at jenkins.model.PeepholePermalink.resolve(PeepholePermalink.java:103) at hudson.model.Job.getLastSuccessfulBuild(Job.java:947) at hudson.model.Job.getEstimatedDurationCandidates(Job.java:1019) at hudson.model.Job.getEstimatedDuration(Job.java:1053) at hudson.model.Run.getEstimatedDuration(Run.java:2496) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getEstimatedDuration(ExecutorStepExecution.java:696) at hudson.model.queue.MappingWorksheet.<init>(MappingWorksheet.java:327) at hudson.model.queue.MappingWorksheet.<init>(MappingWorksheet.java:312) at hudson.model.Queue.maintain(Queue.java:1645) at hudson.model.Queue$1.call(Queue.java:325) at hudson.model.Queue$1.call(Queue.java:322) at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:107) at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:97) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:121) at java.lang.Thread.run(Thread.java:748) , CpsStepContext.isReady [#2] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@18965682 (owned by AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#26]): at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue.schedule2(Queue.java:567) at hudson.model.Queue.schedule2(Queue.java:693) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution.start(ExecutorStepExecution.java:104) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution.onResume(ExecutorStepExecution.java:210) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener$1.onSuccess(FlowExecutionList.java:265) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener$1.onSuccess(FlowExecutionList.java:243) at com.google.common.util.concurrent.Futures$6.run(Futures.java:975) at org.jenkinsci.plugins.workflow.flow.DirectExecutor.execute(DirectExecutor.java:33) at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149) at com.google.common.util.concurrent.ExecutionList.add(ExecutionList.java:105) at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:155) at com.google.common.util.concurrent.Futures.addCallback(Futures.java:985) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener.onResumed(FlowExecutionList.java:243) at org.jenkinsci.plugins.workflow.flow.FlowExecutionListener.fireResumed(FlowExecutionListener.java:84) at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:567) at hudson.model.RunMap.retrieve(RunMap.java:231) at hudson.model.RunMap.retrieve(RunMap.java:58) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:506) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:488) at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:386) at hudson.model.RunMap.getById(RunMap.java:211) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:948) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:959) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getExecution(CpsStepContext.java:217) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:242) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.access$000(CpsStepContext.java:97) at org.jenkinsci.plugins.workflow.cps.CpsStepContext$1.call(CpsStepContext.java:263) at org.jenkinsci.plugins.workflow.cps.CpsStepContext$1.call(CpsStepContext.java:261) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) , Splunk data monitor thread locked on hudson.model.RunMap@166af3a7 (owned by CpsStepContext.isReady [#2]): at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:376) at jenkins.model.lazy.LazyBuildMixIn.getBuildByNumber(LazyBuildMixIn.java:228) at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:233) at org.jenkinsci.plugins.workflow.job.WorkflowJob.getBuildByNumber(WorkflowJob.java:104) at hudson.model.Run.fromExternalizableId(Run.java:2483) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.runForDisplay(ExecutorStepExecution.java:527) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getUrl(ExecutorStepExecution.java:536) at com.splunk.splunkjenkins.HealthMonitor.sendPendingQueue(HealthMonitor.java:110) at com.splunk.splunkjenkins.HealthMonitor.execute(HealthMonitor.java:44) at hudson.model.AsyncPeriodicWork.lambda$doRun$0(AsyncPeriodicWork.java:101) at hudson.model.AsyncPeriodicWork$$Lambda$545/292627145.run(Unknown Source) at java.lang.Thread.run(Thread.java:748)
- is caused by
-
JENKINS-67164 Pipelines missing from FlowExecutionList hang forever after resuming
- Resolved
- links to