Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
Jenkins 1.598-SNAPSHOT (compiled from latest source code 6th Jan 2015), Docker plugin 0.8, Durable Task Plugin 1.1
Description
When a node is removed (synchronized Jenkins.removeNode()), then eventually Computer.setNumExecutors() is called, attempting to lock the Queue. If, at the same time, a CloudRetentionStrategy.check() runs and determines that a node should be terminated, it locks the Queue before calling Jenkins.removeNode() and attempting to get a lock on the Jenkins object.
A thread dump (deadlock.tdump) is attached which shows the deadlock.
We're using DockerComputers that use the OnceRetentionStrategy, which means that nodes are removed every time a task completes, so the potential for this deadlock occurring is quite high (we experience it multiple times per day).
Code changed in jenkins
User: Stephen Connolly
Path:
src/main/java/org/jenkinsci/plugins/durabletask/executors/OnceRetentionStrategy.java
http://jenkins-ci.org/commit/durable-task-plugin/18733f566e3ddb1dafe32dfb16025586cb76306f
Log:
JENKINS-26380Terminate nodes correctly