-
Bug
-
Resolution: Fixed
-
Critical
-
Jenkins 1.598-SNAPSHOT (compiled from latest source code 6th Jan 2015), Docker plugin 0.8, Durable Task Plugin 1.1
When a node is removed (synchronized Jenkins.removeNode()), then eventually Computer.setNumExecutors() is called, attempting to lock the Queue. If, at the same time, a CloudRetentionStrategy.check() runs and determines that a node should be terminated, it locks the Queue before calling Jenkins.removeNode() and attempting to get a lock on the Jenkins object.
A thread dump (deadlock.tdump) is attached which shows the deadlock.
We're using DockerComputers that use the OnceRetentionStrategy, which means that nodes are removed every time a task completes, so the potential for this deadlock occurring is quite high (we experience it multiple times per day).