-
Bug
-
Resolution: Fixed
-
Major
-
-
2.283 released Mar 9, 2021, 2.277.2 released Apr 7, 2021
support-core-plugin has detected a Deadlock
============== Deadlock Found ==============
"Executor #-1 for master : executing xxxx.xxxx@57a2067d" id=6472210 (0x62c212) state=WAITING cpu=0% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue._withLock(Queue.java:1437) at hudson.model.ResourceController.execute(ResourceController.java:81) at hudson.model.Executor.run(Executor.java:428)
"Executor #-1 for master" id=6472207 (0x62c20f) state=BLOCKED cpu=0% - waiting to lock <0x195cc02c> (a hudson.model.queue.FutureImpl) owned by "Executor #-1 for master : executing xxxxxxxx #183" id=6218806 (0x5ee436) at hudson.model.queue.FutureImpl.addExecutor(FutureImpl.java:96) at hudson.model.queue.WorkUnit.setExecutor(WorkUnit.java:73) at hudson.model.Executor$1.call(Executor.java:359) at hudson.model.Executor$1.call(Executor.java:346) at hudson.model.Queue._withLock(Queue.java:1458) at hudson.model.Queue.withLock(Queue.java:1319) at hudson.model.Executor.run(Executor.java:346)
"Executor #-1 for master : executing xxxxxxxxx #183" id=6218806 (0x5ee436) state=WAITING cpu=76% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue.cancel(Queue.java:732) at hudson.model.queue.FutureImpl.cancel(FutureImpl.java:82)
My idea (but it’s complicated to write those race conditions as a unit test…):
Thread A is calling Queue._withLock (so get the lock instance field ReentrantLock lock ) (https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/Queue.java#L1381)
Thread B is calling FutureImpl.cancel this method have a synchronized block on the Queue instance (same as above as it’s unique instance in Jenkins) https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74
Thread B is holding queue instance and try to cancel method from Queue the cancel method try to get the lock from the instance field but this one is already hold by Thread A.
Thread A try to return the lock as Thread B have a synchronized on Queue instance.
The solution seems to remove the synchronized block on the Queue instance here https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74 as there is a use of a lock in Queue.
Looks to be a safe change (again writing a unit test is not easy to prove it)
The other solution is to have the caller not using FutureImpl.cancel but using queue.cancel