Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64931

Deadlock with FutureImpl and Queue

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • 2.283 released Mar 9, 2021, 2.277.2 released Apr 7, 2021

      support-core-plugin has detected a Deadlock

       

      ============== Deadlock Found ==============

      "Executor #-1 for master : executing xxxx.xxxx@57a2067d" id=6472210 (0x62c212) state=WAITING cpu=0% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue._withLock(Queue.java:1437) at hudson.model.ResourceController.execute(ResourceController.java:81) at hudson.model.Executor.run(Executor.java:428)

      "Executor #-1 for master" id=6472207 (0x62c20f) state=BLOCKED cpu=0% - waiting to lock <0x195cc02c> (a hudson.model.queue.FutureImpl) owned by "Executor #-1 for master : executing xxxxxxxx #183" id=6218806 (0x5ee436) at hudson.model.queue.FutureImpl.addExecutor(FutureImpl.java:96) at hudson.model.queue.WorkUnit.setExecutor(WorkUnit.java:73) at hudson.model.Executor$1.call(Executor.java:359) at hudson.model.Executor$1.call(Executor.java:346) at hudson.model.Queue._withLock(Queue.java:1458) at hudson.model.Queue.withLock(Queue.java:1319) at hudson.model.Executor.run(Executor.java:346)

      "Executor #-1 for master : executing xxxxxxxxx #183" id=6218806 (0x5ee436) state=WAITING cpu=76% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue.cancel(Queue.java:732) at hudson.model.queue.FutureImpl.cancel(FutureImpl.java:82)

       

       My idea (but it’s complicated to write those race conditions as a unit test…):

      Thread A is calling Queue._withLock (so get the lock instance field ReentrantLock lock ) (https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/Queue.java#L1381)

      Thread B is calling FutureImpl.cancel this method have a synchronized block on the Queue instance (same as above as it’s unique instance in Jenkins) https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74

      Thread B is holding queue instance and try to cancel method from Queue the cancel method try to get the lock from the instance field but this one is already hold by Thread A.

      Thread A try to return the lock as Thread B have a synchronized on Queue instance.

      The solution seems to remove the synchronized block on the Queue instance here https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74 as there is a use of a lock in Queue.

      Looks to be a safe change (again writing a unit test is not easy to prove it)

      The other solution is to have the caller not using FutureImpl.cancel but using queue.cancel 

            olamy Olivier Lamy
            olamy Olivier Lamy
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: