Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64931

Deadlock with FutureImpl and Queue

    XMLWordPrintable

    Details

    • Similar Issues:
    • Released As:
      2.283 released Mar 9, 2021, 2.277.2 released Apr 7, 2021

      Description

      support-core-plugin has detected a Deadlock

       

      ============== Deadlock Found ==============

      "Executor #-1 for master : executing xxxx.xxxx@57a2067d" id=6472210 (0x62c212) state=WAITING cpu=0% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue._withLock(Queue.java:1437) at hudson.model.ResourceController.execute(ResourceController.java:81) at hudson.model.Executor.run(Executor.java:428)

      "Executor #-1 for master" id=6472207 (0x62c20f) state=BLOCKED cpu=0% - waiting to lock <0x195cc02c> (a hudson.model.queue.FutureImpl) owned by "Executor #-1 for master : executing xxxxxxxx #183" id=6218806 (0x5ee436) at hudson.model.queue.FutureImpl.addExecutor(FutureImpl.java:96) at hudson.model.queue.WorkUnit.setExecutor(WorkUnit.java:73) at hudson.model.Executor$1.call(Executor.java:359) at hudson.model.Executor$1.call(Executor.java:346) at hudson.model.Queue._withLock(Queue.java:1458) at hudson.model.Queue.withLock(Queue.java:1319) at hudson.model.Executor.run(Executor.java:346)

      "Executor #-1 for master : executing xxxxxxxxx #183" id=6218806 (0x5ee436) state=WAITING cpu=76% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue.cancel(Queue.java:732) at hudson.model.queue.FutureImpl.cancel(FutureImpl.java:82)

       

       My idea (but it’s complicated to write those race conditions as a unit test…):

      Thread A is calling Queue._withLock (so get the lock instance field ReentrantLock lock ) (https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/Queue.java#L1381)

      Thread B is calling FutureImpl.cancel this method have a synchronized block on the Queue instance (same as above as it’s unique instance in Jenkins) https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74

      Thread B is holding queue instance and try to cancel method from Queue the cancel method try to get the lock from the instance field but this one is already hold by Thread A.

      Thread A try to return the lock as Thread B have a synchronized on Queue instance.

      The solution seems to remove the synchronized block on the Queue instance here https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74 as there is a use of a lock in Queue.

      Looks to be a safe change (again writing a unit test is not easy to prove it)

      The other solution is to have the caller not using FutureImpl.cancel but using queue.cancel 

        Attachments

          Activity

          olamy Olivier Lamy created issue -
          olamy Olivier Lamy made changes -
          Field Original Value New Value
          Description support-core-plugin has detected a Deadlock

           
          {quote}============== Deadlock Found ==============

          "Executor #-1 for master : executing com.cloudbees.opscenter.server.clusterops.execution.ItemActions@57a2067d" id=6472210 (0x62c212) state=WAITING cpu=0% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue._withLock(Queue.java:1437) at hudson.model.ResourceController.execute(ResourceController.java:81) at hudson.model.Executor.run(Executor.java:428)

          "Executor #-1 for master" id=6472207 (0x62c20f) state=BLOCKED cpu=0% - waiting to lock <0x195cc02c> (a hudson.model.queue.FutureImpl) owned by "Executor #-1 for master : executing cloudbeescore-master-backup #183" id=6218806 (0x5ee436) at hudson.model.queue.FutureImpl.addExecutor(FutureImpl.java:96) at hudson.model.queue.WorkUnit.setExecutor(WorkUnit.java:73) at hudson.model.Executor$1.call(Executor.java:359) at hudson.model.Executor$1.call(Executor.java:346) at hudson.model.Queue._withLock(Queue.java:1458) at hudson.model.Queue.withLock(Queue.java:1319) at hudson.model.Executor.run(Executor.java:346)

          "Executor #-1 for master : executing cloudbeescore-master-backup #183" id=6218806 (0x5ee436) state=WAITING cpu=76% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue.cancel(Queue.java:732) at hudson.model.queue.FutureImpl.cancel(FutureImpl.java:82)
          {quote}
           

           My idea (but it’s complicated to write those race conditions as a unit test…):

          Thread A is calling Queue._withLock (so get the lock instance field ReentrantLock lock ) ([https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/Queue.java#L1381])

          Thread B is calling FutureImpl.cancel this method have a synchronized block on the Queue instance (same as above as it’s unique instance in Jenkins) [https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74]

          Thread B is holding queue instance and try to cancel method from Queue the cancel method try to get the lock from the instance field but this one is already hold by Thread A.

          Thread A try to return the lock as Thread B have a synchronized on Queue instance.

          The solution seems to remove the synchronized block on the Queue instance here [https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74] as there is a use of a lock in Queue.

          Looks to be a safe change (again writing a unit test is not easy to prove it)

          The other solution is to have the caller not using FutureImpl.cancel but using queue.cancel 
          support-core-plugin has detected a Deadlock

           
          {quote}============== Deadlock Found ==============

          "Executor #-1 for master : executing xxxx.xxxx@57a2067d" id=6472210 (0x62c212) state=WAITING cpu=0% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue._withLock(Queue.java:1437) at hudson.model.ResourceController.execute(ResourceController.java:81) at hudson.model.Executor.run(Executor.java:428)

          "Executor #-1 for master" id=6472207 (0x62c20f) state=BLOCKED cpu=0% - waiting to lock <0x195cc02c> (a hudson.model.queue.FutureImpl) owned by "Executor #-1 for master : executing xxxxxxxx #183" id=6218806 (0x5ee436) at hudson.model.queue.FutureImpl.addExecutor(FutureImpl.java:96) at hudson.model.queue.WorkUnit.setExecutor(WorkUnit.java:73) at hudson.model.Executor$1.call(Executor.java:359) at hudson.model.Executor$1.call(Executor.java:346) at hudson.model.Queue._withLock(Queue.java:1458) at hudson.model.Queue.withLock(Queue.java:1319) at hudson.model.Executor.run(Executor.java:346)

          "Executor #-1 for master : executing xxxxxxxxx #183" id=6218806 (0x5ee436) state=WAITING cpu=76% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue.cancel(Queue.java:732) at hudson.model.queue.FutureImpl.cancel(FutureImpl.java:82)
          {quote}
           

           My idea (but it’s complicated to write those race conditions as a unit test…):

          Thread A is calling Queue._withLock (so get the lock instance field ReentrantLock lock ) ([https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/Queue.java#L1381])

          Thread B is calling FutureImpl.cancel this method have a synchronized block on the Queue instance (same as above as it’s unique instance in Jenkins) [https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74]

          Thread B is holding queue instance and try to cancel method from Queue the cancel method try to get the lock from the instance field but this one is already hold by Thread A.

          Thread A try to return the lock as Thread B have a synchronized on Queue instance.

          The solution seems to remove the synchronized block on the Queue instance here [https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74] as there is a use of a lock in Queue.

          Looks to be a safe change (again writing a unit test is not easy to prove it)

          The other solution is to have the caller not using FutureImpl.cancel but using queue.cancel 
          oleg_nenashev Oleg Nenashev made changes -
          Labels lts-candidate
          markewaite Mark Waite made changes -
          Released As 2.283 released Mar 9, 2021
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Closed [ 6 ]
          timja Tim Jacomb made changes -
          Issue Type Task [ 3 ] Bug [ 1 ]
          markewaite Mark Waite made changes -
          Labels lts-candidate 2.277.2-fixed lts-candidate
          markewaite Mark Waite made changes -
          Released As 2.283 released Mar 9, 2021 2.283 released Mar 9, 2021, 2.277.2 released Apr 7, 2021
          markewaite Mark Waite made changes -
          Labels 2.277.2-fixed lts-candidate 2.277.2-fixed

            People

            Assignee:
            olamy Olivier Lamy
            Reporter:
            olamy Olivier Lamy
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: