• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • 2.283 released Mar 9, 2021, 2.277.2 released Apr 7, 2021

      support-core-plugin has detected a Deadlock

       

      ============== Deadlock Found ==============

      "Executor #-1 for master : executing xxxx.xxxx@57a2067d" id=6472210 (0x62c212) state=WAITING cpu=0% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue._withLock(Queue.java:1437) at hudson.model.ResourceController.execute(ResourceController.java:81) at hudson.model.Executor.run(Executor.java:428)

      "Executor #-1 for master" id=6472207 (0x62c20f) state=BLOCKED cpu=0% - waiting to lock <0x195cc02c> (a hudson.model.queue.FutureImpl) owned by "Executor #-1 for master : executing xxxxxxxx #183" id=6218806 (0x5ee436) at hudson.model.queue.FutureImpl.addExecutor(FutureImpl.java:96) at hudson.model.queue.WorkUnit.setExecutor(WorkUnit.java:73) at hudson.model.Executor$1.call(Executor.java:359) at hudson.model.Executor$1.call(Executor.java:346) at hudson.model.Queue._withLock(Queue.java:1458) at hudson.model.Queue.withLock(Queue.java:1319) at hudson.model.Executor.run(Executor.java:346)

      "Executor #-1 for master : executing xxxxxxxxx #183" id=6218806 (0x5ee436) state=WAITING cpu=76% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue.cancel(Queue.java:732) at hudson.model.queue.FutureImpl.cancel(FutureImpl.java:82)

       

       My idea (but it’s complicated to write those race conditions as a unit test…):

      Thread A is calling Queue._withLock (so get the lock instance field ReentrantLock lock ) (https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/Queue.java#L1381)

      Thread B is calling FutureImpl.cancel this method have a synchronized block on the Queue instance (same as above as it’s unique instance in Jenkins) https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74

      Thread B is holding queue instance and try to cancel method from Queue the cancel method try to get the lock from the instance field but this one is already hold by Thread A.

      Thread A try to return the lock as Thread B have a synchronized on Queue instance.

      The solution seems to remove the synchronized block on the Queue instance here https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74 as there is a use of a lock in Queue.

      Looks to be a safe change (again writing a unit test is not easy to prove it)

      The other solution is to have the caller not using FutureImpl.cancel but using queue.cancel 

          [JENKINS-64931] Deadlock with FutureImpl and Queue

          Olivier Lamy created issue -

          Olivier Lamy added a comment -

          PR https://github.com/jenkinsci/jenkins/pull/5305 

          This commit introduced a new strategy using a Lock https://github.com/jenkinsci/jenkins/commit/92147c3597308bc05e6448ccc41409fcc7c05fd7 but didn't change the FutureImpl class to not use anymore synchronized on Queue instance.

          possible workaround is to use queue.cancel(FutureImpl.task) so this will use the Lock from Queue.

          Olivier Lamy added a comment - PR https://github.com/jenkinsci/jenkins/pull/5305   This commit introduced a new strategy using a Lock https://github.com/jenkinsci/jenkins/commit/92147c3597308bc05e6448ccc41409fcc7c05fd7  but didn't change the FutureImpl class to not use anymore synchronized on Queue instance. possible workaround is to use queue.cancel(FutureImpl.task) so this will use the Lock from Queue.
          Olivier Lamy made changes -
          Description Original: support-core-plugin has detected a Deadlock

           
          {quote}============== Deadlock Found ==============

          "Executor #-1 for master : executing com.cloudbees.opscenter.server.clusterops.execution.ItemActions@57a2067d" id=6472210 (0x62c212) state=WAITING cpu=0% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue._withLock(Queue.java:1437) at hudson.model.ResourceController.execute(ResourceController.java:81) at hudson.model.Executor.run(Executor.java:428)

          "Executor #-1 for master" id=6472207 (0x62c20f) state=BLOCKED cpu=0% - waiting to lock <0x195cc02c> (a hudson.model.queue.FutureImpl) owned by "Executor #-1 for master : executing cloudbeescore-master-backup #183" id=6218806 (0x5ee436) at hudson.model.queue.FutureImpl.addExecutor(FutureImpl.java:96) at hudson.model.queue.WorkUnit.setExecutor(WorkUnit.java:73) at hudson.model.Executor$1.call(Executor.java:359) at hudson.model.Executor$1.call(Executor.java:346) at hudson.model.Queue._withLock(Queue.java:1458) at hudson.model.Queue.withLock(Queue.java:1319) at hudson.model.Executor.run(Executor.java:346)

          "Executor #-1 for master : executing cloudbeescore-master-backup #183" id=6218806 (0x5ee436) state=WAITING cpu=76% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue.cancel(Queue.java:732) at hudson.model.queue.FutureImpl.cancel(FutureImpl.java:82)
          {quote}
           

           My idea (but it’s complicated to write those race conditions as a unit test…):

          Thread A is calling Queue._withLock (so get the lock instance field ReentrantLock lock ) ([https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/Queue.java#L1381])

          Thread B is calling FutureImpl.cancel this method have a synchronized block on the Queue instance (same as above as it’s unique instance in Jenkins) [https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74]

          Thread B is holding queue instance and try to cancel method from Queue the cancel method try to get the lock from the instance field but this one is already hold by Thread A.

          Thread A try to return the lock as Thread B have a synchronized on Queue instance.

          The solution seems to remove the synchronized block on the Queue instance here [https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74] as there is a use of a lock in Queue.

          Looks to be a safe change (again writing a unit test is not easy to prove it)

          The other solution is to have the caller not using FutureImpl.cancel but using queue.cancel 
          New: support-core-plugin has detected a Deadlock

           
          {quote}============== Deadlock Found ==============

          "Executor #-1 for master : executing xxxx.xxxx@57a2067d" id=6472210 (0x62c212) state=WAITING cpu=0% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue._withLock(Queue.java:1437) at hudson.model.ResourceController.execute(ResourceController.java:81) at hudson.model.Executor.run(Executor.java:428)

          "Executor #-1 for master" id=6472207 (0x62c20f) state=BLOCKED cpu=0% - waiting to lock <0x195cc02c> (a hudson.model.queue.FutureImpl) owned by "Executor #-1 for master : executing xxxxxxxx #183" id=6218806 (0x5ee436) at hudson.model.queue.FutureImpl.addExecutor(FutureImpl.java:96) at hudson.model.queue.WorkUnit.setExecutor(WorkUnit.java:73) at hudson.model.Executor$1.call(Executor.java:359) at hudson.model.Executor$1.call(Executor.java:346) at hudson.model.Queue._withLock(Queue.java:1458) at hudson.model.Queue.withLock(Queue.java:1319) at hudson.model.Executor.run(Executor.java:346)

          "Executor #-1 for master : executing xxxxxxxxx #183" id=6218806 (0x5ee436) state=WAITING cpu=76% - waiting on <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) - locked <0x270b04ac> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "Executor #-1 for master" id=6472207 (0x62c20f) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue.cancel(Queue.java:732) at hudson.model.queue.FutureImpl.cancel(FutureImpl.java:82)
          {quote}
           

           My idea (but it’s complicated to write those race conditions as a unit test…):

          Thread A is calling Queue._withLock (so get the lock instance field ReentrantLock lock ) ([https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/Queue.java#L1381])

          Thread B is calling FutureImpl.cancel this method have a synchronized block on the Queue instance (same as above as it’s unique instance in Jenkins) [https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74]

          Thread B is holding queue instance and try to cancel method from Queue the cancel method try to get the lock from the instance field but this one is already hold by Thread A.

          Thread A try to return the lock as Thread B have a synchronized on Queue instance.

          The solution seems to remove the synchronized block on the Queue instance here [https://github.com/jenkinsci/jenkins/blob/e065e79d9b19822593260f9db27d4e5b16939ef3/core/src/main/java/hudson/model/queue/FutureImpl.java#L74] as there is a use of a lock in Queue.

          Looks to be a safe change (again writing a unit test is not easy to prove it)

          The other solution is to have the caller not using FutureImpl.cancel but using queue.cancel 
          Oleg Nenashev made changes -
          Labels New: lts-candidate
          Mark Waite made changes -
          Released As New: 2.283 released Mar 9, 2021
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Closed [ 6 ]
          Tim Jacomb made changes -
          Issue Type Original: Task [ 3 ] New: Bug [ 1 ]
          Mark Waite made changes -
          Labels Original: lts-candidate New: 2.277.2-fixed lts-candidate
          Mark Waite made changes -
          Released As Original: 2.283 released Mar 9, 2021 New: 2.283 released Mar 9, 2021, 2.277.2 released Apr 7, 2021
          Mark Waite made changes -
          Labels Original: 2.277.2-fixed lts-candidate New: 2.277.2-fixed

            olamy Olivier Lamy
            olamy Olivier Lamy
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: