Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28840

Deadlock between Queue.maintain and Executor.interrupt

      I reproduced that a couple of times running matrix-project-plugin tests, though it is not a reliable reproducer.

      Java stack information for the threads listed above:
      ===================================================
      "AtmostOneTaskExecutor[hudson.model.Queue$1@6a0918b1] [#13]":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007dccc7200> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
        at hudson.model.Executor.isParking(Executor.java:609)
        at hudson.model.Queue.maintain(Queue.java:1277)
        at hudson.model.Queue$1.call(Queue.java:334)
        at hudson.model.Queue$1.call(Queue.java:331)
        at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:101)
        at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:91)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
        at java.lang.Thread.run(Thread.java:745)
      "Executing testConcurrentBuild(hudson.matrix.MatrixProjectTest)":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007d7bdd660> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
        at hudson.model.Queue._withLock(Queue.java:1205)
        at hudson.model.Queue.withLock(Queue.java:1143)
        at hudson.model.Computer.removeExecutor(Computer.java:977)
        at hudson.model.Executor.interrupt(Executor.java:187)
        at hudson.model.Executor.interrupt(Executor.java:164)
        at hudson.model.Executor.interruptForShutdown(Executor.java:149)
        at hudson.model.Computer.interrupt(Computer.java:1014)
        at jenkins.model.Jenkins.cleanUp(Jenkins.java:2769)
        at org.jvnet.hudson.test.JenkinsRule.after(JenkinsRule.java:460)
        at org.jvnet.hudson.test.JenkinsRule$2.evaluate(JenkinsRule.java:526)
        at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
        at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
        at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
        at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
        at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
        at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
      
      Found 1 deadlock.
      

          [JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt

          Oliver Gondža created issue -
          Stephen Connolly made changes -
          Labels New: lts-candidate
          Stephen Connolly made changes -
          Priority Original: Minor [ 4 ] New: Major [ 3 ]
          Stephen Connolly made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]

          What version of Jenkins was this stack trace from?

          Stephen Connolly added a comment - What version of Jenkins was this stack trace from?

          Oliver Gondža added a comment - This was observed running tests based on [1] , so it is 1.609. [1] https://github.com/jenkinsci/matrix-project-plugin/blob/f314a83be60cc1e1d430d6d86659299f366e5e09/pom.xml
          Jesse Glick made changes -
          Labels Original: lts-candidate New: deadlock lts-candidate regression
          Jesse Glick made changes -
          Labels Original: deadlock lts-candidate regression New: deadlock lts-candidate queue regression
          Jesse Glick made changes -
          Remote Link New: This issue links to "PR 1738 (Web Link)" [ 12943 ]

          Code changed in jenkins
          User: Stephen Connolly
          Path:
          core/src/main/java/hudson/model/Computer.java
          core/src/main/java/hudson/model/Queue.java
          core/src/main/java/jenkins/model/Jenkins.java
          http://jenkins-ci.org/commit/jenkins/6f343dc7c2f0c32e9eb1a0b5d588a2e7ad6f62ba
          Log:
          [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt

          More fun here:

          • All this originates from Executor extending Thread.
          • There is funky logic in the lock handling code of the JVM that makes assumptions
            about how it might proceed with the lock when the thread holding the lock has its
            interrupt flag set.
          • Really it would be better if Executor did not extend Thread as that way we wouldn't
            have to deal with some of that complexity. But OTOH we are where we are and backwards
            compatibility may make such a change not possible without a lot of breakage.
          • Fixing the issue at hand, firstly requires that interrupting a Computer happens with the
            Queue lock held (to speed up tests we have Jenkins.cleanup get the lock for all Computers)
            That prevents the Queue maintain thread from getting caught
          • Secondly, when removing an executor from a computer we process the removal while
            holding the Queue lock, but we move the removal itself to a separate thread if we cannot
            get the Queue lock in order to avoid deadlock.
          • Also add helper methods to wrap tasks to be performed while holding the lock
            and a helper method for Runnables that exposes the tryLock functionality

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/src/main/java/hudson/model/Computer.java core/src/main/java/hudson/model/Queue.java core/src/main/java/jenkins/model/Jenkins.java http://jenkins-ci.org/commit/jenkins/6f343dc7c2f0c32e9eb1a0b5d588a2e7ad6f62ba Log: [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt More fun here: All this originates from Executor extending Thread. There is funky logic in the lock handling code of the JVM that makes assumptions about how it might proceed with the lock when the thread holding the lock has its interrupt flag set. Really it would be better if Executor did not extend Thread as that way we wouldn't have to deal with some of that complexity. But OTOH we are where we are and backwards compatibility may make such a change not possible without a lot of breakage. Fixing the issue at hand, firstly requires that interrupting a Computer happens with the Queue lock held (to speed up tests we have Jenkins.cleanup get the lock for all Computers) That prevents the Queue maintain thread from getting caught Secondly, when removing an executor from a computer we process the removal while holding the Queue lock, but we move the removal itself to a separate thread if we cannot get the Queue lock in order to avoid deadlock. Also add helper methods to wrap tasks to be performed while holding the lock and a helper method for Runnables that exposes the tryLock functionality

            stephenconnolly Stephen Connolly
            olivergondza Oliver Gondža
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: