Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28840

Deadlock between Queue.maintain and Executor.interrupt

      I reproduced that a couple of times running matrix-project-plugin tests, though it is not a reliable reproducer.

      Java stack information for the threads listed above:
      ===================================================
      "AtmostOneTaskExecutor[hudson.model.Queue$1@6a0918b1] [#13]":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007dccc7200> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
        at hudson.model.Executor.isParking(Executor.java:609)
        at hudson.model.Queue.maintain(Queue.java:1277)
        at hudson.model.Queue$1.call(Queue.java:334)
        at hudson.model.Queue$1.call(Queue.java:331)
        at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:101)
        at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:91)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
        at java.lang.Thread.run(Thread.java:745)
      "Executing testConcurrentBuild(hudson.matrix.MatrixProjectTest)":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007d7bdd660> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
        at hudson.model.Queue._withLock(Queue.java:1205)
        at hudson.model.Queue.withLock(Queue.java:1143)
        at hudson.model.Computer.removeExecutor(Computer.java:977)
        at hudson.model.Executor.interrupt(Executor.java:187)
        at hudson.model.Executor.interrupt(Executor.java:164)
        at hudson.model.Executor.interruptForShutdown(Executor.java:149)
        at hudson.model.Computer.interrupt(Computer.java:1014)
        at jenkins.model.Jenkins.cleanUp(Jenkins.java:2769)
        at org.jvnet.hudson.test.JenkinsRule.after(JenkinsRule.java:460)
        at org.jvnet.hudson.test.JenkinsRule$2.evaluate(JenkinsRule.java:526)
        at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
        at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
        at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
        at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
        at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
        at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
      
      Found 1 deadlock.
      

          [JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt

          Code changed in jenkins
          User: Stephen Connolly
          Path:
          core/src/main/java/hudson/model/Computer.java
          core/src/main/java/hudson/model/Queue.java
          core/src/main/java/jenkins/model/Jenkins.java
          http://jenkins-ci.org/commit/jenkins/119fcbbf98c27f0257ac1be02104e0d87acc8728
          Log:
          [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt

          More fun here:

          • All this originates from Executor extending Thread.
          • There is funky logic in the lock handling code of the JVM that makes assumptions
            about how it might proceed with the lock when the thread holding the lock has its
            interrupt flag set.
          • Really it would be better if Executor did not extend Thread as that way we wouldn't
            have to deal with some of that complexity. But OTOH we are where we are and backwards
            compatibility may make such a change not possible without a lot of breakage.
          • Fixing the issue at hand, firstly requires that interrupting a Computer happens with the
            Queue lock held (to speed up tests we have Jenkins.cleanup get the lock for all Computers)
            That prevents the Queue maintain thread from getting caught
          • Secondly, when removing an executor from a computer we process the removal while
            holding the Queue lock, but we move the removal itself to a separate thread if we cannot
            get the Queue lock in order to avoid deadlock.
          • Also add helper methods to wrap tasks to be performed while holding the lock
            and a helper method for Runnables that exposes the tryLock functionality

          (cherry picked from commit 6f343dc7c2f0c32e9eb1a0b5d588a2e7ad6f62ba)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/src/main/java/hudson/model/Computer.java core/src/main/java/hudson/model/Queue.java core/src/main/java/jenkins/model/Jenkins.java http://jenkins-ci.org/commit/jenkins/119fcbbf98c27f0257ac1be02104e0d87acc8728 Log: [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt More fun here: All this originates from Executor extending Thread. There is funky logic in the lock handling code of the JVM that makes assumptions about how it might proceed with the lock when the thread holding the lock has its interrupt flag set. Really it would be better if Executor did not extend Thread as that way we wouldn't have to deal with some of that complexity. But OTOH we are where we are and backwards compatibility may make such a change not possible without a lot of breakage. Fixing the issue at hand, firstly requires that interrupting a Computer happens with the Queue lock held (to speed up tests we have Jenkins.cleanup get the lock for all Computers) That prevents the Queue maintain thread from getting caught Secondly, when removing an executor from a computer we process the removal while holding the Queue lock, but we move the removal itself to a separate thread if we cannot get the Queue lock in order to avoid deadlock. Also add helper methods to wrap tasks to be performed while holding the lock and a helper method for Runnables that exposes the tryLock functionality (cherry picked from commit 6f343dc7c2f0c32e9eb1a0b5d588a2e7ad6f62ba)

          Jesse Glick added a comment -

          Another case in mock-slave-plugin in 1.609.1: thread dump.

          Jesse Glick added a comment - Another case in mock-slave-plugin in 1.609.1: thread dump .

          Jesse Glick added a comment -

          Filed a PR to try to help tests time out even when there is a hang during shutdown.

          Jesse Glick added a comment - Filed a PR to try to help tests time out even when there is a hang during shutdown.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          pom.xml
          http://jenkins-ci.org/commit/workflow-plugin/b9c94e110085b41ebea5b587ee6d8fc7a48a5dec
          Log:
          Updating baseline to 1.609.2 to pick up JENKINS-28840 fix.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml http://jenkins-ci.org/commit/workflow-plugin/b9c94e110085b41ebea5b587ee6d8fc7a48a5dec Log: Updating baseline to 1.609.2 to pick up JENKINS-28840 fix.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          pom.xml
          http://jenkins-ci.org/commit/workflow-plugin/5618084d59d6de7bec6a8b862801a7349dbc8964
          Log:
          Merge pull request #184 from jglick/1.609.2

          Updating baseline to 1.609.2 to pick up JENKINS-28840 fix

          Compare: https://github.com/jenkinsci/workflow-plugin/compare/da97432f26ad...5618084d59d6

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml http://jenkins-ci.org/commit/workflow-plugin/5618084d59d6de7bec6a8b862801a7349dbc8964 Log: Merge pull request #184 from jglick/1.609.2 Updating baseline to 1.609.2 to pick up JENKINS-28840 fix Compare: https://github.com/jenkinsci/workflow-plugin/compare/da97432f26ad...5618084d59d6

          Code changed in jenkins
          User: Jesse Glick
          Path:
          test/src/main/java/org/jvnet/hudson/test/JenkinsRule.java
          test/src/test/java/org/jvnet/hudson/main/JenkinsRuleTimeoutTest.java
          http://jenkins-ci.org/commit/jenkins/b204bd0719c75201a454a2b6c6883d8acc7d0712
          Log:
          JENKINS-28840 Cancel the test timer only after we call Jenkins.cleanUp, in case that hung.
          Merges #1809.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/main/java/org/jvnet/hudson/test/JenkinsRule.java test/src/test/java/org/jvnet/hudson/main/JenkinsRuleTimeoutTest.java http://jenkins-ci.org/commit/jenkins/b204bd0719c75201a454a2b6c6883d8acc7d0712 Log: JENKINS-28840 Cancel the test timer only after we call Jenkins.cleanUp, in case that hung. Merges #1809.

          dogfood added a comment -

          Integrated in jenkins_main_trunk #4267
          JENKINS-28840 Cancel the test timer only after we call Jenkins.cleanUp, in case that hung. (Revision b204bd0719c75201a454a2b6c6883d8acc7d0712)

          Result = SUCCESS
          jesse glick : b204bd0719c75201a454a2b6c6883d8acc7d0712
          Files :

          • test/src/test/java/org/jvnet/hudson/main/JenkinsRuleTimeoutTest.java
          • test/src/main/java/org/jvnet/hudson/test/JenkinsRule.java

          dogfood added a comment - Integrated in jenkins_main_trunk #4267 JENKINS-28840 Cancel the test timer only after we call Jenkins.cleanUp, in case that hung. (Revision b204bd0719c75201a454a2b6c6883d8acc7d0712) Result = SUCCESS jesse glick : b204bd0719c75201a454a2b6c6883d8acc7d0712 Files : test/src/test/java/org/jvnet/hudson/main/JenkinsRuleTimeoutTest.java test/src/main/java/org/jvnet/hudson/test/JenkinsRule.java

          dogfood added a comment -

          Integrated in jenkins_main_trunk #4292
          [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt (Revision 119fcbbf98c27f0257ac1be02104e0d87acc8728)

          Result = UNSTABLE
          ogondza : 119fcbbf98c27f0257ac1be02104e0d87acc8728
          Files :

          • core/src/main/java/jenkins/model/Jenkins.java
          • core/src/main/java/hudson/model/Computer.java
          • core/src/main/java/hudson/model/Queue.java

          dogfood added a comment - Integrated in jenkins_main_trunk #4292 [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt (Revision 119fcbbf98c27f0257ac1be02104e0d87acc8728) Result = UNSTABLE ogondza : 119fcbbf98c27f0257ac1be02104e0d87acc8728 Files : core/src/main/java/jenkins/model/Jenkins.java core/src/main/java/hudson/model/Computer.java core/src/main/java/hudson/model/Queue.java

          Code changed in jenkins
          User: Jesse Glick
          Path:
          test/src/main/java/org/jvnet/hudson/test/JenkinsRule.java
          test/src/test/java/org/jvnet/hudson/main/JenkinsRuleTimeoutTest.java
          http://jenkins-ci.org/commit/jenkins-test-harness/7e4859965f367be1818a4ffb408bb45e4288c6c3
          Log:
          JENKINS-28840 Cancel the test timer only after we call Jenkins.cleanUp, in case that hung.
          Merges #1809.

          Originally-Committed-As: b204bd0719c75201a454a2b6c6883d8acc7d0712

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/main/java/org/jvnet/hudson/test/JenkinsRule.java test/src/test/java/org/jvnet/hudson/main/JenkinsRuleTimeoutTest.java http://jenkins-ci.org/commit/jenkins-test-harness/7e4859965f367be1818a4ffb408bb45e4288c6c3 Log: JENKINS-28840 Cancel the test timer only after we call Jenkins.cleanUp, in case that hung. Merges #1809. Originally-Committed-As: b204bd0719c75201a454a2b6c6883d8acc7d0712

          Code changed in jenkins
          User: Andres Rodriguez
          Path:
          pom.xml
          src/test/java/hudson/plugins/git/AbstractGitProject.java
          src/test/java/hudson/plugins/git/AbstractGitTestCase.java
          src/test/java/hudson/plugins/git/RevisionParameterActionTest.java
          http://jenkins-ci.org/commit/git-plugin/daf453dfc43db81ede5cde60d0469fda0b3321ab
          Log:
          JENKINS-33874 Move to 1.609.3 (because of JENKINS-28840)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Andres Rodriguez Path: pom.xml src/test/java/hudson/plugins/git/AbstractGitProject.java src/test/java/hudson/plugins/git/AbstractGitTestCase.java src/test/java/hudson/plugins/git/RevisionParameterActionTest.java http://jenkins-ci.org/commit/git-plugin/daf453dfc43db81ede5cde60d0469fda0b3321ab Log: JENKINS-33874 Move to 1.609.3 (because of JENKINS-28840 )

            stephenconnolly Stephen Connolly
            olivergondza Oliver Gondža
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: