Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28840

Deadlock between Queue.maintain and Executor.interrupt

    XMLWordPrintable

Details

    Description

      I reproduced that a couple of times running matrix-project-plugin tests, though it is not a reliable reproducer.

      Java stack information for the threads listed above:
      ===================================================
      "AtmostOneTaskExecutor[hudson.model.Queue$1@6a0918b1] [#13]":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007dccc7200> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
        at hudson.model.Executor.isParking(Executor.java:609)
        at hudson.model.Queue.maintain(Queue.java:1277)
        at hudson.model.Queue$1.call(Queue.java:334)
        at hudson.model.Queue$1.call(Queue.java:331)
        at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:101)
        at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:91)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
        at java.lang.Thread.run(Thread.java:745)
      "Executing testConcurrentBuild(hudson.matrix.MatrixProjectTest)":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007d7bdd660> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
        at hudson.model.Queue._withLock(Queue.java:1205)
        at hudson.model.Queue.withLock(Queue.java:1143)
        at hudson.model.Computer.removeExecutor(Computer.java:977)
        at hudson.model.Executor.interrupt(Executor.java:187)
        at hudson.model.Executor.interrupt(Executor.java:164)
        at hudson.model.Executor.interruptForShutdown(Executor.java:149)
        at hudson.model.Computer.interrupt(Computer.java:1014)
        at jenkins.model.Jenkins.cleanUp(Jenkins.java:2769)
        at org.jvnet.hudson.test.JenkinsRule.after(JenkinsRule.java:460)
        at org.jvnet.hudson.test.JenkinsRule$2.evaluate(JenkinsRule.java:526)
        at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
        at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
        at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
        at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
        at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
        at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
      
      Found 1 deadlock.
      

      Attachments

        Issue Links

          Activity

            What version of Jenkins was this stack trace from?

            stephenconnolly Stephen Connolly added a comment - What version of Jenkins was this stack trace from?
            olivergondza Oliver Gondža added a comment - This was observed running tests based on [1] , so it is 1.609. [1] https://github.com/jenkinsci/matrix-project-plugin/blob/f314a83be60cc1e1d430d6d86659299f366e5e09/pom.xml

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            core/src/main/java/hudson/model/Computer.java
            core/src/main/java/hudson/model/Queue.java
            core/src/main/java/jenkins/model/Jenkins.java
            http://jenkins-ci.org/commit/jenkins/6f343dc7c2f0c32e9eb1a0b5d588a2e7ad6f62ba
            Log:
            [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt

            More fun here:

            • All this originates from Executor extending Thread.
            • There is funky logic in the lock handling code of the JVM that makes assumptions
              about how it might proceed with the lock when the thread holding the lock has its
              interrupt flag set.
            • Really it would be better if Executor did not extend Thread as that way we wouldn't
              have to deal with some of that complexity. But OTOH we are where we are and backwards
              compatibility may make such a change not possible without a lot of breakage.
            • Fixing the issue at hand, firstly requires that interrupting a Computer happens with the
              Queue lock held (to speed up tests we have Jenkins.cleanup get the lock for all Computers)
              That prevents the Queue maintain thread from getting caught
            • Secondly, when removing an executor from a computer we process the removal while
              holding the Queue lock, but we move the removal itself to a separate thread if we cannot
              get the Queue lock in order to avoid deadlock.
            • Also add helper methods to wrap tasks to be performed while holding the lock
              and a helper method for Runnables that exposes the tryLock functionality
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/src/main/java/hudson/model/Computer.java core/src/main/java/hudson/model/Queue.java core/src/main/java/jenkins/model/Jenkins.java http://jenkins-ci.org/commit/jenkins/6f343dc7c2f0c32e9eb1a0b5d588a2e7ad6f62ba Log: [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt More fun here: All this originates from Executor extending Thread. There is funky logic in the lock handling code of the JVM that makes assumptions about how it might proceed with the lock when the thread holding the lock has its interrupt flag set. Really it would be better if Executor did not extend Thread as that way we wouldn't have to deal with some of that complexity. But OTOH we are where we are and backwards compatibility may make such a change not possible without a lot of breakage. Fixing the issue at hand, firstly requires that interrupting a Computer happens with the Queue lock held (to speed up tests we have Jenkins.cleanup get the lock for all Computers) That prevents the Queue maintain thread from getting caught Secondly, when removing an executor from a computer we process the removal while holding the Queue lock, but we move the removal itself to a separate thread if we cannot get the Queue lock in order to avoid deadlock. Also add helper methods to wrap tasks to be performed while holding the lock and a helper method for Runnables that exposes the tryLock functionality

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            changelog.html
            core/src/main/java/hudson/model/Computer.java
            core/src/main/java/hudson/model/Queue.java
            core/src/main/java/jenkins/model/Jenkins.java
            test/src/test/java/hudson/slaves/CommandLauncherTest.java
            http://jenkins-ci.org/commit/jenkins/71e684ad900363c48d845f73c1993f90de4417ad
            Log:
            Merge pull request #1738 from stephenc/jenkins-28840

            [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt

            Compare: https://github.com/jenkinsci/jenkins/compare/fe839630847b...71e684ad9003

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: changelog.html core/src/main/java/hudson/model/Computer.java core/src/main/java/hudson/model/Queue.java core/src/main/java/jenkins/model/Jenkins.java test/src/test/java/hudson/slaves/CommandLauncherTest.java http://jenkins-ci.org/commit/jenkins/71e684ad900363c48d845f73c1993f90de4417ad Log: Merge pull request #1738 from stephenc/jenkins-28840 [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt Compare: https://github.com/jenkinsci/jenkins/compare/fe839630847b...71e684ad9003
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #4179
            [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt (Revision 6f343dc7c2f0c32e9eb1a0b5d588a2e7ad6f62ba)

            Result = UNSTABLE
            stephen connolly : 6f343dc7c2f0c32e9eb1a0b5d588a2e7ad6f62ba
            Files :

            • core/src/main/java/jenkins/model/Jenkins.java
            • core/src/main/java/hudson/model/Queue.java
            • core/src/main/java/hudson/model/Computer.java
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #4179 [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt (Revision 6f343dc7c2f0c32e9eb1a0b5d588a2e7ad6f62ba) Result = UNSTABLE stephen connolly : 6f343dc7c2f0c32e9eb1a0b5d588a2e7ad6f62ba Files : core/src/main/java/jenkins/model/Jenkins.java core/src/main/java/hudson/model/Queue.java core/src/main/java/hudson/model/Computer.java

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            core/src/main/java/hudson/model/Computer.java
            core/src/main/java/hudson/model/Queue.java
            core/src/main/java/jenkins/model/Jenkins.java
            http://jenkins-ci.org/commit/jenkins/119fcbbf98c27f0257ac1be02104e0d87acc8728
            Log:
            [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt

            More fun here:

            • All this originates from Executor extending Thread.
            • There is funky logic in the lock handling code of the JVM that makes assumptions
              about how it might proceed with the lock when the thread holding the lock has its
              interrupt flag set.
            • Really it would be better if Executor did not extend Thread as that way we wouldn't
              have to deal with some of that complexity. But OTOH we are where we are and backwards
              compatibility may make such a change not possible without a lot of breakage.
            • Fixing the issue at hand, firstly requires that interrupting a Computer happens with the
              Queue lock held (to speed up tests we have Jenkins.cleanup get the lock for all Computers)
              That prevents the Queue maintain thread from getting caught
            • Secondly, when removing an executor from a computer we process the removal while
              holding the Queue lock, but we move the removal itself to a separate thread if we cannot
              get the Queue lock in order to avoid deadlock.
            • Also add helper methods to wrap tasks to be performed while holding the lock
              and a helper method for Runnables that exposes the tryLock functionality

            (cherry picked from commit 6f343dc7c2f0c32e9eb1a0b5d588a2e7ad6f62ba)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/src/main/java/hudson/model/Computer.java core/src/main/java/hudson/model/Queue.java core/src/main/java/jenkins/model/Jenkins.java http://jenkins-ci.org/commit/jenkins/119fcbbf98c27f0257ac1be02104e0d87acc8728 Log: [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt More fun here: All this originates from Executor extending Thread. There is funky logic in the lock handling code of the JVM that makes assumptions about how it might proceed with the lock when the thread holding the lock has its interrupt flag set. Really it would be better if Executor did not extend Thread as that way we wouldn't have to deal with some of that complexity. But OTOH we are where we are and backwards compatibility may make such a change not possible without a lot of breakage. Fixing the issue at hand, firstly requires that interrupting a Computer happens with the Queue lock held (to speed up tests we have Jenkins.cleanup get the lock for all Computers) That prevents the Queue maintain thread from getting caught Secondly, when removing an executor from a computer we process the removal while holding the Queue lock, but we move the removal itself to a separate thread if we cannot get the Queue lock in order to avoid deadlock. Also add helper methods to wrap tasks to be performed while holding the lock and a helper method for Runnables that exposes the tryLock functionality (cherry picked from commit 6f343dc7c2f0c32e9eb1a0b5d588a2e7ad6f62ba)
            jglick Jesse Glick added a comment -

            Another case in mock-slave-plugin in 1.609.1: thread dump.

            jglick Jesse Glick added a comment - Another case in mock-slave-plugin in 1.609.1: thread dump .
            jglick Jesse Glick added a comment -

            Filed a PR to try to help tests time out even when there is a hang during shutdown.

            jglick Jesse Glick added a comment - Filed a PR to try to help tests time out even when there is a hang during shutdown.

            Code changed in jenkins
            User: Jesse Glick
            Path:
            pom.xml
            http://jenkins-ci.org/commit/workflow-plugin/b9c94e110085b41ebea5b587ee6d8fc7a48a5dec
            Log:
            Updating baseline to 1.609.2 to pick up JENKINS-28840 fix.

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml http://jenkins-ci.org/commit/workflow-plugin/b9c94e110085b41ebea5b587ee6d8fc7a48a5dec Log: Updating baseline to 1.609.2 to pick up JENKINS-28840 fix.

            Code changed in jenkins
            User: Jesse Glick
            Path:
            pom.xml
            http://jenkins-ci.org/commit/workflow-plugin/5618084d59d6de7bec6a8b862801a7349dbc8964
            Log:
            Merge pull request #184 from jglick/1.609.2

            Updating baseline to 1.609.2 to pick up JENKINS-28840 fix

            Compare: https://github.com/jenkinsci/workflow-plugin/compare/da97432f26ad...5618084d59d6

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml http://jenkins-ci.org/commit/workflow-plugin/5618084d59d6de7bec6a8b862801a7349dbc8964 Log: Merge pull request #184 from jglick/1.609.2 Updating baseline to 1.609.2 to pick up JENKINS-28840 fix Compare: https://github.com/jenkinsci/workflow-plugin/compare/da97432f26ad...5618084d59d6

            Code changed in jenkins
            User: Jesse Glick
            Path:
            test/src/main/java/org/jvnet/hudson/test/JenkinsRule.java
            test/src/test/java/org/jvnet/hudson/main/JenkinsRuleTimeoutTest.java
            http://jenkins-ci.org/commit/jenkins/b204bd0719c75201a454a2b6c6883d8acc7d0712
            Log:
            JENKINS-28840 Cancel the test timer only after we call Jenkins.cleanUp, in case that hung.
            Merges #1809.

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/main/java/org/jvnet/hudson/test/JenkinsRule.java test/src/test/java/org/jvnet/hudson/main/JenkinsRuleTimeoutTest.java http://jenkins-ci.org/commit/jenkins/b204bd0719c75201a454a2b6c6883d8acc7d0712 Log: JENKINS-28840 Cancel the test timer only after we call Jenkins.cleanUp, in case that hung. Merges #1809.
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #4267
            JENKINS-28840 Cancel the test timer only after we call Jenkins.cleanUp, in case that hung. (Revision b204bd0719c75201a454a2b6c6883d8acc7d0712)

            Result = SUCCESS
            jesse glick : b204bd0719c75201a454a2b6c6883d8acc7d0712
            Files :

            • test/src/test/java/org/jvnet/hudson/main/JenkinsRuleTimeoutTest.java
            • test/src/main/java/org/jvnet/hudson/test/JenkinsRule.java
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #4267 JENKINS-28840 Cancel the test timer only after we call Jenkins.cleanUp, in case that hung. (Revision b204bd0719c75201a454a2b6c6883d8acc7d0712) Result = SUCCESS jesse glick : b204bd0719c75201a454a2b6c6883d8acc7d0712 Files : test/src/test/java/org/jvnet/hudson/main/JenkinsRuleTimeoutTest.java test/src/main/java/org/jvnet/hudson/test/JenkinsRule.java
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #4292
            [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt (Revision 119fcbbf98c27f0257ac1be02104e0d87acc8728)

            Result = UNSTABLE
            ogondza : 119fcbbf98c27f0257ac1be02104e0d87acc8728
            Files :

            • core/src/main/java/jenkins/model/Jenkins.java
            • core/src/main/java/hudson/model/Computer.java
            • core/src/main/java/hudson/model/Queue.java
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #4292 [FIXED JENKINS-28840] Deadlock between Queue.maintain and Executor.interrupt (Revision 119fcbbf98c27f0257ac1be02104e0d87acc8728) Result = UNSTABLE ogondza : 119fcbbf98c27f0257ac1be02104e0d87acc8728 Files : core/src/main/java/jenkins/model/Jenkins.java core/src/main/java/hudson/model/Computer.java core/src/main/java/hudson/model/Queue.java

            Code changed in jenkins
            User: Jesse Glick
            Path:
            test/src/main/java/org/jvnet/hudson/test/JenkinsRule.java
            test/src/test/java/org/jvnet/hudson/main/JenkinsRuleTimeoutTest.java
            http://jenkins-ci.org/commit/jenkins-test-harness/7e4859965f367be1818a4ffb408bb45e4288c6c3
            Log:
            JENKINS-28840 Cancel the test timer only after we call Jenkins.cleanUp, in case that hung.
            Merges #1809.

            Originally-Committed-As: b204bd0719c75201a454a2b6c6883d8acc7d0712

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/main/java/org/jvnet/hudson/test/JenkinsRule.java test/src/test/java/org/jvnet/hudson/main/JenkinsRuleTimeoutTest.java http://jenkins-ci.org/commit/jenkins-test-harness/7e4859965f367be1818a4ffb408bb45e4288c6c3 Log: JENKINS-28840 Cancel the test timer only after we call Jenkins.cleanUp, in case that hung. Merges #1809. Originally-Committed-As: b204bd0719c75201a454a2b6c6883d8acc7d0712

            Code changed in jenkins
            User: Andres Rodriguez
            Path:
            pom.xml
            src/test/java/hudson/plugins/git/AbstractGitProject.java
            src/test/java/hudson/plugins/git/AbstractGitTestCase.java
            src/test/java/hudson/plugins/git/RevisionParameterActionTest.java
            http://jenkins-ci.org/commit/git-plugin/daf453dfc43db81ede5cde60d0469fda0b3321ab
            Log:
            JENKINS-33874 Move to 1.609.3 (because of JENKINS-28840)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Andres Rodriguez Path: pom.xml src/test/java/hudson/plugins/git/AbstractGitProject.java src/test/java/hudson/plugins/git/AbstractGitTestCase.java src/test/java/hudson/plugins/git/RevisionParameterActionTest.java http://jenkins-ci.org/commit/git-plugin/daf453dfc43db81ede5cde60d0469fda0b3321ab Log: JENKINS-33874 Move to 1.609.3 (because of JENKINS-28840 )

            People

              stephenconnolly Stephen Connolly
              olivergondza Oliver Gondža
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: