Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28690

Deadlock in hudson.model.Executor

    XMLWordPrintable

Details

    Description

      In very specific scenario, when build is running on slave, and PingThread detects slave as unavailable deadlock occurs in Executor thread of that slave.

      stacktrace:

      "Executor #0 for xxxx : executing xxxx #9" daemon prio=10 tid=0x00007f444248b800 nid=0x66e0 waiting on condition [0x00007f448a92f000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x000000045e3eea00> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
      	at hudson.model.Executor.interrupt(Executor.java:183)
      	at hudson.model.Executor.interrupt(Executor.java:164)
      	at hudson.model.Executor.interrupt(Executor.java:158)
      	at hudson.model.Executor.interrupt(Executor.java:145)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.selfInterrupt(AbstractQueuedSynchronizer.java:825)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:959)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
      	at hudson.model.Executor.abortResult(Executor.java:208)
      	at hudson.model.Build$BuildExecution.doRun(Build.java:165)
      	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:537)
      	at hudson.model.Run.execute(Run.java:1744)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      	at hudson.model.ResourceController.execute(ResourceController.java:98)
      	at hudson.model.Executor.run(Executor.java:374)
      

      This alone is not very bad, but than maintain task of queue kicks in, blocks on Executor's lock and leads to deadlock on Queue lock.
      Stacktrace:

      "AtmostOneTaskExecutor[hudson.model.Queue$1@6a9812a3] [#6684]" daemon prio=10 tid=0x00007f44bf7af000 nid=0x74ec waiting on condition [0x00007f44c827b000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x000000045e3eea00> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
      	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
      	at hudson.model.Executor.isParking(Executor.java:609)
      	at hudson.model.Queue.maintain(Queue.java:1282)
      	at hudson.model.Queue$1.call(Queue.java:334)
      	at hudson.model.Queue$1.call(Queue.java:331)
      	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:101)
      	at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:91)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
      	at java.lang.Thread.run(Thread.java:745)
      

      This blocks all actions on Jenkins, as no new builds can be scheduled and you cannot access Jenkins main page.

      After downgrade to versions 1.606 before https://issues.jenkins-ci.org/browse/JENKINS-27565 all is working good.

      Attachments

        Issue Links

          Activity

            Code changed in jenkins
            User: Ivan Meredith
            Path:
            src/main/java/com/cloudbees/jenkins/plugins/mtslavescloud/MansionComputer.java
            src/main/java/com/cloudbees/jenkins/plugins/mtslavescloud/MansionRetentionStrategy.java
            http://jenkins-ci.org/commit/mansion-cloud-plugin/cf798b87dc339c91da4b5fb26ceb4ab1bcae4259
            Log:
            Merge pull request #2 from jenkinsci/jenkins-28690-related

            Need to hold a lock when disconnecting or you can trigger JENKINS-28690 style deadlocks

            Compare: https://github.com/jenkinsci/mansion-cloud-plugin/compare/e265cac825ba...cf798b87dc33

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Ivan Meredith Path: src/main/java/com/cloudbees/jenkins/plugins/mtslavescloud/MansionComputer.java src/main/java/com/cloudbees/jenkins/plugins/mtslavescloud/MansionRetentionStrategy.java http://jenkins-ci.org/commit/mansion-cloud-plugin/cf798b87dc339c91da4b5fb26ceb4ab1bcae4259 Log: Merge pull request #2 from jenkinsci/jenkins-28690-related Need to hold a lock when disconnecting or you can trigger JENKINS-28690 style deadlocks Compare: https://github.com/jenkinsci/mansion-cloud-plugin/compare/e265cac825ba...cf798b87dc33

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            core/src/main/java/hudson/model/Executor.java
            http://jenkins-ci.org/commit/jenkins/0ba505b60ca86d6b103b070a690a98ae6fef8c5d
            Log:
            JENKINS-28690 Aha! So I believe this will fully resolve any of these kinds of deadlocks

            • Without this, then it becomes a question of find catch and release for each potential code path that
              might end up restoring the interrupt flag on the current thread.
            • Since standard Lock support is kind enough to restore the interrupt flag on the current thread
              when blocked waiting for the lock, that would be a hiding to nothing
            • I welcome others to review my logic detailed in the code comment
            • I am leaving the code comment as this is IMHO too important to assume that somebody will
              check the git commit history
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/src/main/java/hudson/model/Executor.java http://jenkins-ci.org/commit/jenkins/0ba505b60ca86d6b103b070a690a98ae6fef8c5d Log: JENKINS-28690 Aha! So I believe this will fully resolve any of these kinds of deadlocks Without this, then it becomes a question of find catch and release for each potential code path that might end up restoring the interrupt flag on the current thread. Since standard Lock support is kind enough to restore the interrupt flag on the current thread when blocked waiting for the lock, that would be a hiding to nothing I welcome others to review my logic detailed in the code comment I am leaving the code comment as this is IMHO too important to assume that somebody will check the git commit history

            Code changed in jenkins
            User: Stephen Connolly
            Path:
            core/src/main/java/hudson/model/Executor.java
            http://jenkins-ci.org/commit/jenkins/a80972307e03a7f67e97dee700720cb80f7f65d8
            Log:
            Merge pull request #1786 from stephenc/jenkins-28690-correct-interrupt-override

            JENKINS-28690 Aha! So I believe this will fully resolve any of these kinds of deadlocks

            Compare: https://github.com/jenkinsci/jenkins/compare/7ab816878b16...a80972307e03

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/src/main/java/hudson/model/Executor.java http://jenkins-ci.org/commit/jenkins/a80972307e03a7f67e97dee700720cb80f7f65d8 Log: Merge pull request #1786 from stephenc/jenkins-28690-correct-interrupt-override JENKINS-28690 Aha! So I believe this will fully resolve any of these kinds of deadlocks Compare: https://github.com/jenkinsci/jenkins/compare/7ab816878b16...a80972307e03
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #4246
            JENKINS-28690 Aha! So I believe this will fully resolve any of these kinds of deadlocks (Revision 0ba505b60ca86d6b103b070a690a98ae6fef8c5d)

            Result = SUCCESS
            stephen connolly : 0ba505b60ca86d6b103b070a690a98ae6fef8c5d
            Files :

            • core/src/main/java/hudson/model/Executor.java
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #4246 JENKINS-28690 Aha! So I believe this will fully resolve any of these kinds of deadlocks (Revision 0ba505b60ca86d6b103b070a690a98ae6fef8c5d) Result = SUCCESS stephen connolly : 0ba505b60ca86d6b103b070a690a98ae6fef8c5d Files : core/src/main/java/hudson/model/Executor.java
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #4292
            [FIXED JENKINS-28690] Deadlock in hudson.model.Executor (Revision c24c3236917cfac2ae7c536b5fd6ad737fa2253c)

            Result = UNSTABLE
            ogondza : c24c3236917cfac2ae7c536b5fd6ad737fa2253c
            Files :

            • core/src/main/java/hudson/model/Executor.java
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #4292 [FIXED JENKINS-28690] Deadlock in hudson.model.Executor (Revision c24c3236917cfac2ae7c536b5fd6ad737fa2253c) Result = UNSTABLE ogondza : c24c3236917cfac2ae7c536b5fd6ad737fa2253c Files : core/src/main/java/hudson/model/Executor.java

            People

              stephenconnolly Stephen Connolly
              szubster Tomasz Szuba
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: