Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-37034

Deadlock in hudson.model.Executor

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Critical
    • Resolution: Cannot Reproduce
    • core
    • None
    • Jenkins 1.609

    Description

      We caught a deadlock in hudson.model.Executor with this stacktrace (XXX were sensitive data):

      "Executor #-1 for XXX : executing XXX #160" daemon prio=10 tid=2249607168 nid=6359
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for <0x5ba959038> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
              at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
              at hudson.model.Executor.interrupt(Executor.java:183)
              at hudson.model.Executor.interrupt(Executor.java:164)
              at hudson.model.Executor.interrupt(Executor.java:158)
              at hudson.model.Executor.interrupt(Executor.java:145)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.selfInterrupt(AbstractQueuedSynchronizer.java:802)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:937)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
              at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
              at hudson.model.Executor.getCurrentExecutable(Executor.java:475)
              at hudson.model.Executor.of(Executor.java:931)
              at hudson.model.Run.getExecutor(Run.java:517)
              at hudson.matrix.MatrixBuild$MatrixBuildExecution.doRun(MatrixBuild.java:376)
              - locked <0x402e23f78> (a hudson.model.Queue)
              at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:536)
              at hudson.model.Run.execute(Run.java:1738)
              at hudson.matrix.MatrixBuild.run(MatrixBuild.java:301)
              at hudson.model.ResourceController.execute(ResourceController.java:98)
              at hudson.model.Executor.run(Executor.java:381)
      

      Because there is obtained lock for Queue first and than WriteLock waits to releasing ReadLock indefinitely, Jenkins doesn't response to anything (locked Queue)...

      I've found a similar issue at JENKINS-28690 for Executor.abortResult(). If I understand Stephen's fix correctly it can't be applied here because we don't know from where we have locked via ReadLock. Upgrading ReadLock to WriteLock is impossible way too.

      Attachments

        Issue Links

          Activity

            I believe second fix from Stephen can help in this situation (merged to Jenkins 1.625). I'm not sure if all instances of this dead-lock are covered by that fix, so I leave this issue still open.

            I was thinking about the reproducer, but ended stuck in the point how to simulate necessary situation handed by JVM, resp. java.util.concurrent.locks.AbstractQueuedSynchronizer.

            pajasoft Pavel Janoušek added a comment - I believe second fix from Stephen can help in this situation (merged to Jenkins 1.625). I'm not sure if all instances of this dead-lock are covered by that fix, so I leave this issue still open. I was thinking about the reproducer, but ended stuck in the point how to simulate necessary situation handed by JVM, resp. java.util.concurrent.locks.AbstractQueuedSynchronizer .
            oleg_nenashev Oleg Nenashev added a comment -

            pajasoft please provide as full thread dump. It's not possible to analyze deadlocks only by a single thread trace. It would be also useful to know your Jenkins version. If it's reported to something below 1.625, maybe it worth to reproduce the issue on higher versions.

            oleg_nenashev Oleg Nenashev added a comment - pajasoft please provide as full thread dump. It's not possible to analyze deadlocks only by a single thread trace. It would be also useful to know your Jenkins version. If it's reported to something below 1.625, maybe it worth to reproduce the issue on higher versions.

            oleg_nenashev This issue occurred on the Jenkins instance based on 1.609 which we still have been using in the production environment. Fortunately we weren't under the pressure of this issue later, so it seems it isn't a common race-condition. If occurs again, I'll post the full stacktrace here.

            pajasoft Pavel Janoušek added a comment - oleg_nenashev This issue occurred on the Jenkins instance based on 1.609 which we still have been using in the production environment. Fortunately we weren't under the pressure of this issue later, so it seems it isn't a common race-condition. If occurs again, I'll post the full stacktrace here.
            oleg_nenashev Oleg Nenashev added a comment -

            Closing as Cannot Reproduce since the executor and queue logic has changed significantly. Please reopen if the issue happens again

            oleg_nenashev Oleg Nenashev added a comment - Closing as Cannot Reproduce since the executor and queue logic has changed significantly. Please reopen if the issue happens again

            People

              Unassigned Unassigned
              pajasoft Pavel Janoušek
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: