• Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Critical Critical
    • core
    • None
    • Jenkins 1.609

      We caught a deadlock in hudson.model.Executor with this stacktrace (XXX were sensitive data):

      "Executor #-1 for XXX : executing XXX #160" daemon prio=10 tid=2249607168 nid=6359
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for <0x5ba959038> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
              at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
              at hudson.model.Executor.interrupt(Executor.java:183)
              at hudson.model.Executor.interrupt(Executor.java:164)
              at hudson.model.Executor.interrupt(Executor.java:158)
              at hudson.model.Executor.interrupt(Executor.java:145)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.selfInterrupt(AbstractQueuedSynchronizer.java:802)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:937)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
              at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
              at hudson.model.Executor.getCurrentExecutable(Executor.java:475)
              at hudson.model.Executor.of(Executor.java:931)
              at hudson.model.Run.getExecutor(Run.java:517)
              at hudson.matrix.MatrixBuild$MatrixBuildExecution.doRun(MatrixBuild.java:376)
              - locked <0x402e23f78> (a hudson.model.Queue)
              at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:536)
              at hudson.model.Run.execute(Run.java:1738)
              at hudson.matrix.MatrixBuild.run(MatrixBuild.java:301)
              at hudson.model.ResourceController.execute(ResourceController.java:98)
              at hudson.model.Executor.run(Executor.java:381)
      

      Because there is obtained lock for Queue first and than WriteLock waits to releasing ReadLock indefinitely, Jenkins doesn't response to anything (locked Queue)...

      I've found a similar issue at JENKINS-28690 for Executor.abortResult(). If I understand Stephen's fix correctly it can't be applied here because we don't know from where we have locked via ReadLock. Upgrading ReadLock to WriteLock is impossible way too.

          [JENKINS-37034] Deadlock in hudson.model.Executor

          I believe second fix from Stephen can help in this situation (merged to Jenkins 1.625). I'm not sure if all instances of this dead-lock are covered by that fix, so I leave this issue still open.

          I was thinking about the reproducer, but ended stuck in the point how to simulate necessary situation handed by JVM, resp. java.util.concurrent.locks.AbstractQueuedSynchronizer.

          Pavel Janoušek added a comment - I believe second fix from Stephen can help in this situation (merged to Jenkins 1.625). I'm not sure if all instances of this dead-lock are covered by that fix, so I leave this issue still open. I was thinking about the reproducer, but ended stuck in the point how to simulate necessary situation handed by JVM, resp. java.util.concurrent.locks.AbstractQueuedSynchronizer .

          Oleg Nenashev added a comment -

          pajasoft please provide as full thread dump. It's not possible to analyze deadlocks only by a single thread trace. It would be also useful to know your Jenkins version. If it's reported to something below 1.625, maybe it worth to reproduce the issue on higher versions.

          Oleg Nenashev added a comment - pajasoft please provide as full thread dump. It's not possible to analyze deadlocks only by a single thread trace. It would be also useful to know your Jenkins version. If it's reported to something below 1.625, maybe it worth to reproduce the issue on higher versions.

          oleg_nenashev This issue occurred on the Jenkins instance based on 1.609 which we still have been using in the production environment. Fortunately we weren't under the pressure of this issue later, so it seems it isn't a common race-condition. If occurs again, I'll post the full stacktrace here.

          Pavel Janoušek added a comment - oleg_nenashev This issue occurred on the Jenkins instance based on 1.609 which we still have been using in the production environment. Fortunately we weren't under the pressure of this issue later, so it seems it isn't a common race-condition. If occurs again, I'll post the full stacktrace here.

          Oleg Nenashev added a comment -

          Closing as Cannot Reproduce since the executor and queue logic has changed significantly. Please reopen if the issue happens again

          Oleg Nenashev added a comment - Closing as Cannot Reproduce since the executor and queue logic has changed significantly. Please reopen if the issue happens again

            Unassigned Unassigned
            pajasoft Pavel Janoušek
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: