• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • core
    • None
    • Platform: All, OS: All

      I've got four different builds of the same job that have been hung about an hour
      too long on "recording test results" for all of them.

      I can't quite find any other relevant information other than the stuff that's
      been spewing to System Log for the past hour (below)

      I get the feeling all the SCMTrigger issues are a different error, and only
      tangentally related

          [JENKINS-4060] Jobs hanging on "Recording test results"

          R. Tyler Croy added a comment -

          Adding admc@ just so he has a heads-up on the issue.

          Another relevant note, all four of the jobs were started fairly close to one
          another, so they all completed around the same time.

          My current thinking is that there's a deadlock in the JUnit code that two of the
          builds that literally started within a minute or two of each other entered into,
          and then trapped subsequent builds in (the reason there were only four builds is
          because there weren't available executors for more)

          R. Tyler Croy added a comment - Adding admc@ just so he has a heads-up on the issue. Another relevant note, all four of the jobs were started fairly close to one another, so they all completed around the same time. My current thinking is that there's a deadlock in the JUnit code that two of the builds that literally started within a minute or two of each other entered into, and then trapped subsequent builds in (the reason there were only four builds is because there weren't available executors for more)

          This stack trace indicates another problem that I'll fix, but to understand the
          root cause of your hang, please obtain the thread dump and attach it.

          Kohsuke Kawaguchi added a comment - This stack trace indicates another problem that I'll fix, but to understand the root cause of your hang, please obtain the thread dump and attach it.

          Code changed in hudson
          User: : kohsuke
          Path:
          branches/concurrent-build/core/src/main/java/hudson/triggers/SCMTrigger.java
          http://fisheye4.cenqua.com/changelog/hudson/?cs=19979
          Log:
          JENKINS-4060 fixed a reported ClassCastException, but I don't think that's related to the hang problem.

          SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: branches/concurrent-build/core/src/main/java/hudson/triggers/SCMTrigger.java http://fisheye4.cenqua.com/changelog/hudson/?cs=19979 Log: JENKINS-4060 fixed a reported ClassCastException, but I don't think that's related to the hang problem.

          R. Tyler Croy added a comment - - edited

          HOLY THREAD DUMP BATMAN!

          <removed dump as too long for GitHub>

          R. Tyler Croy added a comment - - edited HOLY THREAD DUMP BATMAN! <removed dump as too long for GitHub>

          R. Tyler Croy added a comment -

          Worth noting that there are two blocked builds in this instance, and one is on
          "Flexo Loopback #1" and the other is on "Fry Loopback #1"

          The majority of the rest of the thread dump is crap.

          R. Tyler Croy added a comment - Worth noting that there are two blocked builds in this instance, and one is on "Flexo Loopback #1" and the other is on "Fry Loopback #1" The majority of the rest of the thread dump is crap.

          I'm still not quite sure why the hang. I simplified the code a bit and added a
          bit more diagnostics in build #185, so can you try this build, and if there's a
          problem, get another thread dump?

          I recommend putting the dump as an attachment.

          Kohsuke Kawaguchi added a comment - I'm still not quite sure why the hang. I simplified the code a bit and added a bit more diagnostics in build #185, so can you try this build, and if there's a problem, get another thread dump? I recommend putting the dump as an attachment.

          R. Tyler Croy added a comment -

          Created an attachment (id=796)
          Thread dump with a couple concurrent jobs all locked on recording test results

          R. Tyler Croy added a comment - Created an attachment (id=796) Thread dump with a couple concurrent jobs all locked on recording test results

          Code changed in hudson
          User: : kohsuke
          Path:
          branches/concurrent-build/core/src/main/java/hudson/model/AbstractBuild.java
          branches/concurrent-build/core/src/main/java/hudson/model/Run.java
          http://fisheye4.cenqua.com/changelog/hudson/?cs=20207
          Log:
          JENKINS-4060 this class-level synchronization has a devastating effect as it effectively creates a single giant lock for the entire Hudson and can cause a dead lock.

          SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: branches/concurrent-build/core/src/main/java/hudson/model/AbstractBuild.java branches/concurrent-build/core/src/main/java/hudson/model/Run.java http://fisheye4.cenqua.com/changelog/hudson/?cs=20207 Log: JENKINS-4060 this class-level synchronization has a devastating effect as it effectively creates a single giant lock for the entire Hudson and can cause a dead lock.

          R. Tyler Croy added a comment -

          Created an attachment (id=807)
          Another day, another deadlock

          R. Tyler Croy added a comment - Created an attachment (id=807) Another day, another deadlock

          The last attachment shows no dead lock. "Executor #0 for VBox A : executing master-release #2477" is waiting for a child process to complete, and #2478 is waiting for #2477 to complete, in turn.

          Since the time of this bug, the concurrent build feature was integrated into the main line, and a number of people are using it extensively. So I assume rev.20207 fixed this problem.

          Kohsuke Kawaguchi added a comment - The last attachment shows no dead lock. "Executor #0 for VBox A : executing master-release #2477" is waiting for a child process to complete, and #2478 is waiting for #2477 to complete, in turn. Since the time of this bug, the concurrent build feature was integrated into the main line, and a number of people are using it extensively. So I assume rev.20207 fixed this problem.

            kohsuke Kohsuke Kawaguchi
            rtyler R. Tyler Croy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: