Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-10234

Junit result archiver getting stuck for a long time in concurrent builds

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • junit-plugin
    • Jenkins 1.415, Hudson 1.393. Both on Fedora, Tomcat 6. x86_64.

      When reaching the end of a build with jUnit results, possibly when the job is allowed to run concurrently, we are frequently seeing our system get stuck on "Recording test results".

      Looking at the thread list, I see the following:
      "Executor #9 for master : executing Run_Manual_SOAK #242 : waiting for Check point JUnit result archiving on Run_Manual_SOAK #241
      java.lang.Object.wait(Native Method)
      java.lang.Object.wait(Object.java:502)
      hudson.model.Run$Runner$CheckpointSet.waitForCheckPoint(Run.java:1266)
      hudson.model.Run.waitForCheckpoint(Run.java:1234)
      hudson.model.CheckPoint.block(CheckPoint.java:144)
      hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:159)
      hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:663)
      hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:638)
      hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:616)
      hudson.model.Build$RunnerImpl.post2(Build.java:161)
      hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:585)
      hudson.model.Run.run(Run.java:1399)
      hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      hudson.model.ResourceController.execute(ResourceController.java:88)
      hudson.model.Executor.run(Executor.java:145)
      Executor #9 for master : executing Run_Manual_SOAK #242 : waiting for Check point JUnit result archiving on Run_Manual_SOAK #241"

      All the stuck jobs are in the same place. They do eventually come unstuck, but can spend a long time (hours and sometimes a day or so) in this state.
      Machine load average is at 0.23 0.22 0.21.

          [JENKINS-10234] Junit result archiver getting stuck for a long time in concurrent builds

          Jesse Glick added a comment -

          Since JENKINS-9913 is covering only the reporting of checkpoints, this should be reopened: JUnitResultArchiver.CHECKPOINT still exists, and probably should not.

          Needs to be determined if anything needs to be done to replace it, in case a build with a higher number in fact finishes before one with a lower number, so calculation of test regressions cannot be done accurately when the result is published (in case anyone even cares about build-to-build diffs for a concurrent-capable job). Until the earlier build finishes, will the later build’s test result display show any “regressions” (against the last completed build), or show no regressions ever, or throw exceptions? After the earlier build finishes, will the later’s result display show regressions against the earlier build, or against the last completed build at the time of this build’s completion, or do something else? In other words, are calls to getPreviousResult made on demand whenever a build-to-build diff is requested (great)? Or made once when the build completes (not great but adequate)? Or does something really break? My casual inspection of the code suggests that there is some improper caching (CaseResult.failedSince) but that code generally defends against a prior build having no test result action, meaning that simply deleting CHECKPOINT would cause little harm.

          Jesse Glick added a comment - Since JENKINS-9913 is covering only the reporting of checkpoints, this should be reopened: JUnitResultArchiver.CHECKPOINT still exists, and probably should not. Needs to be determined if anything needs to be done to replace it, in case a build with a higher number in fact finishes before one with a lower number, so calculation of test regressions cannot be done accurately when the result is published (in case anyone even cares about build-to-build diffs for a concurrent-capable job). Until the earlier build finishes, will the later build’s test result display show any “regressions” (against the last completed build), or show no regressions ever, or throw exceptions? After the earlier build finishes, will the later’s result display show regressions against the earlier build, or against the last completed build at the time of this build’s completion, or do something else? In other words, are calls to getPreviousResult made on demand whenever a build-to-build diff is requested (great)? Or made once when the build completes (not great but adequate)? Or does something really break? My casual inspection of the code suggests that there is some improper caching ( CaseResult.failedSince ) but that code generally defends against a prior build having no test result action, meaning that simply deleting CHECKPOINT would cause little harm.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          changelog.html
          core/src/main/java/hudson/tasks/junit/JUnitResultArchiver.java
          http://jenkins-ci.org/commit/jenkins/90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a
          Log:
          [FIXED JENKINS-10234] Removed checkpoint from JUnitResultArchiver.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html core/src/main/java/hudson/tasks/junit/JUnitResultArchiver.java http://jenkins-ci.org/commit/jenkins/90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a Log: [FIXED JENKINS-10234] Removed checkpoint from JUnitResultArchiver.

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3535
          [FIXED JENKINS-10234] Removed checkpoint from JUnitResultArchiver. (Revision 90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a)

          Result = SUCCESS
          Jesse Glick : 90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a
          Files :

          • changelog.html
          • core/src/main/java/hudson/tasks/junit/JUnitResultArchiver.java

          dogfood added a comment - Integrated in jenkins_main_trunk #3535 [FIXED JENKINS-10234] Removed checkpoint from JUnitResultArchiver. (Revision 90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a) Result = SUCCESS Jesse Glick : 90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a Files : changelog.html core/src/main/java/hudson/tasks/junit/JUnitResultArchiver.java

          Henrik Skupin added a comment -

          Any change that we could get this backported to the 1.565.x LTS branch? It's one of the most annoying problems for our Jenkins production systems.

          Henrik Skupin added a comment - Any change that we could get this backported to the 1.565.x LTS branch? It's one of the most annoying problems for our Jenkins production systems.

          Jesse Glick added a comment -

          Probably too late for 1.565.x and probably already in the next LTS, but marking it as a candidate just in case.

          Jesse Glick added a comment - Probably too late for 1.565.x and probably already in the next LTS, but marking it as a candidate just in case.

          Henrik Skupin added a comment -

          There is version 1.565.3 LTS to be scheduled for Oct 1st. Not sure when fixes are taking into consideration for LTS releases. Anything else beside the keyword we could try to get it in? The next major version bump for LTS will happen end of Oct, where this might be fixed.

          Henrik Skupin added a comment - There is version 1.565.3 LTS to be scheduled for Oct 1st. Not sure when fixes are taking into consideration for LTS releases. Anything else beside the keyword we could try to get it in? The next major version bump for LTS will happen end of Oct, where this might be fixed.

          Daniel Beck added a comment -

          Next LTS will be based on 1.580, and the 1.565.x line is done.

          Daniel Beck added a comment - Next LTS will be based on 1.580, and the 1.565.x line is done.

          Lan Wu added a comment - - edited

          Sorry new to Jenkins jira. We are hitting this bug, but I can't see from this listing which version this was fixed in. On this page, http://jenkins-ci.org/changelog-stable, I couldn't find issue 10234. Does that mean it's not in one of the LTS builds? Thanks!

          Lan Wu added a comment - - edited Sorry new to Jenkins jira. We are hitting this bug, but I can't see from this listing which version this was fixed in. On this page, http://jenkins-ci.org/changelog-stable , I couldn't find issue 10234. Does that mean it's not in one of the LTS builds? Thanks!

          Daniel Beck added a comment -

          Does that mean it's not in one of the LTS builds?

          No, it just means it has not specifically been fixed/backported for one of the LTS releases. It was fixed for 1.575 and is therefore in 1.580 and and the LTS releases based on that.

          Daniel Beck added a comment - Does that mean it's not in one of the LTS builds? No, it just means it has not specifically been fixed/backported for one of the LTS releases. It was fixed for 1.575 and is therefore in 1.580 and and the LTS releases based on that.

          Code changed in jenkins
          User: Martin Bektchiev
          Path:
          src/main/java/hudson/plugins/nunit/NUnitPublisher.java
          http://jenkins-ci.org/commit/nunit-plugin/1e89e0267814d280b7051d7d70d5d2939326b182
          Log:
          Do not wait for checkpoint from previous build

          Fixes an issue with result archiver getting stuck for a long time in concurrent builds.
          A similar issue has been fixed in the JUnit Jenkins plugin:

          https://issues.jenkins-ci.org/browse/JENKINS-10234
          https://github.com/jenkinsci/jenkins/commit/90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Martin Bektchiev Path: src/main/java/hudson/plugins/nunit/NUnitPublisher.java http://jenkins-ci.org/commit/nunit-plugin/1e89e0267814d280b7051d7d70d5d2939326b182 Log: Do not wait for checkpoint from previous build Fixes an issue with result archiver getting stuck for a long time in concurrent builds. A similar issue has been fixed in the JUnit Jenkins plugin: https://issues.jenkins-ci.org/browse/JENKINS-10234 https://github.com/jenkinsci/jenkins/commit/90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a

            jglick Jesse Glick
            dannystaple Danny Staple
            Votes:
            15 Vote for this issue
            Watchers:
            30 Start watching this issue

              Created:
              Updated:
              Resolved: