Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-10234

Junit result archiver getting stuck for a long time in concurrent builds

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • junit-plugin
    • Jenkins 1.415, Hudson 1.393. Both on Fedora, Tomcat 6. x86_64.

      When reaching the end of a build with jUnit results, possibly when the job is allowed to run concurrently, we are frequently seeing our system get stuck on "Recording test results".

      Looking at the thread list, I see the following:
      "Executor #9 for master : executing Run_Manual_SOAK #242 : waiting for Check point JUnit result archiving on Run_Manual_SOAK #241
      java.lang.Object.wait(Native Method)
      java.lang.Object.wait(Object.java:502)
      hudson.model.Run$Runner$CheckpointSet.waitForCheckPoint(Run.java:1266)
      hudson.model.Run.waitForCheckpoint(Run.java:1234)
      hudson.model.CheckPoint.block(CheckPoint.java:144)
      hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:159)
      hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:663)
      hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:638)
      hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:616)
      hudson.model.Build$RunnerImpl.post2(Build.java:161)
      hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:585)
      hudson.model.Run.run(Run.java:1399)
      hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      hudson.model.ResourceController.execute(ResourceController.java:88)
      hudson.model.Executor.run(Executor.java:145)
      Executor #9 for master : executing Run_Manual_SOAK #242 : waiting for Check point JUnit result archiving on Run_Manual_SOAK #241"

      All the stuck jobs are in the same place. They do eventually come unstuck, but can spend a long time (hours and sometimes a day or so) in this state.
      Machine load average is at 0.23 0.22 0.21.

          [JENKINS-10234] Junit result archiver getting stuck for a long time in concurrent builds

          Danny Staple added a comment -

          We know have understood exactly what happens here - it may not be a bug but a "feature".
          When concurrent jobs are started, it is possible, especially where parameterization affects the run time of a job, for a later build to finish before the earlier build. This means that when it reaches the archive stage, it is doing the junit analysis.
          Some of our tests take 20 minutes, and some 15 hours.

          Junit then tries to sort out regression. The oldest job will hold up the archiving of any newer ones while it waits to find this.
          We use junit as a convenient way to display results - but hadn't anticipated this behaviour.

          Danny Staple added a comment - We know have understood exactly what happens here - it may not be a bug but a "feature". When concurrent jobs are started, it is possible, especially where parameterization affects the run time of a job, for a later build to finish before the earlier build. This means that when it reaches the archive stage, it is doing the junit analysis. Some of our tests take 20 minutes, and some 15 hours. Junit then tries to sort out regression. The oldest job will hold up the archiving of any newer ones while it waits to find this. We use junit as a convenient way to display results - but hadn't anticipated this behaviour.

          Danny Staple added a comment -

          The answer we are now considering is to find (or make) a way to disable the regression checking behaviour of junit - preferably as an option we can set per job, so that other jobs that are sequential and not concurrent, or that should consistently take the same time, can have it enabled.

          Danny Staple added a comment - The answer we are now considering is to find (or make) a way to disable the regression checking behaviour of junit - preferably as an option we can set per job, so that other jobs that are sequential and not concurrent, or that should consistently take the same time, can have it enabled.

          We're also having this problem of different running times, as we sometimes are skipping a job in a chain of jobs.

          Danny, have you found any workaround for this?

          Temporarily disable the checkPoint waiting on certain jobs would help us. If the JUnitResultArchiver were a plugin it would simplify forking the feature.

          Jonas Eriksson added a comment - We're also having this problem of different running times, as we sometimes are skipping a job in a chain of jobs. Danny, have you found any workaround for this? Temporarily disable the checkPoint waiting on certain jobs would help us. If the JUnitResultArchiver were a plugin it would simplify forking the feature.

          Danny Staple added a comment -

          We are in the process of removing things using the concurrent build flag from our setup. It means more duplication and a violation of SPOT (Single point of truth), but concurrent builds have lead us to too many problems - including the far more serious #JENKINS-10615. We are looking into the viability of the templatised job plugin to prevent us duplicating stuff, and have most of our control logic in an SCM run within shell build steps.

          Danny Staple added a comment - We are in the process of removing things using the concurrent build flag from our setup. It means more duplication and a violation of SPOT (Single point of truth), but concurrent builds have lead us to too many problems - including the far more serious # JENKINS-10615 . We are looking into the viability of the templatised job plugin to prevent us duplicating stuff, and have most of our control logic in an SCM run within shell build steps.

          I've been playing around with Jenkins now and found out that even if I don't have JUnit Reports enabled I have another plugin (email ext) that will wait for the job to finish by using the checkPoint.

          I guess running concurrent parameterized builds in Jenkins is not fitting how the model is implemented in the first place.

          Jonas Eriksson added a comment - I've been playing around with Jenkins now and found out that even if I don't have JUnit Reports enabled I have another plugin (email ext) that will wait for the job to finish by using the checkPoint. I guess running concurrent parameterized builds in Jenkins is not fitting how the model is implemented in the first place.

          If the JUnit Reports is the only task holding you back from finishing a job when running concurrent builds I've found out that the xunit extension configured with Custom Tool is a solution to the checkPoint waiting problem.

          Jonas Eriksson added a comment - If the JUnit Reports is the only task holding you back from finishing a job when running concurrent builds I've found out that the xunit extension configured with Custom Tool is a solution to the checkPoint waiting problem.

          Inbar Rose added a comment -

          same problem here. total blocker. task A starts, then task B starts. task B reaches the 'Recording test results' stage and hangs until task A finishes. after testing with simple timed builds with many plugins/options enabled/disabled concluded that junit is the problem.

          Inbar Rose added a comment - same problem here. total blocker. task A starts, then task B starts. task B reaches the 'Recording test results' stage and hangs until task A finishes. after testing with simple timed builds with many plugins/options enabled/disabled concluded that junit is the problem.

          kutzi added a comment -

          IMO this is definitely a feature and not a bug. If you don't like this behaviour, then use e.g. the xunit plugin which doesn't seem to behave in this way.

          kutzi added a comment - IMO this is definitely a feature and not a bug. If you don't like this behaviour, then use e.g. the xunit plugin which doesn't seem to behave in this way.

          Marc Seeger added a comment - - edited

          How is this supposed to be a feature? A testrun that doesn't have anything to do with another testrun being blocked?
          This isn't even for parallel runs, this is for completely unrelated jobs.
          This just got closed as a duplicate. Is this in relation to JENKINS-9913?

          Marc Seeger added a comment - - edited How is this supposed to be a feature? A testrun that doesn't have anything to do with another testrun being blocked? This isn't even for parallel runs, this is for completely unrelated jobs. This just got closed as a duplicate. Is this in relation to JENKINS-9913 ?

          Andy Chen added a comment -

          I was redirected from another issue to this one. My problem is the result recording takes forever for some of my builds. The job in question is a parametrized concurrent job.

          Andy Chen added a comment - I was redirected from another issue to this one. My problem is the result recording takes forever for some of my builds. The job in question is a parametrized concurrent job.

          clint axeda added a comment -

          i have hit this issue tonight with Jenkins ver. 1.538. I captured a thread dump if interested?

          clint axeda added a comment - i have hit this issue tonight with Jenkins ver. 1.538. I captured a thread dump if interested?

          @clint axeda:
          can you make a CPU sample using visualVM to check whether you experience high cpu-time consumption in CipherInputStream.fill_buffer() (sort by column "Self Time (CPU)")?

          maybe we share the same root-cause: JENKINS-22297

          Klaus Azesberger added a comment - @clint axeda: can you make a CPU sample using visualVM to check whether you experience high cpu-time consumption in CipherInputStream.fill_buffer() (sort by column "Self Time (CPU)")? maybe we share the same root-cause: JENKINS-22297

          Jesse Glick added a comment -

          Since JENKINS-9913 is covering only the reporting of checkpoints, this should be reopened: JUnitResultArchiver.CHECKPOINT still exists, and probably should not.

          Needs to be determined if anything needs to be done to replace it, in case a build with a higher number in fact finishes before one with a lower number, so calculation of test regressions cannot be done accurately when the result is published (in case anyone even cares about build-to-build diffs for a concurrent-capable job). Until the earlier build finishes, will the later build’s test result display show any “regressions” (against the last completed build), or show no regressions ever, or throw exceptions? After the earlier build finishes, will the later’s result display show regressions against the earlier build, or against the last completed build at the time of this build’s completion, or do something else? In other words, are calls to getPreviousResult made on demand whenever a build-to-build diff is requested (great)? Or made once when the build completes (not great but adequate)? Or does something really break? My casual inspection of the code suggests that there is some improper caching (CaseResult.failedSince) but that code generally defends against a prior build having no test result action, meaning that simply deleting CHECKPOINT would cause little harm.

          Jesse Glick added a comment - Since JENKINS-9913 is covering only the reporting of checkpoints, this should be reopened: JUnitResultArchiver.CHECKPOINT still exists, and probably should not. Needs to be determined if anything needs to be done to replace it, in case a build with a higher number in fact finishes before one with a lower number, so calculation of test regressions cannot be done accurately when the result is published (in case anyone even cares about build-to-build diffs for a concurrent-capable job). Until the earlier build finishes, will the later build’s test result display show any “regressions” (against the last completed build), or show no regressions ever, or throw exceptions? After the earlier build finishes, will the later’s result display show regressions against the earlier build, or against the last completed build at the time of this build’s completion, or do something else? In other words, are calls to getPreviousResult made on demand whenever a build-to-build diff is requested (great)? Or made once when the build completes (not great but adequate)? Or does something really break? My casual inspection of the code suggests that there is some improper caching ( CaseResult.failedSince ) but that code generally defends against a prior build having no test result action, meaning that simply deleting CHECKPOINT would cause little harm.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          changelog.html
          core/src/main/java/hudson/tasks/junit/JUnitResultArchiver.java
          http://jenkins-ci.org/commit/jenkins/90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a
          Log:
          [FIXED JENKINS-10234] Removed checkpoint from JUnitResultArchiver.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html core/src/main/java/hudson/tasks/junit/JUnitResultArchiver.java http://jenkins-ci.org/commit/jenkins/90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a Log: [FIXED JENKINS-10234] Removed checkpoint from JUnitResultArchiver.

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3535
          [FIXED JENKINS-10234] Removed checkpoint from JUnitResultArchiver. (Revision 90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a)

          Result = SUCCESS
          Jesse Glick : 90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a
          Files :

          • changelog.html
          • core/src/main/java/hudson/tasks/junit/JUnitResultArchiver.java

          dogfood added a comment - Integrated in jenkins_main_trunk #3535 [FIXED JENKINS-10234] Removed checkpoint from JUnitResultArchiver. (Revision 90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a) Result = SUCCESS Jesse Glick : 90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a Files : changelog.html core/src/main/java/hudson/tasks/junit/JUnitResultArchiver.java

          Henrik Skupin added a comment -

          Any change that we could get this backported to the 1.565.x LTS branch? It's one of the most annoying problems for our Jenkins production systems.

          Henrik Skupin added a comment - Any change that we could get this backported to the 1.565.x LTS branch? It's one of the most annoying problems for our Jenkins production systems.

          Jesse Glick added a comment -

          Probably too late for 1.565.x and probably already in the next LTS, but marking it as a candidate just in case.

          Jesse Glick added a comment - Probably too late for 1.565.x and probably already in the next LTS, but marking it as a candidate just in case.

          Henrik Skupin added a comment -

          There is version 1.565.3 LTS to be scheduled for Oct 1st. Not sure when fixes are taking into consideration for LTS releases. Anything else beside the keyword we could try to get it in? The next major version bump for LTS will happen end of Oct, where this might be fixed.

          Henrik Skupin added a comment - There is version 1.565.3 LTS to be scheduled for Oct 1st. Not sure when fixes are taking into consideration for LTS releases. Anything else beside the keyword we could try to get it in? The next major version bump for LTS will happen end of Oct, where this might be fixed.

          Daniel Beck added a comment -

          Next LTS will be based on 1.580, and the 1.565.x line is done.

          Daniel Beck added a comment - Next LTS will be based on 1.580, and the 1.565.x line is done.

          Lan Wu added a comment - - edited

          Sorry new to Jenkins jira. We are hitting this bug, but I can't see from this listing which version this was fixed in. On this page, http://jenkins-ci.org/changelog-stable, I couldn't find issue 10234. Does that mean it's not in one of the LTS builds? Thanks!

          Lan Wu added a comment - - edited Sorry new to Jenkins jira. We are hitting this bug, but I can't see from this listing which version this was fixed in. On this page, http://jenkins-ci.org/changelog-stable , I couldn't find issue 10234. Does that mean it's not in one of the LTS builds? Thanks!

          Daniel Beck added a comment -

          Does that mean it's not in one of the LTS builds?

          No, it just means it has not specifically been fixed/backported for one of the LTS releases. It was fixed for 1.575 and is therefore in 1.580 and and the LTS releases based on that.

          Daniel Beck added a comment - Does that mean it's not in one of the LTS builds? No, it just means it has not specifically been fixed/backported for one of the LTS releases. It was fixed for 1.575 and is therefore in 1.580 and and the LTS releases based on that.

          Code changed in jenkins
          User: Martin Bektchiev
          Path:
          src/main/java/hudson/plugins/nunit/NUnitPublisher.java
          http://jenkins-ci.org/commit/nunit-plugin/1e89e0267814d280b7051d7d70d5d2939326b182
          Log:
          Do not wait for checkpoint from previous build

          Fixes an issue with result archiver getting stuck for a long time in concurrent builds.
          A similar issue has been fixed in the JUnit Jenkins plugin:

          https://issues.jenkins-ci.org/browse/JENKINS-10234
          https://github.com/jenkinsci/jenkins/commit/90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Martin Bektchiev Path: src/main/java/hudson/plugins/nunit/NUnitPublisher.java http://jenkins-ci.org/commit/nunit-plugin/1e89e0267814d280b7051d7d70d5d2939326b182 Log: Do not wait for checkpoint from previous build Fixes an issue with result archiver getting stuck for a long time in concurrent builds. A similar issue has been fixed in the JUnit Jenkins plugin: https://issues.jenkins-ci.org/browse/JENKINS-10234 https://github.com/jenkinsci/jenkins/commit/90ff9f806fcac1a58f4bd40bfcc4ed5273ff116a

            jglick Jesse Glick
            dannystaple Danny Staple
            Votes:
            15 Vote for this issue
            Watchers:
            30 Start watching this issue

              Created:
              Updated:
              Resolved: