Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-11936

Slave hang for a long time when a job is completes

      I noticed that on several jobs running on different Windows slaves, it takes a long time for the job to end, as seen in the following:

      13:53:35 D:\Jenkins\workspace\CES7.0R_InstallKit>exit 0
      14:09:36 [WARNINGS] Parsing warnings in console log with parsers [MSBuild]
      14:09:36 [WARNINGS] MSBuild : Found 24 warnings.

      I saw another issue regarding the archiving of artifacts #7641, but in our case, this specific job does not have artifacts enabled.

      Also, I noticed this on different jobs, some are builds and some are tests.

      Anything I can do to help debug this?

      ---------------------------------

      I stopped a build that was hung and I got this call stack:

      10:48:26 f:\Jenkins\workspace\CES7.0Dx86_WIN>exit 9009
      10:48:27 Build step 'Execute Windows batch command' marked build as failure
      11:07:55 ERROR: Publisher hudson.plugins.warnings.WarningsPublisher aborted due to exception
      11:07:55 java.lang.InterruptedException
      11:07:55 at java.lang.Object.wait(Native Method)
      11:07:55 at java.lang.Object.wait(Object.java:485)
      11:07:55 at hudson.model.Run$Runner$CheckpointSet.waitForCheckPoint(Run.java:1295)
      11:07:55 at hudson.model.Run.waitForCheckpoint(Run.java:1263)
      11:07:55 at hudson.model.CheckPoint.block(CheckPoint.java:144)
      11:07:55 at hudson.tasks.BuildStepMonitor$2.perform(BuildStepMonitor.java:25)
      11:07:55 at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:692)
      11:07:55 at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:667)
      11:07:55 at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:645)
      11:07:55 at hudson.model.Build$RunnerImpl.post2(Build.java:162)
      11:07:55 at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:614)
      11:07:55 at hudson.model.Run.run(Run.java:1429)
      11:07:55 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      11:07:55 at hudson.model.ResourceController.execute(ResourceController.java:88)
      11:07:55 at hudson.model.Executor.run(Executor.java:238)

      Looks like the culprit is the WarningsPublisher plugin. I'll disable it for now.

          [JENKINS-11936] Slave hang for a long time when a job is completes

          evernat added a comment -

          Is it reproduced with a recent Jenkins version?

          evernat added a comment - Is it reproduced with a recent Jenkins version?

          Tom Clift added a comment -

          Reproduced on Jenkins 1.541.

          Occurs for two separate jobs, both of which run on the same Windows build slave. E.g.:

          13:05:02 E:\jenkins\workspace\Project>exit 0 
          13:10:03 Notifying upstream projects of job completion
          13:10:03 Finished: SUCCESS
          

          These jobs are started via "Trigger/call builds on other projects" option from the "Parameterized Trigger Plugin" plugin. One has "Block until the triggered projects finish their builds" enabled, and the other has it disabled.

          Not using plugin "Warnings Plugin".

          Reproducible always.

          Tom Clift added a comment - Reproduced on Jenkins 1.541. Occurs for two separate jobs, both of which run on the same Windows build slave. E.g.: 13:05:02 E:\jenkins\workspace\Project>exit 0 13:10:03 Notifying upstream projects of job completion 13:10:03 Finished: SUCCESS These jobs are started via "Trigger/call builds on other projects" option from the "Parameterized Trigger Plugin" plugin. One has "Block until the triggered projects finish their builds" enabled, and the other has it disabled. Not using plugin "Warnings Plugin". Reproducible always.

          Tom Clift added a comment - - edited

          We tracked this down to the Disk Usage Plugin plugin (i.e. disabling this plugin removed the delay at job completion for us).

          It also turns out that this was not specific to Windows slaves for us.

          Sounds like the bug is common to at least two plugins, or perhaps a combination of plugins. I guess they are doing something blocking with a timeout? It's suspicious in our case that the delay time was very often almost exactly 5 minutes.

          Tom Clift added a comment - - edited We tracked this down to the Disk Usage Plugin plugin (i.e. disabling this plugin removed the delay at job completion for us). It also turns out that this was not specific to Windows slaves for us. Sounds like the bug is common to at least two plugins, or perhaps a combination of plugins. I guess they are doing something blocking with a timeout? It's suspicious in our case that the delay time was very often almost exactly 5 minutes.

          evernat added a comment -

          @Tom
          Ok. You have said Jenkins 1.541 with slaves.
          It would also be good to know the version of the Disk Usage Plugin which was causing the issue.

          evernat added a comment - @Tom Ok. You have said Jenkins 1.541 with slaves. It would also be good to know the version of the Disk Usage Plugin which was causing the issue.

          Tom Clift added a comment -

          Of course, sorry, we were using "Jenkins disk-usage plugin" version 0.23, released 2013-11-12. It's the latest release on the plugin's wiki page, but there is no changelog on that page for 0.22 or 0.23.

          Tom Clift added a comment - Of course, sorry, we were using "Jenkins disk-usage plugin" version 0.23, released 2013-11-12. It's the latest release on the plugin's wiki page , but there is no changelog on that page for 0.22 or 0.23.

          Alex Rodrigues added a comment - - edited

          I am using Jenkins v1.509.3 (with Disk Usage v.023) and see a similar issue. I have 2 remote RHEL Linux nodes and see this issue only when I try to run jobs on 1 node but not on the other. Uninstalling the disk-usage plugin fixed my issue, and I then uninstalled v.017 and the issue has not returned.

          Alex Rodrigues added a comment - - edited I am using Jenkins v1.509.3 (with Disk Usage v.023) and see a similar issue. I have 2 remote RHEL Linux nodes and see this issue only when I try to run jobs on 1 node but not on the other. Uninstalling the disk-usage plugin fixed my issue, and I then uninstalled v.017 and the issue has not returned.

          Daniel Beck added a comment -

          What's mentioned in recent comments is completely unrelated to the reported issue: Disk Usage plugin started to get the workspace size at the end of a build in recent versions. There's an option to turn that off, but according to Andrew Bayer in his talk at JUC Berlin, it doesn't work.

          Resolving this issue as there's actually not been confirmation that this issue still exists as asked by evernat over a year ago.


          If you experience a problem that looks like this and are using a recent version of Disk Usage Plugin, it's a different issue. Disable or downgrade that plugin.

          Daniel Beck added a comment - What's mentioned in recent comments is completely unrelated to the reported issue: Disk Usage plugin started to get the workspace size at the end of a build in recent versions. There's an option to turn that off, but according to Andrew Bayer in his talk at JUC Berlin, it doesn't work. Resolving this issue as there's actually not been confirmation that this issue still exists as asked by evernat over a year ago. If you experience a problem that looks like this and are using a recent version of Disk Usage Plugin, it's a different issue. Disable or downgrade that plugin.

          Alex Rodrigues added a comment - - edited

          I sorry Daniel but I do not understand your last statement.

          Why do you feel that the later versions of Disk Usage Plugin are not an issue in spite of seeing that uninstalling/or downgrading it (i.e. the Disk Usage Plugin) removes the hang/delay? Isn't the fact that I am seeing the same issue evernat saw a year ago enough confirmation that this issue still exists?

          Anyway, I had earlier reported that this issue occurs on one remote node but not the other. the remote node where this occurs has much less free space (i.e. <40%) compared to the other node (> 70%) where this works fine. It appears that the performance of plugin may be affected by the disk usage on the remote node and therefore one might see differing results based on the node the job is run on

          Alex Rodrigues added a comment - - edited I sorry Daniel but I do not understand your last statement. Why do you feel that the later versions of Disk Usage Plugin are not an issue in spite of seeing that uninstalling/or downgrading it (i.e. the Disk Usage Plugin) removes the hang/delay? Isn't the fact that I am seeing the same issue evernat saw a year ago enough confirmation that this issue still exists? Anyway, I had earlier reported that this issue occurs on one remote node but not the other. the remote node where this occurs has much less free space (i.e. <40%) compared to the other node (> 70%) where this works fine. It appears that the performance of plugin may be affected by the disk usage on the remote node and therefore one might see differing results based on the node the job is run on

          Daniel Beck added a comment -

          Alex: Read the original issue report: That indicated it's Warnings Plugin both in the build log and the stack trace. Tom Clift experienced a different problem (with Disk Usage Plugin of a version that didn't yet exist when this issue was filed!) that is actually tracked as JENKINS-23347.

          Daniel Beck added a comment - Alex: Read the original issue report: That indicated it's Warnings Plugin both in the build log and the stack trace. Tom Clift experienced a different problem (with Disk Usage Plugin of a version that didn't yet exist when this issue was filed!) that is actually tracked as JENKINS-23347 .

            kdsweeney kdsweeney
            marcsanfacon Marc Sanfacon
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: