Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-11936

Slave hang for a long time when a job is completes

      I noticed that on several jobs running on different Windows slaves, it takes a long time for the job to end, as seen in the following:

      13:53:35 D:\Jenkins\workspace\CES7.0R_InstallKit>exit 0
      14:09:36 [WARNINGS] Parsing warnings in console log with parsers [MSBuild]
      14:09:36 [WARNINGS] MSBuild : Found 24 warnings.

      I saw another issue regarding the archiving of artifacts #7641, but in our case, this specific job does not have artifacts enabled.

      Also, I noticed this on different jobs, some are builds and some are tests.

      Anything I can do to help debug this?

      ---------------------------------

      I stopped a build that was hung and I got this call stack:

      10:48:26 f:\Jenkins\workspace\CES7.0Dx86_WIN>exit 9009
      10:48:27 Build step 'Execute Windows batch command' marked build as failure
      11:07:55 ERROR: Publisher hudson.plugins.warnings.WarningsPublisher aborted due to exception
      11:07:55 java.lang.InterruptedException
      11:07:55 at java.lang.Object.wait(Native Method)
      11:07:55 at java.lang.Object.wait(Object.java:485)
      11:07:55 at hudson.model.Run$Runner$CheckpointSet.waitForCheckPoint(Run.java:1295)
      11:07:55 at hudson.model.Run.waitForCheckpoint(Run.java:1263)
      11:07:55 at hudson.model.CheckPoint.block(CheckPoint.java:144)
      11:07:55 at hudson.tasks.BuildStepMonitor$2.perform(BuildStepMonitor.java:25)
      11:07:55 at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:692)
      11:07:55 at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:667)
      11:07:55 at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:645)
      11:07:55 at hudson.model.Build$RunnerImpl.post2(Build.java:162)
      11:07:55 at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:614)
      11:07:55 at hudson.model.Run.run(Run.java:1429)
      11:07:55 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      11:07:55 at hudson.model.ResourceController.execute(ResourceController.java:88)
      11:07:55 at hudson.model.Executor.run(Executor.java:238)

      Looks like the culprit is the WarningsPublisher plugin. I'll disable it for now.

          [JENKINS-11936] Slave hang for a long time when a job is completes

          Marc Sanfacon created issue -
          Marc Sanfacon made changes -
          Description Original: I noticed that on several jobs running on different Windows slaves, it takes a long time for the job to end, as seen in the following:

          {quote}
          13:53:35 D:\Jenkins\workspace\CES7.0R_InstallKit>exit 0
          14:09:36 [WARNINGS] Parsing warnings in console log with parsers [MSBuild]
          14:09:36 [WARNINGS] MSBuild : Found 24 warnings.
          {quote}

          I saw another issue regarding the archiving of artifacts #7641, but in our case, this specific job does not have artifacts enabled.

          Also, I noticed this on different jobs, some are builds and some are tests.

          Anything I can do to help debug this?
          New: I noticed that on several jobs running on different Windows slaves, it takes a long time for the job to end, as seen in the following:

          {quote}
          13:53:35 D:\Jenkins\workspace\CES7.0R_InstallKit>exit 0
          14:09:36 [WARNINGS] Parsing warnings in console log with parsers [MSBuild]
          14:09:36 [WARNINGS] MSBuild : Found 24 warnings.
          {quote}

          I saw another issue regarding the archiving of artifacts #7641, but in our case, this specific job does not have artifacts enabled.

          Also, I noticed this on different jobs, some are builds and some are tests.

          Anything I can do to help debug this?

          ---------------------------------

          I stopped a build that was hung and I got this call stack:

          {quote}
          10:48:26 f:\Jenkins\workspace\CES7.0Dx86_WIN>exit 9009
          10:48:27 Build step 'Execute Windows batch command' marked build as failure
          11:07:55 ERROR: Publisher hudson.plugins.warnings.WarningsPublisher aborted due to exception
          11:07:55 java.lang.InterruptedException
          11:07:55 at java.lang.Object.wait(Native Method)
          11:07:55 at java.lang.Object.wait(Object.java:485)
          11:07:55 at hudson.model.Run$Runner$CheckpointSet.waitForCheckPoint(Run.java:1295)
          11:07:55 at hudson.model.Run.waitForCheckpoint(Run.java:1263)
          11:07:55 at hudson.model.CheckPoint.block(CheckPoint.java:144)
          11:07:55 at hudson.tasks.BuildStepMonitor$2.perform(BuildStepMonitor.java:25)
          11:07:55 at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:692)
          11:07:55 at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:667)
          11:07:55 at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:645)
          11:07:55 at hudson.model.Build$RunnerImpl.post2(Build.java:162)
          11:07:55 at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:614)
          11:07:55 at hudson.model.Run.run(Run.java:1429)
          11:07:55 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          11:07:55 at hudson.model.ResourceController.execute(ResourceController.java:88)
          11:07:55 at hudson.model.Executor.run(Executor.java:238)
          {quote}

          Looks like the culprit is the WarningsPublisher plugin. I'll disable it for now.
          Marc Sanfacon made changes -
          Labels Original: hang slave windows New: hang slave warnings windows

          evernat added a comment -

          Is it reproduced with a recent Jenkins version?

          evernat added a comment - Is it reproduced with a recent Jenkins version?

          Tom Clift added a comment -

          Reproduced on Jenkins 1.541.

          Occurs for two separate jobs, both of which run on the same Windows build slave. E.g.:

          13:05:02 E:\jenkins\workspace\Project>exit 0 
          13:10:03 Notifying upstream projects of job completion
          13:10:03 Finished: SUCCESS
          

          These jobs are started via "Trigger/call builds on other projects" option from the "Parameterized Trigger Plugin" plugin. One has "Block until the triggered projects finish their builds" enabled, and the other has it disabled.

          Not using plugin "Warnings Plugin".

          Reproducible always.

          Tom Clift added a comment - Reproduced on Jenkins 1.541. Occurs for two separate jobs, both of which run on the same Windows build slave. E.g.: 13:05:02 E:\jenkins\workspace\Project>exit 0 13:10:03 Notifying upstream projects of job completion 13:10:03 Finished: SUCCESS These jobs are started via "Trigger/call builds on other projects" option from the "Parameterized Trigger Plugin" plugin. One has "Block until the triggered projects finish their builds" enabled, and the other has it disabled. Not using plugin "Warnings Plugin". Reproducible always.

          Tom Clift added a comment - - edited

          We tracked this down to the Disk Usage Plugin plugin (i.e. disabling this plugin removed the delay at job completion for us).

          It also turns out that this was not specific to Windows slaves for us.

          Sounds like the bug is common to at least two plugins, or perhaps a combination of plugins. I guess they are doing something blocking with a timeout? It's suspicious in our case that the delay time was very often almost exactly 5 minutes.

          Tom Clift added a comment - - edited We tracked this down to the Disk Usage Plugin plugin (i.e. disabling this plugin removed the delay at job completion for us). It also turns out that this was not specific to Windows slaves for us. Sounds like the bug is common to at least two plugins, or perhaps a combination of plugins. I guess they are doing something blocking with a timeout? It's suspicious in our case that the delay time was very often almost exactly 5 minutes.

          evernat added a comment -

          @Tom
          Ok. You have said Jenkins 1.541 with slaves.
          It would also be good to know the version of the Disk Usage Plugin which was causing the issue.

          evernat added a comment - @Tom Ok. You have said Jenkins 1.541 with slaves. It would also be good to know the version of the Disk Usage Plugin which was causing the issue.

          Tom Clift added a comment -

          Of course, sorry, we were using "Jenkins disk-usage plugin" version 0.23, released 2013-11-12. It's the latest release on the plugin's wiki page, but there is no changelog on that page for 0.22 or 0.23.

          Tom Clift added a comment - Of course, sorry, we were using "Jenkins disk-usage plugin" version 0.23, released 2013-11-12. It's the latest release on the plugin's wiki page , but there is no changelog on that page for 0.22 or 0.23.

          Alex Rodrigues added a comment - - edited

          I am using Jenkins v1.509.3 (with Disk Usage v.023) and see a similar issue. I have 2 remote RHEL Linux nodes and see this issue only when I try to run jobs on 1 node but not on the other. Uninstalling the disk-usage plugin fixed my issue, and I then uninstalled v.017 and the issue has not returned.

          Alex Rodrigues added a comment - - edited I am using Jenkins v1.509.3 (with Disk Usage v.023) and see a similar issue. I have 2 remote RHEL Linux nodes and see this issue only when I try to run jobs on 1 node but not on the other. Uninstalling the disk-usage plugin fixed my issue, and I then uninstalled v.017 and the issue has not returned.

          Daniel Beck added a comment -

          What's mentioned in recent comments is completely unrelated to the reported issue: Disk Usage plugin started to get the workspace size at the end of a build in recent versions. There's an option to turn that off, but according to Andrew Bayer in his talk at JUC Berlin, it doesn't work.

          Resolving this issue as there's actually not been confirmation that this issue still exists as asked by evernat over a year ago.


          If you experience a problem that looks like this and are using a recent version of Disk Usage Plugin, it's a different issue. Disable or downgrade that plugin.

          Daniel Beck added a comment - What's mentioned in recent comments is completely unrelated to the reported issue: Disk Usage plugin started to get the workspace size at the end of a build in recent versions. There's an option to turn that off, but according to Andrew Bayer in his talk at JUC Berlin, it doesn't work. Resolving this issue as there's actually not been confirmation that this issue still exists as asked by evernat over a year ago. If you experience a problem that looks like this and are using a recent version of Disk Usage Plugin, it's a different issue. Disable or downgrade that plugin.
          Daniel Beck made changes -
          Resolution New: Cannot Reproduce [ 5 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]

            kdsweeney kdsweeney
            marcsanfacon Marc Sanfacon
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: