Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-11936

Slave hang for a long time when a job is completes

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      I noticed that on several jobs running on different Windows slaves, it takes a long time for the job to end, as seen in the following:

      13:53:35 D:\Jenkins\workspace\CES7.0R_InstallKit>exit 0
      14:09:36 [WARNINGS] Parsing warnings in console log with parsers [MSBuild]
      14:09:36 [WARNINGS] MSBuild : Found 24 warnings.

      I saw another issue regarding the archiving of artifacts #7641, but in our case, this specific job does not have artifacts enabled.

      Also, I noticed this on different jobs, some are builds and some are tests.

      Anything I can do to help debug this?

      ---------------------------------

      I stopped a build that was hung and I got this call stack:

      10:48:26 f:\Jenkins\workspace\CES7.0Dx86_WIN>exit 9009
      10:48:27 Build step 'Execute Windows batch command' marked build as failure
      11:07:55 ERROR: Publisher hudson.plugins.warnings.WarningsPublisher aborted due to exception
      11:07:55 java.lang.InterruptedException
      11:07:55 at java.lang.Object.wait(Native Method)
      11:07:55 at java.lang.Object.wait(Object.java:485)
      11:07:55 at hudson.model.Run$Runner$CheckpointSet.waitForCheckPoint(Run.java:1295)
      11:07:55 at hudson.model.Run.waitForCheckpoint(Run.java:1263)
      11:07:55 at hudson.model.CheckPoint.block(CheckPoint.java:144)
      11:07:55 at hudson.tasks.BuildStepMonitor$2.perform(BuildStepMonitor.java:25)
      11:07:55 at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:692)
      11:07:55 at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:667)
      11:07:55 at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:645)
      11:07:55 at hudson.model.Build$RunnerImpl.post2(Build.java:162)
      11:07:55 at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:614)
      11:07:55 at hudson.model.Run.run(Run.java:1429)
      11:07:55 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      11:07:55 at hudson.model.ResourceController.execute(ResourceController.java:88)
      11:07:55 at hudson.model.Executor.run(Executor.java:238)

      Looks like the culprit is the WarningsPublisher plugin. I'll disable it for now.

        Attachments

          Activity

          marcsanfacon Marc Sanfacon created issue -
          marcsanfacon Marc Sanfacon made changes -
          Field Original Value New Value
          Description I noticed that on several jobs running on different Windows slaves, it takes a long time for the job to end, as seen in the following:

          {quote}
          13:53:35 D:\Jenkins\workspace\CES7.0R_InstallKit>exit 0
          14:09:36 [WARNINGS] Parsing warnings in console log with parsers [MSBuild]
          14:09:36 [WARNINGS] MSBuild : Found 24 warnings.
          {quote}

          I saw another issue regarding the archiving of artifacts #7641, but in our case, this specific job does not have artifacts enabled.

          Also, I noticed this on different jobs, some are builds and some are tests.

          Anything I can do to help debug this?
          I noticed that on several jobs running on different Windows slaves, it takes a long time for the job to end, as seen in the following:

          {quote}
          13:53:35 D:\Jenkins\workspace\CES7.0R_InstallKit>exit 0
          14:09:36 [WARNINGS] Parsing warnings in console log with parsers [MSBuild]
          14:09:36 [WARNINGS] MSBuild : Found 24 warnings.
          {quote}

          I saw another issue regarding the archiving of artifacts #7641, but in our case, this specific job does not have artifacts enabled.

          Also, I noticed this on different jobs, some are builds and some are tests.

          Anything I can do to help debug this?

          ---------------------------------

          I stopped a build that was hung and I got this call stack:

          {quote}
          10:48:26 f:\Jenkins\workspace\CES7.0Dx86_WIN>exit 9009
          10:48:27 Build step 'Execute Windows batch command' marked build as failure
          11:07:55 ERROR: Publisher hudson.plugins.warnings.WarningsPublisher aborted due to exception
          11:07:55 java.lang.InterruptedException
          11:07:55 at java.lang.Object.wait(Native Method)
          11:07:55 at java.lang.Object.wait(Object.java:485)
          11:07:55 at hudson.model.Run$Runner$CheckpointSet.waitForCheckPoint(Run.java:1295)
          11:07:55 at hudson.model.Run.waitForCheckpoint(Run.java:1263)
          11:07:55 at hudson.model.CheckPoint.block(CheckPoint.java:144)
          11:07:55 at hudson.tasks.BuildStepMonitor$2.perform(BuildStepMonitor.java:25)
          11:07:55 at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:692)
          11:07:55 at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:667)
          11:07:55 at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:645)
          11:07:55 at hudson.model.Build$RunnerImpl.post2(Build.java:162)
          11:07:55 at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:614)
          11:07:55 at hudson.model.Run.run(Run.java:1429)
          11:07:55 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          11:07:55 at hudson.model.ResourceController.execute(ResourceController.java:88)
          11:07:55 at hudson.model.Executor.run(Executor.java:238)
          {quote}

          Looks like the culprit is the WarningsPublisher plugin. I'll disable it for now.
          marcsanfacon Marc Sanfacon made changes -
          Labels hang slave windows hang slave warnings windows
          Hide
          evernat evernat added a comment -

          Is it reproduced with a recent Jenkins version?

          Show
          evernat evernat added a comment - Is it reproduced with a recent Jenkins version?
          Hide
          tomclift Tom Clift added a comment -

          Reproduced on Jenkins 1.541.

          Occurs for two separate jobs, both of which run on the same Windows build slave. E.g.:

          13:05:02 E:\jenkins\workspace\Project>exit 0 
          13:10:03 Notifying upstream projects of job completion
          13:10:03 Finished: SUCCESS
          

          These jobs are started via "Trigger/call builds on other projects" option from the "Parameterized Trigger Plugin" plugin. One has "Block until the triggered projects finish their builds" enabled, and the other has it disabled.

          Not using plugin "Warnings Plugin".

          Reproducible always.

          Show
          tomclift Tom Clift added a comment - Reproduced on Jenkins 1.541. Occurs for two separate jobs, both of which run on the same Windows build slave. E.g.: 13:05:02 E:\jenkins\workspace\Project>exit 0 13:10:03 Notifying upstream projects of job completion 13:10:03 Finished: SUCCESS These jobs are started via "Trigger/call builds on other projects" option from the "Parameterized Trigger Plugin" plugin. One has "Block until the triggered projects finish their builds" enabled, and the other has it disabled. Not using plugin "Warnings Plugin". Reproducible always.
          Hide
          tomclift Tom Clift added a comment - - edited

          We tracked this down to the Disk Usage Plugin plugin (i.e. disabling this plugin removed the delay at job completion for us).

          It also turns out that this was not specific to Windows slaves for us.

          Sounds like the bug is common to at least two plugins, or perhaps a combination of plugins. I guess they are doing something blocking with a timeout? It's suspicious in our case that the delay time was very often almost exactly 5 minutes.

          Show
          tomclift Tom Clift added a comment - - edited We tracked this down to the Disk Usage Plugin plugin (i.e. disabling this plugin removed the delay at job completion for us). It also turns out that this was not specific to Windows slaves for us. Sounds like the bug is common to at least two plugins, or perhaps a combination of plugins. I guess they are doing something blocking with a timeout? It's suspicious in our case that the delay time was very often almost exactly 5 minutes.
          Hide
          evernat evernat added a comment -

          @Tom
          Ok. You have said Jenkins 1.541 with slaves.
          It would also be good to know the version of the Disk Usage Plugin which was causing the issue.

          Show
          evernat evernat added a comment - @Tom Ok. You have said Jenkins 1.541 with slaves. It would also be good to know the version of the Disk Usage Plugin which was causing the issue.
          Hide
          tomclift Tom Clift added a comment -

          Of course, sorry, we were using "Jenkins disk-usage plugin" version 0.23, released 2013-11-12. It's the latest release on the plugin's wiki page, but there is no changelog on that page for 0.22 or 0.23.

          Show
          tomclift Tom Clift added a comment - Of course, sorry, we were using "Jenkins disk-usage plugin" version 0.23, released 2013-11-12. It's the latest release on the plugin's wiki page , but there is no changelog on that page for 0.22 or 0.23.
          Hide
          rodriguesalex Alex Rodrigues added a comment - - edited

          I am using Jenkins v1.509.3 (with Disk Usage v.023) and see a similar issue. I have 2 remote RHEL Linux nodes and see this issue only when I try to run jobs on 1 node but not on the other. Uninstalling the disk-usage plugin fixed my issue, and I then uninstalled v.017 and the issue has not returned.

          Show
          rodriguesalex Alex Rodrigues added a comment - - edited I am using Jenkins v1.509.3 (with Disk Usage v.023) and see a similar issue. I have 2 remote RHEL Linux nodes and see this issue only when I try to run jobs on 1 node but not on the other. Uninstalling the disk-usage plugin fixed my issue, and I then uninstalled v.017 and the issue has not returned.
          Hide
          danielbeck Daniel Beck added a comment -

          What's mentioned in recent comments is completely unrelated to the reported issue: Disk Usage plugin started to get the workspace size at the end of a build in recent versions. There's an option to turn that off, but according to Andrew Bayer in his talk at JUC Berlin, it doesn't work.

          Resolving this issue as there's actually not been confirmation that this issue still exists as asked by evernat over a year ago.


          If you experience a problem that looks like this and are using a recent version of Disk Usage Plugin, it's a different issue. Disable or downgrade that plugin.

          Show
          danielbeck Daniel Beck added a comment - What's mentioned in recent comments is completely unrelated to the reported issue: Disk Usage plugin started to get the workspace size at the end of a build in recent versions. There's an option to turn that off, but according to Andrew Bayer in his talk at JUC Berlin, it doesn't work. Resolving this issue as there's actually not been confirmation that this issue still exists as asked by evernat over a year ago. If you experience a problem that looks like this and are using a recent version of Disk Usage Plugin, it's a different issue. Disable or downgrade that plugin.
          danielbeck Daniel Beck made changes -
          Resolution Cannot Reproduce [ 5 ]
          Status Open [ 1 ] Resolved [ 5 ]
          Hide
          rodriguesalex Alex Rodrigues added a comment - - edited

          I sorry Daniel but I do not understand your last statement.

          Why do you feel that the later versions of Disk Usage Plugin are not an issue in spite of seeing that uninstalling/or downgrading it (i.e. the Disk Usage Plugin) removes the hang/delay? Isn't the fact that I am seeing the same issue evernat saw a year ago enough confirmation that this issue still exists?

          Anyway, I had earlier reported that this issue occurs on one remote node but not the other. the remote node where this occurs has much less free space (i.e. <40%) compared to the other node (> 70%) where this works fine. It appears that the performance of plugin may be affected by the disk usage on the remote node and therefore one might see differing results based on the node the job is run on

          Show
          rodriguesalex Alex Rodrigues added a comment - - edited I sorry Daniel but I do not understand your last statement. Why do you feel that the later versions of Disk Usage Plugin are not an issue in spite of seeing that uninstalling/or downgrading it (i.e. the Disk Usage Plugin) removes the hang/delay? Isn't the fact that I am seeing the same issue evernat saw a year ago enough confirmation that this issue still exists? Anyway, I had earlier reported that this issue occurs on one remote node but not the other. the remote node where this occurs has much less free space (i.e. <40%) compared to the other node (> 70%) where this works fine. It appears that the performance of plugin may be affected by the disk usage on the remote node and therefore one might see differing results based on the node the job is run on
          Hide
          danielbeck Daniel Beck added a comment -

          Alex: Read the original issue report: That indicated it's Warnings Plugin both in the build log and the stack trace. Tom Clift experienced a different problem (with Disk Usage Plugin of a version that didn't yet exist when this issue was filed!) that is actually tracked as JENKINS-23347.

          Show
          danielbeck Daniel Beck added a comment - Alex: Read the original issue report: That indicated it's Warnings Plugin both in the build log and the stack trace. Tom Clift experienced a different problem (with Disk Usage Plugin of a version that didn't yet exist when this issue was filed!) that is actually tracked as JENKINS-23347 .
          rtyler R. Tyler Croy made changes -
          Workflow JNJira [ 142179 ] JNJira + In-Review [ 190019 ]

            People

            Assignee:
            kdsweeney kdsweeney
            Reporter:
            marcsanfacon Marc Sanfacon
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: