Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-19518

"Take node offline on failure is waiting for a checkpoint on ... - the same build, on a different node

      I'm using Jenkins for continuous testing. A build is a test. I'm running build 100 (test 100) on node A, and it runs for 3 hours (or 2 days). In parallel, I'll run tests build 101 (test 100) on node B - and when it fails, it's 'stuck' with:
      Take node offline on failure is waiting for a checkpoint on test #100

      I'm not sure what checkpoint it's waiting for, but it does not end a failed job.
      There are no dependencies I'm aware of between them.

          [JENKINS-19518] "Take node offline on failure is waiting for a checkpoint on ... - the same build, on a different node

          Aaron Kushner added a comment -

          I've seen a similar issue where one stuck slave causes all other jobs of the same build to backup. For example, I had a hundred builds that looked to finished, but were still on the slaves and had this message on the console output:

          00:32:03.591 Editable Email Notification is waiting for a checkpoint on build_hhvm_fbcode #13105

          Build #13105 had the following message:

          01:26:19.644 Looks like the node went offline during the build. Check the slave log for the details.

          That machine had in fact been rebooted, but all of the other builds shouldn't have been waiting for it. Once I cancelled build 13105, all the other jobs completed.

          Aaron Kushner added a comment - I've seen a similar issue where one stuck slave causes all other jobs of the same build to backup. For example, I had a hundred builds that looked to finished, but were still on the slaves and had this message on the console output: 00:32:03.591 Editable Email Notification is waiting for a checkpoint on build_hhvm_fbcode #13105 Build #13105 had the following message: 01:26:19.644 Looks like the node went offline during the build. Check the slave log for the details. That machine had in fact been rebooted, but all of the other builds shouldn't have been waiting for it. Once I cancelled build 13105, all the other jobs completed.

          This seems to be an issue identical to JENKINS-17507 of the plugin Text finder. In the Text finder plugin version 1.10, it was resolved by having getRequiredMonitorService() return BuildStepMonitor.NONE instead of BuildStepMonitor.BUILD .

          From my undestanding of the getRequiredMonitorService() documentation, returning BuildStepMonitor.NONE (the default value) would also be the correct behavior for this plugin.

          Ami Castonguay added a comment - This seems to be an issue identical to JENKINS-17507 of the plugin Text finder. In the Text finder plugin version 1.10, it was resolved by having getRequiredMonitorService() return BuildStepMonitor.NONE instead of BuildStepMonitor.BUILD . From my undestanding of the getRequiredMonitorService() documentation , returning BuildStepMonitor.NONE (the default value) would also be the correct behavior for this plugin.

          Ami Castonguay added a comment - Pull of fix to main repo requested

          Leah Zagreus added a comment -

          Any word on the pull request?

          Leah Zagreus added a comment - Any word on the pull request?

          Ami Castonguay added a comment - - edited

          Nope, plugin manager is not active anymore. You can either use the build from the pull request or become the plugin manager to perform the pull.

          Ami Castonguay added a comment - - edited Nope, plugin manager is not active anymore. You can either use the build from the pull request or become the plugin manager to perform the pull.

            Unassigned Unassigned
            ykaul Yaniv Kaul
            Votes:
            5 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: