Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-17590

Builds fail because of "slave went offline during the build"

      Several times now I have builds that fail because of a "Looks like the node went offline during the build. Check the slave log for the details." message.

      That slave (a swarm slave) is still connected, but we want to reboot it's host. We have switched it to offline because it still had builds running, and we wanted to wait till they finish, but not have it accept new builds (that's the purpose of Offline, yes?)

      Of course that whole purpose is defeated if switching the slave to Offline also causes running builds to fail.

      Expected:

      • Have a way to cleanly shut down and disconnect an existing slave that has builds running, without disturbing its running builds in any way. Currently that is not possible.

          [JENKINS-17590] Builds fail because of "slave went offline during the build"

          Marc Günther added a comment -

          Example, this is from a job with a Maven buildstep followed by the following postbuild actions, Archive Artifact, Publish HTML, Publish JUnit:

          ...
          [INFO] BUILD SUCCESSFUL
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 27 minutes 17 seconds
          [INFO] Finished at: Fri Apr 12 15:20:59 CEST 2013
          [INFO] Final Memory: 50M/496M
          [INFO] ------------------------------------------------------------------------
          Looks like the node went offline during the build. Check the slave log for the details.FATAL: null
          java.lang.NullPointerException
          

          Marc Günther added a comment - Example, this is from a job with a Maven buildstep followed by the following postbuild actions, Archive Artifact, Publish HTML, Publish JUnit: ... [INFO] BUILD SUCCESSFUL [INFO] ------------------------------------------------------------------------ [INFO] Total time: 27 minutes 17 seconds [INFO] Finished at: Fri Apr 12 15:20:59 CEST 2013 [INFO] Final Memory: 50M/496M [INFO] ------------------------------------------------------------------------ Looks like the node went offline during the build. Check the slave log for the details.FATAL: null java.lang.NullPointerException

          I'm seeing this also when running 1.515 and the previous 2 or 3 releases before. Output found in the console log below. The slave log didn't have any other details.

          ////////

          Looks like the node went offline during the build. Check the slave log for the details.FATAL: null
          java.lang.NullPointerException
          at hudson.plugins.timestamper.annotator.TimestampAnnotatorFactory.getOffset(TimestampAnnotatorFactory.java:65)
          at hudson.plugins.timestamper.annotator.TimestampAnnotatorFactory.newInstance(TimestampAnnotatorFactory.java:52)
          at hudson.console.ConsoleAnnotator._for(ConsoleAnnotator.java:143)
          at hudson.console.ConsoleAnnotator.initial(ConsoleAnnotator.java:133)
          at hudson.console.AnnotatedLargeText.createAnnotator(AnnotatedLargeText.java:140)
          at hudson.console.AnnotatedLargeText.writeHtmlTo(AnnotatedLargeText.java:157)
          at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:599)
          at hudson.model.Run.execute(Run.java:1575)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:241)

          Andrew Erickson added a comment - I'm seeing this also when running 1.515 and the previous 2 or 3 releases before. Output found in the console log below. The slave log didn't have any other details. //////// Looks like the node went offline during the build. Check the slave log for the details.FATAL: null java.lang.NullPointerException at hudson.plugins.timestamper.annotator.TimestampAnnotatorFactory.getOffset(TimestampAnnotatorFactory.java:65) at hudson.plugins.timestamper.annotator.TimestampAnnotatorFactory.newInstance(TimestampAnnotatorFactory.java:52) at hudson.console.ConsoleAnnotator._for(ConsoleAnnotator.java:143) at hudson.console.ConsoleAnnotator.initial(ConsoleAnnotator.java:133) at hudson.console.AnnotatedLargeText.createAnnotator(AnnotatedLargeText.java:140) at hudson.console.AnnotatedLargeText.writeHtmlTo(AnnotatedLargeText.java:157) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:599) at hudson.model.Run.execute(Run.java:1575) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:241)

          I'm going to up the priority on this to 'Major'. It affects whether or not jobs pass or fail which seems like core functionality.

          This used to work fine... the job could still pass even if it was taken offline.

          We frequently take nodes offline before the job finishes so that we can inspect the system state (with long running jobs)... so we see this more often and it's somewhat painful to have to rerun to get a blue/green ball even if it succeeded.

          Andrew Erickson added a comment - I'm going to up the priority on this to 'Major'. It affects whether or not jobs pass or fail which seems like core functionality. This used to work fine... the job could still pass even if it was taken offline. We frequently take nodes offline before the job finishes so that we can inspect the system state (with long running jobs)... so we see this more often and it's somewhat painful to have to rerun to get a blue/green ball even if it succeeded.

          I think I've tracked this down to the "Timestamper" plugin (https://wiki.jenkins-ci.org/display/JENKINS/Timestamper).

          Test setup:

          • create a new job that just has one shell command 'sleep 30'

          Testing procedure:

          • start job
          • before job finishes, take node offline
          • see if job fails (it should succeed)

          This test procedure also fails on our production server (Jenkins 1.515 with Timestamper 1.5.3)

          A build of git head (1.518) on a virgin local installation does not have this bug. Installing the latest version of Timestamper causes the test to fail (the job causes NPE's just like on our production server).

          Andrew Erickson added a comment - I think I've tracked this down to the "Timestamper" plugin ( https://wiki.jenkins-ci.org/display/JENKINS/Timestamper ). Test setup: create a new job that just has one shell command 'sleep 30' Testing procedure: start job before job finishes, take node offline see if job fails (it should succeed) This test procedure also fails on our production server (Jenkins 1.515 with Timestamper 1.5.3) A build of git head (1.518) on a virgin local installation does not have this bug. Installing the latest version of Timestamper causes the test to fail (the job causes NPE's just like on our production server).

          Steven,

          Do you have any ideas on this?

          I've used Timestamper for awhile before this... so something seems to have changed in core that brought this about.

          Thanks,
          Andy

          Andrew Erickson added a comment - Steven, Do you have any ideas on this? I've used Timestamper for awhile before this... so something seems to have changed in core that brought this about. Thanks, Andy

          Marc Günther added a comment - - edited

          I don't know if this is related to my original report. But I just tried Andrews example on Jenkins 1.501, and it also happened there, but with a different stacktrace:

          + sleep 15
          Looks like the node went offline during the build. Check the slave log for the details.FATAL: null
          java.lang.NullPointerException
          	at hudson.plugins.timestamper.TimestampFormatter.<init>(TimestampFormatter.java:71)
          	at hudson.plugins.timestamper.TimestamperConfig$1.get(TimestamperConfig.java:59)
          	at hudson.plugins.timestamper.TimestamperConfig$1.get(TimestamperConfig.java:54)
          	at hudson.plugins.timestamper.TimestamperConfig.formatter(TimestamperConfig.java:148)
          	at hudson.plugins.timestamper.annotator.TimestampAnnotatorFactory.newInstance(TimestampAnnotatorFactory.java:51)
          	at hudson.console.ConsoleAnnotator._for(ConsoleAnnotator.java:143)
          	at hudson.console.ConsoleAnnotator.initial(ConsoleAnnotator.java:133)
          	at hudson.console.AnnotatedLargeText.createAnnotator(AnnotatedLargeText.java:140)
          	at hudson.console.AnnotatedLargeText.writeHtmlTo(AnnotatedLargeText.java:157)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:605)
          	at hudson.model.Run.execute(Run.java:1568)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          	at hudson.model.ResourceController.execute(ResourceController.java:88)
          	at hudson.model.Executor.run(Executor.java:236)
          

          In that test job, the "Add timestamps to the Console Output" is disabled.

          Marc Günther added a comment - - edited I don't know if this is related to my original report. But I just tried Andrews example on Jenkins 1.501, and it also happened there, but with a different stacktrace: + sleep 15 Looks like the node went offline during the build. Check the slave log for the details.FATAL: null java.lang.NullPointerException at hudson.plugins.timestamper.TimestampFormatter.<init>(TimestampFormatter.java:71) at hudson.plugins.timestamper.TimestamperConfig$1.get(TimestamperConfig.java:59) at hudson.plugins.timestamper.TimestamperConfig$1.get(TimestamperConfig.java:54) at hudson.plugins.timestamper.TimestamperConfig.formatter(TimestamperConfig.java:148) at hudson.plugins.timestamper.annotator.TimestampAnnotatorFactory.newInstance(TimestampAnnotatorFactory.java:51) at hudson.console.ConsoleAnnotator._for(ConsoleAnnotator.java:143) at hudson.console.ConsoleAnnotator.initial(ConsoleAnnotator.java:133) at hudson.console.AnnotatedLargeText.createAnnotator(AnnotatedLargeText.java:140) at hudson.console.AnnotatedLargeText.writeHtmlTo(AnnotatedLargeText.java:157) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:605) at hudson.model.Run.execute(Run.java:1568) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236) In that test job, the "Add timestamps to the Console Output" is disabled.

          Marc Günther added a comment -

          Ah, my last comment was using the Plugin version 1.5.2. I just upgraded to 1.5.3, and now my stack trace looks identical to Andrews:

          Looks like the node went offline during the build. Check the slave log for the details.FATAL: null
          java.lang.NullPointerException
          	at hudson.plugins.timestamper.annotator.TimestampAnnotatorFactory.getOffset(TimestampAnnotatorFactory.java:65)
          	at hudson.plugins.timestamper.annotator.TimestampAnnotatorFactory.newInstance(TimestampAnnotatorFactory.java:52)
          	at hudson.console.ConsoleAnnotator._for(ConsoleAnnotator.java:143)
          	at hudson.console.ConsoleAnnotator.initial(ConsoleAnnotator.java:133)
          	at hudson.console.AnnotatedLargeText.createAnnotator(AnnotatedLargeText.java:140)
          	at hudson.console.AnnotatedLargeText.writeHtmlTo(AnnotatedLargeText.java:157)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:605)
          	at hudson.model.Run.execute(Run.java:1568)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          	at hudson.model.ResourceController.execute(ResourceController.java:88)
          	at hudson.model.Executor.run(Executor.java:236)
          

          Marc Günther added a comment - Ah, my last comment was using the Plugin version 1.5.2. I just upgraded to 1.5.3, and now my stack trace looks identical to Andrews: Looks like the node went offline during the build. Check the slave log for the details.FATAL: null java.lang.NullPointerException at hudson.plugins.timestamper.annotator.TimestampAnnotatorFactory.getOffset(TimestampAnnotatorFactory.java:65) at hudson.plugins.timestamper.annotator.TimestampAnnotatorFactory.newInstance(TimestampAnnotatorFactory.java:52) at hudson.console.ConsoleAnnotator._for(ConsoleAnnotator.java:143) at hudson.console.ConsoleAnnotator.initial(ConsoleAnnotator.java:133) at hudson.console.AnnotatedLargeText.createAnnotator(AnnotatedLargeText.java:140) at hudson.console.AnnotatedLargeText.writeHtmlTo(AnnotatedLargeText.java:157) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:605) at hudson.model.Run.execute(Run.java:1568) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:236)

          This appears to be a dupe of https://issues.jenkins-ci.org/browse/JENKINS-16778

          v1.5.3 of Timestamper doesn't have this fix... like the changelog makes it look like. I guess it will be in 1.5.4.

          Andrew Erickson added a comment - This appears to be a dupe of https://issues.jenkins-ci.org/browse/JENKINS-16778 v1.5.3 of Timestamper doesn't have this fix... like the changelog makes it look like. I guess it will be in 1.5.4.

          duplicate, moving activity to JENKINS-16778

          Andrew Erickson added a comment - duplicate, moving activity to JENKINS-16778

            stevengbrown Steven G Brown
            marc_guenther Marc Günther
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: