Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-17976

Jenkins executors dies



    • Bug
    • Resolution: Duplicate
    • Major
    • core, fingerprint-plugin
    • None
    • OS: Linux unknown, amd64/64 (2 cores)
      Java: Java(TM) SE Runtime Environment, 1.7.0_03-b04
      JVM: Java HotSpot(TM) 64-Bit Server VM, 22.1-b02, mixed mode
      Server: Tomcat
      Jenkins: 1.512


      The problem is that all executors on all nodes die with this stack trace:

      at hudson.tasks.Fingerprinter$FingerprintAction.onLoad(Fingerprinter.java:346)
      at hudson.model.Run.onLoad(Run.java:319)
      at hudson.model.RunMap.retrieve(RunMap.java:226)
      at hudson.model.RunMap.retrieve(RunMap.java:59)
      at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:667)
      at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:629)
      at jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:368)
      at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:220)
      at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:103)
      at hudson.model.Job.getLastBuildsOverThreshold(Job.java:887)
      at hudson.model.Job.getEstimatedDuration(Job.java:894)
      at hudson.model.queue.MappingWorksheet.<init>(MappingWorksheet.java:320)
      at hudson.model.queue.MappingWorksheet.<init>(MappingWorksheet.java:303)
      at hudson.model.Queue.maintain(Queue.java:1035)
      at hudson.model.Queue.pop(Queue.java:863)
      at hudson.model.Executor.grabJob(Executor.java:285)
      at hudson.model.Executor.run(Executor.java:206)

      A simple restart of the executor is not working, restart of Jenkins do not help either. Instead the whole Tomcat server needs to be restarted.

      In the FingerprintAction where the NPE originates you can see these lines:

      345: public void onLoad() {
      346: if (build.getParent() == null)

      { 347: logger.warning("JENKINS-16845: broken FingerprintAction record"); 348: build = null; 349: return; 250: }

      When looking in the Tomcat logs:

      > grep JENKINS-16845 /path/to/catalina.out
      WARNING: JENKINS-16845: broken FingerprintAction record
      WARNING: JENKINS-16845: broken FingerprintAction record
      . . .

      So a possible reason is that onLoad has been called twice.

      I do not know if it has any relevance to this bug, but just before all executors died we had started a build of a particular job. The build could never start because there were no available executors. After Tomcat had been restarted we saw the job that had lost all Build History. The history was still there on the disc, but could not be seen on the job page. It is unclear if the history was gone before the restart. We had learned that a trick to get back the history is to rename the job and then rename again to the original name. After that we tried to build again and this time it worked.


        Issue Links



              marcsanfacon Marc Sanfacon
              jan_fagerstrom Jan Fagerström
              0 Vote for this issue
              1 Start watching this issue