FingerprintAction has the following field:

      private final AbstractBuild build;

      This field links to an AbstractBuild, which in turn has the following field:

      protected transient final JobT project;

      Once Jenkin persists a run to a file, the build from the FingerprintAction will be saved, but the project itself will not get saved. An example fragment from one of the build files looks like this:

          <hudson.tasks.Fingerprinter_-FingerprintAction>
            <build class="build">
              <actions>
                <hudson.model.ParametersAction reference="../../../../hudson.model.ParametersAction"/>
                <hudson.model.CauseAction reference="../../../../hudson.model.CauseAction"/>
                <hudson.plugins.copyartifact.CopyArtifact_-EnvAction reference="../../../../hudson.plugins.copyartifact.CopyArtifact_-EnvAction"/>
                <hudson.tasks.Fingerprinter_-FingerprintAction reference="../../.."/>
              </actions>
              <number>17</number>
              <startTime>1358250655935</startTime>
              <result>SUCCESS</result>
              <duration>4488</duration>
              <charset>UTF-8</charset>
              <keepLog>false</keepLog>
              <builtOn></builtOn>
              <workspace>/opt/hudson/files/jobs/Deploy/workspace</workspace>
              <hudsonVersion>1.494</hudsonVersion>
              <scm class="hudson.scm.NullChangeLogParser"/>
              <culprits class="com.google.common.collect.EmptyImmutableSortedSet"/>
            </build>
            ...
      

      When the files are read and the objects are created, the project field remains empty. During intialization, the onLoad() method is called:

          @Override
          protected R retrieve(File d) throws IOException {
              if(new File(d,"build.xml").exists()) {
                  // if the build result file isn't in the directory, ignore it.
                  try {
                      R b = cons.create(d);
                      b.onLoad();
                      if (LOGGER.isLoggable(FINE))
                          LOGGER.log(FINE,"Loaded " + b.getFullDisplayName(),new ThisIsHowItsLoaded());
                      return b;
                  } catch (IOException e) {
                      LOGGER.log(Level.WARNING, "could not load " + d, e);
                  } catch (InstantiationError e) {
                      LOGGER.log(Level.WARNING, "could not load " + d, e);
                  }
              }
              return null;
          }
      

      This method invokes onLoad() on all actions that implement RunAction:

              for (Action a : getActions())
                  if (a instanceof RunAction)
                      ((RunAction) a).onLoad();
      

      The FingerprintAction does implement the RunAction interface, and the method is implemented as follows:

              public void onLoad() {
                  // share data structure with nearby builds, but to keep lazy loading efficient,
                  // don't go back the history forever.
                  if (rand.nextInt(2)!=0) {
                      Run pb = build.getPreviousBuild();
                      if (pb!=null) {
                          FingerprintAction a = pb.getAction(FingerprintAction.class);
                          if (a!=null)
                              compact(a);
                      }
                  }
              }
      

      Build is set, so getPreviousBuild() can be called, this will however fail because the project is null (due to the transient field) and will throw a NullPointerException.

      This causes very strange behavior as sometimes pages are working, sometimes they are not. There are a lot of stack traces around that actually look like as if they were caused by the same problem, for example: https://issues.jenkins-ci.org/browse/JENKINS-16845

      Stack Trace looks like this:

      Caused by: java.lang.NullPointerException
      	at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:207)
      	at hudson.tasks.Fingerprinter$FingerprintAction.onLoad(Fingerprinter.java:349)
      	at hudson.model.Run.onLoad(Run.java:315)
      	at hudson.model.RunMap.retrieve(RunMap.java:221)
      	at hudson.model.RunMap.retrieve(RunMap.java:59)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:638)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:601)
      	at jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:344)
      	at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:207)
      	at hudson.model.AbstractBuild.getPreviousBuild(AbstractBuild.java:100)
      	at hudson.model.RunMap$1.next(RunMap.java:107)
      	at hudson.model.RunMap$1.next(RunMap.java:96)
      	at hudson.widgets.HistoryWidget.getRenderList(HistoryWidget.java:133)
      	... 122 more
      

          [JENKINS-17125] FingerprintAction deserialization leads to NPE

          We got or have XML files (build.xml) like the one you showed very often, however, there is a small number of build.xml files that looks like the one that I described. Isn't that related to how the whole memory object tree get's serialized to the file and also depends on whether the same objects are used or not to represent the builds?

          However, after we fixed the code the way I described, we never faced the issue again and Jenkins is working like charm now. We had the following issues before:

          • The build list on the left side when you go to a project was not showing correctly (sometimes it was empty, othertimes it was full, but when clicking "more builds", it got empty and did not load correctly)
          • The mails (success mails, failure mails, etc... for builds) were not sent correctly. We got the mentioned exception during the build.
          • The build pipeline plugin was not showing correctly and we saw the same exception in the log files.

          Dominik Bieringer added a comment - We got or have XML files (build.xml) like the one you showed very often, however, there is a small number of build.xml files that looks like the one that I described. Isn't that related to how the whole memory object tree get's serialized to the file and also depends on whether the same objects are used or not to represent the builds? However, after we fixed the code the way I described, we never faced the issue again and Jenkins is working like charm now. We had the following issues before: The build list on the left side when you go to a project was not showing correctly (sometimes it was empty, othertimes it was full, but when clicking "more builds", it got empty and did not load correctly) The mails (success mails, failure mails, etc... for builds) were not sent correctly. We got the mentioned exception during the build. The build pipeline plugin was not showing correctly and we saw the same exception in the log files.

          Jesse Glick added a comment -

          Right, I think the busted serialization format is probably the result of a BuildReference being cleared and a different AbstractBuild being created than the FingerprintAction was originally produced with. So there are two problems:

          1. Making sure that does not happen.
          2. Recovering gracefully from the broken format.

          For the second part, your proposed patch is unacceptable (I think) because it breaks the general RunAction contract. I am working on a different patch that acknowledges that the FingerprintAction is incomplete and just limits the damage (no exceptions printed, most things with the fingerprints work).

          Not sure what to do about the first part. Seems like a fundamental flaw. (Though I am unable to reproduce it even by making AbstractLazyLoadRunMap.unwrap usually return null.) The best I can think of is to cache all the RunAction instances in a build using strong references so it is guaranteed they will not be collected.

          Jesse Glick added a comment - Right, I think the busted serialization format is probably the result of a BuildReference being cleared and a different AbstractBuild being created than the FingerprintAction was originally produced with. So there are two problems: Making sure that does not happen. Recovering gracefully from the broken format. For the second part, your proposed patch is unacceptable (I think) because it breaks the general RunAction contract. I am working on a different patch that acknowledges that the FingerprintAction is incomplete and just limits the damage (no exceptions printed, most things with the fingerprints work). Not sure what to do about the first part. Seems like a fundamental flaw. (Though I am unable to reproduce it even by making AbstractLazyLoadRunMap.unwrap usually return null.) The best I can think of is to cache all the RunAction instances in a build using strong references so it is guaranteed they will not be collected.

          Jesse Glick added a comment -

          Another fix in FingerprintAction would be to make the current build field transient and introduce another field based on Run.getExternalizableId and .fromExternalizableId, which would avoid both XStream object graph trickery and problems with collection of build records. But this would be a change in the documented recommendations for persistence of a RunAction so I think Kohsuke should think about this.

          Jesse Glick added a comment - Another fix in FingerprintAction would be to make the current build field transient and introduce another field based on Run.getExternalizableId and .fromExternalizableId , which would avoid both XStream object graph trickery and problems with collection of build records. But this would be a change in the documented recommendations for persistence of a RunAction so I think Kohsuke should think about this.

          Jesse Glick added a comment -

          Better description in this issue than in JENKINS-16845.

          Jesse Glick added a comment - Better description in this issue than in JENKINS-16845 .

          Code changed in jenkins
          User: Jesse Glick
          Path:
          test/src/test/java/hudson/tasks/FingerprinterTest.java
          test/src/test/resources/hudson/tasks/FingerprinterTest/actionSerialization.zip
          http://jenkins-ci.org/commit/jenkins/bd709b0631329f1abaf05de4a3499562a1606691
          Log:
          JENKINS-17125 Establishing baseline FingerprintAction behavior that should not be regressed.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/test/java/hudson/tasks/FingerprinterTest.java test/src/test/resources/hudson/tasks/FingerprinterTest/actionSerialization.zip http://jenkins-ci.org/commit/jenkins/bd709b0631329f1abaf05de4a3499562a1606691 Log: JENKINS-17125 Establishing baseline FingerprintAction behavior that should not be regressed.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          changelog.html
          core/src/main/java/hudson/model/CauseAction.java
          core/src/main/java/hudson/model/Run.java
          core/src/main/java/hudson/model/RunAction.java
          core/src/main/java/hudson/tasks/Fingerprinter.java
          core/src/main/java/jenkins/model/RunAction2.java
          core/src/main/resources/hudson/tasks/Fingerprinter/FingerprintAction/index.jelly
          http://jenkins-ci.org/commit/jenkins/a614cd5b05b3c8cbcb8970ea439b2a1315252f58
          Log:
          [FIXED JENKINS-17125] FingerprintAction no longer need persist the build field thanks to new RunAction2.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html core/src/main/java/hudson/model/CauseAction.java core/src/main/java/hudson/model/Run.java core/src/main/java/hudson/model/RunAction.java core/src/main/java/hudson/tasks/Fingerprinter.java core/src/main/java/jenkins/model/RunAction2.java core/src/main/resources/hudson/tasks/Fingerprinter/FingerprintAction/index.jelly http://jenkins-ci.org/commit/jenkins/a614cd5b05b3c8cbcb8970ea439b2a1315252f58 Log: [FIXED JENKINS-17125] FingerprintAction no longer need persist the build field thanks to new RunAction2.

          dogfood added a comment -

          Integrated in jenkins_main_trunk #2575
          JENKINS-17125 Establishing baseline FingerprintAction behavior that should not be regressed. (Revision bd709b0631329f1abaf05de4a3499562a1606691)
          [FIXED JENKINS-17125] FingerprintAction no longer need persist the build field thanks to new RunAction2. (Revision a614cd5b05b3c8cbcb8970ea439b2a1315252f58)

          Result = SUCCESS
          Jesse Glick : bd709b0631329f1abaf05de4a3499562a1606691
          Files :

          • test/src/test/java/hudson/tasks/FingerprinterTest.java
          • test/src/test/resources/hudson/tasks/FingerprinterTest/actionSerialization.zip

          Jesse Glick : a614cd5b05b3c8cbcb8970ea439b2a1315252f58
          Files :

          • core/src/main/java/hudson/model/RunAction.java
          • core/src/main/resources/hudson/tasks/Fingerprinter/FingerprintAction/index.jelly
          • core/src/main/java/hudson/model/Run.java
          • changelog.html
          • core/src/main/java/hudson/tasks/Fingerprinter.java
          • core/src/main/java/jenkins/model/RunAction2.java
          • core/src/main/java/hudson/model/CauseAction.java

          dogfood added a comment - Integrated in jenkins_main_trunk #2575 JENKINS-17125 Establishing baseline FingerprintAction behavior that should not be regressed. (Revision bd709b0631329f1abaf05de4a3499562a1606691) [FIXED JENKINS-17125] FingerprintAction no longer need persist the build field thanks to new RunAction2. (Revision a614cd5b05b3c8cbcb8970ea439b2a1315252f58) Result = SUCCESS Jesse Glick : bd709b0631329f1abaf05de4a3499562a1606691 Files : test/src/test/java/hudson/tasks/FingerprinterTest.java test/src/test/resources/hudson/tasks/FingerprinterTest/actionSerialization.zip Jesse Glick : a614cd5b05b3c8cbcb8970ea439b2a1315252f58 Files : core/src/main/java/hudson/model/RunAction.java core/src/main/resources/hudson/tasks/Fingerprinter/FingerprintAction/index.jelly core/src/main/java/hudson/model/Run.java changelog.html core/src/main/java/hudson/tasks/Fingerprinter.java core/src/main/java/jenkins/model/RunAction2.java core/src/main/java/hudson/model/CauseAction.java

          Linards L added a comment -

          Fix Targeting 1.519 ?

          Linards L added a comment - Fix Targeting 1.519 ?

          Jesse Glick added a comment -

          Yes, should be in 1.519 I think. Note that this “true” fix changes the serial format of builds going forward: builds created in 1.519+ will not load well in 1.518-, but builds created in any version (even those suffering from the rare condition that this issue reports) should be loaded cleanly in 1.519+ (and their format upgraded if you somehow edit the build, e.g. setting a description).

          Jesse Glick added a comment - Yes, should be in 1.519 I think. Note that this “true” fix changes the serial format of builds going forward: builds created in 1.519+ will not load well in 1.518-, but builds created in any version (even those suffering from the rare condition that this issue reports) should be loaded cleanly in 1.519+ (and their format upgraded if you somehow edit the build, e.g. setting a description).

          Jesse Glick added a comment -

          Compare https://github.com/jenkinsci/jobConfigHistory-plugin/pull/17 and probably others need to be fixed too: hudson.tasks.junit.TestResultAction, org.jenkinsci.plugins.envinject.EnvInjectPluginAction, etc.

          Jesse Glick added a comment - Compare https://github.com/jenkinsci/jobConfigHistory-plugin/pull/17 and probably others need to be fixed too: hudson.tasks.junit.TestResultAction , org.jenkinsci.plugins.envinject.EnvInjectPluginAction , etc.

            jglick Jesse Glick
            homes2001 Dominik Bieringer
            Votes:
            2 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: