Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-22395

Run.delete (from LogRotator) failing with "...looks to have already been deleted"

      Sometimes "log rotation" fails with an exception like

      SEVERE  hudson.model.Run#execute: Failed to rotate log
      java.io.IOException: .../jobs/.../modules/...$.../builds/2013-... looks to have already been deleted
          at hudson.model.Run.delete(Run.java:1432)
          at hudson.maven.MavenModuleSetBuild.delete(MavenModuleSetBuild.java:420)
          at hudson.tasks.LogRotator.perform(LogRotator.java:136)
          at hudson.model.Job.logRotate(Job.java:437)
          at hudson.maven.MavenModuleSet.logRotate(MavenModuleSet.java:851)
          at hudson.model.Run.execute(Run.java:1728)
          at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:509)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:246)
      

      Usually MavenModuleSetBuild is involved, perhaps suggesting a problem with deleted or skipped module builds (I cannot reproduce in either scenario), though I think I have also seen this happen for freestyle builds. Unclear whether the directory actually exists but File.isDirectory thinks it does not (the code originally tried to delete the dir without checking this and failed with "... is in use"); or whether the directory was actually deleted earlier but this Run was not cleaned up properly for some reason.

      Hypothesis: AbstractLazyLoadRunMap.idOnDisk is not removed by removeValue, called from AbstractProject.removeRun (from Run.delete). Perhaps something is later resurrecting the AbstractBuild from idOnDisk, making it again available for another round of log rotation? But if the actual directory was deleted, it is hard to see how: load would just fail.

      Or perhaps Run.delete is the race condition: the run is removed from the parent after its directory has been deleted. The method is synchronized, but that does not help if two copies of the Run exist, which might happen due to other lazy-loading bugs.

      Diagnostics added to date:

          [JENKINS-22395] Run.delete (from LogRotator) failing with "...looks to have already been deleted"

          Jesse Glick added a comment -

          Not to be confused with JENKINS-19377, which had a similar stack trace but was easily reproducible and limited to external monitor jobs.

          Jesse Glick added a comment - Not to be confused with JENKINS-19377 , which had a similar stack trace but was easily reproducible and limited to external monitor jobs.

          Jesse Glick added a comment - Diagnostic build: http://repo.jenkins-ci.org/public/org/jenkins-ci/main/jenkins-war/1.532.2.JENKINS-22395-diag/jenkins-war-1.532.2.JENKINS-22395-diag.war

          Jesse Glick added a comment -

          Enhanced logging suggests that when MavenModuleSetBuild.delete leads to this error, it is deleting a MavenModuleBuild with the same number (successfully), and then (via getModuleBuilds) one with a later number that had already been deleted during an earlier MavenModuleSetBuild.delete. In other words, AbstractBuild.getNextBuild is returning a build which was already (recently) deleted from its parent! That suggests that dropLinks failed to actually drop the link from the older build—perhaps because the referent of previousBuildR in the newer build was null, perhaps because the older build was collected and then recreated.

          Jesse Glick added a comment - Enhanced logging suggests that when MavenModuleSetBuild.delete leads to this error, it is deleting a MavenModuleBuild with the same number (successfully), and then (via getModuleBuilds ) one with a later number that had already been deleted during an earlier MavenModuleSetBuild.delete . In other words, AbstractBuild.getNextBuild is returning a build which was already (recently) deleted from its parent! That suggests that dropLinks failed to actually drop the link from the older build—perhaps because the referent of previousBuildR in the newer build was null, perhaps because the older build was collected and then recreated.

          Jesse Glick added a comment -

          Jesse Glick added a comment - Attaching patch of new backport of purported fix (as well as diagnostics): http://repo.jenkins-ci.org/public/org/jenkins-ci/main/jenkins-war/1.532.3.JENKINS-22395/jenkins-war-1.532.3.JENKINS-22395.war

          Jesse Glick added a comment - Build of https://github.com/jglick/jenkins/compare/jenkinsci:jenkins-1.532.3...JENKINS-22395-diag available: http://repo.jenkins-ci.org/public/org/jenkins-ci/main/jenkins-war/1.532.3.JENKINS-22395-2/jenkins-war-1.532.3.JENKINS-22395-2.war

          Code changed in jenkins
          User: Jesse Glick
          Path:
          core/src/main/java/hudson/model/Run.java
          http://jenkins-ci.org/commit/jenkins/c1fcb08e08c3ccae1451602af8ea20788dc9950e
          Log:
          JENKINS-17236 Run.deleteArtifacts needs to go through ArtifactManager.
          (Also relevant to JENKINS-22395 diagnostics.)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/hudson/model/Run.java http://jenkins-ci.org/commit/jenkins/c1fcb08e08c3ccae1451602af8ea20788dc9950e Log: JENKINS-17236 Run.deleteArtifacts needs to go through ArtifactManager. (Also relevant to JENKINS-22395 diagnostics.)

          Code changed in jenkins
          User: Jesse Glick
          Path:
          core/src/main/java/hudson/model/Run.java
          core/src/main/java/hudson/tasks/LogRotator.java
          core/src/main/java/jenkins/model/StandardArtifactManager.java
          http://jenkins-ci.org/commit/jenkins/3ef8063e43640952442de90e034dadba2ada337c
          Log:
          JENKINS-22395 Yet more diagnostics.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/hudson/model/Run.java core/src/main/java/hudson/tasks/LogRotator.java core/src/main/java/jenkins/model/StandardArtifactManager.java http://jenkins-ci.org/commit/jenkins/3ef8063e43640952442de90e034dadba2ada337c Log: JENKINS-22395 Yet more diagnostics.

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3347
          JENKINS-22395 Yet more diagnostics. (Revision 3ef8063e43640952442de90e034dadba2ada337c)

          Result = SUCCESS
          Jesse Glick : 3ef8063e43640952442de90e034dadba2ada337c
          Files :

          • core/src/main/java/jenkins/model/StandardArtifactManager.java
          • core/src/main/java/hudson/model/Run.java
          • core/src/main/java/hudson/tasks/LogRotator.java

          dogfood added a comment - Integrated in jenkins_main_trunk #3347 JENKINS-22395 Yet more diagnostics. (Revision 3ef8063e43640952442de90e034dadba2ada337c) Result = SUCCESS Jesse Glick : 3ef8063e43640952442de90e034dadba2ada337c Files : core/src/main/java/jenkins/model/StandardArtifactManager.java core/src/main/java/hudson/model/Run.java core/src/main/java/hudson/tasks/LogRotator.java

          Code changed in jenkins
          User: Jesse Glick
          Path:
          core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
          test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
          http://jenkins-ci.org/commit/jenkins/f1430a25ec5481b3635d6cf4fb7ea57a890e50f0
          Log:
          [FIXED JENKINS-22395] Attempt to double-delete a Run probably caused by getNextBuild sometimes returning a build which was already deleted.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/f1430a25ec5481b3635d6cf4fb7ea57a890e50f0 Log: [FIXED JENKINS-22395] Attempt to double-delete a Run probably caused by getNextBuild sometimes returning a build which was already deleted.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html http://jenkins-ci.org/commit/jenkins/ce9d47774e0eb67129099268f047f2d64f4667a7 Log: JENKINS-22395 Noting merge of #1190. Compare: https://github.com/jenkinsci/jenkins/compare/0761c7795d16...ce9d47774e0e

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3360
          [FIXED JENKINS-22395] Attempt to double-delete a Run probably caused by getNextBuild sometimes returning a build which was already deleted. (Revision f1430a25ec5481b3635d6cf4fb7ea57a890e50f0)
          JENKINS-22395 Noting merge of #1190. (Revision ce9d47774e0eb67129099268f047f2d64f4667a7)

          Result = SUCCESS
          Jesse Glick : f1430a25ec5481b3635d6cf4fb7ea57a890e50f0
          Files :

          • test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
          • core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java

          Jesse Glick : ce9d47774e0eb67129099268f047f2d64f4667a7
          Files :

          • changelog.html

          dogfood added a comment - Integrated in jenkins_main_trunk #3360 [FIXED JENKINS-22395] Attempt to double-delete a Run probably caused by getNextBuild sometimes returning a build which was already deleted. (Revision f1430a25ec5481b3635d6cf4fb7ea57a890e50f0) JENKINS-22395 Noting merge of #1190. (Revision ce9d47774e0eb67129099268f047f2d64f4667a7) Result = SUCCESS Jesse Glick : f1430a25ec5481b3635d6cf4fb7ea57a890e50f0 Files : test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java Jesse Glick : ce9d47774e0eb67129099268f047f2d64f4667a7 Files : changelog.html

          Jesse Glick added a comment -

          Attempted fix did not work.

          Jesse Glick added a comment - Attempted fix did not work.

          Daniel Beck added a comment -

          A comment posted in (otherwise unrelated) JENKINS-16805 indicates this can happen after Jenkins fails to delete a renamed build directory once.

          Hypothesis: Run.delete() renames the build dir from DATE to .DATE first, then deletes that dir. If deleting it fails twice (e.g. due to file locking) Jenkins does not remove the build from the project in memory. Any subsequent deletion attempts would then fail to find the DATE build dir and print that message.

          In this case, reloading from disk resolved the issue (as the renamed dir no longer matches the required pattern).

          Daniel Beck added a comment - A comment posted in (otherwise unrelated) JENKINS-16805 indicates this can happen after Jenkins fails to delete a renamed build directory once. Hypothesis: Run.delete() renames the build dir from DATE to .DATE first, then deletes that dir. If deleting it fails twice (e.g. due to file locking) Jenkins does not remove the build from the project in memory. Any subsequent deletion attempts would then fail to find the DATE build dir and print that message. In this case, reloading from disk resolved the issue (as the renamed dir no longer matches the required pattern).

          Jesse Glick added a comment -

          Maybe such a problem also exists, but that is not the bug I am tracking. In the case reported to me, Run.delete evidently succeeds, but the Run is still returned from getNextBuild of its immediate predecessor a moment later, which should never happen. (getNextBuild should only return a nondeleted build, or null if this was the most recent nondeleted build.) The reported exception then follows because in the case of a Maven module set, there can be multiple builds of a given module associated with one build of the module set—the build of matching number, and then any succeeding builds up until some cutoff—and all are deleted when the module set build is deleted in the course of routine LogRotator cleanup.

          Jesse Glick added a comment - Maybe such a problem also exists, but that is not the bug I am tracking. In the case reported to me, Run.delete evidently succeeds, but the Run is still returned from getNextBuild of its immediate predecessor a moment later, which should never happen. ( getNextBuild should only return a nondeleted build, or null if this was the most recent nondeleted build.) The reported exception then follows because in the case of a Maven module set, there can be multiple builds of a given module associated with one build of the module set—the build of matching number, and then any succeeding builds up until some cutoff—and all are deleted when the module set build is deleted in the course of routine LogRotator cleanup.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
          http://jenkins-ci.org/commit/jenkins/53e95ee517259e1e10d7209bcbaa4314397c1f40
          Log:
          JENKINS-22395

          I'm not sure if I understand how the original fix in PR #1190 (f1430a2) addresses the problem.

          My understanding is of the problem is as follows:

          • someone deletes 'b2' but holds on to its reference (could be different threads)
          • someone calls b2.getPreviousBuild()
          • if b2.previousBuildR is null, then this triggers the loading of b1, and
            that process sets up a bi-di link via b1<>b2 via b1.nextBuildR <> b2.previousBuildR
          • this makes b1.getNextBuild() incorrectly return b2

          Presumably f1430a2 addresses this somehow, but I think I can induce this situation in other ways,
          which is what dropLinksAfterGC2() does.

          In this test,

          initial state:

          b1 <> b2 <> b3

          we load everyone and connect them all up

          after deleting b2:

          b1 <--------> b3
          ^ ^
          ---- b2 ----

          b1 and b3 points each other, and b2 still refers to each side

          after dropping b1:
          b2 --> b3

          now, here, when I do b2.getPreviousBuild(), it loads b1a and it sets b1a.nextBuildR to b2.

          b1a <-> b2 --> b3

          So I claim this is a proof that the fix is incomplete, even for the problem JENKINS-22395 has discovered.

          I don't think that the problem is for the dropLinks call to fail to patch up references.
          The problem is that b2.getPreviousBuild() forcing b1 to point to b2, because if b2 is deleted and assumed to be
          invalid, then no matter what bX this method will find you never want bX.nextBuildR to point to b2.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/53e95ee517259e1e10d7209bcbaa4314397c1f40 Log: JENKINS-22395 I'm not sure if I understand how the original fix in PR #1190 (f1430a2) addresses the problem. My understanding is of the problem is as follows: someone deletes 'b2' but holds on to its reference (could be different threads) someone calls b2.getPreviousBuild() if b2.previousBuildR is null, then this triggers the loading of b1, and that process sets up a bi-di link via b1< >b2 via b1.nextBuildR < > b2.previousBuildR this makes b1.getNextBuild() incorrectly return b2 Presumably f1430a2 addresses this somehow, but I think I can induce this situation in other ways, which is what dropLinksAfterGC2() does. In this test, initial state: b1 < > b2 < > b3 we load everyone and connect them all up after deleting b2: b1 <--------> b3 ^ ^ ---- b2 ---- b1 and b3 points each other, and b2 still refers to each side after dropping b1: b2 --> b3 now, here, when I do b2.getPreviousBuild(), it loads b1a and it sets b1a.nextBuildR to b2. b1a <-> b2 --> b3 So I claim this is a proof that the fix is incomplete, even for the problem JENKINS-22395 has discovered. I don't think that the problem is for the dropLinks call to fail to patch up references. The problem is that b2.getPreviousBuild() forcing b1 to point to b2, because if b2 is deleted and assumed to be invalid, then no matter what bX this method will find you never want bX.nextBuildR to point to b2.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
          http://jenkins-ci.org/commit/jenkins/7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82
          Log:
          JENKINS-22395 correcting the control test

          As I step-executed the code, I discovered b2.getPreviousBuild() was getting invoked
          between BRHF.drop(b1) and b2.dropLinks() call, in PeepholePermalink.RunListenerImpl.onDeleted()
          because Run.delete() calls RunListener.fireDeleted(this).

          Thus in effect the sequence of the call order was as follows:
          assertEquals(1, BRHF.drop(b1));
          b2.getPreviousBuild(); // happens indirectly in PeepholePermalink
          b2.delete();
          FreeStyleBuild b1a = b2.getPreviousBuild();

          This defeats the purpose of the fix, because in this call sequence, by b2.dropLinks()
          as implemented before f1430a2 will correctly fix up b1a.nextBuildR to b3.

          For the test to work as intended, it is important that b2.previousBuildR.get()==null
          during dropLinks(). That is what causes b2.getPreviousBuild() in the next line to go
          load b1a, and sets up b1a.nextBuildR to b2, and trigger the assertion error.

          Added a code to remove all RunListeners. With this change, the test now fails without
          the fix in f1430a2.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82 Log: JENKINS-22395 correcting the control test As I step-executed the code, I discovered b2.getPreviousBuild() was getting invoked between BRHF.drop(b1) and b2.dropLinks() call, in PeepholePermalink.RunListenerImpl.onDeleted() because Run.delete() calls RunListener.fireDeleted(this). Thus in effect the sequence of the call order was as follows: assertEquals(1, BRHF.drop(b1)); b2.getPreviousBuild(); // happens indirectly in PeepholePermalink b2.delete(); FreeStyleBuild b1a = b2.getPreviousBuild(); This defeats the purpose of the fix, because in this call sequence, by b2.dropLinks() as implemented before f1430a2 will correctly fix up b1a.nextBuildR to b3. For the test to work as intended, it is important that b2.previousBuildR.get()==null during dropLinks(). That is what causes b2.getPreviousBuild() in the next line to go load b1a, and sets up b1a.nextBuildR to b2, and trigger the assertion error. Added a code to remove all RunListeners. With this change, the test now fails without the fix in f1430a2.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          core/src/main/java/jenkins/model/lazy/BuildReference.java
          core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
          http://jenkins-ci.org/commit/jenkins/b6226ad2d1a332cb661ceb5c5f5b673771118e14
          Log:
          [FIXED JENKINS-22395] redoing the fix in f1430a2

          Based on the last few commits, I proved that the original fix in f1430a2
          doesn't really address the problem.

          That is, once b2 is deleted, and after sufficient garbage collection,
          we can make b2.previousBuild.get() be null, and then
          b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2.

          In this commit, I roll back that part of f1430a2, and then fix the
          problem differently.

          I started thinking that the main problem we are trying to fix here
          is that the deleted build object should be unreferenceable. That is,
          it should behave almost as if the object has already been GCed.
          The easiest way to do this is to clear a BuildReference object,
          since we always use the same BuildReference object for all inbound
          references.

          This change allows us to clear BuildReference. Code like
          b2.getPreviousBuild() will continue to try to update
          b1.nextBuildR to b2, but it will only end up wiping out the field,
          only to have b1.getNextBuild() recompute the correct value.

          This fix makes both test cases pass in LazyBuildMixInTest.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/jenkins/model/lazy/BuildReference.java core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/b6226ad2d1a332cb661ceb5c5f5b673771118e14 Log: [FIXED JENKINS-22395] redoing the fix in f1430a2 Based on the last few commits, I proved that the original fix in f1430a2 doesn't really address the problem. That is, once b2 is deleted, and after sufficient garbage collection, we can make b2.previousBuild.get() be null, and then b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2. In this commit, I roll back that part of f1430a2, and then fix the problem differently. I started thinking that the main problem we are trying to fix here is that the deleted build object should be unreferenceable. That is, it should behave almost as if the object has already been GCed. The easiest way to do this is to clear a BuildReference object, since we always use the same BuildReference object for all inbound references. This change allows us to clear BuildReference. Code like b2.getPreviousBuild() will continue to try to update b1.nextBuildR to b2, but it will only end up wiping out the field, only to have b1.getNextBuild() recompute the correct value. This fix makes both test cases pass in LazyBuildMixInTest.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
          http://jenkins-ci.org/commit/jenkins/aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece
          Log:
          JENKINS-22395 more diagnostics

          Just in case the previous fix didn't address the root cause of ZD-11985 (and given the time it takes for changes like this to land in LTS), I'm adding additional diagnostics that let us track how previous/next builds are getting discovered

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece Log: JENKINS-22395 more diagnostics Just in case the previous fix didn't address the root cause of ZD-11985 (and given the time it takes for changes like this to land in LTS), I'm adding additional diagnostics that let us track how previous/next builds are getting discovered

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3507
          JENKINS-22395 (Revision 53e95ee517259e1e10d7209bcbaa4314397c1f40)
          JENKINS-22395 correcting the control test (Revision 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82)
          [FIXED JENKINS-22395] redoing the fix in f1430a2 (Revision b6226ad2d1a332cb661ceb5c5f5b673771118e14)

          Result = SUCCESS
          kohsuke : 53e95ee517259e1e10d7209bcbaa4314397c1f40
          Files :

          • test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java

          kohsuke : 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82
          Files :

          • test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java

          kohsuke : b6226ad2d1a332cb661ceb5c5f5b673771118e14
          Files :

          • core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
          • core/src/main/java/jenkins/model/lazy/BuildReference.java

          dogfood added a comment - Integrated in jenkins_main_trunk #3507 JENKINS-22395 (Revision 53e95ee517259e1e10d7209bcbaa4314397c1f40) JENKINS-22395 correcting the control test (Revision 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82) [FIXED JENKINS-22395] redoing the fix in f1430a2 (Revision b6226ad2d1a332cb661ceb5c5f5b673771118e14) Result = SUCCESS kohsuke : 53e95ee517259e1e10d7209bcbaa4314397c1f40 Files : test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java kohsuke : 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82 Files : test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java kohsuke : b6226ad2d1a332cb661ceb5c5f5b673771118e14 Files : core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java core/src/main/java/jenkins/model/lazy/BuildReference.java

          Daniel Beck added a comment -

          Is this a suitable lts-candidate, or too dangerous?

          Daniel Beck added a comment - Is this a suitable lts-candidate , or too dangerous?

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3508
          JENKINS-22395 more diagnostics (Revision aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece)

          Result = SUCCESS
          kohsuke : aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece
          Files :

          • core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java

          dogfood added a comment - Integrated in jenkins_main_trunk #3508 JENKINS-22395 more diagnostics (Revision aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece) Result = SUCCESS kohsuke : aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece Files : core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java

          Code changed in jenkins
          User: Jesse Glick
          Path:
          test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
          http://jenkins-ci.org/commit/jenkins/b7ec8578f938864af9c8dbfb73a54d67f972d694
          Log:
          JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC.
          Confirmed that dropLinksAfterGC and dropLinksAfterGC2 both fail in the expected way (b1a.nextBuild == b2) after commenting out the call to createReference().clear() in dropLinks.
          (Also that they fail as expected in assertNotSame if the reference is not cleared at all.)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/b7ec8578f938864af9c8dbfb73a54d67f972d694 Log: JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC. Confirmed that dropLinksAfterGC and dropLinksAfterGC2 both fail in the expected way (b1a.nextBuild == b2) after commenting out the call to createReference().clear() in dropLinks. (Also that they fail as expected in assertNotSame if the reference is not cleared at all.)

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3509
          JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC. (Revision b7ec8578f938864af9c8dbfb73a54d67f972d694)

          Result = SUCCESS
          Jesse Glick : b7ec8578f938864af9c8dbfb73a54d67f972d694
          Files :

          • test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java

          dogfood added a comment - Integrated in jenkins_main_trunk #3509 JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC. (Revision b7ec8578f938864af9c8dbfb73a54d67f972d694) Result = SUCCESS Jesse Glick : b7ec8578f938864af9c8dbfb73a54d67f972d694 Files : test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java

          Code changed in jenkins
          User: Jesse Glick
          Path:
          cli/pom.xml
          core/pom.xml
          plugins/pom.xml
          pom.xml
          test/pom.xml
          war/pom.xml
          http://jenkins-ci.org/commit/jenkins/87df95af49f290cd16f6179d70bc05df411c91ac
          Log:
          Setting version: 1.565.1.JENKINS-22395-dropLinks

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: cli/pom.xml core/pom.xml plugins/pom.xml pom.xml test/pom.xml war/pom.xml http://jenkins-ci.org/commit/jenkins/87df95af49f290cd16f6179d70bc05df411c91ac Log: Setting version: 1.565.1. JENKINS-22395 -dropLinks

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
          http://jenkins-ci.org/commit/jenkins/4910863dbdb944f249fa86c4ca583e53c736ed69
          Log:
          JENKINS-22395

          I'm not sure if I understand how the original fix in PR #1190 (f1430a2) addresses the problem.

          My understanding is of the problem is as follows:

          • someone deletes 'b2' but holds on to its reference (could be different threads)
          • someone calls b2.getPreviousBuild()
          • if b2.previousBuildR is null, then this triggers the loading of b1, and
            that process sets up a bi-di link via b1<>b2 via b1.nextBuildR <> b2.previousBuildR
          • this makes b1.getNextBuild() incorrectly return b2

          Presumably f1430a2 addresses this somehow, but I think I can induce this situation in other ways,
          which is what dropLinksAfterGC2() does.

          In this test,

          initial state:

          b1 <> b2 <> b3

          we load everyone and connect them all up

          after deleting b2:

          b1 <--------> b3
          ^ ^
          ---- b2 ----

          b1 and b3 points each other, and b2 still refers to each side

          after dropping b1:
          b2 --> b3

          now, here, when I do b2.getPreviousBuild(), it loads b1a and it sets b1a.nextBuildR to b2.

          b1a <-> b2 --> b3

          So I claim this is a proof that the fix is incomplete, even for the problem JENKINS-22395 has discovered.

          I don't think that the problem is for the dropLinks call to fail to patch up references.
          The problem is that b2.getPreviousBuild() forcing b1 to point to b2, because if b2 is deleted and assumed to be
          invalid, then no matter what bX this method will find you never want bX.nextBuildR to point to b2.

          (cherry picked from commit 53e95ee517259e1e10d7209bcbaa4314397c1f40)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/4910863dbdb944f249fa86c4ca583e53c736ed69 Log: JENKINS-22395 I'm not sure if I understand how the original fix in PR #1190 (f1430a2) addresses the problem. My understanding is of the problem is as follows: someone deletes 'b2' but holds on to its reference (could be different threads) someone calls b2.getPreviousBuild() if b2.previousBuildR is null, then this triggers the loading of b1, and that process sets up a bi-di link via b1< >b2 via b1.nextBuildR < > b2.previousBuildR this makes b1.getNextBuild() incorrectly return b2 Presumably f1430a2 addresses this somehow, but I think I can induce this situation in other ways, which is what dropLinksAfterGC2() does. In this test, initial state: b1 < > b2 < > b3 we load everyone and connect them all up after deleting b2: b1 <--------> b3 ^ ^ ---- b2 ---- b1 and b3 points each other, and b2 still refers to each side after dropping b1: b2 --> b3 now, here, when I do b2.getPreviousBuild(), it loads b1a and it sets b1a.nextBuildR to b2. b1a <-> b2 --> b3 So I claim this is a proof that the fix is incomplete, even for the problem JENKINS-22395 has discovered. I don't think that the problem is for the dropLinks call to fail to patch up references. The problem is that b2.getPreviousBuild() forcing b1 to point to b2, because if b2 is deleted and assumed to be invalid, then no matter what bX this method will find you never want bX.nextBuildR to point to b2. (cherry picked from commit 53e95ee517259e1e10d7209bcbaa4314397c1f40)

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
          http://jenkins-ci.org/commit/jenkins/02e9f36dac05e533f7dc28325def85d4d937fb9e
          Log:
          JENKINS-22395 correcting the control test

          As I step-executed the code, I discovered b2.getPreviousBuild() was getting invoked
          between BRHF.drop(b1) and b2.dropLinks() call, in PeepholePermalink.RunListenerImpl.onDeleted()
          because Run.delete() calls RunListener.fireDeleted(this).

          Thus in effect the sequence of the call order was as follows:
          assertEquals(1, BRHF.drop(b1));
          b2.getPreviousBuild(); // happens indirectly in PeepholePermalink
          b2.delete();
          FreeStyleBuild b1a = b2.getPreviousBuild();

          This defeats the purpose of the fix, because in this call sequence, by b2.dropLinks()
          as implemented before f1430a2 will correctly fix up b1a.nextBuildR to b3.

          For the test to work as intended, it is important that b2.previousBuildR.get()==null
          during dropLinks(). That is what causes b2.getPreviousBuild() in the next line to go
          load b1a, and sets up b1a.nextBuildR to b2, and trigger the assertion error.

          Added a code to remove all RunListeners. With this change, the test now fails without
          the fix in f1430a2.

          (cherry picked from commit 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/02e9f36dac05e533f7dc28325def85d4d937fb9e Log: JENKINS-22395 correcting the control test As I step-executed the code, I discovered b2.getPreviousBuild() was getting invoked between BRHF.drop(b1) and b2.dropLinks() call, in PeepholePermalink.RunListenerImpl.onDeleted() because Run.delete() calls RunListener.fireDeleted(this). Thus in effect the sequence of the call order was as follows: assertEquals(1, BRHF.drop(b1)); b2.getPreviousBuild(); // happens indirectly in PeepholePermalink b2.delete(); FreeStyleBuild b1a = b2.getPreviousBuild(); This defeats the purpose of the fix, because in this call sequence, by b2.dropLinks() as implemented before f1430a2 will correctly fix up b1a.nextBuildR to b3. For the test to work as intended, it is important that b2.previousBuildR.get()==null during dropLinks(). That is what causes b2.getPreviousBuild() in the next line to go load b1a, and sets up b1a.nextBuildR to b2, and trigger the assertion error. Added a code to remove all RunListeners. With this change, the test now fails without the fix in f1430a2. (cherry picked from commit 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82)

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          core/src/main/java/jenkins/model/lazy/BuildReference.java
          core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
          http://jenkins-ci.org/commit/jenkins/ad65cecc7825693af7378d1d223966bac40ef37a
          Log:
          [FIXED JENKINS-22395] redoing the fix in f1430a2

          Based on the last few commits, I proved that the original fix in f1430a2
          doesn't really address the problem.

          That is, once b2 is deleted, and after sufficient garbage collection,
          we can make b2.previousBuild.get() be null, and then
          b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2.

          In this commit, I roll back that part of f1430a2, and then fix the
          problem differently.

          I started thinking that the main problem we are trying to fix here
          is that the deleted build object should be unreferenceable. That is,
          it should behave almost as if the object has already been GCed.
          The easiest way to do this is to clear a BuildReference object,
          since we always use the same BuildReference object for all inbound
          references.

          This change allows us to clear BuildReference. Code like
          b2.getPreviousBuild() will continue to try to update
          b1.nextBuildR to b2, but it will only end up wiping out the field,
          only to have b1.getNextBuild() recompute the correct value.

          This fix makes both test cases pass in LazyBuildMixInTest.

          (cherry picked from commit b6226ad2d1a332cb661ceb5c5f5b673771118e14)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/jenkins/model/lazy/BuildReference.java core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/ad65cecc7825693af7378d1d223966bac40ef37a Log: [FIXED JENKINS-22395] redoing the fix in f1430a2 Based on the last few commits, I proved that the original fix in f1430a2 doesn't really address the problem. That is, once b2 is deleted, and after sufficient garbage collection, we can make b2.previousBuild.get() be null, and then b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2. In this commit, I roll back that part of f1430a2, and then fix the problem differently. I started thinking that the main problem we are trying to fix here is that the deleted build object should be unreferenceable. That is, it should behave almost as if the object has already been GCed. The easiest way to do this is to clear a BuildReference object, since we always use the same BuildReference object for all inbound references. This change allows us to clear BuildReference. Code like b2.getPreviousBuild() will continue to try to update b1.nextBuildR to b2, but it will only end up wiping out the field, only to have b1.getNextBuild() recompute the correct value. This fix makes both test cases pass in LazyBuildMixInTest. (cherry picked from commit b6226ad2d1a332cb661ceb5c5f5b673771118e14)

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
          http://jenkins-ci.org/commit/jenkins/921001d5245c4f441023e8ca16351e7ca84dabb2
          Log:
          JENKINS-22395 more diagnostics

          Just in case the previous fix didn't address the root cause of ZD-11985 (and given the time it takes for changes like this to land in LTS), I'm adding additional diagnostics that let us track how previous/next builds are getting discovered

          (cherry picked from commit aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/921001d5245c4f441023e8ca16351e7ca84dabb2 Log: JENKINS-22395 more diagnostics Just in case the previous fix didn't address the root cause of ZD-11985 (and given the time it takes for changes like this to land in LTS), I'm adding additional diagnostics that let us track how previous/next builds are getting discovered (cherry picked from commit aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece)

          Code changed in jenkins
          User: Jesse Glick
          Path:
          test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
          http://jenkins-ci.org/commit/jenkins/63e806efb6c6ba38c5043e1916ff4f19d425ea0e
          Log:
          JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC.
          Confirmed that dropLinksAfterGC and dropLinksAfterGC2 both fail in the expected way (b1a.nextBuild == b2) after commenting out the call to createReference().clear() in dropLinks.
          (Also that they fail as expected in assertNotSame if the reference is not cleared at all.)

          (cherry picked from commit b7ec8578f938864af9c8dbfb73a54d67f972d694)

          Compare: https://github.com/jenkinsci/jenkins/compare/87df95af49f2^...63e806efb6c6

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/63e806efb6c6ba38c5043e1916ff4f19d425ea0e Log: JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC. Confirmed that dropLinksAfterGC and dropLinksAfterGC2 both fail in the expected way (b1a.nextBuild == b2) after commenting out the call to createReference().clear() in dropLinks. (Also that they fail as expected in assertNotSame if the reference is not cleared at all.) (cherry picked from commit b7ec8578f938864af9c8dbfb73a54d67f972d694) Compare: https://github.com/jenkinsci/jenkins/compare/87df95af49f2 ^...63e806efb6c6

          Jesse Glick added a comment - Testing build of https://github.com/jenkinsci/jenkins/compare/1.565.1-rc1...JENKINS-22395-dropLinks-backport uploaded: http://repo.jenkins-ci.org/public/org/jenkins-ci/main/jenkins-war/1.565.1.JENKINS-22395-dropLinks/jenkins-war-1.565.1.JENKINS-22395-dropLinks.war

          Code changed in jenkins
          User: Daniel Beck
          Path:
          changelog.html
          http://jenkins-ci.org/commit/jenkins/fdc0b5c5650b3ee849f02e3c5d94d23c12886adc
          Log:
          Noting #1314, #1316, #1308, JENKINS-17667, JENKINS-22395, JENKINS-18065

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Daniel Beck Path: changelog.html http://jenkins-ci.org/commit/jenkins/fdc0b5c5650b3ee849f02e3c5d94d23c12886adc Log: Noting #1314, #1316, #1308, JENKINS-17667 , JENKINS-22395 , JENKINS-18065

          Code changed in jenkins
          User: Daniel Beck
          Path:
          changelog.html
          http://jenkins-ci.org/commit/jenkins/72f37e414533c02624f9deffc85a2b50f03905e6
          Log:
          Remove JENKINS-22395, JENKINS-18065 from 1.572 section

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Daniel Beck Path: changelog.html http://jenkins-ci.org/commit/jenkins/72f37e414533c02624f9deffc85a2b50f03905e6 Log: Remove JENKINS-22395 , JENKINS-18065 from 1.572 section

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
          http://jenkins-ci.org/commit/jenkins/2baa7caade6923b9a2023e64b60bb537fcc65077
          Log:
          JENKINS-22395

          I'm not sure if I understand how the original fix in PR #1190 (f1430a2) addresses the problem.

          My understanding is of the problem is as follows:

          • someone deletes 'b2' but holds on to its reference (could be different threads)
          • someone calls b2.getPreviousBuild()
          • if b2.previousBuildR is null, then this triggers the loading of b1, and
            that process sets up a bi-di link via b1<>b2 via b1.nextBuildR <> b2.previousBuildR
          • this makes b1.getNextBuild() incorrectly return b2

          Presumably f1430a2 addresses this somehow, but I think I can induce this situation in other ways,
          which is what dropLinksAfterGC2() does.

          In this test,

          initial state:

          b1 <> b2 <> b3

          we load everyone and connect them all up

          after deleting b2:

          b1 <--------> b3
          ^ ^
          ---- b2 ----

          b1 and b3 points each other, and b2 still refers to each side

          after dropping b1:
          b2 --> b3

          now, here, when I do b2.getPreviousBuild(), it loads b1a and it sets b1a.nextBuildR to b2.

          b1a <-> b2 --> b3

          So I claim this is a proof that the fix is incomplete, even for the problem JENKINS-22395 has discovered.

          I don't think that the problem is for the dropLinks call to fail to patch up references.
          The problem is that b2.getPreviousBuild() forcing b1 to point to b2, because if b2 is deleted and assumed to be
          invalid, then no matter what bX this method will find you never want bX.nextBuildR to point to b2.

          (cherry picked from commit 53e95ee517259e1e10d7209bcbaa4314397c1f40)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/2baa7caade6923b9a2023e64b60bb537fcc65077 Log: JENKINS-22395 I'm not sure if I understand how the original fix in PR #1190 (f1430a2) addresses the problem. My understanding is of the problem is as follows: someone deletes 'b2' but holds on to its reference (could be different threads) someone calls b2.getPreviousBuild() if b2.previousBuildR is null, then this triggers the loading of b1, and that process sets up a bi-di link via b1< >b2 via b1.nextBuildR < > b2.previousBuildR this makes b1.getNextBuild() incorrectly return b2 Presumably f1430a2 addresses this somehow, but I think I can induce this situation in other ways, which is what dropLinksAfterGC2() does. In this test, initial state: b1 < > b2 < > b3 we load everyone and connect them all up after deleting b2: b1 <--------> b3 ^ ^ ---- b2 ---- b1 and b3 points each other, and b2 still refers to each side after dropping b1: b2 --> b3 now, here, when I do b2.getPreviousBuild(), it loads b1a and it sets b1a.nextBuildR to b2. b1a <-> b2 --> b3 So I claim this is a proof that the fix is incomplete, even for the problem JENKINS-22395 has discovered. I don't think that the problem is for the dropLinks call to fail to patch up references. The problem is that b2.getPreviousBuild() forcing b1 to point to b2, because if b2 is deleted and assumed to be invalid, then no matter what bX this method will find you never want bX.nextBuildR to point to b2. (cherry picked from commit 53e95ee517259e1e10d7209bcbaa4314397c1f40)

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
          http://jenkins-ci.org/commit/jenkins/3f0388379b06b31ebec1bf17cc976001fcd80b7b
          Log:
          JENKINS-22395 correcting the control test

          As I step-executed the code, I discovered b2.getPreviousBuild() was getting invoked
          between BRHF.drop(b1) and b2.dropLinks() call, in PeepholePermalink.RunListenerImpl.onDeleted()
          because Run.delete() calls RunListener.fireDeleted(this).

          Thus in effect the sequence of the call order was as follows:
          assertEquals(1, BRHF.drop(b1));
          b2.getPreviousBuild(); // happens indirectly in PeepholePermalink
          b2.delete();
          FreeStyleBuild b1a = b2.getPreviousBuild();

          This defeats the purpose of the fix, because in this call sequence, by b2.dropLinks()
          as implemented before f1430a2 will correctly fix up b1a.nextBuildR to b3.

          For the test to work as intended, it is important that b2.previousBuildR.get()==null
          during dropLinks(). That is what causes b2.getPreviousBuild() in the next line to go
          load b1a, and sets up b1a.nextBuildR to b2, and trigger the assertion error.

          Added a code to remove all RunListeners. With this change, the test now fails without
          the fix in f1430a2.

          (cherry picked from commit 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/3f0388379b06b31ebec1bf17cc976001fcd80b7b Log: JENKINS-22395 correcting the control test As I step-executed the code, I discovered b2.getPreviousBuild() was getting invoked between BRHF.drop(b1) and b2.dropLinks() call, in PeepholePermalink.RunListenerImpl.onDeleted() because Run.delete() calls RunListener.fireDeleted(this). Thus in effect the sequence of the call order was as follows: assertEquals(1, BRHF.drop(b1)); b2.getPreviousBuild(); // happens indirectly in PeepholePermalink b2.delete(); FreeStyleBuild b1a = b2.getPreviousBuild(); This defeats the purpose of the fix, because in this call sequence, by b2.dropLinks() as implemented before f1430a2 will correctly fix up b1a.nextBuildR to b3. For the test to work as intended, it is important that b2.previousBuildR.get()==null during dropLinks(). That is what causes b2.getPreviousBuild() in the next line to go load b1a, and sets up b1a.nextBuildR to b2, and trigger the assertion error. Added a code to remove all RunListeners. With this change, the test now fails without the fix in f1430a2. (cherry picked from commit 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82)

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          core/src/main/java/jenkins/model/lazy/BuildReference.java
          core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
          http://jenkins-ci.org/commit/jenkins/4bfa16143705e219705ff667c6917a2d6a6a8939
          Log:
          [FIXED JENKINS-22395] redoing the fix in f1430a2

          Based on the last few commits, I proved that the original fix in f1430a2
          doesn't really address the problem.

          That is, once b2 is deleted, and after sufficient garbage collection,
          we can make b2.previousBuild.get() be null, and then
          b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2.

          In this commit, I roll back that part of f1430a2, and then fix the
          problem differently.

          I started thinking that the main problem we are trying to fix here
          is that the deleted build object should be unreferenceable. That is,
          it should behave almost as if the object has already been GCed.
          The easiest way to do this is to clear a BuildReference object,
          since we always use the same BuildReference object for all inbound
          references.

          This change allows us to clear BuildReference. Code like
          b2.getPreviousBuild() will continue to try to update
          b1.nextBuildR to b2, but it will only end up wiping out the field,
          only to have b1.getNextBuild() recompute the correct value.

          This fix makes both test cases pass in LazyBuildMixInTest.

          (cherry picked from commit b6226ad2d1a332cb661ceb5c5f5b673771118e14)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/jenkins/model/lazy/BuildReference.java core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/4bfa16143705e219705ff667c6917a2d6a6a8939 Log: [FIXED JENKINS-22395] redoing the fix in f1430a2 Based on the last few commits, I proved that the original fix in f1430a2 doesn't really address the problem. That is, once b2 is deleted, and after sufficient garbage collection, we can make b2.previousBuild.get() be null, and then b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2. In this commit, I roll back that part of f1430a2, and then fix the problem differently. I started thinking that the main problem we are trying to fix here is that the deleted build object should be unreferenceable. That is, it should behave almost as if the object has already been GCed. The easiest way to do this is to clear a BuildReference object, since we always use the same BuildReference object for all inbound references. This change allows us to clear BuildReference. Code like b2.getPreviousBuild() will continue to try to update b1.nextBuildR to b2, but it will only end up wiping out the field, only to have b1.getNextBuild() recompute the correct value. This fix makes both test cases pass in LazyBuildMixInTest. (cherry picked from commit b6226ad2d1a332cb661ceb5c5f5b673771118e14)

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
          http://jenkins-ci.org/commit/jenkins/ca156f730b9aedac9072f7243e3f1528330ca68b
          Log:
          JENKINS-22395 more diagnostics

          Just in case the previous fix didn't address the root cause of ZD-11985 (and given the time it takes for changes like this to land in LTS), I'm adding additional diagnostics that let us track how previous/next builds are getting discovered

          (cherry picked from commit aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/ca156f730b9aedac9072f7243e3f1528330ca68b Log: JENKINS-22395 more diagnostics Just in case the previous fix didn't address the root cause of ZD-11985 (and given the time it takes for changes like this to land in LTS), I'm adding additional diagnostics that let us track how previous/next builds are getting discovered (cherry picked from commit aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece)

          Code changed in jenkins
          User: Jesse Glick
          Path:
          test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
          http://jenkins-ci.org/commit/jenkins/579753403c63886cf0feff177602d79254829780
          Log:
          JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC.
          Confirmed that dropLinksAfterGC and dropLinksAfterGC2 both fail in the expected way (b1a.nextBuild == b2) after commenting out the call to createReference().clear() in dropLinks.
          (Also that they fail as expected in assertNotSame if the reference is not cleared at all.)

          (cherry picked from commit b7ec8578f938864af9c8dbfb73a54d67f972d694)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/579753403c63886cf0feff177602d79254829780 Log: JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC. Confirmed that dropLinksAfterGC and dropLinksAfterGC2 both fail in the expected way (b1a.nextBuild == b2) after commenting out the call to createReference().clear() in dropLinks. (Also that they fail as expected in assertNotSame if the reference is not cleared at all.) (cherry picked from commit b7ec8578f938864af9c8dbfb73a54d67f972d694)

          Sam Van Oort added a comment - - edited

          Reproducible again with a high enough build rate and pipeline (lots of concurrent or fast builds):

          Aug 08, 2017 5:53:08 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
          INFO: sample #8540 completed: SUCCESS
          Aug 08, 2017 5:53:08 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
          WARNING: failed to save sample #8541 or perform log rotation
          java.io.IOException: sample #8527: /Users/svanoort/Documents/jenkins-plugins-oss/random-job-builder-plugin/work/jobs/sample/builds/8527 looks to have already been deleted; siblings: [.8527, 4568, 4569, 4570, 4571, 8528, 8529, 8530, 8531, 8532, 8533, 8534, 8535, 8536, 8537, 8538, 8539, 8540, 8541, 8542, 8543, 8544, 8545, 8546, 8547, lastFailedBuild, lastStableBuild, lastSuccessfulBuild, lastUnstableBuild, lastUnsuccessfulBuild, legacyIds]
          at hudson.model.Run.delete(Run.java:1480)
          at hudson.tasks.LogRotator.perform(LogRotator.java:130)
          at hudson.model.Job.logRotate(Job.java:478)
          at org.jenkinsci.plugins.workflow.job.WorkflowRun.finish(WorkflowRun.java:597)
          at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$1200(WorkflowRun.java:120)
          at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:865)
          at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1081)
          at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:400)
          at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
          at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:745)

          Sam Van Oort added a comment - - edited Reproducible again with a high enough build rate and pipeline (lots of concurrent or fast builds): Aug 08, 2017 5:53:08 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish INFO: sample #8540 completed: SUCCESS Aug 08, 2017 5:53:08 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish WARNING: failed to save sample #8541 or perform log rotation java.io.IOException: sample #8527: /Users/svanoort/Documents/jenkins-plugins-oss/random-job-builder-plugin/work/jobs/sample/builds/8527 looks to have already been deleted; siblings: [.8527, 4568, 4569, 4570, 4571, 8528, 8529, 8530, 8531, 8532, 8533, 8534, 8535, 8536, 8537, 8538, 8539, 8540, 8541, 8542, 8543, 8544, 8545, 8546, 8547, lastFailedBuild, lastStableBuild, lastSuccessfulBuild, lastUnstableBuild, lastUnsuccessfulBuild, legacyIds] at hudson.model.Run.delete(Run.java:1480) at hudson.tasks.LogRotator.perform(LogRotator.java:130) at hudson.model.Job.logRotate(Job.java:478) at org.jenkinsci.plugins.workflow.job.WorkflowRun.finish(WorkflowRun.java:597) at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$1200(WorkflowRun.java:120) at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:865) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1081) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:400) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

          Jesse Glick added a comment -

          Original bug was indeed fixed, backported, etc. so what you are seeing svanoort is something else with similar symptoms. Better to file a fresh issue and link it to this one.

          Jesse Glick added a comment - Original bug was indeed fixed, backported, etc. so what you are seeing svanoort is something else with similar symptoms. Better to file a fresh issue and link it to this one.

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            2 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: