Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-22395

Run.delete (from LogRotator) failing with "...looks to have already been deleted"

    XMLWordPrintable

Details

    Description

      Sometimes "log rotation" fails with an exception like

      SEVERE  hudson.model.Run#execute: Failed to rotate log
      java.io.IOException: .../jobs/.../modules/...$.../builds/2013-... looks to have already been deleted
          at hudson.model.Run.delete(Run.java:1432)
          at hudson.maven.MavenModuleSetBuild.delete(MavenModuleSetBuild.java:420)
          at hudson.tasks.LogRotator.perform(LogRotator.java:136)
          at hudson.model.Job.logRotate(Job.java:437)
          at hudson.maven.MavenModuleSet.logRotate(MavenModuleSet.java:851)
          at hudson.model.Run.execute(Run.java:1728)
          at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:509)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:246)
      

      Usually MavenModuleSetBuild is involved, perhaps suggesting a problem with deleted or skipped module builds (I cannot reproduce in either scenario), though I think I have also seen this happen for freestyle builds. Unclear whether the directory actually exists but File.isDirectory thinks it does not (the code originally tried to delete the dir without checking this and failed with "... is in use"); or whether the directory was actually deleted earlier but this Run was not cleaned up properly for some reason.

      Hypothesis: AbstractLazyLoadRunMap.idOnDisk is not removed by removeValue, called from AbstractProject.removeRun (from Run.delete). Perhaps something is later resurrecting the AbstractBuild from idOnDisk, making it again available for another round of log rotation? But if the actual directory was deleted, it is hard to see how: load would just fail.

      Or perhaps Run.delete is the race condition: the run is removed from the parent after its directory has been deleted. The method is synchronized, but that does not help if two copies of the Run exist, which might happen due to other lazy-loading bugs.

      Diagnostics added to date:

      Attachments

        Issue Links

          Activity

            jglick Jesse Glick added a comment -

            Not to be confused with JENKINS-19377, which had a similar stack trace but was easily reproducible and limited to external monitor jobs.

            jglick Jesse Glick added a comment - Not to be confused with JENKINS-19377 , which had a similar stack trace but was easily reproducible and limited to external monitor jobs.
            jglick Jesse Glick added a comment - Diagnostic build: http://repo.jenkins-ci.org/public/org/jenkins-ci/main/jenkins-war/1.532.2.JENKINS-22395-diag/jenkins-war-1.532.2.JENKINS-22395-diag.war
            jglick Jesse Glick added a comment -

            Enhanced logging suggests that when MavenModuleSetBuild.delete leads to this error, it is deleting a MavenModuleBuild with the same number (successfully), and then (via getModuleBuilds) one with a later number that had already been deleted during an earlier MavenModuleSetBuild.delete. In other words, AbstractBuild.getNextBuild is returning a build which was already (recently) deleted from its parent! That suggests that dropLinks failed to actually drop the link from the older build—perhaps because the referent of previousBuildR in the newer build was null, perhaps because the older build was collected and then recreated.

            jglick Jesse Glick added a comment - Enhanced logging suggests that when MavenModuleSetBuild.delete leads to this error, it is deleting a MavenModuleBuild with the same number (successfully), and then (via getModuleBuilds ) one with a later number that had already been deleted during an earlier MavenModuleSetBuild.delete . In other words, AbstractBuild.getNextBuild is returning a build which was already (recently) deleted from its parent! That suggests that dropLinks failed to actually drop the link from the older build—perhaps because the referent of previousBuildR in the newer build was null, perhaps because the older build was collected and then recreated.
            jglick Jesse Glick added a comment - Attaching patch of new backport of purported fix (as well as diagnostics): http://repo.jenkins-ci.org/public/org/jenkins-ci/main/jenkins-war/1.532.3.JENKINS-22395/jenkins-war-1.532.3.JENKINS-22395.war
            jglick Jesse Glick added a comment - Build of https://github.com/jglick/jenkins/compare/jenkinsci:jenkins-1.532.3...JENKINS-22395-diag available: http://repo.jenkins-ci.org/public/org/jenkins-ci/main/jenkins-war/1.532.3.JENKINS-22395-2/jenkins-war-1.532.3.JENKINS-22395-2.war

            Code changed in jenkins
            User: Jesse Glick
            Path:
            core/src/main/java/hudson/model/Run.java
            http://jenkins-ci.org/commit/jenkins/c1fcb08e08c3ccae1451602af8ea20788dc9950e
            Log:
            JENKINS-17236 Run.deleteArtifacts needs to go through ArtifactManager.
            (Also relevant to JENKINS-22395 diagnostics.)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/hudson/model/Run.java http://jenkins-ci.org/commit/jenkins/c1fcb08e08c3ccae1451602af8ea20788dc9950e Log: JENKINS-17236 Run.deleteArtifacts needs to go through ArtifactManager. (Also relevant to JENKINS-22395 diagnostics.)

            Code changed in jenkins
            User: Jesse Glick
            Path:
            core/src/main/java/hudson/model/Run.java
            core/src/main/java/hudson/tasks/LogRotator.java
            core/src/main/java/jenkins/model/StandardArtifactManager.java
            http://jenkins-ci.org/commit/jenkins/3ef8063e43640952442de90e034dadba2ada337c
            Log:
            JENKINS-22395 Yet more diagnostics.

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/hudson/model/Run.java core/src/main/java/hudson/tasks/LogRotator.java core/src/main/java/jenkins/model/StandardArtifactManager.java http://jenkins-ci.org/commit/jenkins/3ef8063e43640952442de90e034dadba2ada337c Log: JENKINS-22395 Yet more diagnostics.
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #3347
            JENKINS-22395 Yet more diagnostics. (Revision 3ef8063e43640952442de90e034dadba2ada337c)

            Result = SUCCESS
            Jesse Glick : 3ef8063e43640952442de90e034dadba2ada337c
            Files :

            • core/src/main/java/jenkins/model/StandardArtifactManager.java
            • core/src/main/java/hudson/model/Run.java
            • core/src/main/java/hudson/tasks/LogRotator.java
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #3347 JENKINS-22395 Yet more diagnostics. (Revision 3ef8063e43640952442de90e034dadba2ada337c) Result = SUCCESS Jesse Glick : 3ef8063e43640952442de90e034dadba2ada337c Files : core/src/main/java/jenkins/model/StandardArtifactManager.java core/src/main/java/hudson/model/Run.java core/src/main/java/hudson/tasks/LogRotator.java

            Code changed in jenkins
            User: Jesse Glick
            Path:
            core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
            test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
            http://jenkins-ci.org/commit/jenkins/f1430a25ec5481b3635d6cf4fb7ea57a890e50f0
            Log:
            [FIXED JENKINS-22395] Attempt to double-delete a Run probably caused by getNextBuild sometimes returning a build which was already deleted.

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/f1430a25ec5481b3635d6cf4fb7ea57a890e50f0 Log: [FIXED JENKINS-22395] Attempt to double-delete a Run probably caused by getNextBuild sometimes returning a build which was already deleted.
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html http://jenkins-ci.org/commit/jenkins/ce9d47774e0eb67129099268f047f2d64f4667a7 Log: JENKINS-22395 Noting merge of #1190. Compare: https://github.com/jenkinsci/jenkins/compare/0761c7795d16...ce9d47774e0e
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #3360
            [FIXED JENKINS-22395] Attempt to double-delete a Run probably caused by getNextBuild sometimes returning a build which was already deleted. (Revision f1430a25ec5481b3635d6cf4fb7ea57a890e50f0)
            JENKINS-22395 Noting merge of #1190. (Revision ce9d47774e0eb67129099268f047f2d64f4667a7)

            Result = SUCCESS
            Jesse Glick : f1430a25ec5481b3635d6cf4fb7ea57a890e50f0
            Files :

            • test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
            • core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java

            Jesse Glick : ce9d47774e0eb67129099268f047f2d64f4667a7
            Files :

            • changelog.html
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #3360 [FIXED JENKINS-22395] Attempt to double-delete a Run probably caused by getNextBuild sometimes returning a build which was already deleted. (Revision f1430a25ec5481b3635d6cf4fb7ea57a890e50f0) JENKINS-22395 Noting merge of #1190. (Revision ce9d47774e0eb67129099268f047f2d64f4667a7) Result = SUCCESS Jesse Glick : f1430a25ec5481b3635d6cf4fb7ea57a890e50f0 Files : test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java Jesse Glick : ce9d47774e0eb67129099268f047f2d64f4667a7 Files : changelog.html
            jglick Jesse Glick added a comment -

            Attempted fix did not work.

            jglick Jesse Glick added a comment - Attempted fix did not work.
            danielbeck Daniel Beck added a comment -

            A comment posted in (otherwise unrelated) JENKINS-16805 indicates this can happen after Jenkins fails to delete a renamed build directory once.

            Hypothesis: Run.delete() renames the build dir from DATE to .DATE first, then deletes that dir. If deleting it fails twice (e.g. due to file locking) Jenkins does not remove the build from the project in memory. Any subsequent deletion attempts would then fail to find the DATE build dir and print that message.

            In this case, reloading from disk resolved the issue (as the renamed dir no longer matches the required pattern).

            danielbeck Daniel Beck added a comment - A comment posted in (otherwise unrelated) JENKINS-16805 indicates this can happen after Jenkins fails to delete a renamed build directory once. Hypothesis: Run.delete() renames the build dir from DATE to .DATE first, then deletes that dir. If deleting it fails twice (e.g. due to file locking) Jenkins does not remove the build from the project in memory. Any subsequent deletion attempts would then fail to find the DATE build dir and print that message. In this case, reloading from disk resolved the issue (as the renamed dir no longer matches the required pattern).
            jglick Jesse Glick added a comment -

            Maybe such a problem also exists, but that is not the bug I am tracking. In the case reported to me, Run.delete evidently succeeds, but the Run is still returned from getNextBuild of its immediate predecessor a moment later, which should never happen. (getNextBuild should only return a nondeleted build, or null if this was the most recent nondeleted build.) The reported exception then follows because in the case of a Maven module set, there can be multiple builds of a given module associated with one build of the module set—the build of matching number, and then any succeeding builds up until some cutoff—and all are deleted when the module set build is deleted in the course of routine LogRotator cleanup.

            jglick Jesse Glick added a comment - Maybe such a problem also exists, but that is not the bug I am tracking. In the case reported to me, Run.delete evidently succeeds, but the Run is still returned from getNextBuild of its immediate predecessor a moment later, which should never happen. ( getNextBuild should only return a nondeleted build, or null if this was the most recent nondeleted build.) The reported exception then follows because in the case of a Maven module set, there can be multiple builds of a given module associated with one build of the module set—the build of matching number, and then any succeeding builds up until some cutoff—and all are deleted when the module set build is deleted in the course of routine LogRotator cleanup.

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
            http://jenkins-ci.org/commit/jenkins/53e95ee517259e1e10d7209bcbaa4314397c1f40
            Log:
            JENKINS-22395

            I'm not sure if I understand how the original fix in PR #1190 (f1430a2) addresses the problem.

            My understanding is of the problem is as follows:

            • someone deletes 'b2' but holds on to its reference (could be different threads)
            • someone calls b2.getPreviousBuild()
            • if b2.previousBuildR is null, then this triggers the loading of b1, and
              that process sets up a bi-di link via b1<>b2 via b1.nextBuildR <> b2.previousBuildR
            • this makes b1.getNextBuild() incorrectly return b2

            Presumably f1430a2 addresses this somehow, but I think I can induce this situation in other ways,
            which is what dropLinksAfterGC2() does.

            In this test,

            initial state:

            b1 <> b2 <> b3

            we load everyone and connect them all up

            after deleting b2:

            b1 <--------> b3
            ^ ^
            ---- b2 ----

            b1 and b3 points each other, and b2 still refers to each side

            after dropping b1:
            b2 --> b3

            now, here, when I do b2.getPreviousBuild(), it loads b1a and it sets b1a.nextBuildR to b2.

            b1a <-> b2 --> b3

            So I claim this is a proof that the fix is incomplete, even for the problem JENKINS-22395 has discovered.

            I don't think that the problem is for the dropLinks call to fail to patch up references.
            The problem is that b2.getPreviousBuild() forcing b1 to point to b2, because if b2 is deleted and assumed to be
            invalid, then no matter what bX this method will find you never want bX.nextBuildR to point to b2.

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/53e95ee517259e1e10d7209bcbaa4314397c1f40 Log: JENKINS-22395 I'm not sure if I understand how the original fix in PR #1190 (f1430a2) addresses the problem. My understanding is of the problem is as follows: someone deletes 'b2' but holds on to its reference (could be different threads) someone calls b2.getPreviousBuild() if b2.previousBuildR is null, then this triggers the loading of b1, and that process sets up a bi-di link via b1< >b2 via b1.nextBuildR < > b2.previousBuildR this makes b1.getNextBuild() incorrectly return b2 Presumably f1430a2 addresses this somehow, but I think I can induce this situation in other ways, which is what dropLinksAfterGC2() does. In this test, initial state: b1 < > b2 < > b3 we load everyone and connect them all up after deleting b2: b1 <--------> b3 ^ ^ ---- b2 ---- b1 and b3 points each other, and b2 still refers to each side after dropping b1: b2 --> b3 now, here, when I do b2.getPreviousBuild(), it loads b1a and it sets b1a.nextBuildR to b2. b1a <-> b2 --> b3 So I claim this is a proof that the fix is incomplete, even for the problem JENKINS-22395 has discovered. I don't think that the problem is for the dropLinks call to fail to patch up references. The problem is that b2.getPreviousBuild() forcing b1 to point to b2, because if b2 is deleted and assumed to be invalid, then no matter what bX this method will find you never want bX.nextBuildR to point to b2.

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
            http://jenkins-ci.org/commit/jenkins/7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82
            Log:
            JENKINS-22395 correcting the control test

            As I step-executed the code, I discovered b2.getPreviousBuild() was getting invoked
            between BRHF.drop(b1) and b2.dropLinks() call, in PeepholePermalink.RunListenerImpl.onDeleted()
            because Run.delete() calls RunListener.fireDeleted(this).

            Thus in effect the sequence of the call order was as follows:
            assertEquals(1, BRHF.drop(b1));
            b2.getPreviousBuild(); // happens indirectly in PeepholePermalink
            b2.delete();
            FreeStyleBuild b1a = b2.getPreviousBuild();

            This defeats the purpose of the fix, because in this call sequence, by b2.dropLinks()
            as implemented before f1430a2 will correctly fix up b1a.nextBuildR to b3.

            For the test to work as intended, it is important that b2.previousBuildR.get()==null
            during dropLinks(). That is what causes b2.getPreviousBuild() in the next line to go
            load b1a, and sets up b1a.nextBuildR to b2, and trigger the assertion error.

            Added a code to remove all RunListeners. With this change, the test now fails without
            the fix in f1430a2.

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82 Log: JENKINS-22395 correcting the control test As I step-executed the code, I discovered b2.getPreviousBuild() was getting invoked between BRHF.drop(b1) and b2.dropLinks() call, in PeepholePermalink.RunListenerImpl.onDeleted() because Run.delete() calls RunListener.fireDeleted(this). Thus in effect the sequence of the call order was as follows: assertEquals(1, BRHF.drop(b1)); b2.getPreviousBuild(); // happens indirectly in PeepholePermalink b2.delete(); FreeStyleBuild b1a = b2.getPreviousBuild(); This defeats the purpose of the fix, because in this call sequence, by b2.dropLinks() as implemented before f1430a2 will correctly fix up b1a.nextBuildR to b3. For the test to work as intended, it is important that b2.previousBuildR.get()==null during dropLinks(). That is what causes b2.getPreviousBuild() in the next line to go load b1a, and sets up b1a.nextBuildR to b2, and trigger the assertion error. Added a code to remove all RunListeners. With this change, the test now fails without the fix in f1430a2.

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            core/src/main/java/jenkins/model/lazy/BuildReference.java
            core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
            http://jenkins-ci.org/commit/jenkins/b6226ad2d1a332cb661ceb5c5f5b673771118e14
            Log:
            [FIXED JENKINS-22395] redoing the fix in f1430a2

            Based on the last few commits, I proved that the original fix in f1430a2
            doesn't really address the problem.

            That is, once b2 is deleted, and after sufficient garbage collection,
            we can make b2.previousBuild.get() be null, and then
            b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2.

            In this commit, I roll back that part of f1430a2, and then fix the
            problem differently.

            I started thinking that the main problem we are trying to fix here
            is that the deleted build object should be unreferenceable. That is,
            it should behave almost as if the object has already been GCed.
            The easiest way to do this is to clear a BuildReference object,
            since we always use the same BuildReference object for all inbound
            references.

            This change allows us to clear BuildReference. Code like
            b2.getPreviousBuild() will continue to try to update
            b1.nextBuildR to b2, but it will only end up wiping out the field,
            only to have b1.getNextBuild() recompute the correct value.

            This fix makes both test cases pass in LazyBuildMixInTest.

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/jenkins/model/lazy/BuildReference.java core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/b6226ad2d1a332cb661ceb5c5f5b673771118e14 Log: [FIXED JENKINS-22395] redoing the fix in f1430a2 Based on the last few commits, I proved that the original fix in f1430a2 doesn't really address the problem. That is, once b2 is deleted, and after sufficient garbage collection, we can make b2.previousBuild.get() be null, and then b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2. In this commit, I roll back that part of f1430a2, and then fix the problem differently. I started thinking that the main problem we are trying to fix here is that the deleted build object should be unreferenceable. That is, it should behave almost as if the object has already been GCed. The easiest way to do this is to clear a BuildReference object, since we always use the same BuildReference object for all inbound references. This change allows us to clear BuildReference. Code like b2.getPreviousBuild() will continue to try to update b1.nextBuildR to b2, but it will only end up wiping out the field, only to have b1.getNextBuild() recompute the correct value. This fix makes both test cases pass in LazyBuildMixInTest.

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
            http://jenkins-ci.org/commit/jenkins/aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece
            Log:
            JENKINS-22395 more diagnostics

            Just in case the previous fix didn't address the root cause of ZD-11985 (and given the time it takes for changes like this to land in LTS), I'm adding additional diagnostics that let us track how previous/next builds are getting discovered

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece Log: JENKINS-22395 more diagnostics Just in case the previous fix didn't address the root cause of ZD-11985 (and given the time it takes for changes like this to land in LTS), I'm adding additional diagnostics that let us track how previous/next builds are getting discovered
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #3507
            JENKINS-22395 (Revision 53e95ee517259e1e10d7209bcbaa4314397c1f40)
            JENKINS-22395 correcting the control test (Revision 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82)
            [FIXED JENKINS-22395] redoing the fix in f1430a2 (Revision b6226ad2d1a332cb661ceb5c5f5b673771118e14)

            Result = SUCCESS
            kohsuke : 53e95ee517259e1e10d7209bcbaa4314397c1f40
            Files :

            • test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java

            kohsuke : 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82
            Files :

            • test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java

            kohsuke : b6226ad2d1a332cb661ceb5c5f5b673771118e14
            Files :

            • core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
            • core/src/main/java/jenkins/model/lazy/BuildReference.java
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #3507 JENKINS-22395 (Revision 53e95ee517259e1e10d7209bcbaa4314397c1f40) JENKINS-22395 correcting the control test (Revision 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82) [FIXED JENKINS-22395] redoing the fix in f1430a2 (Revision b6226ad2d1a332cb661ceb5c5f5b673771118e14) Result = SUCCESS kohsuke : 53e95ee517259e1e10d7209bcbaa4314397c1f40 Files : test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java kohsuke : 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82 Files : test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java kohsuke : b6226ad2d1a332cb661ceb5c5f5b673771118e14 Files : core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java core/src/main/java/jenkins/model/lazy/BuildReference.java
            danielbeck Daniel Beck added a comment -

            Is this a suitable lts-candidate, or too dangerous?

            danielbeck Daniel Beck added a comment - Is this a suitable lts-candidate , or too dangerous?
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #3508
            JENKINS-22395 more diagnostics (Revision aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece)

            Result = SUCCESS
            kohsuke : aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece
            Files :

            • core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #3508 JENKINS-22395 more diagnostics (Revision aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece) Result = SUCCESS kohsuke : aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece Files : core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java

            Code changed in jenkins
            User: Jesse Glick
            Path:
            test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
            http://jenkins-ci.org/commit/jenkins/b7ec8578f938864af9c8dbfb73a54d67f972d694
            Log:
            JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC.
            Confirmed that dropLinksAfterGC and dropLinksAfterGC2 both fail in the expected way (b1a.nextBuild == b2) after commenting out the call to createReference().clear() in dropLinks.
            (Also that they fail as expected in assertNotSame if the reference is not cleared at all.)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/b7ec8578f938864af9c8dbfb73a54d67f972d694 Log: JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC. Confirmed that dropLinksAfterGC and dropLinksAfterGC2 both fail in the expected way (b1a.nextBuild == b2) after commenting out the call to createReference().clear() in dropLinks. (Also that they fail as expected in assertNotSame if the reference is not cleared at all.)
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #3509
            JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC. (Revision b7ec8578f938864af9c8dbfb73a54d67f972d694)

            Result = SUCCESS
            Jesse Glick : b7ec8578f938864af9c8dbfb73a54d67f972d694
            Files :

            • test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #3509 JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC. (Revision b7ec8578f938864af9c8dbfb73a54d67f972d694) Result = SUCCESS Jesse Glick : b7ec8578f938864af9c8dbfb73a54d67f972d694 Files : test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java

            Code changed in jenkins
            User: Jesse Glick
            Path:
            cli/pom.xml
            core/pom.xml
            plugins/pom.xml
            pom.xml
            test/pom.xml
            war/pom.xml
            http://jenkins-ci.org/commit/jenkins/87df95af49f290cd16f6179d70bc05df411c91ac
            Log:
            Setting version: 1.565.1.JENKINS-22395-dropLinks

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: cli/pom.xml core/pom.xml plugins/pom.xml pom.xml test/pom.xml war/pom.xml http://jenkins-ci.org/commit/jenkins/87df95af49f290cd16f6179d70bc05df411c91ac Log: Setting version: 1.565.1. JENKINS-22395 -dropLinks

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
            http://jenkins-ci.org/commit/jenkins/4910863dbdb944f249fa86c4ca583e53c736ed69
            Log:
            JENKINS-22395

            I'm not sure if I understand how the original fix in PR #1190 (f1430a2) addresses the problem.

            My understanding is of the problem is as follows:

            • someone deletes 'b2' but holds on to its reference (could be different threads)
            • someone calls b2.getPreviousBuild()
            • if b2.previousBuildR is null, then this triggers the loading of b1, and
              that process sets up a bi-di link via b1<>b2 via b1.nextBuildR <> b2.previousBuildR
            • this makes b1.getNextBuild() incorrectly return b2

            Presumably f1430a2 addresses this somehow, but I think I can induce this situation in other ways,
            which is what dropLinksAfterGC2() does.

            In this test,

            initial state:

            b1 <> b2 <> b3

            we load everyone and connect them all up

            after deleting b2:

            b1 <--------> b3
            ^ ^
            ---- b2 ----

            b1 and b3 points each other, and b2 still refers to each side

            after dropping b1:
            b2 --> b3

            now, here, when I do b2.getPreviousBuild(), it loads b1a and it sets b1a.nextBuildR to b2.

            b1a <-> b2 --> b3

            So I claim this is a proof that the fix is incomplete, even for the problem JENKINS-22395 has discovered.

            I don't think that the problem is for the dropLinks call to fail to patch up references.
            The problem is that b2.getPreviousBuild() forcing b1 to point to b2, because if b2 is deleted and assumed to be
            invalid, then no matter what bX this method will find you never want bX.nextBuildR to point to b2.

            (cherry picked from commit 53e95ee517259e1e10d7209bcbaa4314397c1f40)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/4910863dbdb944f249fa86c4ca583e53c736ed69 Log: JENKINS-22395 I'm not sure if I understand how the original fix in PR #1190 (f1430a2) addresses the problem. My understanding is of the problem is as follows: someone deletes 'b2' but holds on to its reference (could be different threads) someone calls b2.getPreviousBuild() if b2.previousBuildR is null, then this triggers the loading of b1, and that process sets up a bi-di link via b1< >b2 via b1.nextBuildR < > b2.previousBuildR this makes b1.getNextBuild() incorrectly return b2 Presumably f1430a2 addresses this somehow, but I think I can induce this situation in other ways, which is what dropLinksAfterGC2() does. In this test, initial state: b1 < > b2 < > b3 we load everyone and connect them all up after deleting b2: b1 <--------> b3 ^ ^ ---- b2 ---- b1 and b3 points each other, and b2 still refers to each side after dropping b1: b2 --> b3 now, here, when I do b2.getPreviousBuild(), it loads b1a and it sets b1a.nextBuildR to b2. b1a <-> b2 --> b3 So I claim this is a proof that the fix is incomplete, even for the problem JENKINS-22395 has discovered. I don't think that the problem is for the dropLinks call to fail to patch up references. The problem is that b2.getPreviousBuild() forcing b1 to point to b2, because if b2 is deleted and assumed to be invalid, then no matter what bX this method will find you never want bX.nextBuildR to point to b2. (cherry picked from commit 53e95ee517259e1e10d7209bcbaa4314397c1f40)

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
            http://jenkins-ci.org/commit/jenkins/02e9f36dac05e533f7dc28325def85d4d937fb9e
            Log:
            JENKINS-22395 correcting the control test

            As I step-executed the code, I discovered b2.getPreviousBuild() was getting invoked
            between BRHF.drop(b1) and b2.dropLinks() call, in PeepholePermalink.RunListenerImpl.onDeleted()
            because Run.delete() calls RunListener.fireDeleted(this).

            Thus in effect the sequence of the call order was as follows:
            assertEquals(1, BRHF.drop(b1));
            b2.getPreviousBuild(); // happens indirectly in PeepholePermalink
            b2.delete();
            FreeStyleBuild b1a = b2.getPreviousBuild();

            This defeats the purpose of the fix, because in this call sequence, by b2.dropLinks()
            as implemented before f1430a2 will correctly fix up b1a.nextBuildR to b3.

            For the test to work as intended, it is important that b2.previousBuildR.get()==null
            during dropLinks(). That is what causes b2.getPreviousBuild() in the next line to go
            load b1a, and sets up b1a.nextBuildR to b2, and trigger the assertion error.

            Added a code to remove all RunListeners. With this change, the test now fails without
            the fix in f1430a2.

            (cherry picked from commit 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/02e9f36dac05e533f7dc28325def85d4d937fb9e Log: JENKINS-22395 correcting the control test As I step-executed the code, I discovered b2.getPreviousBuild() was getting invoked between BRHF.drop(b1) and b2.dropLinks() call, in PeepholePermalink.RunListenerImpl.onDeleted() because Run.delete() calls RunListener.fireDeleted(this). Thus in effect the sequence of the call order was as follows: assertEquals(1, BRHF.drop(b1)); b2.getPreviousBuild(); // happens indirectly in PeepholePermalink b2.delete(); FreeStyleBuild b1a = b2.getPreviousBuild(); This defeats the purpose of the fix, because in this call sequence, by b2.dropLinks() as implemented before f1430a2 will correctly fix up b1a.nextBuildR to b3. For the test to work as intended, it is important that b2.previousBuildR.get()==null during dropLinks(). That is what causes b2.getPreviousBuild() in the next line to go load b1a, and sets up b1a.nextBuildR to b2, and trigger the assertion error. Added a code to remove all RunListeners. With this change, the test now fails without the fix in f1430a2. (cherry picked from commit 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82)

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            core/src/main/java/jenkins/model/lazy/BuildReference.java
            core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
            http://jenkins-ci.org/commit/jenkins/ad65cecc7825693af7378d1d223966bac40ef37a
            Log:
            [FIXED JENKINS-22395] redoing the fix in f1430a2

            Based on the last few commits, I proved that the original fix in f1430a2
            doesn't really address the problem.

            That is, once b2 is deleted, and after sufficient garbage collection,
            we can make b2.previousBuild.get() be null, and then
            b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2.

            In this commit, I roll back that part of f1430a2, and then fix the
            problem differently.

            I started thinking that the main problem we are trying to fix here
            is that the deleted build object should be unreferenceable. That is,
            it should behave almost as if the object has already been GCed.
            The easiest way to do this is to clear a BuildReference object,
            since we always use the same BuildReference object for all inbound
            references.

            This change allows us to clear BuildReference. Code like
            b2.getPreviousBuild() will continue to try to update
            b1.nextBuildR to b2, but it will only end up wiping out the field,
            only to have b1.getNextBuild() recompute the correct value.

            This fix makes both test cases pass in LazyBuildMixInTest.

            (cherry picked from commit b6226ad2d1a332cb661ceb5c5f5b673771118e14)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/jenkins/model/lazy/BuildReference.java core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/ad65cecc7825693af7378d1d223966bac40ef37a Log: [FIXED JENKINS-22395] redoing the fix in f1430a2 Based on the last few commits, I proved that the original fix in f1430a2 doesn't really address the problem. That is, once b2 is deleted, and after sufficient garbage collection, we can make b2.previousBuild.get() be null, and then b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2. In this commit, I roll back that part of f1430a2, and then fix the problem differently. I started thinking that the main problem we are trying to fix here is that the deleted build object should be unreferenceable. That is, it should behave almost as if the object has already been GCed. The easiest way to do this is to clear a BuildReference object, since we always use the same BuildReference object for all inbound references. This change allows us to clear BuildReference. Code like b2.getPreviousBuild() will continue to try to update b1.nextBuildR to b2, but it will only end up wiping out the field, only to have b1.getNextBuild() recompute the correct value. This fix makes both test cases pass in LazyBuildMixInTest. (cherry picked from commit b6226ad2d1a332cb661ceb5c5f5b673771118e14)

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
            http://jenkins-ci.org/commit/jenkins/921001d5245c4f441023e8ca16351e7ca84dabb2
            Log:
            JENKINS-22395 more diagnostics

            Just in case the previous fix didn't address the root cause of ZD-11985 (and given the time it takes for changes like this to land in LTS), I'm adding additional diagnostics that let us track how previous/next builds are getting discovered

            (cherry picked from commit aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/921001d5245c4f441023e8ca16351e7ca84dabb2 Log: JENKINS-22395 more diagnostics Just in case the previous fix didn't address the root cause of ZD-11985 (and given the time it takes for changes like this to land in LTS), I'm adding additional diagnostics that let us track how previous/next builds are getting discovered (cherry picked from commit aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece)

            Code changed in jenkins
            User: Jesse Glick
            Path:
            test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
            http://jenkins-ci.org/commit/jenkins/63e806efb6c6ba38c5043e1916ff4f19d425ea0e
            Log:
            JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC.
            Confirmed that dropLinksAfterGC and dropLinksAfterGC2 both fail in the expected way (b1a.nextBuild == b2) after commenting out the call to createReference().clear() in dropLinks.
            (Also that they fail as expected in assertNotSame if the reference is not cleared at all.)

            (cherry picked from commit b7ec8578f938864af9c8dbfb73a54d67f972d694)

            Compare: https://github.com/jenkinsci/jenkins/compare/87df95af49f2^...63e806efb6c6

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/63e806efb6c6ba38c5043e1916ff4f19d425ea0e Log: JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC. Confirmed that dropLinksAfterGC and dropLinksAfterGC2 both fail in the expected way (b1a.nextBuild == b2) after commenting out the call to createReference().clear() in dropLinks. (Also that they fail as expected in assertNotSame if the reference is not cleared at all.) (cherry picked from commit b7ec8578f938864af9c8dbfb73a54d67f972d694) Compare: https://github.com/jenkinsci/jenkins/compare/87df95af49f2 ^...63e806efb6c6
            jglick Jesse Glick added a comment - Testing build of https://github.com/jenkinsci/jenkins/compare/1.565.1-rc1...JENKINS-22395-dropLinks-backport uploaded: http://repo.jenkins-ci.org/public/org/jenkins-ci/main/jenkins-war/1.565.1.JENKINS-22395-dropLinks/jenkins-war-1.565.1.JENKINS-22395-dropLinks.war

            Code changed in jenkins
            User: Daniel Beck
            Path:
            changelog.html
            http://jenkins-ci.org/commit/jenkins/fdc0b5c5650b3ee849f02e3c5d94d23c12886adc
            Log:
            Noting #1314, #1316, #1308, JENKINS-17667, JENKINS-22395, JENKINS-18065

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Daniel Beck Path: changelog.html http://jenkins-ci.org/commit/jenkins/fdc0b5c5650b3ee849f02e3c5d94d23c12886adc Log: Noting #1314, #1316, #1308, JENKINS-17667 , JENKINS-22395 , JENKINS-18065

            Code changed in jenkins
            User: Daniel Beck
            Path:
            changelog.html
            http://jenkins-ci.org/commit/jenkins/72f37e414533c02624f9deffc85a2b50f03905e6
            Log:
            Remove JENKINS-22395, JENKINS-18065 from 1.572 section

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Daniel Beck Path: changelog.html http://jenkins-ci.org/commit/jenkins/72f37e414533c02624f9deffc85a2b50f03905e6 Log: Remove JENKINS-22395 , JENKINS-18065 from 1.572 section

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
            http://jenkins-ci.org/commit/jenkins/2baa7caade6923b9a2023e64b60bb537fcc65077
            Log:
            JENKINS-22395

            I'm not sure if I understand how the original fix in PR #1190 (f1430a2) addresses the problem.

            My understanding is of the problem is as follows:

            • someone deletes 'b2' but holds on to its reference (could be different threads)
            • someone calls b2.getPreviousBuild()
            • if b2.previousBuildR is null, then this triggers the loading of b1, and
              that process sets up a bi-di link via b1<>b2 via b1.nextBuildR <> b2.previousBuildR
            • this makes b1.getNextBuild() incorrectly return b2

            Presumably f1430a2 addresses this somehow, but I think I can induce this situation in other ways,
            which is what dropLinksAfterGC2() does.

            In this test,

            initial state:

            b1 <> b2 <> b3

            we load everyone and connect them all up

            after deleting b2:

            b1 <--------> b3
            ^ ^
            ---- b2 ----

            b1 and b3 points each other, and b2 still refers to each side

            after dropping b1:
            b2 --> b3

            now, here, when I do b2.getPreviousBuild(), it loads b1a and it sets b1a.nextBuildR to b2.

            b1a <-> b2 --> b3

            So I claim this is a proof that the fix is incomplete, even for the problem JENKINS-22395 has discovered.

            I don't think that the problem is for the dropLinks call to fail to patch up references.
            The problem is that b2.getPreviousBuild() forcing b1 to point to b2, because if b2 is deleted and assumed to be
            invalid, then no matter what bX this method will find you never want bX.nextBuildR to point to b2.

            (cherry picked from commit 53e95ee517259e1e10d7209bcbaa4314397c1f40)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/2baa7caade6923b9a2023e64b60bb537fcc65077 Log: JENKINS-22395 I'm not sure if I understand how the original fix in PR #1190 (f1430a2) addresses the problem. My understanding is of the problem is as follows: someone deletes 'b2' but holds on to its reference (could be different threads) someone calls b2.getPreviousBuild() if b2.previousBuildR is null, then this triggers the loading of b1, and that process sets up a bi-di link via b1< >b2 via b1.nextBuildR < > b2.previousBuildR this makes b1.getNextBuild() incorrectly return b2 Presumably f1430a2 addresses this somehow, but I think I can induce this situation in other ways, which is what dropLinksAfterGC2() does. In this test, initial state: b1 < > b2 < > b3 we load everyone and connect them all up after deleting b2: b1 <--------> b3 ^ ^ ---- b2 ---- b1 and b3 points each other, and b2 still refers to each side after dropping b1: b2 --> b3 now, here, when I do b2.getPreviousBuild(), it loads b1a and it sets b1a.nextBuildR to b2. b1a <-> b2 --> b3 So I claim this is a proof that the fix is incomplete, even for the problem JENKINS-22395 has discovered. I don't think that the problem is for the dropLinks call to fail to patch up references. The problem is that b2.getPreviousBuild() forcing b1 to point to b2, because if b2 is deleted and assumed to be invalid, then no matter what bX this method will find you never want bX.nextBuildR to point to b2. (cherry picked from commit 53e95ee517259e1e10d7209bcbaa4314397c1f40)

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
            http://jenkins-ci.org/commit/jenkins/3f0388379b06b31ebec1bf17cc976001fcd80b7b
            Log:
            JENKINS-22395 correcting the control test

            As I step-executed the code, I discovered b2.getPreviousBuild() was getting invoked
            between BRHF.drop(b1) and b2.dropLinks() call, in PeepholePermalink.RunListenerImpl.onDeleted()
            because Run.delete() calls RunListener.fireDeleted(this).

            Thus in effect the sequence of the call order was as follows:
            assertEquals(1, BRHF.drop(b1));
            b2.getPreviousBuild(); // happens indirectly in PeepholePermalink
            b2.delete();
            FreeStyleBuild b1a = b2.getPreviousBuild();

            This defeats the purpose of the fix, because in this call sequence, by b2.dropLinks()
            as implemented before f1430a2 will correctly fix up b1a.nextBuildR to b3.

            For the test to work as intended, it is important that b2.previousBuildR.get()==null
            during dropLinks(). That is what causes b2.getPreviousBuild() in the next line to go
            load b1a, and sets up b1a.nextBuildR to b2, and trigger the assertion error.

            Added a code to remove all RunListeners. With this change, the test now fails without
            the fix in f1430a2.

            (cherry picked from commit 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/3f0388379b06b31ebec1bf17cc976001fcd80b7b Log: JENKINS-22395 correcting the control test As I step-executed the code, I discovered b2.getPreviousBuild() was getting invoked between BRHF.drop(b1) and b2.dropLinks() call, in PeepholePermalink.RunListenerImpl.onDeleted() because Run.delete() calls RunListener.fireDeleted(this). Thus in effect the sequence of the call order was as follows: assertEquals(1, BRHF.drop(b1)); b2.getPreviousBuild(); // happens indirectly in PeepholePermalink b2.delete(); FreeStyleBuild b1a = b2.getPreviousBuild(); This defeats the purpose of the fix, because in this call sequence, by b2.dropLinks() as implemented before f1430a2 will correctly fix up b1a.nextBuildR to b3. For the test to work as intended, it is important that b2.previousBuildR.get()==null during dropLinks(). That is what causes b2.getPreviousBuild() in the next line to go load b1a, and sets up b1a.nextBuildR to b2, and trigger the assertion error. Added a code to remove all RunListeners. With this change, the test now fails without the fix in f1430a2. (cherry picked from commit 7b1b50c85a6a4bdb75f1ae5b40885afa45f96b82)

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            core/src/main/java/jenkins/model/lazy/BuildReference.java
            core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
            http://jenkins-ci.org/commit/jenkins/4bfa16143705e219705ff667c6917a2d6a6a8939
            Log:
            [FIXED JENKINS-22395] redoing the fix in f1430a2

            Based on the last few commits, I proved that the original fix in f1430a2
            doesn't really address the problem.

            That is, once b2 is deleted, and after sufficient garbage collection,
            we can make b2.previousBuild.get() be null, and then
            b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2.

            In this commit, I roll back that part of f1430a2, and then fix the
            problem differently.

            I started thinking that the main problem we are trying to fix here
            is that the deleted build object should be unreferenceable. That is,
            it should behave almost as if the object has already been GCed.
            The easiest way to do this is to clear a BuildReference object,
            since we always use the same BuildReference object for all inbound
            references.

            This change allows us to clear BuildReference. Code like
            b2.getPreviousBuild() will continue to try to update
            b1.nextBuildR to b2, but it will only end up wiping out the field,
            only to have b1.getNextBuild() recompute the correct value.

            This fix makes both test cases pass in LazyBuildMixInTest.

            (cherry picked from commit b6226ad2d1a332cb661ceb5c5f5b673771118e14)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/jenkins/model/lazy/BuildReference.java core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/4bfa16143705e219705ff667c6917a2d6a6a8939 Log: [FIXED JENKINS-22395] redoing the fix in f1430a2 Based on the last few commits, I proved that the original fix in f1430a2 doesn't really address the problem. That is, once b2 is deleted, and after sufficient garbage collection, we can make b2.previousBuild.get() be null, and then b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2. In this commit, I roll back that part of f1430a2, and then fix the problem differently. I started thinking that the main problem we are trying to fix here is that the deleted build object should be unreferenceable. That is, it should behave almost as if the object has already been GCed. The easiest way to do this is to clear a BuildReference object, since we always use the same BuildReference object for all inbound references. This change allows us to clear BuildReference. Code like b2.getPreviousBuild() will continue to try to update b1.nextBuildR to b2, but it will only end up wiping out the field, only to have b1.getNextBuild() recompute the correct value. This fix makes both test cases pass in LazyBuildMixInTest. (cherry picked from commit b6226ad2d1a332cb661ceb5c5f5b673771118e14)

            Code changed in jenkins
            User: Kohsuke Kawaguchi
            Path:
            core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
            http://jenkins-ci.org/commit/jenkins/ca156f730b9aedac9072f7243e3f1528330ca68b
            Log:
            JENKINS-22395 more diagnostics

            Just in case the previous fix didn't address the root cause of ZD-11985 (and given the time it takes for changes like this to land in LTS), I'm adding additional diagnostics that let us track how previous/next builds are getting discovered

            (cherry picked from commit aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/ca156f730b9aedac9072f7243e3f1528330ca68b Log: JENKINS-22395 more diagnostics Just in case the previous fix didn't address the root cause of ZD-11985 (and given the time it takes for changes like this to land in LTS), I'm adding additional diagnostics that let us track how previous/next builds are getting discovered (cherry picked from commit aa8e0b4f0e312ba4ef9c647ccf2c093793a9bece)

            Code changed in jenkins
            User: Jesse Glick
            Path:
            test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java
            http://jenkins-ci.org/commit/jenkins/579753403c63886cf0feff177602d79254829780
            Log:
            JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC.
            Confirmed that dropLinksAfterGC and dropLinksAfterGC2 both fail in the expected way (b1a.nextBuild == b2) after commenting out the call to createReference().clear() in dropLinks.
            (Also that they fail as expected in assertNotSame if the reference is not cleared at all.)

            (cherry picked from commit b7ec8578f938864af9c8dbfb73a54d67f972d694)

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/test/java/jenkins/model/lazy/LazyBuildMixInTest.java http://jenkins-ci.org/commit/jenkins/579753403c63886cf0feff177602d79254829780 Log: JENKINS-22395 Taking advantage of BuildReference.clear (just introduced in b6226ad) to simplify test by not requiring a custom build reference holder just to simulate GC. Confirmed that dropLinksAfterGC and dropLinksAfterGC2 both fail in the expected way (b1a.nextBuild == b2) after commenting out the call to createReference().clear() in dropLinks. (Also that they fail as expected in assertNotSame if the reference is not cleared at all.) (cherry picked from commit b7ec8578f938864af9c8dbfb73a54d67f972d694)
            svanoort Sam Van Oort added a comment - - edited

            Reproducible again with a high enough build rate and pipeline (lots of concurrent or fast builds):

            Aug 08, 2017 5:53:08 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
            INFO: sample #8540 completed: SUCCESS
            Aug 08, 2017 5:53:08 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
            WARNING: failed to save sample #8541 or perform log rotation
            java.io.IOException: sample #8527: /Users/svanoort/Documents/jenkins-plugins-oss/random-job-builder-plugin/work/jobs/sample/builds/8527 looks to have already been deleted; siblings: [.8527, 4568, 4569, 4570, 4571, 8528, 8529, 8530, 8531, 8532, 8533, 8534, 8535, 8536, 8537, 8538, 8539, 8540, 8541, 8542, 8543, 8544, 8545, 8546, 8547, lastFailedBuild, lastStableBuild, lastSuccessfulBuild, lastUnstableBuild, lastUnsuccessfulBuild, legacyIds]
            at hudson.model.Run.delete(Run.java:1480)
            at hudson.tasks.LogRotator.perform(LogRotator.java:130)
            at hudson.model.Job.logRotate(Job.java:478)
            at org.jenkinsci.plugins.workflow.job.WorkflowRun.finish(WorkflowRun.java:597)
            at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$1200(WorkflowRun.java:120)
            at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:865)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1081)
            at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:400)
            at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
            at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
            at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)

            svanoort Sam Van Oort added a comment - - edited Reproducible again with a high enough build rate and pipeline (lots of concurrent or fast builds): Aug 08, 2017 5:53:08 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish INFO: sample #8540 completed: SUCCESS Aug 08, 2017 5:53:08 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish WARNING: failed to save sample #8541 or perform log rotation java.io.IOException: sample #8527: /Users/svanoort/Documents/jenkins-plugins-oss/random-job-builder-plugin/work/jobs/sample/builds/8527 looks to have already been deleted; siblings: [.8527, 4568, 4569, 4570, 4571, 8528, 8529, 8530, 8531, 8532, 8533, 8534, 8535, 8536, 8537, 8538, 8539, 8540, 8541, 8542, 8543, 8544, 8545, 8546, 8547, lastFailedBuild, lastStableBuild, lastSuccessfulBuild, lastUnstableBuild, lastUnsuccessfulBuild, legacyIds] at hudson.model.Run.delete(Run.java:1480) at hudson.tasks.LogRotator.perform(LogRotator.java:130) at hudson.model.Job.logRotate(Job.java:478) at org.jenkinsci.plugins.workflow.job.WorkflowRun.finish(WorkflowRun.java:597) at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$1200(WorkflowRun.java:120) at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:865) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1081) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:400) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
            jglick Jesse Glick added a comment -

            Original bug was indeed fixed, backported, etc. so what you are seeing svanoort is something else with similar symptoms. Better to file a fresh issue and link it to this one.

            jglick Jesse Glick added a comment - Original bug was indeed fixed, backported, etc. so what you are seeing svanoort is something else with similar symptoms. Better to file a fresh issue and link it to this one.

            People

              jglick Jesse Glick
              jglick Jesse Glick
              Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: