Details
-
Bug
-
Status: Resolved (View Workflow)
-
Major
-
Resolution: Fixed
Description
Sometimes "log rotation" fails with an exception like
SEVERE hudson.model.Run#execute: Failed to rotate log java.io.IOException: .../jobs/.../modules/...$.../builds/2013-... looks to have already been deleted at hudson.model.Run.delete(Run.java:1432) at hudson.maven.MavenModuleSetBuild.delete(MavenModuleSetBuild.java:420) at hudson.tasks.LogRotator.perform(LogRotator.java:136) at hudson.model.Job.logRotate(Job.java:437) at hudson.maven.MavenModuleSet.logRotate(MavenModuleSet.java:851) at hudson.model.Run.execute(Run.java:1728) at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:509) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:246)
Usually MavenModuleSetBuild is involved, perhaps suggesting a problem with deleted or skipped module builds (I cannot reproduce in either scenario), though I think I have also seen this happen for freestyle builds. Unclear whether the directory actually exists but File.isDirectory thinks it does not (the code originally tried to delete the dir without checking this and failed with "... is in use"); or whether the directory was actually deleted earlier but this Run was not cleaned up properly for some reason.
Hypothesis: AbstractLazyLoadRunMap.idOnDisk is not removed by removeValue, called from AbstractProject.removeRun (from Run.delete). Perhaps something is later resurrecting the AbstractBuild from idOnDisk, making it again available for another round of log rotation? But if the actual directory was deleted, it is hard to see how: load would just fail.
Or perhaps Run.delete is the race condition: the run is removed from the parent after its directory has been deleted. The method is synchronized, but that does not help if two copies of the Run exist, which might happen due to other lazy-loading bugs.
Diagnostics added to date:
- https://github.com/jenkinsci/jenkins/commit/446783fb882b41ac9b39d3dbd3c17c919acc6df2 (1.501)
- https://github.com/jenkinsci/jenkins/commit/f209a75d672e0bd318ba5e2458abcb208d2b6aa6 (1.526)
- https://github.com/jenkinsci/jenkins/commit/5d92b45f47e630b6d9790a7d91b877c4f0acc449 and https://github.com/jenkinsci/jenkins/commit/3eb3126e00f6e6ab42f59475c91d2b634b4dd8a7 (1.558)
Attachments
Issue Links
- depends on
-
JENKINS-25788 WARNING: hudson.model.FreeStyleProject@9a4e77f[...] did not contain ... #584 to begin with
-
- Resolved
-
- is related to
-
JENKINS-19377 Deleting an external run does not immediately remove it from build list
-
- Resolved
-
-
JENKINS-17553 Archived artifacts of Maven Modules are no longer cleaned up
-
- Resolved
-
-
JENKINS-17508 The 'Discard Old Builds' advanced option - removal of only artifacts - does not work for me after 1.503.
-
- Resolved
-
Code changed in jenkins
User: Kohsuke Kawaguchi
Path:
core/src/main/java/jenkins/model/lazy/BuildReference.java
core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
http://jenkins-ci.org/commit/jenkins/4bfa16143705e219705ff667c6917a2d6a6a8939
Log:
[FIXED JENKINS-22395] redoing the fix in f1430a2
Based on the last few commits, I proved that the original fix in f1430a2
doesn't really address the problem.
That is, once b2 is deleted, and after sufficient garbage collection,
we can make b2.previousBuild.get() be null, and then
b2.getPreviousBuild().getNextBuild() ends up incorrectly returning b2.
In this commit, I roll back that part of f1430a2, and then fix the
problem differently.
I started thinking that the main problem we are trying to fix here
is that the deleted build object should be unreferenceable. That is,
it should behave almost as if the object has already been GCed.
The easiest way to do this is to clear a BuildReference object,
since we always use the same BuildReference object for all inbound
references.
This change allows us to clear BuildReference. Code like
b2.getPreviousBuild() will continue to try to update
b1.nextBuildR to b2, but it will only end up wiping out the field,
only to have b1.getNextBuild() recompute the correct value.
This fix makes both test cases pass in LazyBuildMixInTest.
(cherry picked from commit b6226ad2d1a332cb661ceb5c5f5b673771118e14)