-
Bug
-
Resolution: Fixed
-
Major
-
Platform: PC, OS: Windows XP
-
Powered by SuggestiMate
Executing the "Reload Configuration from Disk" while builds are running causes
information about those builds to be lost.
POSTing a new configuration to /job/JOBNAME/config.xml also causes the exact same problem. Yet editing job configuration interactively using the web form does not show the problem.
I had two builds running when I reloaded configuration to pick up manual changes
to one project (not running a build) config.xml file. When the browser
refreshed, the builds were still shown as actively building in the Build
Executor Status frame but the status indicators were no longer blinking and the
project pages did not show any builds running for those projects.
It appears that part of Hudson thought that the builds had been canceled because
they were queued, blocked by themselves. The original builds appeared to
finsish normally and then the queued builds ran. There was no record of the
original build that had been running when the config reloaded.
Note: If Jenkins is completely restarted, by shutting down and starting up the service, the missing build re-appears. (Of course, doing that each time is unacceptable, just trying to help focus the debugging.)
- depends on
-
JENKINS-17337 Failed to load job on Matrix jobs
-
- Resolved
-
-
JENKINS-17341 Promoted builds throw NullPointerException after upgrade to 1.507
-
- Resolved
-
- is duplicated by
-
JENKINS-10219 Jenkins loses track on build queue when reloading configuration while a build runs
-
- Resolved
-
-
JENKINS-12318 A current active build in the build history is lost if the job configuration XML uploaded
-
- Resolved
-
-
JENKINS-7885 Reloading configuration leads to lost build status
-
- Resolved
-
-
JENKINS-12171 Running build disappears forever when changing job configuration
-
- Closed
-
- is related to
-
JENKINS-32984 Reload Configuration from disk disrupts build elements and pipeline
-
- Open
-
-
JENKINS-15340 Add 'Are you sure' on Reload configuration from disk
-
- Resolved
-
[JENKINS-3265] Reload Configuration from Disk (or POSTing config.xml) loses info on running builds
Hi
I too have this problem, and the way I manage my builds, this is getting very troublesome (I have automatic processes to create jobs for me for build branches that are created).
Any time Hudson reloads configuratons, not only are the info on running builds lost, you have to do another reload configuration once a running job is completed in order for hudson to recognize it.
Thanks!
Sam
We are also struggling with this problem: it makes it problematic to use the reload from disk option because we have a lot of long-running (>1hr) builds, and we don't want to lose visibility of currently executing jobs.
Same here.
In fact, in an attempt to get away of this problem we tried to perform bulk configuration changes on jobs through the CLI jar file by calling get-job, updating it's xml locally and then calling update-job to upload the new config... and the same behaviour occurs with running jobs :/ (more info about this at http://groups.google.com/group/jenkinsci-users/browse_frm/thread/9786fa572e9924b6)
Is there a status on this bug? It was filed over a year ago, and still repros. It's listed as "major"! Hello????
David - we also switched to using the cli to perform bulk changes. I currently use the following groovy method to make a job isn't running before doing the update:
public boolean isJobRunning(String jobName) { URL jobDetails = new URL(JENKINS_URL + "/job/" + jobName + "/api/xml?depth=1") SAXParserFactory factory = SAXParserFactory.newInstance() factory.setFeature( "http://apache.org/xml/features/nonvalidating/load-external-dtd", false) def details = new XmlSlurper(factory.newSAXParser()).parse(jobDetails.openStream()) String building = details.build.building[0] if(building.equals("true")) { return true } return false }
Not really an answer to the original bug, but might help you...
First of all thanks for sharing, pmv.
The truth is that we switched to the cli tool when we found that we had to stop jenkins to perform bulk configuration changes on xmls due to the problem with reloading the config. But if we have to wait for jobs not to be running for the update to work from the cli, we are losing the advantage we wanted to get from using cli in the first place: the possibility of doing "hot" bulk updates of jobs configurations in a fast way.
Nevertheless I will have it in mind should we switch back again to the cli approach.
I am currently evaluating Jenkins and ran into this problem as well.
I must repeat samweiss's concern that this major bug has been open for a long time, approaching two or three years now (hmmm what is "10/Mar/09" is that 2009 or 2010? use four-digit years please). Is this typical for Jenkins bugs?
I added to the description a couple of things I found: POSTing config.xml triggers the same problem; editing the configuration via web form does not trigger the problem; and completely restarting the Jenkins service makes the disappeared jobs re-appear.
My workaround at the moment is to have my application (a .net console app) create a WebBrowser object to load the interactive form, adjust its content, and emulate a "click" on the submit button. It is ugly, especially since I have to mess around with thread apartment model to get WebBrowser to work from a console app, but it might let me get past this problem right now. Hope that can help a few people.
Assigning to Kohsuke as I agree that this is a major issue which shouldn't stay open that long.
We're using the maven-jenkins-plugin (http://evgeny-goldin.com/wiki/Maven-jenkins-plugin) to have our build configurations under version control and to have the ability for jobs to inherit values from other jobs.
We too suffer from having to wait for all jobs to complete before we can re-load the configuration after the plugin runs. It's annoying. I'd love to see this bug fixed.
We were facing this problem when after Reload we wanted to trigger another jobs. I configured the Reload Job to be run as shell script like this and this seems to work:
curl http://localhost:80/reload
sleep 25
curl http://localhost:80/job/$Job_name/build?delay=0sec
I did some investigation into this, and here's what's going on:
- The Reload Configuration operation from the UI attempts to load up all the jobs and runs it can find.
- It loads runs from disk based on the presence of a build.xml in the directory for a particular build number (cf. RunMap.load()).
- Runs that are in progress do not have a build.xml written to the directory in the course of normal operations (problem #1).
Attempt 1: write a piece of code that, at the start of the reload operation (Jenkins.reload()), looks at all running Executors and forces them to marshal their state to disk (creating a build.xml). This would then allow Jenkins to notice those jobs running when it restarts.
Then I discovered that the unmarshal procedure for build.xml -> Run assumes that any build.xml it sees is for a run that is in State.COMPLETED. The State is not itself persisted as there's no getState() for xstream to call.
Attempt 2: Make the State of a run persist appropriately so that it can be recovered when Jenkins reloads the configuration.
This seems to work OK at least in limited tests, and I intend to put up a pull request to let people see the changes I'm proposing in the code. I do wonder, though, what the justification for doing it this way in the first place was; it seems likely that you would not in all cases want Jenkins to totally "trust" the state on disk when starting up, for example if there were a large time or configuration delta between the stop and start. I do, however, think that the specific case of mashing the "Reload Configuration" link should be able to assume that what was running before is still running. Things that could possibly go wrong now would mostly be in the arena of jobs that Jenkins now thinks are running but actually aren't anymore. In a reload case, you can probably expect that to happen minimally (if at all).
Another idea I had was based on the fact that the executors clearly are reporting back the fact that they are running a particular job, even if Jenkins doesn't believe that run actually exists due to this bug. It might be possible/better to, instead of persisting everything to disk on the Jenkins master side, have it query the slaves for running jobs when it comes back up and use the information it gets from them to reconstruct its own idea of what is currently running. I don't know how people feel about potential for abuse there, given that it would require the master to "trust" the slaves to tell it what they were working on when it restarted. A combination of the two approaches might be best (trust, but verify).
It is nice to see some work being done on this issue. Thank you Matt.
There seems to be a question of what the best approach would be, for solving the problem. Perhaps this is naive, but... When loading a new config.xml, how about doing whatever the interactive web form does? When I reconfigure a job via html form (POSTing to /job/X/configSubmit) the job is reconfigured properly and the running build list maintains existing builds just fine. In fact my workaround for this problem is to emulate the web form posting, and it has been doing great. So, could the solution here be to duplicate the functionality of /job/X/configSubmit into the code that handles /job/X/config.xml (and a similar thing for /reload if possible)? I'm not the expert though, just hoping that thought sparks some ideas.
Cheers.
POSTing config.xml, which looks like it goes through AbstractItem.doConfigDotXml(), has similar behavior to the reload case as near as I can tell, which is to just load whatever exists in the stored state on disk (in the case of POSTing config.xml, that's whatever you just POSTed) without writing any information into the build.xml for the build numbers which might currently be in progress. So that wouldn't be affected by my change at present. I'm still trying to figure out what the other code paths are doing that is different, but it might be that the answer is "nothing in particular", and in so doing are maintaining their in-memory view of the current Jenkins object graph which contains the running builds.
We don't use the POST of config.xml here but I could add the same logic in to write out the build.xml (and therefore the running build state) for running builds of that job when the POST occurs, which should accomplish the same end for that usage. I'm more interested to know what people think about the general idea about writing the build.xml state out to disk for non-terminated builds. It seems like an eminently reasonable idea to me, but changing that assumption will probably require other changes. Another possibility that I haven't explored is somehow preserving the portion of the Jenkins object graph that contains the running job information, but that would require some gymnastics around saving the RunMap objects off of the Jobs before dumping them and reloading them, but then we'd have to update all the references to point the Runs at the new Job objects and so on. That feels at least as dangerous to me as persisting the state of non-terminated builds.
updateByXml actually looks kind of bonkers to me anyway. It doesn't call AbstractItem.save() first, thereby ensuring other aspects of the item we might want to save prior to reloading the new config.xml will get lost. I'll have to come up with a more serious patch for the POST config.xml fix than I thought.
Any application can halt without warning, due to power out, hardware defect, OS bugs, malicious software, you-name-it, even well-intentioned code in the application itself. Therefore important state information must be persisted upon every change. Anything else risks losing that data. So by this logic, I agree that saving the state of running builds is "eminently reasonable" as you say. You have my vote .
I would also like to see this issue resolved by having the POST of config.xml be processed the same as when editing the job in the config page. I have many cases where the xml is accepted without an issue, but the job fails for a weird reason, then open the config page, and save it, then the job runs just fine. I also have cases where settings are lost by just opening the config and saving it.
At the very least, do not let the administrator reload the configuration from disk without a warning saying that in-flight jobs will be lost.
We are running 1.479 and I still see this bug. I'd like to suggest that it should be reopened.
@kscaldef best to have a fully reproducible test case, ideally based on a trunk version.
Jesse, I'm a little unclear what constitutes a "fully reproducible test case" in this situation. The description of this bug seems pretty well detailed and matches exactly what I see on our installation. To summarize: while builds are running, click the "Reload configuration from disk" link, the existing builds continue to show as in progress on the front page, but are gone from the project page, and the links to the builds from the front page are broken (404). If I log into the machine, all the builds appear to be present, but those that were running when I reloaded the config aren't available from the UI.
The bug still manifests as described in the initial report, and in my previous message.
Code changed in jenkins
User: Kohsuke Kawaguchi
Path:
changelog.html
core/src/main/java/hudson/model/AbstractProject.java
core/src/main/java/jenkins/model/Jenkins.java
http://jenkins-ci.org/commit/jenkins/25084f528208a5ab8e0f1ebd9ae80527bf0099d1
Log:
[FIXED JENKINS-3265]
Made the in-flight build survive the reload from the disk.
–
You received this message because you are subscribed to the Google Groups "Jenkins Commits" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-commits+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Integrated in jenkins_main_trunk #2372
[FIXED JENKINS-3265] (Revision 25084f528208a5ab8e0f1ebd9ae80527bf0099d1)
Result = SUCCESS
kohsuke : 25084f528208a5ab8e0f1ebd9ae80527bf0099d1
Files :
- core/src/main/java/jenkins/model/Jenkins.java
- changelog.html
- core/src/main/java/hudson/model/AbstractProject.java
This change broke loading (not reloading!) of matrix projects:
java.lang.NullPointerException at hudson.matrix.MatrixProject.getItem(MatrixProject.java:669) at hudson.matrix.MatrixProject.getItem(MatrixProject.java:662) at hudson.matrix.MatrixProject.getItem(MatrixProject.java:98) at hudson.model.AbstractProject.onLoad(AbstractProject.java:291) at hudson.model.Project.onLoad(Project.java:83) at hudson.matrix.MatrixConfiguration.onLoad(MatrixConfiguration.java:89) at hudson.matrix.MatrixProject.loadConfigurations(MatrixProject.java:541) at hudson.matrix.MatrixProject.rebuildConfigurations(MatrixProject.java:585) at hudson.matrix.MatrixProject.onLoad(MatrixProject.java:476) at hudson.model.Items.load(Items.java:221) at jenkins.model.Jenkins$17.run(Jenkins.java:2552)
Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/matrix/MatrixProject.java
http://jenkins-ci.org/commit/jenkins/3370e6edf5e0e7c7a8cf22e4a7a5decc24c45205
Log:
[FIXED JENKINS-3265] Hotfix for regression in matrix project loading from 25084f5.
Integrated in jenkins_main_trunk #2389
[FIXED JENKINS-3265] Hotfix for regression in matrix project loading from 25084f5. (Revision 3370e6edf5e0e7c7a8cf22e4a7a5decc24c45205)
Result = SUCCESS
Jesse Glick : 3370e6edf5e0e7c7a8cf22e4a7a5decc24c45205
Files :
- core/src/main/java/hudson/matrix/MatrixProject.java
Is this hotfix included in 1.507? I get that NullPointerException with 1.507 so either the fix is not there or it doesn't work.
@vige I am not sure what build gets the hotfix but I think not 1.507. This got filed as JENKINS-17337.
Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/hudson/matrix/MatrixProject.java
http://jenkins-ci.org/commit/jenkins/6b1f8d1aaa2179c99af33a8d60851abd032442ac
Log:
[FIXED JENKINS-3265] Hotfix for regression in matrix project loading from 25084f5.(cherry picked from commit 3370e6edf5e0e7c7a8cf22e4a7a5decc24c45205)
Integrated in jenkins_main_trunk #2411
[FIXED JENKINS-3265] Hotfix for regression in matrix project loading from 25084f5.(cherry picked from commit 3370e6edf5e0e7c7a8cf22e4a7a5decc24c45205) (Revision 6b1f8d1aaa2179c99af33a8d60851abd032442ac)
Result = SUCCESS
kohsuke : 6b1f8d1aaa2179c99af33a8d60851abd032442ac
Files :
- core/src/main/java/hudson/matrix/MatrixProject.java
Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/hudson/plugins/promoted_builds/JobPropertyImpl.java
src/test/java/hudson/plugins/promoted_builds/ConfigurationRoundtripTest.java
src/test/resources/hudson/plugins/promoted_builds/ConfigurationRoundtripTest/testLoad/config.xml
src/test/resources/hudson/plugins/promoted_builds/ConfigurationRoundtripTest/testLoad/jobs/j/builds/2012-10-08_10-29-01/build.xml
src/test/resources/hudson/plugins/promoted_builds/ConfigurationRoundtripTest/testLoad/jobs/j/config.xml
src/test/resources/hudson/plugins/promoted_builds/ConfigurationRoundtripTest/testLoad/jobs/j/promotions/OK/builds/2012-10-08_10-30-11/build.xml
src/test/resources/hudson/plugins/promoted_builds/ConfigurationRoundtripTest/testLoad/jobs/j/promotions/OK/config.xml
http://jenkins-ci.org/commit/promoted-builds-plugin/145a38922ef87bb0bea5edb372d7e11e07bdb6eb
Log:
[FIXED JENKINS-17341] NPE after JENKINS-3265 fix.
Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/hudson/plugins/promoted_builds/JobPropertyImpl.java
src/test/java/hudson/plugins/promoted_builds/ConfigurationRoundtripTest.java
src/test/resources/hudson/plugins/promoted_builds/ConfigurationRoundtripTest/testLoad/config.xml
src/test/resources/hudson/plugins/promoted_builds/ConfigurationRoundtripTest/testLoad/jobs/j/builds/2012-10-08_10-29-01/build.xml
src/test/resources/hudson/plugins/promoted_builds/ConfigurationRoundtripTest/testLoad/jobs/j/config.xml
src/test/resources/hudson/plugins/promoted_builds/ConfigurationRoundtripTest/testLoad/jobs/j/promotions/OK/builds/2012-10-08_10-30-11/build.xml
src/test/resources/hudson/plugins/promoted_builds/ConfigurationRoundtripTest/testLoad/jobs/j/promotions/OK/config.xml
http://jenkins-ci.org/commit/promoted-builds-plugin/8147deb58e583ba696cd7feb451b1afabcb15a3d
Log:
Merge pull request #24 from jglick/NPE-JENKINS-17341
[FIXED JENKINS-17341] NPE after JENKINS-3265 fix.
Compare: https://github.com/jenkinsci/promoted-builds-plugin/compare/8f07c80a678d...8147deb58e58
We are still seeing this issue on 1.518 but only occasionally. Is anyone else seeing it?
Unfortunately we have not identified any particular characteristics of how or when this happens or if there is anything of note about the jobs it is happening with. If anyone has suggestions for things to look for to help diagnose the problem that would be most helpful.
Integrated in jenkins_main_trunk #2710
JENKINS-3265 JENKINS-17341 Even if older plugins throw an NPE during reload, continue loading job. (Revision ef08900292726bb35be269ce6d41d8dc141c6dea)
Result = SUCCESS
Jesse Glick : ef08900292726bb35be269ce6d41d8dc141c6dea
Files :
- core/src/main/java/hudson/model/AbstractProject.java
Hi all,
Exactly the same problem : reload the config from disk cause current running builds to disappear after reload but the build directory is present on the disk.
Environment: Jenkins 1.593 on RHEL 6.0
Please keep issues as old as this resolved/closed.
So please file a new issue. Make sure it also happens on the current weekly release, as there've been possibly relevant changes since 1.593. Provide all the information asked for on https://wiki.jenkins-ci.org/display/JENKINS/How+to+report+an+issue and try to provide a minimal test case with clear instructions how to reproduce the problem.
which path it refers to load configuration from local if we have 3 jenkins running on the same server
I have the same problem.
I checked it with Hudson #3.369 and #3.383
How reproduce the problem:
1) Define the free-style project.
2) Add command sleep 300
3) Run this job.
4) When this job running execute "Reload Configuration from Disk"
5) Click on progress bar of running job ( console output )
6) Get page 404 error
7) In the history of this job I can't see the current running execution.