-
Bug
-
Resolution: Fixed
-
Major
-
Jenkins 1.597
-
Powered by SuggestiMate
Jan 23, 2015 11:43:52 PM hudson.model.Executor run
SEVERE: Unexpected executor death
java.lang.IllegalStateException: /MY_DIR/jenkins/home/jobs/MY_JOB/builds/29 already existed; will not overwite with MY_JOB #29
at hudson.model.RunMap.put(RunMap.java:187)
at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178)
at hudson.model.AbstractProject.newBuild(AbstractProject.java:1001)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1200)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
at hudson.model.Executor.run(Executor.java:213)
I tried to manually start a job via the run button. I see the Dead indicator in the executor status sidebar. I was able to restart the thread and then manually start the job again. This time it started with #30 and appears to be running OK.
Looking at JENKINS_HOME dir in linux, I see that there is a build directory for #29 and the job ran and failed (unrelated). Looking at the Jenkins webpage for the job, the build skips from 28 to (the now running) 30. #29 has no listing.
I know some items related to the major enhancement are not closed yet, like JENKINS-23152, so please close this if the behavior I'm seeing is expected until the next release. If this is unexpected - hopefully this report is informative - this issue happened 4x in a row for each job I had put in the build queue.
- screenshot-3.png
- 32 kB
- philbeiler
- screenshot-2.png
- 15 kB
- philbeiler
- screenshot-1.png
- 16 kB
- philbeiler
- config.xml
- 5 kB
- Mark Sinclair
- is blocking
-
JENKINS-24380 Use build numbers as IDs
-
- Resolved
-
- is duplicated by
-
JENKINS-27853 rebuilds failing with java.lang.IllegalStateException intermittently
-
- Resolved
-
-
JENKINS-29268 Unexpected executor death
-
- Resolved
-
- is related to
-
JENKINS-27530 ISE from RunMap.put after reloading configuration from disk
-
- Resolved
-
-
JENKINS-23152 builds getting lost due to GerritTrigger
-
- Resolved
-
-
JENKINS-28384 NPE from LoadStatistics$LoadStatisticsSnapshot$Builder.with
-
- Resolved
-
-
JENKINS-26739 ISE from AbstractLazyLoadRunMap.proposeNewNumber for concurrent matrix builds
-
- Closed
-
- links to
[JENKINS-26582] ISE from RunMap.put using /git/notifyCommit on a matrix project
Gerrit Trigger Plugin is not installed.
Yes, there is a build.xml for build29.
Does Jenkins show a build 29 that corresponds with the builds/29/build.xml file? Or does it stop at build 28?
On the Jenkins project web page, I see all builds through 28, 29 is missing, and then I see build 30.
In Linux JENKINS_HOME/jobs/MY_JOB/builds/ shows all builds, including 29.
What kind of projects (e.g. Freestyle, Maven, Matrix) are affected?
When you restart Jenkins, does build 29 appear afterwards?
After soft restart of Jenkins, the missing build re-appears (#29).
All projects are freestyle. I attached my config.xml as well if that will help (replaced some fields with XXX).
The only suspicious thing I see in the job config is the use of the Heavy Job plugin. Is this problem at all reproducible for you? If so, does skipping use of Heavy Job fix it?
The critical diagnostic which was not mentioned here was whether nextBuildNumber existed and if so what it said. It is supposed to point to the next build number which should be created. If for some reason it failed to be updated, Jenkins would try to recreate a build with the same number. In the past this mistake would have resulted in an earlier build being silently overwritten. As of 1.597 it is caught.
The same applies to bugs like JENKINS-23152, with a different cause (objects held in memory with stale contents).
I had posted a comment a couple of weeks ago, but it's not showing up here.
In any case I was just able to reproduce the problem today. It's been about 3 weeks since I last saw the problem. I got exactly the same thing to happen today.
Interesting to note, I was doing some configuration updates behind the scenes by editing config.xml for many jobs. Then to load the config I called out JenkinsURL/reload from my browser.
I wonder if it's picking up an old version of nextBuildNumber when the reload occurs? Some of the jobs were running when I reloaded. I wonder if nextBuildNumber only gets updated when the job completes?
No, nextBuildNumber is incremented the moment a new build is created, before it even really starts (right after it leaves the queue).
Reloading from disk is a plausible explanation; perhaps there are two copies of the Job sitting around temporarily, each with its own version of the field.
There were many jobs in the queue and some running when the reload happened. Maybe it is safer quietDown, clear the queue, then reload?
Deals with freestyle projects so probably distinct from JENKINS-26739.
I am guessing the reload operation caused the problem. Do you happen to know how to reproduce?
I don't have a specific way to reproduce. Here are some conditions that were true both times the failure occured:
-Multiple slaves all busy and about 50 jobs in the build queue. Some jobs have been sitting in the build queue for 12+ hours.
-Implement configuration changes across many jobs by editing the config.xml files directly in linux/emacs. config.xml is the only file touched.
-reload configuration via <jenkins-url>/reload from my browser. Jenkins asks me to 'POST' the command, which I hit the "tryPOSTing" button and then it reloads.
After the reload, when an executor becomes free, it picks up a job from the queue and immediately fails with the dead indicator in the executor status sidebar. When the thread is restarted, it picks up another job from the queue and dies again. The process repeats until the queue is empty.
It is unclear if jobs that were not in the queue at the time of reload would die. I'm not sure if after the reload all jobs need to die one time or just the jobs that were in the queue at the time of the reload.
It is true that the problem is self correcting, after a job dies one time, the build number gets corrected and it will run properly the next time.
This doesn't happen every time I reload config.
Noticed this recently after an upgrade, and might be related, running 1.602
Reported here rather than JENKINS-26739 since it's already closed and this bug relates more to RunMap.put rather than lazyput. The matrix build below has one configuration runs on one node with 1 executor so concurrency here shouldn't be an issue.
I get:
Mar 11, 2015 4:07:24 PM hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
INFO: Scheduling MY_JOB to build commit MY_ID
Mar 11, 2015 4:07:24 PM hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
INFO: Scheduling MY_JOB » MY_HOST to build commit MY_ID
Mar 11, 2015 4:07:24 PM hudson.model.Executor run
SEVERE: Unexpected executor death
java.lang.IllegalStateException: /MY_DIR/jobs/MY_JOB/builds/20 already existed; will not overwite with MY_JOB
/label=MY_JOB #20
at hudson.model.RunMap.put(RunMap.java:187)
at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
at hudson.model.Executor.run(Executor.java:213)
I get this almost every triggered build. Manually restarting the thread sees the build succeed.
Interestingly, on the filesystem I see build 21 already exists when it fails:
MY_DIR/jobs/MY_JOB/builds$ ls -lR 20 21 ../nextBuildNumber
rw-rr- 1 jenkins jenkins 3 Mar 11 16:07 ../nextBuildNumber
20:
total 16
rw-rr- 1 jenkins jenkins 6488 Mar 11 14:11 build.xml
rw-rr- 1 jenkins jenkins 478 Mar 11 14:04 changelog.xml
rw-rr- 1 jenkins jenkins 2424 Mar 11 14:11 log
21:
total 8
rw-rr- 1 jenkins jenkins 1723 Mar 11 16:07 changelog.xml
rw-rr- 1 jenkins jenkins 2110 Mar 11 16:08 log
MY_DIR/jobs/MY_JOB/builds$ cat ../nextBuildNumber
22
Job 20 shows up in the gui as completed, job 21 still running (executor crashed)
So it looks like the wrong build number is being picked up (note the times of build 21 and the log message 4:07)
Hi, we have the following stacktrace after updating a 1.596 version to 1.605 (and 1.606) :
Mar 25, 2015 12:57:18 PM SEVERE hudson.model.Executor run
Unexpected executor death
java.lang.IllegalStateException: /var/lib/jenkins/jobs/service-mysql-migrations_master/configurations/axis-BASE_TAG/prod/builds/218 already existed; will not overwite with service-mysql-migrations_master/BASE_TAG=prod #218
at hudson.model.RunMap.put(RunMap.java:187)
at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
Update to this issue. It's definitely seems related to multiple new jobs starting at the same time.
Once the executor dies, jobs queue up. If I then restart the thread with no new jobs starting the pending jobs queue will clear itself over time. If another job comes along at the right time, the thread death may occur.
I have been able to consistently reproduce the thread death with "Stash Webhook to Jenkins" (v2.6) triggering a Multi-Configuration job. Our installation is relatively new, and this is the first time we setup anything to do triggering of jobs.
Log of session
Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
Scheduling BUILD: Code Deploy to build commit 7410c63de6e0431ee7c831bfdc1f2ba909b00385
Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
Scheduling BUILD: Code Deploy » py27 to build commit 7410c63de6e0431ee7c831bfdc1f2ba909b00385
Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
PostCommitHooks are disabled on DEPLOY: code_deploy
Mar 27, 2015 2:39:52 PM SEVERE hudson.model.Executor run
Unexpected executor death
java.lang.IllegalStateException: /var/lib/jenkins/jobs/build_code_deploy/configurations/axis-TOXENV/py27/builds/8 already existed; will not overwite with build_code_deploy/TOXENV=py27 #8
at hudson.model.RunMap.put(RunMap.java:187)
at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
at hudson.model.Executor.run(Executor.java:213)
Mar 27, 2015 2:40:20 PM INFO hudson.model.Run execute
build_code_deploy/TOXENV=py27 #9 main build action completed: SUCCESS
Mar 27, 2015 2:40:21 PM INFO hudson.model.Run execute
build_code_deploy #9 main build action completed: SUCCESS
Mar 27, 2015 2:40:44 PM INFO hudson.model.Run execute
deploy_code_deploy #11 main build action completed: SUCCESS
Of interest Build #8 is the previous successful run, build #9 is the current run and succeeds in a different thread.
Versions
component | version |
---|---|
Jenkins | 1.606 |
Stash | 3.1.0 |
Stash Webhook to Jenkins | 2.60 |
ShiningPanda plugin | 0.21 |
Job Configuration
Job Name: build_code_deploy
SCM: git (ssh)
Branches: */master
Poll SCM: H 0 1 1 *
Configuration Matrix: Tox; py27
Build: Tox Builder, $configuration_file=tox.ini, recreate (checked)
Post-build: trigger job= deploy_code_deploy, Publish JUnit test result, Notify Stash Instance
Stash Plugin:
Configured to only build from Master
Hmm, do not have Stash available, so I wonder if there is a way to reproduce without using proprietary tools. I would be tempted to ascribe this to yet another weird bug in the matrix-project plugin except that there are some reports from people using freestyle as well.
Since the next LTS baseline is looming and there is no progress in finding a test case for this, I filed a PR to at least make the error nonfatal.
The Stash Webhook plugin makes a GET request to the jenkins server. So one should just need Jenkins, the Git plugins (scm-api, git client plugin, git plugin), and something to serve git.
GET /git/notifyCommit?url=${URL_ENCODED_GIT_URL}&branches=master&sha1=${COMMIT}
We could reliably reproduce this issue by triggering a matrix configuration job via git plugin
Just curious if this is being actively worked or if there is any kind of timeline for resolution? This has completely killed my environment, as all of our jobs are Matrix builds (with automated feature branch/job creation and web hook pushes from GIT to kick off the jobs). Within an hour, all nodes on our farm are full of dead threads – Jenkins is basically down and unusable... serious bummer!
Are there any work-arounds? Or do I have to change all of my jobs to polling and give up on the hooks? Thanks for any information.
This is not being actively worked on because there is no known way to reproduce from scratch.
I did file a PR to downgrade the error to a warning, closer to the pre-1.597 state (which allowed builds to be overwritten silently), but it was rejected. No problem to reopen that discussion, though of course providing developers with a way to reproduce the problem would be far better.
Would be great if we knew how to reproduce this. Not something vague, but full steps to reproduce on a newly set up instance. If you're experiencing this issue, consider spinning up a second Jenkins instance (could be your desktop machine) to try to make this reproducible based on what you know about your production instance. Only once we know the circumstances for this to happen, we can investigate them and fix the underlying problem.
Fortunately, I was able to easily duplicate the problem in about 5 minutes, first crack! Hopefully you can too!
Ubuntu 15.04 - Does not matter, not what I'm running in prod
Jenkins 1.612 - Does not matter, not what I'm running in prod – seems like this problem started around 1.597??? Not really sure when.
Install Git Plugin
Two slaves s1 and s2 both on localhost - each with 2 executors s1 label S1, s2 label with S2 (I run no executors on master in prod, just an FYI)
Create a Matrix job
GIT REPO https://github.com/allegro/axion-release-plugin.git (use this repo, as I'm using a hash for this repository - no auth required )
Branches to build: origin/master
Additional Behavoir -> Local Branch Name: master
Schedule: H 0 1 1 0
Configuration Matrix Label Expression label_exp S1 S2
Build Execute Shell Command echo "--------------->"$label_exp
Per the gentleman's comment above – simply post this url in another browser tab and watch your threads die... Keep submitting, and they all die...
SEVERE: Unexpected executor death
java.lang.IllegalStateException: /usr/share/tomcat8/.jenkins/jobs/test/configurations/axis-label_exp/S1/builds/3 already existed; will not overwite with test/label_exp=S1 #3
at hudson.model.RunMap.put(RunMap.java:189)
at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1210)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
at hudson.model.Executor$1.call(Executor.java:328)
at hudson.model.Executor$1.call(Executor.java:310)
at hudson.model.Queue._withLock(Queue.java:1251)
at hudson.model.Queue.withLock(Queue.java:1189)
at hudson.model.Executor.run(Executor.java:310)
Let me know if this does not work, or you want some config files. I hope it is easily reproducible..and quickly resolved – this is killing me! Thanks so much for your help...
This is even worse.. I lose my node after a while – I saw this in production and but I did not think it was related...
Now I have these messages in the log...
May 08, 2015 7:15:45 AM hudson.triggers.SafeTimerTask run
SEVERE: Timer task hudson.slaves.NodeProvisioner$NodeProvisionerInvoker@7c2aef8c failed
java.lang.NullPointerException
at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:624)
at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:619)
at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:352)
at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:332)
at hudson.slaves.NodeProvisioner$2.run(NodeProvisioner.java:261)
at hudson.model.Queue._withLock(Queue.java:1212)
at hudson.model.Queue.withLock(Queue.java:1148)
at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:208)
at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:57)
at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:778)
at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
May 08, 2015 7:15:51 AM hudson.triggers.SafeTimerTask run
SEVERE: Timer task hudson.model.LoadStatistics$LoadStatisticsUpdater@40623d93 failed
java.lang.NullPointerException
at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:624)
at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:619)
at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:352)
at hudson.model.LoadStatistics$LoadStatisticsUpdater.doRun(LoadStatistics.java:394)
at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I restart tomcat and my slave returns.... that is not very cool! But at least it came back!!
Thanks again...
Was anyone able to replicate this problem? Just want to help out where I can, and am hoping for a quick resolution. Thanks.
I get this ISE the second and subsequent time I trigger the build. The first time I get something different (with matrix-project 1.4.1):
java.io.IOException: cannot start a build of JENKINS-26582/label_exp=S1 since its parent has no builds at all at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:276) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1210) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144) at hudson.model.Executor$1.call(Executor.java:328) at …
Will investigate both.
All the builds actually seem to run.
I filed the NPE as JENKINS-28384. It was probably triggered by the executor death here, but is an independent bug.
Here is what I have so far. Despite appearances, this is not a core bug. A change made as part of the implementation of JENKINS-24380 merely made an erroneous condition be displayed as such.
The critical point (and the reason why existing automated tests did not catch this) is that the bug is not manifested when you trigger a matrix build normally—only when you install the Git plugin and use /git/notifyCommit.
The main bug is in matrix-project: MatrixConfiguration.newBuild assumes without checking that the newly created MatrixRun in fact has a unique build number. If this was created as part of a MatrixBuild, then that will be true, since the MatrixProject uses the default (sane) implementation of newBuild. And that is exactly what happens—normally.
But it turns out that there is nothing blocking a MatrixRun from being created other ways. For example, if you ping <job>/label_exp=S1/build you will get this error, because the number of the last parent MatrixBuild is unchanged, and there was already a MatrixRun with that number. Of course you would not normally go to this URL—it is not exposed in the UI—but there is nothing stopping you from doing that, or I guess from using the CLI to do the same.
I presume the initial …since its parent has no builds at all exception has the same cause: an attempt to directly start a MatrixRun without a parent MatrixBuild.
Now the reason why this appears with Git notifications is that GitStatus.JenkinsAbstractProjectListener.onNotifyCommit is wrong, too. If you look at SubversionRepositoryStatus or MercurialStatus, you will see that they check whether there is an SCMTrigger for the project which does not ignorePostCommitHooks. If there is not, they print a message (No subversion jobs using SCM polling or all jobs using SCM polling are ignoring post-commit hooks, No SCMTrigger on …), and then do nothing. And if sha1 is unspecified, GitStatus does the same. But if you specify sha1, it immediately schedules a build (with RevisionParameterAction), rather than using polling, even if there is no SCMTrigger.
For a top-level project, that is probably fine. But MatrixConfiguration.getScm delegates to its parent, so each configuration shows up in the list of projects matching the specified repo. Thus means that the Git plugin schedules not only the top-level MatrixProject into the queue, but each MatrixConfiguration! And when those children are run directly, the error appears because they are trying to create builds which duplicate an existing number: either that of the last matrix build (if the configurations get scheduled before the parent), or that of the current matrix build (if after).
msinclair sorry to hijack this issue but everyone else seems to be seeing a problem with matrix projects and Git notifications, whereas your case was actually something else, apparently rarer and probably needing some unrelated fix. If you still see it, file it separately (blocking JENKINS-24380).
Offering PRs for both plugins. Either fix will avoid the usual symptom, but it is best to have both.
Issue can be masked because not all plugins using onNotifyCommit in github-plugin migration is in progress and other plugins just copy-pasted from github-plugin algorithm/code.
Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/hudson/plugins/git/GitStatus.java
http://jenkins-ci.org/commit/git-plugin/de3117def8625c57a95126a200e990ab0481948e
Log:
[FIXED JENKINS-26582] To trigger a build from notifyCommit, the project must have an SCMTrigger.
This is true even if it has a matching SCM and sha1 is specified.
Otherwise we would be triggering MatrixConfiguration, which is illegal and cause errors.
Code changed in jenkins
User: Mark Waite
Path:
src/main/java/hudson/plugins/git/GitStatus.java
src/test/java/hudson/plugins/git/GitSCMTest.java
http://jenkins-ci.org/commit/git-plugin/b88b388aee1085e5d161c578c3f551953b27abf4
Log:
Merge pull request #319 from jglick/SCMTrigger-JENKINS-26582
JENKINS-26582 notifyCommit should ignore projects without SCMTrigger
Passed tests on multiple platforms.
Compare: https://github.com/jenkinsci/git-plugin/compare/6c1c49feefb3...b88b388aee10
I was having a hard time following the comments, if these changes would actually fix the dead thread problem. Now that this fix was integrated into the head (according to emails that I received from the Jenkins server, I decided to retry the test, but I get the same results..
philbeiler The fixes are in the Git Plugin and have not been released yet.
I had originally reported this issue. For my application (freestyle jobs), I see the dead thread problem after a config reload. This resolution won't solve the problem for those that are seeing the problem after reloading configuration (I don't have the Git Plugin).
msinclair Well, this is a mess then
Phil's explanation was so good and easily reproducible that Jesse went ahead and fixed it using this issue as reference. Notably, most commenters also used Matrix Projects and provided information and stack traces related to that, so it's understandable that the difference in the original report was missed.
The cleanest way to move forward would be to file your original issue a second time. Mention that it's not JENKINS-26582, as its resolution helps with Matrix projects and Git plugin, but your problem is with Freestyle projects, to protect it from being hijacked again. I understand this must be frustrating but using this issue going forward, when it was already used in the fixes to Git Plugin, would be too confusing IMO.
It would also be great if you could make your issue reliably reproducible in some way.
msinclair perhaps you missed my comment of May 13th where I said essentially the same thing, but perhaps not explained as well as Daniel.
Please note that there are fixes for this issue in both the Git and Matrix Project plugins. I believe either suffices to avoid the symptom. Not sure about release status, check plugin changelogs.
Thanks - hadn't noticed that. I will get the latest and give it a try when it's released.
I apologize for for hijacking this issue.. That was not my attempt, as the symptoms/issue produced the same outcome – it just seemed like the right place to jump in!
I hope I can ask one last question on this, Jesse stated that both the Git and Matrix plugin have been fixed. The matrix plugin has not been released in almost a year, and the Git plugin was last released in February. Is there any plan for releasing them (I'm sure they are not your responsibility), but this but is killing me (or my threads!) Is there some way that I can monitor these fixes (related ticket numbers, etc), as they go thru the automated process, similar to the core Jenkins code? I would just like to know how long I have to deal with this, especially, since you were so kind to fix it weeks ago.
Thanks again. Phil
philbeiler the git plugin and git client plugin are being tested in hopes of releasing new versions before the end of June. If you're willing to assist with the testing, please download and install a pre-release build of the git client plugin and the git plugin. Problems detected in the pre-release should be e-mailed to MarkEWaite and ndeloof.
I wrote some test ideas if you would like suggestions of areas that need testing. The git plugin supports many different use cases and its automated tests only evaluate a very few of those use cases.
I ran through the steps described in this bug report with a matrix job running across slaves on multiple versions of Windows and multiple versions of Linux (CentOS, Debian, and Ubuntu). As far as I can tell, the bug is fixed by the changes made by jglick.
I've got 1.609.3 LTS installed with git plugin 2.4.0 and matrix project 1.6 and I'm getting lots of
java.lang.IllegalStateException: /var/lib/jenkins/jobs/java8-1-build-pro-java7-master/builds/1388 already existed; will not overwite with java8-1-build-pro-java7-master #1388
at hudson.model.RunMap.put(RunMap.java:187)
at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178)
at hudson.model.AbstractProject.newBuild(AbstractProject.java:1010)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1209)
at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
at hudson.model.Executor$1.call(Executor.java:335)
at hudson.model.Executor$1.call(Executor.java:317)
at hudson.model.Queue._withLock(Queue.java:1348)
at hudson.model.Queue.withLock(Queue.java:1213)
at hudson.model.Executor.run(Executor.java:317)
I recently started using github push triggers (manually configured). this ticket tldr. Did i miss something (sorry)? I'm limping along with this. Thanks.
prosegay if you have applied the relevant plugin updates then you are hitting a different bug with the same symptom but a distinct cause, which would be tracked separately. Without knowing how to reproduce we are unlikely to be able to help.
ya think? seems unlikely as the stack trace looked the same to me. what do you need?
Jenkins core is simply reporting an illegal condition which something else was responsible for producing. Other than the original report here, all cases have been tracked back to a combination of bugs in the Git and Matrix Project plugins, both of which were fixed. What is causing the problem in your case I do not know.
didn't answer the question: what do you need to isolate to root cause?
Usually, isolating a problem to root cause needs enough description of the distinct conditions which caused the problem so that someone else can duplicate the problem.
In this case, it may need a support bundle (to show the versions of various plugins installed), a copy of the job definition (or a detailed enough description of the job definition that someone can recreate the job from that description), and a description of the actions taken to show the problem.
Right, ideally a way to reproduce the problem from scratch. When that cannot be found, there may be some clues that are helpful (for example: only happens when a certain plugin is installed/configured), but in general the problem may not be fixable. For certain bugs of course the error message/stack trace suffices to guess at a diagnosis. Unfortunately that is not the case here.
Is this the best/current bug to discuss the 'dead node' issue? Should I be using JENKINS-29268 or something else instead?
This bug is affecting us pretty badly (team of ~15 developers) and we're willing to put some work in to help get it fixed.
Thanks in advance
I don't think a resolved issue is generally a good place to discuss an issue you're currently seeing. If there is an open issue that is the same, or appears strongly related, that would be a reasonable place for the discussion.
If the issue is still there, and this describes that issue, then I think this bug should be reopened.
Do not reopen this issue.
There is a known but unconfirmed occurrence when using reload-from-disk. If you know of some other means of reproducing, mention in JENKINS-27530.
This may be a legitimate bug revealed by the changes for
JENKINS-24380(I actually asked for this to be an error condition to prevent overriding existing builds).Are you using Gerrit Trigger Plugin?
Is there a build.xml in 29 (while Jenkins still thinks the next build should be 29)?