Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-26582

ISE from RunMap.put using /git/notifyCommit on a matrix project

      Jan 23, 2015 11:43:52 PM hudson.model.Executor run
      SEVERE: Unexpected executor death
      java.lang.IllegalStateException: /MY_DIR/jenkins/home/jobs/MY_JOB/builds/29 already existed; will not overwite with MY_JOB #29
      at hudson.model.RunMap.put(RunMap.java:187)
      at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178)
      at hudson.model.AbstractProject.newBuild(AbstractProject.java:1001)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1200)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
      at hudson.model.Executor.run(Executor.java:213)

      I tried to manually start a job via the run button. I see the Dead indicator in the executor status sidebar. I was able to restart the thread and then manually start the job again. This time it started with #30 and appears to be running OK.

      Looking at JENKINS_HOME dir in linux, I see that there is a build directory for #29 and the job ran and failed (unrelated). Looking at the Jenkins webpage for the job, the build skips from 28 to (the now running) 30. #29 has no listing.

      I know some items related to the major enhancement are not closed yet, like JENKINS-23152, so please close this if the behavior I'm seeing is expected until the next release. If this is unexpected - hopefully this report is informative - this issue happened 4x in a row for each job I had put in the build queue.

        1. screenshot-3.png
          32 kB
          philbeiler
        2. screenshot-2.png
          15 kB
          philbeiler
        3. screenshot-1.png
          16 kB
          philbeiler
        4. config.xml
          5 kB
          Mark Sinclair

          [JENKINS-26582] ISE from RunMap.put using /git/notifyCommit on a matrix project

          Mark Sinclair added a comment -

          After soft restart of Jenkins, the missing build re-appears (#29).

          All projects are freestyle. I attached my config.xml as well if that will help (replaced some fields with XXX).

          Mark Sinclair added a comment - After soft restart of Jenkins, the missing build re-appears (#29). All projects are freestyle. I attached my config.xml as well if that will help (replaced some fields with XXX).

          Jesse Glick added a comment -

          The only suspicious thing I see in the job config is the use of the Heavy Job plugin. Is this problem at all reproducible for you? If so, does skipping use of Heavy Job fix it?

          The critical diagnostic which was not mentioned here was whether nextBuildNumber existed and if so what it said. It is supposed to point to the next build number which should be created. If for some reason it failed to be updated, Jenkins would try to recreate a build with the same number. In the past this mistake would have resulted in an earlier build being silently overwritten. As of 1.597 it is caught.

          The same applies to bugs like JENKINS-23152, with a different cause (objects held in memory with stale contents).

          Jesse Glick added a comment - The only suspicious thing I see in the job config is the use of the Heavy Job plugin. Is this problem at all reproducible for you? If so, does skipping use of Heavy Job fix it? The critical diagnostic which was not mentioned here was whether nextBuildNumber existed and if so what it said. It is supposed to point to the next build number which should be created. If for some reason it failed to be updated, Jenkins would try to recreate a build with the same number. In the past this mistake would have resulted in an earlier build being silently overwritten. As of 1.597 it is caught. The same applies to bugs like JENKINS-23152 , with a different cause (objects held in memory with stale contents).

          Mark Sinclair added a comment -

          I had posted a comment a couple of weeks ago, but it's not showing up here.

          In any case I was just able to reproduce the problem today. It's been about 3 weeks since I last saw the problem. I got exactly the same thing to happen today.

          Interesting to note, I was doing some configuration updates behind the scenes by editing config.xml for many jobs. Then to load the config I called out JenkinsURL/reload from my browser.

          I wonder if it's picking up an old version of nextBuildNumber when the reload occurs? Some of the jobs were running when I reloaded. I wonder if nextBuildNumber only gets updated when the job completes?

          Mark Sinclair added a comment - I had posted a comment a couple of weeks ago, but it's not showing up here. In any case I was just able to reproduce the problem today. It's been about 3 weeks since I last saw the problem. I got exactly the same thing to happen today. Interesting to note, I was doing some configuration updates behind the scenes by editing config.xml for many jobs. Then to load the config I called out JenkinsURL/reload from my browser. I wonder if it's picking up an old version of nextBuildNumber when the reload occurs? Some of the jobs were running when I reloaded. I wonder if nextBuildNumber only gets updated when the job completes?

          Jesse Glick added a comment -

          No, nextBuildNumber is incremented the moment a new build is created, before it even really starts (right after it leaves the queue).

          Reloading from disk is a plausible explanation; perhaps there are two copies of the Job sitting around temporarily, each with its own version of the field.

          Jesse Glick added a comment - No, nextBuildNumber is incremented the moment a new build is created, before it even really starts (right after it leaves the queue). Reloading from disk is a plausible explanation; perhaps there are two copies of the Job sitting around temporarily, each with its own version of the field.

          Mark Sinclair added a comment -

          There were many jobs in the queue and some running when the reload happened. Maybe it is safer quietDown, clear the queue, then reload?

          Mark Sinclair added a comment - There were many jobs in the queue and some running when the reload happened. Maybe it is safer quietDown, clear the queue, then reload?

          Jesse Glick added a comment -

          Deals with freestyle projects so probably distinct from JENKINS-26739.

          Jesse Glick added a comment - Deals with freestyle projects so probably distinct from JENKINS-26739 .

          Jesse Glick added a comment -

          I am guessing the reload operation caused the problem. Do you happen to know how to reproduce?

          Jesse Glick added a comment - I am guessing the reload operation caused the problem. Do you happen to know how to reproduce?

          Mark Sinclair added a comment -

          I don't have a specific way to reproduce. Here are some conditions that were true both times the failure occured:
          -Multiple slaves all busy and about 50 jobs in the build queue. Some jobs have been sitting in the build queue for 12+ hours.
          -Implement configuration changes across many jobs by editing the config.xml files directly in linux/emacs. config.xml is the only file touched.
          -reload configuration via <jenkins-url>/reload from my browser. Jenkins asks me to 'POST' the command, which I hit the "tryPOSTing" button and then it reloads.

          After the reload, when an executor becomes free, it picks up a job from the queue and immediately fails with the dead indicator in the executor status sidebar. When the thread is restarted, it picks up another job from the queue and dies again. The process repeats until the queue is empty.

          It is unclear if jobs that were not in the queue at the time of reload would die. I'm not sure if after the reload all jobs need to die one time or just the jobs that were in the queue at the time of the reload.

          It is true that the problem is self correcting, after a job dies one time, the build number gets corrected and it will run properly the next time.

          This doesn't happen every time I reload config.

          Mark Sinclair added a comment - I don't have a specific way to reproduce. Here are some conditions that were true both times the failure occured: -Multiple slaves all busy and about 50 jobs in the build queue. Some jobs have been sitting in the build queue for 12+ hours. -Implement configuration changes across many jobs by editing the config.xml files directly in linux/emacs. config.xml is the only file touched. -reload configuration via <jenkins-url>/reload from my browser. Jenkins asks me to 'POST' the command, which I hit the "tryPOSTing" button and then it reloads. After the reload, when an executor becomes free, it picks up a job from the queue and immediately fails with the dead indicator in the executor status sidebar. When the thread is restarted, it picks up another job from the queue and dies again. The process repeats until the queue is empty. It is unclear if jobs that were not in the queue at the time of reload would die. I'm not sure if after the reload all jobs need to die one time or just the jobs that were in the queue at the time of the reload. It is true that the problem is self correcting, after a job dies one time, the build number gets corrected and it will run properly the next time. This doesn't happen every time I reload config.

          Noticed this recently after an upgrade, and might be related, running 1.602
          Reported here rather than JENKINS-26739 since it's already closed and this bug relates more to RunMap.put rather than lazyput. The matrix build below has one configuration runs on one node with 1 executor so concurrency here shouldn't be an issue.

          I get:
          Mar 11, 2015 4:07:24 PM hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
          INFO: Scheduling MY_JOB to build commit MY_ID
          Mar 11, 2015 4:07:24 PM hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
          INFO: Scheduling MY_JOB » MY_HOST to build commit MY_ID
          Mar 11, 2015 4:07:24 PM hudson.model.Executor run
          SEVERE: Unexpected executor death
          java.lang.IllegalStateException: /MY_DIR/jobs/MY_JOB/builds/20 already existed; will not overwite with MY_JOB
          /label=MY_JOB #20
          at hudson.model.RunMap.put(RunMap.java:187)
          at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
          at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
          at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
          at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
          at hudson.model.Executor.run(Executor.java:213)

          I get this almost every triggered build. Manually restarting the thread sees the build succeed.
          Interestingly, on the filesystem I see build 21 already exists when it fails:

          MY_DIR/jobs/MY_JOB/builds$ ls -lR 20 21 ../nextBuildNumber
          rw-rr- 1 jenkins jenkins 3 Mar 11 16:07 ../nextBuildNumber

          20:
          total 16
          rw-rr- 1 jenkins jenkins 6488 Mar 11 14:11 build.xml
          rw-rr- 1 jenkins jenkins 478 Mar 11 14:04 changelog.xml
          rw-rr- 1 jenkins jenkins 2424 Mar 11 14:11 log

          21:
          total 8
          rw-rr- 1 jenkins jenkins 1723 Mar 11 16:07 changelog.xml
          rw-rr- 1 jenkins jenkins 2110 Mar 11 16:08 log
          MY_DIR/jobs/MY_JOB/builds$ cat ../nextBuildNumber
          22

          Job 20 shows up in the gui as completed, job 21 still running (executor crashed)
          So it looks like the wrong build number is being picked up (note the times of build 21 and the log message 4:07)

          Benjamin Close added a comment - Noticed this recently after an upgrade, and might be related, running 1.602 Reported here rather than JENKINS-26739 since it's already closed and this bug relates more to RunMap.put rather than lazyput. The matrix build below has one configuration runs on one node with 1 executor so concurrency here shouldn't be an issue. I get: Mar 11, 2015 4:07:24 PM hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit INFO: Scheduling MY_JOB to build commit MY_ID Mar 11, 2015 4:07:24 PM hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit INFO: Scheduling MY_JOB » MY_HOST to build commit MY_ID Mar 11, 2015 4:07:24 PM hudson.model.Executor run SEVERE: Unexpected executor death java.lang.IllegalStateException: /MY_DIR/jobs/MY_JOB/builds/20 already existed; will not overwite with MY_JOB /label=MY_JOB #20 at hudson.model.RunMap.put(RunMap.java:187) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144) at hudson.model.Executor.run(Executor.java:213) I get this almost every triggered build. Manually restarting the thread sees the build succeed. Interestingly, on the filesystem I see build 21 already exists when it fails: MY_DIR/jobs/MY_JOB/builds$ ls -lR 20 21 ../nextBuildNumber rw-r r - 1 jenkins jenkins 3 Mar 11 16:07 ../nextBuildNumber 20: total 16 rw-r r - 1 jenkins jenkins 6488 Mar 11 14:11 build.xml rw-r r - 1 jenkins jenkins 478 Mar 11 14:04 changelog.xml rw-r r - 1 jenkins jenkins 2424 Mar 11 14:11 log 21: total 8 rw-r r - 1 jenkins jenkins 1723 Mar 11 16:07 changelog.xml rw-r r - 1 jenkins jenkins 2110 Mar 11 16:08 log MY_DIR/jobs/MY_JOB/builds$ cat ../nextBuildNumber 22 Job 20 shows up in the gui as completed, job 21 still running (executor crashed) So it looks like the wrong build number is being picked up (note the times of build 21 and the log message 4:07)

          Greg BOUGEARD added a comment -

          Hi, we have the following stacktrace after updating a 1.596 version to 1.605 (and 1.606) :

          Mar 25, 2015 12:57:18 PM SEVERE hudson.model.Executor run
            Unexpected executor death
            java.lang.IllegalStateException: /var/lib/jenkins/jobs/service-mysql-migrations_master/configurations/axis-BASE_TAG/prod/builds/218 already existed; will not overwite with service-mysql-migrations_master/BASE_TAG=prod #218
          	at hudson.model.RunMap.put(RunMap.java:187)
          	at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
          	at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
          	at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
          	at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
          

          Greg BOUGEARD added a comment - Hi, we have the following stacktrace after updating a 1.596 version to 1.605 (and 1.606) : Mar 25, 2015 12:57:18 PM SEVERE hudson.model.Executor run Unexpected executor death java.lang.IllegalStateException: / var /lib/jenkins/jobs/service-mysql-migrations_master/configurations/axis-BASE_TAG/prod/builds/218 already existed; will not overwite with service-mysql-migrations_master/BASE_TAG=prod #218 at hudson.model.RunMap.put(RunMap.java:187) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)

          Update to this issue. It's definitely seems related to multiple new jobs starting at the same time.
          Once the executor dies, jobs queue up. If I then restart the thread with no new jobs starting the pending jobs queue will clear itself over time. If another job comes along at the right time, the thread death may occur.

          Benjamin Close added a comment - Update to this issue. It's definitely seems related to multiple new jobs starting at the same time. Once the executor dies, jobs queue up. If I then restart the thread with no new jobs starting the pending jobs queue will clear itself over time. If another job comes along at the right time, the thread death may occur.

          I have been able to consistently reproduce the thread death with "Stash Webhook to Jenkins" (v2.6) triggering a Multi-Configuration job. Our installation is relatively new, and this is the first time we setup anything to do triggering of jobs.

          Log of session

          Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
          Scheduling BUILD: Code Deploy to build commit 7410c63de6e0431ee7c831bfdc1f2ba909b00385
          Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
          Scheduling BUILD: Code Deploy » py27 to build commit 7410c63de6e0431ee7c831bfdc1f2ba909b00385
          Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
          PostCommitHooks are disabled on DEPLOY: code_deploy
          Mar 27, 2015 2:39:52 PM SEVERE hudson.model.Executor run
          Unexpected executor death
          java.lang.IllegalStateException: /var/lib/jenkins/jobs/build_code_deploy/configurations/axis-TOXENV/py27/builds/8 already existed; will not overwite with build_code_deploy/TOXENV=py27 #8
          	at hudson.model.RunMap.put(RunMap.java:187)
          	at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
          	at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
          	at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
          	at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
          	at hudson.model.Executor.run(Executor.java:213)
          
          Mar 27, 2015 2:40:20 PM INFO hudson.model.Run execute
          build_code_deploy/TOXENV=py27 #9 main build action completed: SUCCESS
          Mar 27, 2015 2:40:21 PM INFO hudson.model.Run execute
          build_code_deploy #9 main build action completed: SUCCESS
          Mar 27, 2015 2:40:44 PM INFO hudson.model.Run execute
          deploy_code_deploy #11 main build action completed: SUCCESS
          

          Of interest Build #8 is the previous successful run, build #9 is the current run and succeeds in a different thread.

          Versions

          component version
          Jenkins 1.606
          Stash 3.1.0
          Stash Webhook to Jenkins 2.60
          ShiningPanda plugin 0.21

          Job Configuration
          Job Name: build_code_deploy
          SCM: git (ssh)
          Branches: */master
          Poll SCM: H 0 1 1 *
          Configuration Matrix: Tox; py27
          Build: Tox Builder, $configuration_file=tox.ini, recreate (checked)
          Post-build: trigger job= deploy_code_deploy, Publish JUnit test result, Notify Stash Instance

          Stash Plugin:
          Configured to only build from Master

          Damion Alexander added a comment - I have been able to consistently reproduce the thread death with "Stash Webhook to Jenkins" (v2.6) triggering a Multi-Configuration job. Our installation is relatively new, and this is the first time we setup anything to do triggering of jobs. Log of session Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit Scheduling BUILD: Code Deploy to build commit 7410c63de6e0431ee7c831bfdc1f2ba909b00385 Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit Scheduling BUILD: Code Deploy » py27 to build commit 7410c63de6e0431ee7c831bfdc1f2ba909b00385 Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit PostCommitHooks are disabled on DEPLOY: code_deploy Mar 27, 2015 2:39:52 PM SEVERE hudson.model.Executor run Unexpected executor death java.lang.IllegalStateException: / var /lib/jenkins/jobs/build_code_deploy/configurations/axis-TOXENV/py27/builds/8 already existed; will not overwite with build_code_deploy/TOXENV=py27 #8 at hudson.model.RunMap.put(RunMap.java:187) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144) at hudson.model.Executor.run(Executor.java:213) Mar 27, 2015 2:40:20 PM INFO hudson.model.Run execute build_code_deploy/TOXENV=py27 #9 main build action completed: SUCCESS Mar 27, 2015 2:40:21 PM INFO hudson.model.Run execute build_code_deploy #9 main build action completed: SUCCESS Mar 27, 2015 2:40:44 PM INFO hudson.model.Run execute deploy_code_deploy #11 main build action completed: SUCCESS Of interest Build #8 is the previous successful run, build #9 is the current run and succeeds in a different thread. Versions component version Jenkins 1.606 Stash 3.1.0 Stash Webhook to Jenkins 2.60 ShiningPanda plugin 0.21 Job Configuration Job Name: build_code_deploy SCM: git (ssh) Branches: */master Poll SCM: H 0 1 1 * Configuration Matrix: Tox; py27 Build: Tox Builder, $configuration_file=tox.ini, recreate (checked) Post-build: trigger job= deploy_code_deploy, Publish JUnit test result, Notify Stash Instance Stash Plugin: Configured to only build from Master

          Jesse Glick added a comment -

          Hmm, do not have Stash available, so I wonder if there is a way to reproduce without using proprietary tools. I would be tempted to ascribe this to yet another weird bug in the matrix-project plugin except that there are some reports from people using freestyle as well.

          Jesse Glick added a comment - Hmm, do not have Stash available, so I wonder if there is a way to reproduce without using proprietary tools. I would be tempted to ascribe this to yet another weird bug in the matrix-project plugin except that there are some reports from people using freestyle as well.

          Jesse Glick added a comment -

          Since the next LTS baseline is looming and there is no progress in finding a test case for this, I filed a PR to at least make the error nonfatal.

          Jesse Glick added a comment - Since the next LTS baseline is looming and there is no progress in finding a test case for this, I filed a PR to at least make the error nonfatal.

          The Stash Webhook plugin makes a GET request to the jenkins server. So one should just need Jenkins, the Git plugins (scm-api, git client plugin, git plugin), and something to serve git.

          GET /git/notifyCommit?url=${URL_ENCODED_GIT_URL}&branches=master&sha1=${COMMIT}
          

          Damion Alexander added a comment - The Stash Webhook plugin makes a GET request to the jenkins server. So one should just need Jenkins, the Git plugins (scm-api, git client plugin, git plugin), and something to serve git. GET /git/notifyCommit?url=${URL_ENCODED_GIT_URL}&branches=master&sha1=${COMMIT}

          Greg BOUGEARD added a comment -

          Yep, we're using a freestyle project with a matrix configuration

          Greg BOUGEARD added a comment - Yep, we're using a freestyle project with a matrix configuration

          Zuyang Kou added a comment -

          We could reliably reproduce this issue by triggering a matrix configuration job via git plugin

          Zuyang Kou added a comment - We could reliably reproduce this issue by triggering a matrix configuration job via git plugin

          philbeiler added a comment -

          Just curious if this is being actively worked or if there is any kind of timeline for resolution? This has completely killed my environment, as all of our jobs are Matrix builds (with automated feature branch/job creation and web hook pushes from GIT to kick off the jobs). Within an hour, all nodes on our farm are full of dead threads – Jenkins is basically down and unusable... serious bummer!

          Are there any work-arounds? Or do I have to change all of my jobs to polling and give up on the hooks? Thanks for any information.

          philbeiler added a comment - Just curious if this is being actively worked or if there is any kind of timeline for resolution? This has completely killed my environment, as all of our jobs are Matrix builds (with automated feature branch/job creation and web hook pushes from GIT to kick off the jobs). Within an hour, all nodes on our farm are full of dead threads – Jenkins is basically down and unusable... serious bummer! Are there any work-arounds? Or do I have to change all of my jobs to polling and give up on the hooks? Thanks for any information.

          Jesse Glick added a comment -

          This is not being actively worked on because there is no known way to reproduce from scratch.

          I did file a PR to downgrade the error to a warning, closer to the pre-1.597 state (which allowed builds to be overwritten silently), but it was rejected. No problem to reopen that discussion, though of course providing developers with a way to reproduce the problem would be far better.

          Jesse Glick added a comment - This is not being actively worked on because there is no known way to reproduce from scratch. I did file a PR to downgrade the error to a warning, closer to the pre-1.597 state (which allowed builds to be overwritten silently), but it was rejected . No problem to reopen that discussion, though of course providing developers with a way to reproduce the problem would be far better.

          Daniel Beck added a comment -

          Would be great if we knew how to reproduce this. Not something vague, but full steps to reproduce on a newly set up instance. If you're experiencing this issue, consider spinning up a second Jenkins instance (could be your desktop machine) to try to make this reproducible based on what you know about your production instance. Only once we know the circumstances for this to happen, we can investigate them and fix the underlying problem.

          Daniel Beck added a comment - Would be great if we knew how to reproduce this. Not something vague, but full steps to reproduce on a newly set up instance. If you're experiencing this issue, consider spinning up a second Jenkins instance (could be your desktop machine) to try to make this reproducible based on what you know about your production instance. Only once we know the circumstances for this to happen, we can investigate them and fix the underlying problem.

          philbeiler added a comment - - edited

          Fortunately, I was able to easily duplicate the problem in about 5 minutes, first crack! Hopefully you can too!

          Ubuntu 15.04 - Does not matter, not what I'm running in prod
          Jenkins 1.612 - Does not matter, not what I'm running in prod – seems like this problem started around 1.597??? Not really sure when.
          Install Git Plugin
          Two slaves s1 and s2 both on localhost - each with 2 executors s1 label S1, s2 label with S2 (I run no executors on master in prod, just an FYI)
          Create a Matrix job
          GIT REPO https://github.com/allegro/axion-release-plugin.git (use this repo, as I'm using a hash for this repository - no auth required )
          Branches to build: origin/master
          Additional Behavoir -> Local Branch Name: master
          Schedule: H 0 1 1 0
          Configuration Matrix Label Expression label_exp S1 S2

          Build Execute Shell Command echo "--------------->"$label_exp

          Per the gentleman's comment above – simply post this url in another browser tab and watch your threads die... Keep submitting, and they all die...

          http://localhost:8080/jenkins//git/notifyCommit?url=https%3A%2F%2Fgithub.com%2Fallegro%2Faxion-release-plugin.git&branches=master&sha1=a5de4d725814ff907a8cf1f5666b9f01e8361655

          SEVERE: Unexpected executor death
          java.lang.IllegalStateException: /usr/share/tomcat8/.jenkins/jobs/test/configurations/axis-label_exp/S1/builds/3 already existed; will not overwite with test/label_exp=S1 #3
          at hudson.model.RunMap.put(RunMap.java:189)
          at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
          at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
          at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1210)
          at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
          at hudson.model.Executor$1.call(Executor.java:328)
          at hudson.model.Executor$1.call(Executor.java:310)
          at hudson.model.Queue._withLock(Queue.java:1251)
          at hudson.model.Queue.withLock(Queue.java:1189)
          at hudson.model.Executor.run(Executor.java:310)

          Let me know if this does not work, or you want some config files. I hope it is easily reproducible..and quickly resolved – this is killing me! Thanks so much for your help...

          This is even worse.. I lose my node after a while – I saw this in production and but I did not think it was related...

          Now I have these messages in the log...

          May 08, 2015 7:15:45 AM hudson.triggers.SafeTimerTask run
          SEVERE: Timer task hudson.slaves.NodeProvisioner$NodeProvisionerInvoker@7c2aef8c failed
          java.lang.NullPointerException
          at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:624)
          at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:619)
          at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:352)
          at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:332)
          at hudson.slaves.NodeProvisioner$2.run(NodeProvisioner.java:261)
          at hudson.model.Queue._withLock(Queue.java:1212)
          at hudson.model.Queue.withLock(Queue.java:1148)
          at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:208)
          at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:57)
          at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:778)
          at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
          at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          at java.lang.Thread.run(Thread.java:745)

          May 08, 2015 7:15:51 AM hudson.triggers.SafeTimerTask run
          SEVERE: Timer task hudson.model.LoadStatistics$LoadStatisticsUpdater@40623d93 failed
          java.lang.NullPointerException
          at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:624)
          at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:619)
          at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:352)
          at hudson.model.LoadStatistics$LoadStatisticsUpdater.doRun(LoadStatistics.java:394)
          at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
          at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          at java.lang.Thread.run(Thread.java:745)

          I restart tomcat and my slave returns.... that is not very cool! But at least it came back!!
          Thanks again...

          philbeiler added a comment - - edited Fortunately, I was able to easily duplicate the problem in about 5 minutes, first crack! Hopefully you can too! Ubuntu 15.04 - Does not matter, not what I'm running in prod Jenkins 1.612 - Does not matter, not what I'm running in prod – seems like this problem started around 1.597??? Not really sure when. Install Git Plugin Two slaves s1 and s2 both on localhost - each with 2 executors s1 label S1, s2 label with S2 (I run no executors on master in prod, just an FYI) Create a Matrix job GIT REPO https://github.com/allegro/axion-release-plugin.git (use this repo, as I'm using a hash for this repository - no auth required ) Branches to build: origin/master Additional Behavoir -> Local Branch Name: master Schedule: H 0 1 1 0 Configuration Matrix Label Expression label_exp S1 S2 Build Execute Shell Command echo "--------------->"$label_exp Per the gentleman's comment above – simply post this url in another browser tab and watch your threads die... Keep submitting, and they all die... http://localhost:8080/jenkins//git/notifyCommit?url=https%3A%2F%2Fgithub.com%2Fallegro%2Faxion-release-plugin.git&branches=master&sha1=a5de4d725814ff907a8cf1f5666b9f01e8361655 SEVERE: Unexpected executor death java.lang.IllegalStateException: /usr/share/tomcat8/.jenkins/jobs/test/configurations/axis-label_exp/S1/builds/3 already existed; will not overwite with test/label_exp=S1 #3 at hudson.model.RunMap.put(RunMap.java:189) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1210) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144) at hudson.model.Executor$1.call(Executor.java:328) at hudson.model.Executor$1.call(Executor.java:310) at hudson.model.Queue._withLock(Queue.java:1251) at hudson.model.Queue.withLock(Queue.java:1189) at hudson.model.Executor.run(Executor.java:310) Let me know if this does not work, or you want some config files. I hope it is easily reproducible..and quickly resolved – this is killing me! Thanks so much for your help... This is even worse.. I lose my node after a while – I saw this in production and but I did not think it was related... Now I have these messages in the log... May 08, 2015 7:15:45 AM hudson.triggers.SafeTimerTask run SEVERE: Timer task hudson.slaves.NodeProvisioner$NodeProvisionerInvoker@7c2aef8c failed java.lang.NullPointerException at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:624) at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:619) at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:352) at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:332) at hudson.slaves.NodeProvisioner$2.run(NodeProvisioner.java:261) at hudson.model.Queue._withLock(Queue.java:1212) at hudson.model.Queue.withLock(Queue.java:1148) at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:208) at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:57) at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:778) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) May 08, 2015 7:15:51 AM hudson.triggers.SafeTimerTask run SEVERE: Timer task hudson.model.LoadStatistics$LoadStatisticsUpdater@40623d93 failed java.lang.NullPointerException at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:624) at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:619) at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:352) at hudson.model.LoadStatistics$LoadStatisticsUpdater.doRun(LoadStatistics.java:394) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I restart tomcat and my slave returns.... that is not very cool! But at least it came back!! Thanks again...

          philbeiler added a comment -

          Was anyone able to replicate this problem? Just want to help out where I can, and am hoping for a quick resolution. Thanks.

          philbeiler added a comment - Was anyone able to replicate this problem? Just want to help out where I can, and am hoping for a quick resolution. Thanks.

          Jesse Glick added a comment -

          I plan to try.

          Jesse Glick added a comment - I plan to try.

          Jesse Glick added a comment -

          I get this ISE the second and subsequent time I trigger the build. The first time I get something different (with matrix-project 1.4.1):

          java.io.IOException: cannot start a build of JENKINS-26582/label_exp=S1 since its parent has no builds at all
          	at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:276)
          	at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
          	at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1210)
          	at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
          	at hudson.model.Executor$1.call(Executor.java:328)
          	at …
          

          Will investigate both.

          All the builds actually seem to run.

          Jesse Glick added a comment - I get this ISE the second and subsequent time I trigger the build. The first time I get something different (with matrix-project 1.4.1 ): java.io.IOException: cannot start a build of JENKINS-26582/label_exp=S1 since its parent has no builds at all at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:276) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1210) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144) at hudson.model.Executor$1.call(Executor.java:328) at … Will investigate both. All the builds actually seem to run.

          Jesse Glick added a comment -

          I filed the NPE as JENKINS-28384. It was probably triggered by the executor death here, but is an independent bug.

          Jesse Glick added a comment - I filed the NPE as JENKINS-28384 . It was probably triggered by the executor death here, but is an independent bug.

          Jesse Glick added a comment -

          Here is what I have so far. Despite appearances, this is not a core bug. A change made as part of the implementation of JENKINS-24380 merely made an erroneous condition be displayed as such.

          The critical point (and the reason why existing automated tests did not catch this) is that the bug is not manifested when you trigger a matrix build normally—only when you install the Git plugin and use /git/notifyCommit.

          The main bug is in matrix-project: MatrixConfiguration.newBuild assumes without checking that the newly created MatrixRun in fact has a unique build number. If this was created as part of a MatrixBuild, then that will be true, since the MatrixProject uses the default (sane) implementation of newBuild. And that is exactly what happens—normally.

          But it turns out that there is nothing blocking a MatrixRun from being created other ways. For example, if you ping <job>/label_exp=S1/build you will get this error, because the number of the last parent MatrixBuild is unchanged, and there was already a MatrixRun with that number. Of course you would not normally go to this URL—it is not exposed in the UI—but there is nothing stopping you from doing that, or I guess from using the CLI to do the same.

          I presume the initial …since its parent has no builds at all exception has the same cause: an attempt to directly start a MatrixRun without a parent MatrixBuild.

          Now the reason why this appears with Git notifications is that GitStatus.JenkinsAbstractProjectListener.onNotifyCommit is wrong, too. If you look at SubversionRepositoryStatus or MercurialStatus, you will see that they check whether there is an SCMTrigger for the project which does not ignorePostCommitHooks. If there is not, they print a message (No subversion jobs using SCM polling or all jobs using SCM polling are ignoring post-commit hooks, No SCMTrigger on …), and then do nothing. And if sha1 is unspecified, GitStatus does the same. But if you specify sha1, it immediately schedules a build (with RevisionParameterAction), rather than using polling, even if there is no SCMTrigger.

          For a top-level project, that is probably fine. But MatrixConfiguration.getScm delegates to its parent, so each configuration shows up in the list of projects matching the specified repo. Thus means that the Git plugin schedules not only the top-level MatrixProject into the queue, but each MatrixConfiguration! And when those children are run directly, the error appears because they are trying to create builds which duplicate an existing number: either that of the last matrix build (if the configurations get scheduled before the parent), or that of the current matrix build (if after).

          Jesse Glick added a comment - Here is what I have so far. Despite appearances, this is not a core bug. A change made as part of the implementation of JENKINS-24380 merely made an erroneous condition be displayed as such. The critical point (and the reason why existing automated tests did not catch this) is that the bug is not manifested when you trigger a matrix build normally—only when you install the Git plugin and use /git/notifyCommit . The main bug is in matrix-project : MatrixConfiguration.newBuild assumes without checking that the newly created MatrixRun in fact has a unique build number. If this was created as part of a MatrixBuild , then that will be true, since the MatrixProject uses the default (sane) implementation of newBuild . And that is exactly what happens—normally. But it turns out that there is nothing blocking a MatrixRun from being created other ways. For example, if you ping <job>/label_exp=S1/build you will get this error, because the number of the last parent MatrixBuild is unchanged, and there was already a MatrixRun with that number. Of course you would not normally go to this URL—it is not exposed in the UI—but there is nothing stopping you from doing that, or I guess from using the CLI to do the same. I presume the initial …since its parent has no builds at all exception has the same cause: an attempt to directly start a MatrixRun without a parent MatrixBuild . Now the reason why this appears with Git notifications is that GitStatus.JenkinsAbstractProjectListener.onNotifyCommit is wrong, too. If you look at SubversionRepositoryStatus or MercurialStatus , you will see that they check whether there is an SCMTrigger for the project which does not ignorePostCommitHooks . If there is not, they print a message ( No subversion jobs using SCM polling or all jobs using SCM polling are ignoring post-commit hooks , No SCMTrigger on … ), and then do nothing. And if sha1 is unspecified, GitStatus does the same. But if you specify sha1 , it immediately schedules a build (with RevisionParameterAction ), rather than using polling, even if there is no SCMTrigger . For a top-level project, that is probably fine. But MatrixConfiguration.getScm delegates to its parent, so each configuration shows up in the list of projects matching the specified repo. Thus means that the Git plugin schedules not only the top-level MatrixProject into the queue, but each MatrixConfiguration ! And when those children are run directly, the error appears because they are trying to create builds which duplicate an existing number: either that of the last matrix build (if the configurations get scheduled before the parent), or that of the current matrix build (if after).

          Jesse Glick added a comment -

          msinclair sorry to hijack this issue but everyone else seems to be seeing a problem with matrix projects and Git notifications, whereas your case was actually something else, apparently rarer and probably needing some unrelated fix. If you still see it, file it separately (blocking JENKINS-24380).

          Jesse Glick added a comment - msinclair sorry to hijack this issue but everyone else seems to be seeing a problem with matrix projects and Git notifications, whereas your case was actually something else, apparently rarer and probably needing some unrelated fix. If you still see it, file it separately (blocking JENKINS-24380 ).

          Jesse Glick added a comment -

          Offering PRs for both plugins. Either fix will avoid the usual symptom, but it is best to have both.

          Jesse Glick added a comment - Offering PRs for both plugins. Either fix will avoid the usual symptom, but it is best to have both.

          Issue can be masked because not all plugins using onNotifyCommit in github-plugin migration is in progress and other plugins just copy-pasted from github-plugin algorithm/code.

          Kanstantsin Shautsou added a comment - Issue can be masked because not all plugins using onNotifyCommit in github-plugin migration is in progress and other plugins just copy-pasted from github-plugin algorithm/code.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/hudson/plugins/git/GitStatus.java
          http://jenkins-ci.org/commit/git-plugin/de3117def8625c57a95126a200e990ab0481948e
          Log:
          [FIXED JENKINS-26582] To trigger a build from notifyCommit, the project must have an SCMTrigger.
          This is true even if it has a matching SCM and sha1 is specified.
          Otherwise we would be triggering MatrixConfiguration, which is illegal and cause errors.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/hudson/plugins/git/GitStatus.java http://jenkins-ci.org/commit/git-plugin/de3117def8625c57a95126a200e990ab0481948e Log: [FIXED JENKINS-26582] To trigger a build from notifyCommit, the project must have an SCMTrigger. This is true even if it has a matching SCM and sha1 is specified. Otherwise we would be triggering MatrixConfiguration, which is illegal and cause errors.

          Code changed in jenkins
          User: Mark Waite
          Path:
          src/main/java/hudson/plugins/git/GitStatus.java
          src/test/java/hudson/plugins/git/GitSCMTest.java
          http://jenkins-ci.org/commit/git-plugin/b88b388aee1085e5d161c578c3f551953b27abf4
          Log:
          Merge pull request #319 from jglick/SCMTrigger-JENKINS-26582

          JENKINS-26582 notifyCommit should ignore projects without SCMTrigger

          Passed tests on multiple platforms.

          Compare: https://github.com/jenkinsci/git-plugin/compare/6c1c49feefb3...b88b388aee10

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Mark Waite Path: src/main/java/hudson/plugins/git/GitStatus.java src/test/java/hudson/plugins/git/GitSCMTest.java http://jenkins-ci.org/commit/git-plugin/b88b388aee1085e5d161c578c3f551953b27abf4 Log: Merge pull request #319 from jglick/SCMTrigger- JENKINS-26582 JENKINS-26582 notifyCommit should ignore projects without SCMTrigger Passed tests on multiple platforms. Compare: https://github.com/jenkinsci/git-plugin/compare/6c1c49feefb3...b88b388aee10

          philbeiler added a comment - - edited

          I was having a hard time following the comments, if these changes would actually fix the dead thread problem. Now that this fix was integrated into the head (according to emails that I received from the Jenkins server, I decided to retry the test, but I get the same results..

          philbeiler added a comment - - edited I was having a hard time following the comments, if these changes would actually fix the dead thread problem. Now that this fix was integrated into the head (according to emails that I received from the Jenkins server, I decided to retry the test, but I get the same results..

          philbeiler added a comment -

          philbeiler added a comment -

          Daniel Beck added a comment -

          philbeiler The fixes are in the Git Plugin and have not been released yet.

          Daniel Beck added a comment - philbeiler The fixes are in the Git Plugin and have not been released yet.

          Mark Sinclair added a comment -

          I had originally reported this issue. For my application (freestyle jobs), I see the dead thread problem after a config reload. This resolution won't solve the problem for those that are seeing the problem after reloading configuration (I don't have the Git Plugin).

          Mark Sinclair added a comment - I had originally reported this issue. For my application (freestyle jobs), I see the dead thread problem after a config reload. This resolution won't solve the problem for those that are seeing the problem after reloading configuration (I don't have the Git Plugin).

          Daniel Beck added a comment -

          msinclair Well, this is a mess then

          Phil's explanation was so good and easily reproducible that Jesse went ahead and fixed it using this issue as reference. Notably, most commenters also used Matrix Projects and provided information and stack traces related to that, so it's understandable that the difference in the original report was missed.

          The cleanest way to move forward would be to file your original issue a second time. Mention that it's not JENKINS-26582, as its resolution helps with Matrix projects and Git plugin, but your problem is with Freestyle projects, to protect it from being hijacked again. I understand this must be frustrating but using this issue going forward, when it was already used in the fixes to Git Plugin, would be too confusing IMO.

          It would also be great if you could make your issue reliably reproducible in some way.

          Daniel Beck added a comment - msinclair Well, this is a mess then Phil's explanation was so good and easily reproducible that Jesse went ahead and fixed it using this issue as reference. Notably, most commenters also used Matrix Projects and provided information and stack traces related to that, so it's understandable that the difference in the original report was missed. The cleanest way to move forward would be to file your original issue a second time. Mention that it's not JENKINS-26582 , as its resolution helps with Matrix projects and Git plugin, but your problem is with Freestyle projects, to protect it from being hijacked again. I understand this must be frustrating but using this issue going forward, when it was already used in the fixes to Git Plugin, would be too confusing IMO. It would also be great if you could make your issue reliably reproducible in some way.

          Jesse Glick added a comment -

          msinclair perhaps you missed my comment of May 13th where I said essentially the same thing, but perhaps not explained as well as Daniel.

          Jesse Glick added a comment - msinclair perhaps you missed my comment of May 13th where I said essentially the same thing, but perhaps not explained as well as Daniel.

          Jesse Glick added a comment -

          Please note that there are fixes for this issue in both the Git and Matrix Project plugins. I believe either suffices to avoid the symptom. Not sure about release status, check plugin changelogs.

          Jesse Glick added a comment - Please note that there are fixes for this issue in both the Git and Matrix Project plugins. I believe either suffices to avoid the symptom. Not sure about release status, check plugin changelogs.

          Mark Sinclair added a comment -

          Thanks - hadn't noticed that. I will get the latest and give it a try when it's released.

          Mark Sinclair added a comment - Thanks - hadn't noticed that. I will get the latest and give it a try when it's released.

          philbeiler added a comment -

          I apologize for for hijacking this issue.. That was not my attempt, as the symptoms/issue produced the same outcome – it just seemed like the right place to jump in!

          I hope I can ask one last question on this, Jesse stated that both the Git and Matrix plugin have been fixed. The matrix plugin has not been released in almost a year, and the Git plugin was last released in February. Is there any plan for releasing them (I'm sure they are not your responsibility), but this but is killing me (or my threads!) Is there some way that I can monitor these fixes (related ticket numbers, etc), as they go thru the automated process, similar to the core Jenkins code? I would just like to know how long I have to deal with this, especially, since you were so kind to fix it weeks ago.
          Thanks again. Phil

          philbeiler added a comment - I apologize for for hijacking this issue.. That was not my attempt, as the symptoms/issue produced the same outcome – it just seemed like the right place to jump in! I hope I can ask one last question on this, Jesse stated that both the Git and Matrix plugin have been fixed. The matrix plugin has not been released in almost a year, and the Git plugin was last released in February. Is there any plan for releasing them (I'm sure they are not your responsibility), but this but is killing me (or my threads!) Is there some way that I can monitor these fixes (related ticket numbers, etc), as they go thru the automated process, similar to the core Jenkins code? I would just like to know how long I have to deal with this, especially, since you were so kind to fix it weeks ago. Thanks again. Phil

          Mark Waite added a comment - - edited

          philbeiler the git plugin and git client plugin are being tested in hopes of releasing new versions before the end of June. If you're willing to assist with the testing, please download and install a pre-release build of the git client plugin and the git plugin. Problems detected in the pre-release should be e-mailed to MarkEWaite and ndeloof.

          I wrote some test ideas if you would like suggestions of areas that need testing. The git plugin supports many different use cases and its automated tests only evaluate a very few of those use cases.

          I ran through the steps described in this bug report with a matrix job running across slaves on multiple versions of Windows and multiple versions of Linux (CentOS, Debian, and Ubuntu). As far as I can tell, the bug is fixed by the changes made by jglick.

          Mark Waite added a comment - - edited philbeiler the git plugin and git client plugin are being tested in hopes of releasing new versions before the end of June. If you're willing to assist with the testing, please download and install a pre-release build of the git client plugin and the git plugin . Problems detected in the pre-release should be e-mailed to MarkEWaite and ndeloof . I wrote some test ideas if you would like suggestions of areas that need testing. The git plugin supports many different use cases and its automated tests only evaluate a very few of those use cases. I ran through the steps described in this bug report with a matrix job running across slaves on multiple versions of Windows and multiple versions of Linux (CentOS, Debian, and Ubuntu). As far as I can tell, the bug is fixed by the changes made by jglick .

          Jesse Glick added a comment -

          I have released Matrix Project 1.5 with this fix.

          Jesse Glick added a comment - I have released Matrix Project 1.5 with this fix.

          Mark Waite added a comment -

          Also included in git plugin 2.4.0 released 18 July 2015

          Mark Waite added a comment - Also included in git plugin 2.4.0 released 18 July 2015

          I've got 1.609.3 LTS installed with git plugin 2.4.0 and matrix project 1.6 and I'm getting lots of

          java.lang.IllegalStateException: /var/lib/jenkins/jobs/java8-1-build-pro-java7-master/builds/1388 already existed; will not overwite with java8-1-build-pro-java7-master #1388
          at hudson.model.RunMap.put(RunMap.java:187)
          at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178)
          at hudson.model.AbstractProject.newBuild(AbstractProject.java:1010)
          at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1209)
          at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
          at hudson.model.Executor$1.call(Executor.java:335)
          at hudson.model.Executor$1.call(Executor.java:317)
          at hudson.model.Queue._withLock(Queue.java:1348)
          at hudson.model.Queue.withLock(Queue.java:1213)
          at hudson.model.Executor.run(Executor.java:317)

          I recently started using github push triggers (manually configured). this ticket tldr. Did i miss something (sorry)? I'm limping along with this. Thanks.

          philip rosegay added a comment - I've got 1.609.3 LTS installed with git plugin 2.4.0 and matrix project 1.6 and I'm getting lots of java.lang.IllegalStateException: /var/lib/jenkins/jobs/java8-1-build-pro-java7-master/builds/1388 already existed; will not overwite with java8-1-build-pro-java7-master #1388 at hudson.model.RunMap.put(RunMap.java:187) at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178) at hudson.model.AbstractProject.newBuild(AbstractProject.java:1010) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1209) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144) at hudson.model.Executor$1.call(Executor.java:335) at hudson.model.Executor$1.call(Executor.java:317) at hudson.model.Queue._withLock(Queue.java:1348) at hudson.model.Queue.withLock(Queue.java:1213) at hudson.model.Executor.run(Executor.java:317) I recently started using github push triggers (manually configured). this ticket tldr. Did i miss something (sorry)? I'm limping along with this. Thanks.

          Jesse Glick added a comment -

          prosegay if you have applied the relevant plugin updates then you are hitting a different bug with the same symptom but a distinct cause, which would be tracked separately. Without knowing how to reproduce we are unlikely to be able to help.

          Jesse Glick added a comment - prosegay if you have applied the relevant plugin updates then you are hitting a different bug with the same symptom but a distinct cause, which would be tracked separately. Without knowing how to reproduce we are unlikely to be able to help.

          philip rosegay added a comment - - edited

          ya think? seems unlikely as the stack trace looked the same to me. what do you need?

          philip rosegay added a comment - - edited ya think? seems unlikely as the stack trace looked the same to me. what do you need?

          Jesse Glick added a comment -

          Jenkins core is simply reporting an illegal condition which something else was responsible for producing. Other than the original report here, all cases have been tracked back to a combination of bugs in the Git and Matrix Project plugins, both of which were fixed. What is causing the problem in your case I do not know.

          Jesse Glick added a comment - Jenkins core is simply reporting an illegal condition which something else was responsible for producing. Other than the original report here, all cases have been tracked back to a combination of bugs in the Git and Matrix Project plugins, both of which were fixed. What is causing the problem in your case I do not know.

          didn't answer the question: what do you need to isolate to root cause?

          philip rosegay added a comment - didn't answer the question: what do you need to isolate to root cause?

          Mark Waite added a comment -

          Usually, isolating a problem to root cause needs enough description of the distinct conditions which caused the problem so that someone else can duplicate the problem.

          In this case, it may need a support bundle (to show the versions of various plugins installed), a copy of the job definition (or a detailed enough description of the job definition that someone can recreate the job from that description), and a description of the actions taken to show the problem.

          Mark Waite added a comment - Usually, isolating a problem to root cause needs enough description of the distinct conditions which caused the problem so that someone else can duplicate the problem. In this case, it may need a support bundle (to show the versions of various plugins installed), a copy of the job definition (or a detailed enough description of the job definition that someone can recreate the job from that description), and a description of the actions taken to show the problem.

          Jesse Glick added a comment -

          Right, ideally a way to reproduce the problem from scratch. When that cannot be found, there may be some clues that are helpful (for example: only happens when a certain plugin is installed/configured), but in general the problem may not be fixable. For certain bugs of course the error message/stack trace suffices to guess at a diagnosis. Unfortunately that is not the case here.

          Jesse Glick added a comment - Right, ideally a way to reproduce the problem from scratch. When that cannot be found, there may be some clues that are helpful (for example: only happens when a certain plugin is installed/configured), but in general the problem may not be fixable. For certain bugs of course the error message/stack trace suffices to guess at a diagnosis. Unfortunately that is not the case here.

          David Harris added a comment -

          Is this the best/current bug to discuss the 'dead node' issue? Should I be using JENKINS-29268 or something else instead?

          This bug is affecting us pretty badly (team of ~15 developers) and we're willing to put some work in to help get it fixed.

          Thanks in advance

          David Harris added a comment - Is this the best/current bug to discuss the 'dead node' issue? Should I be using JENKINS-29268 or something else instead? This bug is affecting us pretty badly (team of ~15 developers) and we're willing to put some work in to help get it fixed. Thanks in advance

          Mark Waite added a comment -

          I don't think a resolved issue is generally a good place to discuss an issue you're currently seeing. If there is an open issue that is the same, or appears strongly related, that would be a reasonable place for the discussion.

          If the issue is still there, and this describes that issue, then I think this bug should be reopened.

          Mark Waite added a comment - I don't think a resolved issue is generally a good place to discuss an issue you're currently seeing. If there is an open issue that is the same, or appears strongly related, that would be a reasonable place for the discussion. If the issue is still there, and this describes that issue, then I think this bug should be reopened.

          Jesse Glick added a comment -

          Do not reopen this issue.

          There is a known but unconfirmed occurrence when using reload-from-disk. If you know of some other means of reproducing, mention in JENKINS-27530.

          Jesse Glick added a comment - Do not reopen this issue. There is a known but unconfirmed occurrence when using reload-from-disk. If you know of some other means of reproducing, mention in JENKINS-27530 .

            jglick Jesse Glick
            msinclair Mark Sinclair
            Votes:
            6 Vote for this issue
            Watchers:
            22 Start watching this issue

              Created:
              Updated:
              Resolved: