Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-26582

ISE from RunMap.put using /git/notifyCommit on a matrix project

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Jan 23, 2015 11:43:52 PM hudson.model.Executor run
      SEVERE: Unexpected executor death
      java.lang.IllegalStateException: /MY_DIR/jenkins/home/jobs/MY_JOB/builds/29 already existed; will not overwite with MY_JOB #29
      at hudson.model.RunMap.put(RunMap.java:187)
      at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178)
      at hudson.model.AbstractProject.newBuild(AbstractProject.java:1001)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1200)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
      at hudson.model.Executor.run(Executor.java:213)

      I tried to manually start a job via the run button. I see the Dead indicator in the executor status sidebar. I was able to restart the thread and then manually start the job again. This time it started with #30 and appears to be running OK.

      Looking at JENKINS_HOME dir in linux, I see that there is a build directory for #29 and the job ran and failed (unrelated). Looking at the Jenkins webpage for the job, the build skips from 28 to (the now running) 30. #29 has no listing.

      I know some items related to the major enhancement are not closed yet, like JENKINS-23152, so please close this if the behavior I'm seeing is expected until the next release. If this is unexpected - hopefully this report is informative - this issue happened 4x in a row for each job I had put in the build queue.

        Attachments

        1. config.xml
          5 kB
        2. screenshot-1.png
          screenshot-1.png
          16 kB
        3. screenshot-2.png
          screenshot-2.png
          15 kB
        4. screenshot-3.png
          screenshot-3.png
          32 kB

          Issue Links

            Activity

            msinclair Mark Sinclair created issue -
            danielbeck Daniel Beck made changes -
            Field Original Value New Value
            Link This issue is related to JENKINS-24380 [ JENKINS-24380 ]
            Hide
            danielbeck Daniel Beck added a comment -

            This may be a legitimate bug revealed by the changes for JENKINS-24380 (I actually asked for this to be an error condition to prevent overriding existing builds).

            Are you using Gerrit Trigger Plugin?

            Is there a build.xml in 29 (while Jenkins still thinks the next build should be 29)?

            Show
            danielbeck Daniel Beck added a comment - This may be a legitimate bug revealed by the changes for JENKINS-24380 (I actually asked for this to be an error condition to prevent overriding existing builds). Are you using Gerrit Trigger Plugin? Is there a build.xml in 29 (while Jenkins still thinks the next build should be 29)?
            Hide
            msinclair Mark Sinclair added a comment -

            Gerrit Trigger Plugin is not installed.

            Yes, there is a build.xml for build29.

            Show
            msinclair Mark Sinclair added a comment - Gerrit Trigger Plugin is not installed. Yes, there is a build.xml for build29.
            Hide
            danielbeck Daniel Beck added a comment -

            Does Jenkins show a build 29 that corresponds with the builds/29/build.xml file? Or does it stop at build 28?

            Show
            danielbeck Daniel Beck added a comment - Does Jenkins show a build 29 that corresponds with the builds/29/build.xml file? Or does it stop at build 28?
            Hide
            msinclair Mark Sinclair added a comment -

            On the Jenkins project web page, I see all builds through 28, 29 is missing, and then I see build 30.
            In Linux JENKINS_HOME/jobs/MY_JOB/builds/ shows all builds, including 29.

            Show
            msinclair Mark Sinclair added a comment - On the Jenkins project web page, I see all builds through 28, 29 is missing, and then I see build 30. In Linux JENKINS_HOME/jobs/MY_JOB/builds/ shows all builds, including 29.
            Hide
            danielbeck Daniel Beck added a comment -

            What kind of projects (e.g. Freestyle, Maven, Matrix) are affected?

            When you restart Jenkins, does build 29 appear afterwards?

            Show
            danielbeck Daniel Beck added a comment - What kind of projects (e.g. Freestyle, Maven, Matrix) are affected? When you restart Jenkins, does build 29 appear afterwards?
            msinclair Mark Sinclair made changes -
            Attachment config.xml [ 28419 ]
            Hide
            msinclair Mark Sinclair added a comment -

            After soft restart of Jenkins, the missing build re-appears (#29).

            All projects are freestyle. I attached my config.xml as well if that will help (replaced some fields with XXX).

            Show
            msinclair Mark Sinclair added a comment - After soft restart of Jenkins, the missing build re-appears (#29). All projects are freestyle. I attached my config.xml as well if that will help (replaced some fields with XXX).
            Hide
            jglick Jesse Glick added a comment -

            The only suspicious thing I see in the job config is the use of the Heavy Job plugin. Is this problem at all reproducible for you? If so, does skipping use of Heavy Job fix it?

            The critical diagnostic which was not mentioned here was whether nextBuildNumber existed and if so what it said. It is supposed to point to the next build number which should be created. If for some reason it failed to be updated, Jenkins would try to recreate a build with the same number. In the past this mistake would have resulted in an earlier build being silently overwritten. As of 1.597 it is caught.

            The same applies to bugs like JENKINS-23152, with a different cause (objects held in memory with stale contents).

            Show
            jglick Jesse Glick added a comment - The only suspicious thing I see in the job config is the use of the Heavy Job plugin. Is this problem at all reproducible for you? If so, does skipping use of Heavy Job fix it? The critical diagnostic which was not mentioned here was whether nextBuildNumber existed and if so what it said. It is supposed to point to the next build number which should be created. If for some reason it failed to be updated, Jenkins would try to recreate a build with the same number. In the past this mistake would have resulted in an earlier build being silently overwritten. As of 1.597 it is caught. The same applies to bugs like JENKINS-23152 , with a different cause (objects held in memory with stale contents).
            jglick Jesse Glick made changes -
            Link This issue is related to JENKINS-26739 [ JENKINS-26739 ]
            Hide
            msinclair Mark Sinclair added a comment -

            I had posted a comment a couple of weeks ago, but it's not showing up here.

            In any case I was just able to reproduce the problem today. It's been about 3 weeks since I last saw the problem. I got exactly the same thing to happen today.

            Interesting to note, I was doing some configuration updates behind the scenes by editing config.xml for many jobs. Then to load the config I called out JenkinsURL/reload from my browser.

            I wonder if it's picking up an old version of nextBuildNumber when the reload occurs? Some of the jobs were running when I reloaded. I wonder if nextBuildNumber only gets updated when the job completes?

            Show
            msinclair Mark Sinclair added a comment - I had posted a comment a couple of weeks ago, but it's not showing up here. In any case I was just able to reproduce the problem today. It's been about 3 weeks since I last saw the problem. I got exactly the same thing to happen today. Interesting to note, I was doing some configuration updates behind the scenes by editing config.xml for many jobs. Then to load the config I called out JenkinsURL/reload from my browser. I wonder if it's picking up an old version of nextBuildNumber when the reload occurs? Some of the jobs were running when I reloaded. I wonder if nextBuildNumber only gets updated when the job completes?
            Hide
            jglick Jesse Glick added a comment -

            No, nextBuildNumber is incremented the moment a new build is created, before it even really starts (right after it leaves the queue).

            Reloading from disk is a plausible explanation; perhaps there are two copies of the Job sitting around temporarily, each with its own version of the field.

            Show
            jglick Jesse Glick added a comment - No, nextBuildNumber is incremented the moment a new build is created, before it even really starts (right after it leaves the queue). Reloading from disk is a plausible explanation; perhaps there are two copies of the Job sitting around temporarily, each with its own version of the field.
            Hide
            msinclair Mark Sinclair added a comment -

            There were many jobs in the queue and some running when the reload happened. Maybe it is safer quietDown, clear the queue, then reload?

            Show
            msinclair Mark Sinclair added a comment - There were many jobs in the queue and some running when the reload happened. Maybe it is safer quietDown, clear the queue, then reload?
            Hide
            jglick Jesse Glick added a comment -

            Deals with freestyle projects so probably distinct from JENKINS-26739.

            Show
            jglick Jesse Glick added a comment - Deals with freestyle projects so probably distinct from JENKINS-26739 .
            jglick Jesse Glick made changes -
            Assignee Jesse Glick [ jglick ]
            Description
            Jan 23, 2015 11:43:52 PM hudson.model.Executor run
            SEVERE: Unexpected executor death
            java.lang.IllegalStateException: /MY_DIR/jenkins/home/jobs/MY_JOB/builds/29 already existed; will not overwite with MY_JOB #29
            at hudson.model.RunMap.put(RunMap.java:187)
            at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178)
            at hudson.model.AbstractProject.newBuild(AbstractProject.java:1001)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1200)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
            at hudson.model.Executor.run(Executor.java:213)

            I tried to manually start a job via the run button. I see the Dead(!) indicator in the executor status sidebar. I was able to restart the thread and then manually start the job again. This time it started with #30 and appears to be running OK.

            Looking at JENKINS_HOME dir in linux, I see that there is a build directory for #29 and the job ran and failed (unrelated). Looking at the Jenkins webpage for the job, the build skips from 28 to (the now running) 30. #29 has no listing.

            I know some items related to the major enhancement are not closed yet, like JENKINS-23152, so please close this if the behavior I'm seeing is expected until the next release. If this is unexpected - hopefully this report is informative - this issue happened 4x in a row for each job I had put in the build queue.

            Jan 23, 2015 11:43:52 PM hudson.model.Executor run
            SEVERE: Unexpected executor death
            java.lang.IllegalStateException: /MY_DIR/jenkins/home/jobs/MY_JOB/builds/29 already existed; will not overwite with MY_JOB #29
            at hudson.model.RunMap.put(RunMap.java:187)
            at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178)
            at hudson.model.AbstractProject.newBuild(AbstractProject.java:1001)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1200)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
            at hudson.model.Executor.run(Executor.java:213)

            I tried to manually start a job via the run button. I see the Dead(!) indicator in the executor status sidebar. I was able to restart the thread and then manually start the job again. This time it started with #30 and appears to be running OK.

            Looking at JENKINS_HOME dir in linux, I see that there is a build directory for #29 and the job ran and failed (unrelated). Looking at the Jenkins webpage for the job, the build skips from 28 to (the now running) 30. #29 has no listing.

            I know some items related to the major enhancement are not closed yet, like JENKINS-23152, so please close this if the behavior I'm seeing is expected until the next release. If this is unexpected - hopefully this report is informative - this issue happened 4x in a row for each job I had put in the build queue.

            Labels regression
            Summary Unexpected Executor Death - build already existed - 1.597 ISE from RunMap.put after reloading config
            jglick Jesse Glick made changes -
            Link This issue is blocking JENKINS-24380 [ JENKINS-24380 ]
            jglick Jesse Glick made changes -
            Link This issue is related to JENKINS-24380 [ JENKINS-24380 ]
            Hide
            jglick Jesse Glick added a comment -

            I am guessing the reload operation caused the problem. Do you happen to know how to reproduce?

            Show
            jglick Jesse Glick added a comment - I am guessing the reload operation caused the problem. Do you happen to know how to reproduce?
            Hide
            msinclair Mark Sinclair added a comment -

            I don't have a specific way to reproduce. Here are some conditions that were true both times the failure occured:
            -Multiple slaves all busy and about 50 jobs in the build queue. Some jobs have been sitting in the build queue for 12+ hours.
            -Implement configuration changes across many jobs by editing the config.xml files directly in linux/emacs. config.xml is the only file touched.
            -reload configuration via <jenkins-url>/reload from my browser. Jenkins asks me to 'POST' the command, which I hit the "tryPOSTing" button and then it reloads.

            After the reload, when an executor becomes free, it picks up a job from the queue and immediately fails with the dead indicator in the executor status sidebar. When the thread is restarted, it picks up another job from the queue and dies again. The process repeats until the queue is empty.

            It is unclear if jobs that were not in the queue at the time of reload would die. I'm not sure if after the reload all jobs need to die one time or just the jobs that were in the queue at the time of the reload.

            It is true that the problem is self correcting, after a job dies one time, the build number gets corrected and it will run properly the next time.

            This doesn't happen every time I reload config.

            Show
            msinclair Mark Sinclair added a comment - I don't have a specific way to reproduce. Here are some conditions that were true both times the failure occured: -Multiple slaves all busy and about 50 jobs in the build queue. Some jobs have been sitting in the build queue for 12+ hours. -Implement configuration changes across many jobs by editing the config.xml files directly in linux/emacs. config.xml is the only file touched. -reload configuration via <jenkins-url>/reload from my browser. Jenkins asks me to 'POST' the command, which I hit the "tryPOSTing" button and then it reloads. After the reload, when an executor becomes free, it picks up a job from the queue and immediately fails with the dead indicator in the executor status sidebar. When the thread is restarted, it picks up another job from the queue and dies again. The process repeats until the queue is empty. It is unclear if jobs that were not in the queue at the time of reload would die. I'm not sure if after the reload all jobs need to die one time or just the jobs that were in the queue at the time of the reload. It is true that the problem is self correcting, after a job dies one time, the build number gets corrected and it will run properly the next time. This doesn't happen every time I reload config.
            Hide
            benjsc Benjamin Close added a comment -

            Noticed this recently after an upgrade, and might be related, running 1.602
            Reported here rather than JENKINS-26739 since it's already closed and this bug relates more to RunMap.put rather than lazyput. The matrix build below has one configuration runs on one node with 1 executor so concurrency here shouldn't be an issue.

            I get:
            Mar 11, 2015 4:07:24 PM hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
            INFO: Scheduling MY_JOB to build commit MY_ID
            Mar 11, 2015 4:07:24 PM hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
            INFO: Scheduling MY_JOB » MY_HOST to build commit MY_ID
            Mar 11, 2015 4:07:24 PM hudson.model.Executor run
            SEVERE: Unexpected executor death
            java.lang.IllegalStateException: /MY_DIR/jobs/MY_JOB/builds/20 already existed; will not overwite with MY_JOB
            /label=MY_JOB #20
            at hudson.model.RunMap.put(RunMap.java:187)
            at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
            at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
            at hudson.model.Executor.run(Executor.java:213)

            I get this almost every triggered build. Manually restarting the thread sees the build succeed.
            Interestingly, on the filesystem I see build 21 already exists when it fails:

            MY_DIR/jobs/MY_JOB/builds$ ls -lR 20 21 ../nextBuildNumber
            rw-rr- 1 jenkins jenkins 3 Mar 11 16:07 ../nextBuildNumber

            20:
            total 16
            rw-rr- 1 jenkins jenkins 6488 Mar 11 14:11 build.xml
            rw-rr- 1 jenkins jenkins 478 Mar 11 14:04 changelog.xml
            rw-rr- 1 jenkins jenkins 2424 Mar 11 14:11 log

            21:
            total 8
            rw-rr- 1 jenkins jenkins 1723 Mar 11 16:07 changelog.xml
            rw-rr- 1 jenkins jenkins 2110 Mar 11 16:08 log
            MY_DIR/jobs/MY_JOB/builds$ cat ../nextBuildNumber
            22

            Job 20 shows up in the gui as completed, job 21 still running (executor crashed)
            So it looks like the wrong build number is being picked up (note the times of build 21 and the log message 4:07)

            Show
            benjsc Benjamin Close added a comment - Noticed this recently after an upgrade, and might be related, running 1.602 Reported here rather than JENKINS-26739 since it's already closed and this bug relates more to RunMap.put rather than lazyput. The matrix build below has one configuration runs on one node with 1 executor so concurrency here shouldn't be an issue. I get: Mar 11, 2015 4:07:24 PM hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit INFO: Scheduling MY_JOB to build commit MY_ID Mar 11, 2015 4:07:24 PM hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit INFO: Scheduling MY_JOB » MY_HOST to build commit MY_ID Mar 11, 2015 4:07:24 PM hudson.model.Executor run SEVERE: Unexpected executor death java.lang.IllegalStateException: /MY_DIR/jobs/MY_JOB/builds/20 already existed; will not overwite with MY_JOB /label=MY_JOB #20 at hudson.model.RunMap.put(RunMap.java:187) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144) at hudson.model.Executor.run(Executor.java:213) I get this almost every triggered build. Manually restarting the thread sees the build succeed. Interestingly, on the filesystem I see build 21 already exists when it fails: MY_DIR/jobs/MY_JOB/builds$ ls -lR 20 21 ../nextBuildNumber rw-r r - 1 jenkins jenkins 3 Mar 11 16:07 ../nextBuildNumber 20: total 16 rw-r r - 1 jenkins jenkins 6488 Mar 11 14:11 build.xml rw-r r - 1 jenkins jenkins 478 Mar 11 14:04 changelog.xml rw-r r - 1 jenkins jenkins 2424 Mar 11 14:11 log 21: total 8 rw-r r - 1 jenkins jenkins 1723 Mar 11 16:07 changelog.xml rw-r r - 1 jenkins jenkins 2110 Mar 11 16:08 log MY_DIR/jobs/MY_JOB/builds$ cat ../nextBuildNumber 22 Job 20 shows up in the gui as completed, job 21 still running (executor crashed) So it looks like the wrong build number is being picked up (note the times of build 21 and the log message 4:07)
            Hide
            gbougeard Greg BOUGEARD added a comment -

            Hi, we have the following stacktrace after updating a 1.596 version to 1.605 (and 1.606) :

            Mar 25, 2015 12:57:18 PM SEVERE hudson.model.Executor run
              Unexpected executor death
              java.lang.IllegalStateException: /var/lib/jenkins/jobs/service-mysql-migrations_master/configurations/axis-BASE_TAG/prod/builds/218 already existed; will not overwite with service-mysql-migrations_master/BASE_TAG=prod #218
            	at hudson.model.RunMap.put(RunMap.java:187)
            	at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
            	at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
            	at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
            	at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
            
            Show
            gbougeard Greg BOUGEARD added a comment - Hi, we have the following stacktrace after updating a 1.596 version to 1.605 (and 1.606) : Mar 25, 2015 12:57:18 PM SEVERE hudson.model.Executor run Unexpected executor death java.lang.IllegalStateException: / var /lib/jenkins/jobs/service-mysql-migrations_master/configurations/axis-BASE_TAG/prod/builds/218 already existed; will not overwite with service-mysql-migrations_master/BASE_TAG=prod #218 at hudson.model.RunMap.put(RunMap.java:187) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
            Hide
            benjsc Benjamin Close added a comment -

            Update to this issue. It's definitely seems related to multiple new jobs starting at the same time.
            Once the executor dies, jobs queue up. If I then restart the thread with no new jobs starting the pending jobs queue will clear itself over time. If another job comes along at the right time, the thread death may occur.

            Show
            benjsc Benjamin Close added a comment - Update to this issue. It's definitely seems related to multiple new jobs starting at the same time. Once the executor dies, jobs queue up. If I then restart the thread with no new jobs starting the pending jobs queue will clear itself over time. If another job comes along at the right time, the thread death may occur.
            Hide
            dalexander Damion Alexander added a comment -

            I have been able to consistently reproduce the thread death with "Stash Webhook to Jenkins" (v2.6) triggering a Multi-Configuration job. Our installation is relatively new, and this is the first time we setup anything to do triggering of jobs.

            Log of session

            Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
            Scheduling BUILD: Code Deploy to build commit 7410c63de6e0431ee7c831bfdc1f2ba909b00385
            Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
            Scheduling BUILD: Code Deploy » py27 to build commit 7410c63de6e0431ee7c831bfdc1f2ba909b00385
            Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit
            PostCommitHooks are disabled on DEPLOY: code_deploy
            Mar 27, 2015 2:39:52 PM SEVERE hudson.model.Executor run
            Unexpected executor death
            java.lang.IllegalStateException: /var/lib/jenkins/jobs/build_code_deploy/configurations/axis-TOXENV/py27/builds/8 already existed; will not overwite with build_code_deploy/TOXENV=py27 #8
            	at hudson.model.RunMap.put(RunMap.java:187)
            	at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
            	at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
            	at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
            	at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
            	at hudson.model.Executor.run(Executor.java:213)
            
            Mar 27, 2015 2:40:20 PM INFO hudson.model.Run execute
            build_code_deploy/TOXENV=py27 #9 main build action completed: SUCCESS
            Mar 27, 2015 2:40:21 PM INFO hudson.model.Run execute
            build_code_deploy #9 main build action completed: SUCCESS
            Mar 27, 2015 2:40:44 PM INFO hudson.model.Run execute
            deploy_code_deploy #11 main build action completed: SUCCESS
            

            Of interest Build #8 is the previous successful run, build #9 is the current run and succeeds in a different thread.

            Versions

            component version
            Jenkins 1.606
            Stash 3.1.0
            Stash Webhook to Jenkins 2.60
            ShiningPanda plugin 0.21

            Job Configuration
            Job Name: build_code_deploy
            SCM: git (ssh)
            Branches: */master
            Poll SCM: H 0 1 1 *
            Configuration Matrix: Tox; py27
            Build: Tox Builder, $configuration_file=tox.ini, recreate (checked)
            Post-build: trigger job= deploy_code_deploy, Publish JUnit test result, Notify Stash Instance

            Stash Plugin:
            Configured to only build from Master

            Show
            dalexander Damion Alexander added a comment - I have been able to consistently reproduce the thread death with "Stash Webhook to Jenkins" (v2.6) triggering a Multi-Configuration job. Our installation is relatively new, and this is the first time we setup anything to do triggering of jobs. Log of session Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit Scheduling BUILD: Code Deploy to build commit 7410c63de6e0431ee7c831bfdc1f2ba909b00385 Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit Scheduling BUILD: Code Deploy » py27 to build commit 7410c63de6e0431ee7c831bfdc1f2ba909b00385 Mar 27, 2015 2:39:52 PM INFO hudson.plugins.git.GitStatus$JenkinsAbstractProjectListener onNotifyCommit PostCommitHooks are disabled on DEPLOY: code_deploy Mar 27, 2015 2:39:52 PM SEVERE hudson.model.Executor run Unexpected executor death java.lang.IllegalStateException: / var /lib/jenkins/jobs/build_code_deploy/configurations/axis-TOXENV/py27/builds/8 already existed; will not overwite with build_code_deploy/TOXENV=py27 #8 at hudson.model.RunMap.put(RunMap.java:187) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144) at hudson.model.Executor.run(Executor.java:213) Mar 27, 2015 2:40:20 PM INFO hudson.model.Run execute build_code_deploy/TOXENV=py27 #9 main build action completed: SUCCESS Mar 27, 2015 2:40:21 PM INFO hudson.model.Run execute build_code_deploy #9 main build action completed: SUCCESS Mar 27, 2015 2:40:44 PM INFO hudson.model.Run execute deploy_code_deploy #11 main build action completed: SUCCESS Of interest Build #8 is the previous successful run, build #9 is the current run and succeeds in a different thread. Versions component version Jenkins 1.606 Stash 3.1.0 Stash Webhook to Jenkins 2.60 ShiningPanda plugin 0.21 Job Configuration Job Name: build_code_deploy SCM: git (ssh) Branches: */master Poll SCM: H 0 1 1 * Configuration Matrix: Tox; py27 Build: Tox Builder, $configuration_file=tox.ini, recreate (checked) Post-build: trigger job= deploy_code_deploy, Publish JUnit test result, Notify Stash Instance Stash Plugin: Configured to only build from Master
            Hide
            jglick Jesse Glick added a comment -

            Hmm, do not have Stash available, so I wonder if there is a way to reproduce without using proprietary tools. I would be tempted to ascribe this to yet another weird bug in the matrix-project plugin except that there are some reports from people using freestyle as well.

            Show
            jglick Jesse Glick added a comment - Hmm, do not have Stash available, so I wonder if there is a way to reproduce without using proprietary tools. I would be tempted to ascribe this to yet another weird bug in the matrix-project plugin except that there are some reports from people using freestyle as well.
            jglick Jesse Glick made changes -
            Link This issue is related to JENKINS-23152 [ JENKINS-23152 ]
            Hide
            jglick Jesse Glick added a comment -

            Since the next LTS baseline is looming and there is no progress in finding a test case for this, I filed a PR to at least make the error nonfatal.

            Show
            jglick Jesse Glick added a comment - Since the next LTS baseline is looming and there is no progress in finding a test case for this, I filed a PR to at least make the error nonfatal.
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PR 1630 (Web Link)" [ 12188 ]
            Hide
            dalexander Damion Alexander added a comment -

            The Stash Webhook plugin makes a GET request to the jenkins server. So one should just need Jenkins, the Git plugins (scm-api, git client plugin, git plugin), and something to serve git.

            GET /git/notifyCommit?url=${URL_ENCODED_GIT_URL}&branches=master&sha1=${COMMIT}
            
            Show
            dalexander Damion Alexander added a comment - The Stash Webhook plugin makes a GET request to the jenkins server. So one should just need Jenkins, the Git plugins (scm-api, git client plugin, git plugin), and something to serve git. GET /git/notifyCommit?url=${URL_ENCODED_GIT_URL}&branches=master&sha1=${COMMIT}
            Hide
            gbougeard Greg BOUGEARD added a comment -

            Yep, we're using a freestyle project with a matrix configuration

            Show
            gbougeard Greg BOUGEARD added a comment - Yep, we're using a freestyle project with a matrix configuration
            danielbeck Daniel Beck made changes -
            Link This issue is duplicated by JENKINS-27853 [ JENKINS-27853 ]
            Hide
            leafduo Zuyang Kou added a comment -

            We could reliably reproduce this issue by triggering a matrix configuration job via git plugin

            Show
            leafduo Zuyang Kou added a comment - We could reliably reproduce this issue by triggering a matrix configuration job via git plugin
            Hide
            philbeiler philbeiler added a comment -

            Just curious if this is being actively worked or if there is any kind of timeline for resolution? This has completely killed my environment, as all of our jobs are Matrix builds (with automated feature branch/job creation and web hook pushes from GIT to kick off the jobs). Within an hour, all nodes on our farm are full of dead threads – Jenkins is basically down and unusable... serious bummer!

            Are there any work-arounds? Or do I have to change all of my jobs to polling and give up on the hooks? Thanks for any information.

            Show
            philbeiler philbeiler added a comment - Just curious if this is being actively worked or if there is any kind of timeline for resolution? This has completely killed my environment, as all of our jobs are Matrix builds (with automated feature branch/job creation and web hook pushes from GIT to kick off the jobs). Within an hour, all nodes on our farm are full of dead threads – Jenkins is basically down and unusable... serious bummer! Are there any work-arounds? Or do I have to change all of my jobs to polling and give up on the hooks? Thanks for any information.
            Hide
            jglick Jesse Glick added a comment -

            This is not being actively worked on because there is no known way to reproduce from scratch.

            I did file a PR to downgrade the error to a warning, closer to the pre-1.597 state (which allowed builds to be overwritten silently), but it was rejected. No problem to reopen that discussion, though of course providing developers with a way to reproduce the problem would be far better.

            Show
            jglick Jesse Glick added a comment - This is not being actively worked on because there is no known way to reproduce from scratch. I did file a PR to downgrade the error to a warning, closer to the pre-1.597 state (which allowed builds to be overwritten silently), but it was rejected . No problem to reopen that discussion, though of course providing developers with a way to reproduce the problem would be far better.
            Hide
            danielbeck Daniel Beck added a comment -

            Would be great if we knew how to reproduce this. Not something vague, but full steps to reproduce on a newly set up instance. If you're experiencing this issue, consider spinning up a second Jenkins instance (could be your desktop machine) to try to make this reproducible based on what you know about your production instance. Only once we know the circumstances for this to happen, we can investigate them and fix the underlying problem.

            Show
            danielbeck Daniel Beck added a comment - Would be great if we knew how to reproduce this. Not something vague, but full steps to reproduce on a newly set up instance. If you're experiencing this issue, consider spinning up a second Jenkins instance (could be your desktop machine) to try to make this reproducible based on what you know about your production instance. Only once we know the circumstances for this to happen, we can investigate them and fix the underlying problem.
            philbeiler philbeiler made changes -
            Attachment screenshot-1.png [ 29753 ]
            philbeiler philbeiler made changes -
            Attachment screenshot-2.png [ 29754 ]
            Hide
            philbeiler philbeiler added a comment - - edited

            Fortunately, I was able to easily duplicate the problem in about 5 minutes, first crack! Hopefully you can too!

            Ubuntu 15.04 - Does not matter, not what I'm running in prod
            Jenkins 1.612 - Does not matter, not what I'm running in prod – seems like this problem started around 1.597??? Not really sure when.
            Install Git Plugin
            Two slaves s1 and s2 both on localhost - each with 2 executors s1 label S1, s2 label with S2 (I run no executors on master in prod, just an FYI)
            Create a Matrix job
            GIT REPO https://github.com/allegro/axion-release-plugin.git (use this repo, as I'm using a hash for this repository - no auth required )
            Branches to build: origin/master
            Additional Behavoir -> Local Branch Name: master
            Schedule: H 0 1 1 0
            Configuration Matrix Label Expression label_exp S1 S2

            Build Execute Shell Command echo "--------------->"$label_exp

            Per the gentleman's comment above – simply post this url in another browser tab and watch your threads die... Keep submitting, and they all die...

            http://localhost:8080/jenkins//git/notifyCommit?url=https%3A%2F%2Fgithub.com%2Fallegro%2Faxion-release-plugin.git&branches=master&sha1=a5de4d725814ff907a8cf1f5666b9f01e8361655

            SEVERE: Unexpected executor death
            java.lang.IllegalStateException: /usr/share/tomcat8/.jenkins/jobs/test/configurations/axis-label_exp/S1/builds/3 already existed; will not overwite with test/label_exp=S1 #3
            at hudson.model.RunMap.put(RunMap.java:189)
            at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
            at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1210)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
            at hudson.model.Executor$1.call(Executor.java:328)
            at hudson.model.Executor$1.call(Executor.java:310)
            at hudson.model.Queue._withLock(Queue.java:1251)
            at hudson.model.Queue.withLock(Queue.java:1189)
            at hudson.model.Executor.run(Executor.java:310)

            Let me know if this does not work, or you want some config files. I hope it is easily reproducible..and quickly resolved – this is killing me! Thanks so much for your help...

            This is even worse.. I lose my node after a while – I saw this in production and but I did not think it was related...

            Now I have these messages in the log...

            May 08, 2015 7:15:45 AM hudson.triggers.SafeTimerTask run
            SEVERE: Timer task hudson.slaves.NodeProvisioner$NodeProvisionerInvoker@7c2aef8c failed
            java.lang.NullPointerException
            at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:624)
            at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:619)
            at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:352)
            at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:332)
            at hudson.slaves.NodeProvisioner$2.run(NodeProvisioner.java:261)
            at hudson.model.Queue._withLock(Queue.java:1212)
            at hudson.model.Queue.withLock(Queue.java:1148)
            at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:208)
            at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:57)
            at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:778)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)

            May 08, 2015 7:15:51 AM hudson.triggers.SafeTimerTask run
            SEVERE: Timer task hudson.model.LoadStatistics$LoadStatisticsUpdater@40623d93 failed
            java.lang.NullPointerException
            at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:624)
            at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:619)
            at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:352)
            at hudson.model.LoadStatistics$LoadStatisticsUpdater.doRun(LoadStatistics.java:394)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)

            I restart tomcat and my slave returns.... that is not very cool! But at least it came back!!
            Thanks again...

            Show
            philbeiler philbeiler added a comment - - edited Fortunately, I was able to easily duplicate the problem in about 5 minutes, first crack! Hopefully you can too! Ubuntu 15.04 - Does not matter, not what I'm running in prod Jenkins 1.612 - Does not matter, not what I'm running in prod – seems like this problem started around 1.597??? Not really sure when. Install Git Plugin Two slaves s1 and s2 both on localhost - each with 2 executors s1 label S1, s2 label with S2 (I run no executors on master in prod, just an FYI) Create a Matrix job GIT REPO https://github.com/allegro/axion-release-plugin.git (use this repo, as I'm using a hash for this repository - no auth required ) Branches to build: origin/master Additional Behavoir -> Local Branch Name: master Schedule: H 0 1 1 0 Configuration Matrix Label Expression label_exp S1 S2 Build Execute Shell Command echo "--------------->"$label_exp Per the gentleman's comment above – simply post this url in another browser tab and watch your threads die... Keep submitting, and they all die... http://localhost:8080/jenkins//git/notifyCommit?url=https%3A%2F%2Fgithub.com%2Fallegro%2Faxion-release-plugin.git&branches=master&sha1=a5de4d725814ff907a8cf1f5666b9f01e8361655 SEVERE: Unexpected executor death java.lang.IllegalStateException: /usr/share/tomcat8/.jenkins/jobs/test/configurations/axis-label_exp/S1/builds/3 already existed; will not overwite with test/label_exp=S1 #3 at hudson.model.RunMap.put(RunMap.java:189) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1210) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144) at hudson.model.Executor$1.call(Executor.java:328) at hudson.model.Executor$1.call(Executor.java:310) at hudson.model.Queue._withLock(Queue.java:1251) at hudson.model.Queue.withLock(Queue.java:1189) at hudson.model.Executor.run(Executor.java:310) Let me know if this does not work, or you want some config files. I hope it is easily reproducible..and quickly resolved – this is killing me! Thanks so much for your help... This is even worse.. I lose my node after a while – I saw this in production and but I did not think it was related... Now I have these messages in the log... May 08, 2015 7:15:45 AM hudson.triggers.SafeTimerTask run SEVERE: Timer task hudson.slaves.NodeProvisioner$NodeProvisionerInvoker@7c2aef8c failed java.lang.NullPointerException at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:624) at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:619) at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:352) at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:332) at hudson.slaves.NodeProvisioner$2.run(NodeProvisioner.java:261) at hudson.model.Queue._withLock(Queue.java:1212) at hudson.model.Queue.withLock(Queue.java:1148) at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:208) at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:57) at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:778) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) May 08, 2015 7:15:51 AM hudson.triggers.SafeTimerTask run SEVERE: Timer task hudson.model.LoadStatistics$LoadStatisticsUpdater@40623d93 failed java.lang.NullPointerException at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:624) at hudson.model.LoadStatistics$LoadStatisticsSnapshot$Builder.with(LoadStatistics.java:619) at hudson.model.LoadStatistics.computeSnapshot(LoadStatistics.java:352) at hudson.model.LoadStatistics$LoadStatisticsUpdater.doRun(LoadStatistics.java:394) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I restart tomcat and my slave returns.... that is not very cool! But at least it came back!! Thanks again...
            Hide
            philbeiler philbeiler added a comment -

            Was anyone able to replicate this problem? Just want to help out where I can, and am hoping for a quick resolution. Thanks.

            Show
            philbeiler philbeiler added a comment - Was anyone able to replicate this problem? Just want to help out where I can, and am hoping for a quick resolution. Thanks.
            Hide
            jglick Jesse Glick added a comment -

            I plan to try.

            Show
            jglick Jesse Glick added a comment - I plan to try.
            jglick Jesse Glick made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            jglick Jesse Glick added a comment -

            I get this ISE the second and subsequent time I trigger the build. The first time I get something different (with matrix-project 1.4.1):

            java.io.IOException: cannot start a build of JENKINS-26582/label_exp=S1 since its parent has no builds at all
            	at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:276)
            	at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
            	at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1210)
            	at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
            	at hudson.model.Executor$1.call(Executor.java:328)
            	at …
            

            Will investigate both.

            All the builds actually seem to run.

            Show
            jglick Jesse Glick added a comment - I get this ISE the second and subsequent time I trigger the build. The first time I get something different (with matrix-project 1.4.1 ): java.io.IOException: cannot start a build of JENKINS-26582/label_exp=S1 since its parent has no builds at all at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:276) at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1210) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144) at hudson.model.Executor$1.call(Executor.java:328) at … Will investigate both. All the builds actually seem to run.
            Hide
            jglick Jesse Glick added a comment -

            I filed the NPE as JENKINS-28384. It was probably triggered by the executor death here, but is an independent bug.

            Show
            jglick Jesse Glick added a comment - I filed the NPE as JENKINS-28384 . It was probably triggered by the executor death here, but is an independent bug.
            jglick Jesse Glick made changes -
            Link This issue is related to JENKINS-28384 [ JENKINS-28384 ]
            Hide
            jglick Jesse Glick added a comment -

            Here is what I have so far. Despite appearances, this is not a core bug. A change made as part of the implementation of JENKINS-24380 merely made an erroneous condition be displayed as such.

            The critical point (and the reason why existing automated tests did not catch this) is that the bug is not manifested when you trigger a matrix build normally—only when you install the Git plugin and use /git/notifyCommit.

            The main bug is in matrix-project: MatrixConfiguration.newBuild assumes without checking that the newly created MatrixRun in fact has a unique build number. If this was created as part of a MatrixBuild, then that will be true, since the MatrixProject uses the default (sane) implementation of newBuild. And that is exactly what happens—normally.

            But it turns out that there is nothing blocking a MatrixRun from being created other ways. For example, if you ping <job>/label_exp=S1/build you will get this error, because the number of the last parent MatrixBuild is unchanged, and there was already a MatrixRun with that number. Of course you would not normally go to this URL—it is not exposed in the UI—but there is nothing stopping you from doing that, or I guess from using the CLI to do the same.

            I presume the initial …since its parent has no builds at all exception has the same cause: an attempt to directly start a MatrixRun without a parent MatrixBuild.

            Now the reason why this appears with Git notifications is that GitStatus.JenkinsAbstractProjectListener.onNotifyCommit is wrong, too. If you look at SubversionRepositoryStatus or MercurialStatus, you will see that they check whether there is an SCMTrigger for the project which does not ignorePostCommitHooks. If there is not, they print a message (No subversion jobs using SCM polling or all jobs using SCM polling are ignoring post-commit hooks, No SCMTrigger on …), and then do nothing. And if sha1 is unspecified, GitStatus does the same. But if you specify sha1, it immediately schedules a build (with RevisionParameterAction), rather than using polling, even if there is no SCMTrigger.

            For a top-level project, that is probably fine. But MatrixConfiguration.getScm delegates to its parent, so each configuration shows up in the list of projects matching the specified repo. Thus means that the Git plugin schedules not only the top-level MatrixProject into the queue, but each MatrixConfiguration! And when those children are run directly, the error appears because they are trying to create builds which duplicate an existing number: either that of the last matrix build (if the configurations get scheduled before the parent), or that of the current matrix build (if after).

            Show
            jglick Jesse Glick added a comment - Here is what I have so far. Despite appearances, this is not a core bug. A change made as part of the implementation of JENKINS-24380 merely made an erroneous condition be displayed as such. The critical point (and the reason why existing automated tests did not catch this) is that the bug is not manifested when you trigger a matrix build normally—only when you install the Git plugin and use /git/notifyCommit . The main bug is in matrix-project : MatrixConfiguration.newBuild assumes without checking that the newly created MatrixRun in fact has a unique build number. If this was created as part of a MatrixBuild , then that will be true, since the MatrixProject uses the default (sane) implementation of newBuild . And that is exactly what happens—normally. But it turns out that there is nothing blocking a MatrixRun from being created other ways. For example, if you ping <job>/label_exp=S1/build you will get this error, because the number of the last parent MatrixBuild is unchanged, and there was already a MatrixRun with that number. Of course you would not normally go to this URL—it is not exposed in the UI—but there is nothing stopping you from doing that, or I guess from using the CLI to do the same. I presume the initial …since its parent has no builds at all exception has the same cause: an attempt to directly start a MatrixRun without a parent MatrixBuild . Now the reason why this appears with Git notifications is that GitStatus.JenkinsAbstractProjectListener.onNotifyCommit is wrong, too. If you look at SubversionRepositoryStatus or MercurialStatus , you will see that they check whether there is an SCMTrigger for the project which does not ignorePostCommitHooks . If there is not, they print a message ( No subversion jobs using SCM polling or all jobs using SCM polling are ignoring post-commit hooks , No SCMTrigger on … ), and then do nothing. And if sha1 is unspecified, GitStatus does the same. But if you specify sha1 , it immediately schedules a build (with RevisionParameterAction ), rather than using polling, even if there is no SCMTrigger . For a top-level project, that is probably fine. But MatrixConfiguration.getScm delegates to its parent, so each configuration shows up in the list of projects matching the specified repo. Thus means that the Git plugin schedules not only the top-level MatrixProject into the queue, but each MatrixConfiguration ! And when those children are run directly, the error appears because they are trying to create builds which duplicate an existing number: either that of the last matrix build (if the configurations get scheduled before the parent), or that of the current matrix build (if after).
            jglick Jesse Glick made changes -
            Component/s git-plugin [ 15543 ]
            Component/s matrix-project-plugin [ 18765 ]
            Component/s core [ 15593 ]
            Hide
            jglick Jesse Glick added a comment -

            Mark Sinclair sorry to hijack this issue but everyone else seems to be seeing a problem with matrix projects and Git notifications, whereas your case was actually something else, apparently rarer and probably needing some unrelated fix. If you still see it, file it separately (blocking JENKINS-24380).

            Show
            jglick Jesse Glick added a comment - Mark Sinclair sorry to hijack this issue but everyone else seems to be seeing a problem with matrix projects and Git notifications, whereas your case was actually something else, apparently rarer and probably needing some unrelated fix. If you still see it, file it separately (blocking JENKINS-24380 ).
            jglick Jesse Glick made changes -
            Summary ISE from RunMap.put after reloading config ISE from RunMap.put using /git/notifyCommit on a matrix project
            jglick Jesse Glick made changes -
            Remote Link This issue links to "git PR 319 (Web Link)" [ 12905 ]
            Hide
            jglick Jesse Glick added a comment -

            Offering PRs for both plugins. Either fix will avoid the usual symptom, but it is best to have both.

            Show
            jglick Jesse Glick added a comment - Offering PRs for both plugins. Either fix will avoid the usual symptom, but it is best to have both.
            jglick Jesse Glick made changes -
            Remote Link This issue links to "matrix-project PR 19 (Web Link)" [ 12906 ]
            Hide
            integer Kanstantsin Shautsou added a comment -

            Issue can be masked because not all plugins using onNotifyCommit in github-plugin migration is in progress and other plugins just copy-pasted from github-plugin algorithm/code.

            Show
            integer Kanstantsin Shautsou added a comment - Issue can be masked because not all plugins using onNotifyCommit in github-plugin migration is in progress and other plugins just copy-pasted from github-plugin algorithm/code.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            src/main/java/hudson/plugins/git/GitStatus.java
            http://jenkins-ci.org/commit/git-plugin/de3117def8625c57a95126a200e990ab0481948e
            Log:
            [FIXED JENKINS-26582] To trigger a build from notifyCommit, the project must have an SCMTrigger.
            This is true even if it has a matching SCM and sha1 is specified.
            Otherwise we would be triggering MatrixConfiguration, which is illegal and cause errors.

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/hudson/plugins/git/GitStatus.java http://jenkins-ci.org/commit/git-plugin/de3117def8625c57a95126a200e990ab0481948e Log: [FIXED JENKINS-26582] To trigger a build from notifyCommit, the project must have an SCMTrigger. This is true even if it has a matching SCM and sha1 is specified. Otherwise we would be triggering MatrixConfiguration, which is illegal and cause errors.
            scm_issue_link SCM/JIRA link daemon made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Mark Waite
            Path:
            src/main/java/hudson/plugins/git/GitStatus.java
            src/test/java/hudson/plugins/git/GitSCMTest.java
            http://jenkins-ci.org/commit/git-plugin/b88b388aee1085e5d161c578c3f551953b27abf4
            Log:
            Merge pull request #319 from jglick/SCMTrigger-JENKINS-26582

            JENKINS-26582 notifyCommit should ignore projects without SCMTrigger

            Passed tests on multiple platforms.

            Compare: https://github.com/jenkinsci/git-plugin/compare/6c1c49feefb3...b88b388aee10

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Mark Waite Path: src/main/java/hudson/plugins/git/GitStatus.java src/test/java/hudson/plugins/git/GitSCMTest.java http://jenkins-ci.org/commit/git-plugin/b88b388aee1085e5d161c578c3f551953b27abf4 Log: Merge pull request #319 from jglick/SCMTrigger- JENKINS-26582 JENKINS-26582 notifyCommit should ignore projects without SCMTrigger Passed tests on multiple platforms. Compare: https://github.com/jenkinsci/git-plugin/compare/6c1c49feefb3...b88b388aee10
            Hide
            philbeiler philbeiler added a comment - - edited

            I was having a hard time following the comments, if these changes would actually fix the dead thread problem. Now that this fix was integrated into the head (according to emails that I received from the Jenkins server, I decided to retry the test, but I get the same results..

            Show
            philbeiler philbeiler added a comment - - edited I was having a hard time following the comments, if these changes would actually fix the dead thread problem. Now that this fix was integrated into the head (according to emails that I received from the Jenkins server, I decided to retry the test, but I get the same results..
            philbeiler philbeiler made changes -
            Attachment screenshot-3.png [ 29808 ]
            Hide
            philbeiler philbeiler added a comment -

            Show
            philbeiler philbeiler added a comment -
            Hide
            danielbeck Daniel Beck added a comment -

            philbeiler The fixes are in the Git Plugin and have not been released yet.

            Show
            danielbeck Daniel Beck added a comment - philbeiler The fixes are in the Git Plugin and have not been released yet.
            Hide
            msinclair Mark Sinclair added a comment -

            I had originally reported this issue. For my application (freestyle jobs), I see the dead thread problem after a config reload. This resolution won't solve the problem for those that are seeing the problem after reloading configuration (I don't have the Git Plugin).

            Show
            msinclair Mark Sinclair added a comment - I had originally reported this issue. For my application (freestyle jobs), I see the dead thread problem after a config reload. This resolution won't solve the problem for those that are seeing the problem after reloading configuration (I don't have the Git Plugin).
            Hide
            danielbeck Daniel Beck added a comment -

            Mark Sinclair Well, this is a mess then

            Phil's explanation was so good and easily reproducible that Jesse went ahead and fixed it using this issue as reference. Notably, most commenters also used Matrix Projects and provided information and stack traces related to that, so it's understandable that the difference in the original report was missed.

            The cleanest way to move forward would be to file your original issue a second time. Mention that it's not JENKINS-26582, as its resolution helps with Matrix projects and Git plugin, but your problem is with Freestyle projects, to protect it from being hijacked again. I understand this must be frustrating but using this issue going forward, when it was already used in the fixes to Git Plugin, would be too confusing IMO.

            It would also be great if you could make your issue reliably reproducible in some way.

            Show
            danielbeck Daniel Beck added a comment - Mark Sinclair Well, this is a mess then Phil's explanation was so good and easily reproducible that Jesse went ahead and fixed it using this issue as reference. Notably, most commenters also used Matrix Projects and provided information and stack traces related to that, so it's understandable that the difference in the original report was missed. The cleanest way to move forward would be to file your original issue a second time. Mention that it's not JENKINS-26582 , as its resolution helps with Matrix projects and Git plugin, but your problem is with Freestyle projects, to protect it from being hijacked again. I understand this must be frustrating but using this issue going forward, when it was already used in the fixes to Git Plugin, would be too confusing IMO. It would also be great if you could make your issue reliably reproducible in some way.
            Hide
            jglick Jesse Glick added a comment -

            Mark Sinclair perhaps you missed my comment of May 13th where I said essentially the same thing, but perhaps not explained as well as Daniel.

            Show
            jglick Jesse Glick added a comment - Mark Sinclair perhaps you missed my comment of May 13th where I said essentially the same thing, but perhaps not explained as well as Daniel.
            Hide
            jglick Jesse Glick added a comment -

            Please note that there are fixes for this issue in both the Git and Matrix Project plugins. I believe either suffices to avoid the symptom. Not sure about release status, check plugin changelogs.

            Show
            jglick Jesse Glick added a comment - Please note that there are fixes for this issue in both the Git and Matrix Project plugins. I believe either suffices to avoid the symptom. Not sure about release status, check plugin changelogs.
            Hide
            msinclair Mark Sinclair added a comment -

            Thanks - hadn't noticed that. I will get the latest and give it a try when it's released.

            Show
            msinclair Mark Sinclair added a comment - Thanks - hadn't noticed that. I will get the latest and give it a try when it's released.
            Hide
            philbeiler philbeiler added a comment -

            I apologize for for hijacking this issue.. That was not my attempt, as the symptoms/issue produced the same outcome – it just seemed like the right place to jump in!

            I hope I can ask one last question on this, Jesse stated that both the Git and Matrix plugin have been fixed. The matrix plugin has not been released in almost a year, and the Git plugin was last released in February. Is there any plan for releasing them (I'm sure they are not your responsibility), but this but is killing me (or my threads!) Is there some way that I can monitor these fixes (related ticket numbers, etc), as they go thru the automated process, similar to the core Jenkins code? I would just like to know how long I have to deal with this, especially, since you were so kind to fix it weeks ago.
            Thanks again. Phil

            Show
            philbeiler philbeiler added a comment - I apologize for for hijacking this issue.. That was not my attempt, as the symptoms/issue produced the same outcome – it just seemed like the right place to jump in! I hope I can ask one last question on this, Jesse stated that both the Git and Matrix plugin have been fixed. The matrix plugin has not been released in almost a year, and the Git plugin was last released in February. Is there any plan for releasing them (I'm sure they are not your responsibility), but this but is killing me (or my threads!) Is there some way that I can monitor these fixes (related ticket numbers, etc), as they go thru the automated process, similar to the core Jenkins code? I would just like to know how long I have to deal with this, especially, since you were so kind to fix it weeks ago. Thanks again. Phil
            Hide
            markewaite Mark Waite added a comment - - edited

            philbeiler the git plugin and git client plugin are being tested in hopes of releasing new versions before the end of June. If you're willing to assist with the testing, please download and install a pre-release build of the git client plugin and the git plugin. Problems detected in the pre-release should be e-mailed to Mark Waite and Nicolas De Loof.

            I wrote some test ideas if you would like suggestions of areas that need testing. The git plugin supports many different use cases and its automated tests only evaluate a very few of those use cases.

            I ran through the steps described in this bug report with a matrix job running across slaves on multiple versions of Windows and multiple versions of Linux (CentOS, Debian, and Ubuntu). As far as I can tell, the bug is fixed by the changes made by Jesse Glick.

            Show
            markewaite Mark Waite added a comment - - edited philbeiler the git plugin and git client plugin are being tested in hopes of releasing new versions before the end of June. If you're willing to assist with the testing, please download and install a pre-release build of the git client plugin and the git plugin . Problems detected in the pre-release should be e-mailed to Mark Waite and Nicolas De Loof . I wrote some test ideas if you would like suggestions of areas that need testing. The git plugin supports many different use cases and its automated tests only evaluate a very few of those use cases. I ran through the steps described in this bug report with a matrix job running across slaves on multiple versions of Windows and multiple versions of Linux (CentOS, Debian, and Ubuntu). As far as I can tell, the bug is fixed by the changes made by Jesse Glick .
            Hide
            jglick Jesse Glick added a comment -

            I have released Matrix Project 1.5 with this fix.

            Show
            jglick Jesse Glick added a comment - I have released Matrix Project 1.5 with this fix.
            markewaite Mark Waite made changes -
            Link This issue is duplicated by JENKINS-28865 [ JENKINS-28865 ]
            jglick Jesse Glick made changes -
            Link This issue is duplicated by JENKINS-28865 [ JENKINS-28865 ]
            Hide
            markewaite Mark Waite added a comment -

            Also included in git plugin 2.4.0 released 18 July 2015

            Show
            markewaite Mark Waite added a comment - Also included in git plugin 2.4.0 released 18 July 2015
            markewaite Mark Waite made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            drivehappy Mark Harmer made changes -
            Link This issue is duplicated by JENKINS-29268 [ JENKINS-29268 ]
            Hide
            prosegay philip rosegay added a comment -

            I've got 1.609.3 LTS installed with git plugin 2.4.0 and matrix project 1.6 and I'm getting lots of

            java.lang.IllegalStateException: /var/lib/jenkins/jobs/java8-1-build-pro-java7-master/builds/1388 already existed; will not overwite with java8-1-build-pro-java7-master #1388
            at hudson.model.RunMap.put(RunMap.java:187)
            at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178)
            at hudson.model.AbstractProject.newBuild(AbstractProject.java:1010)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1209)
            at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
            at hudson.model.Executor$1.call(Executor.java:335)
            at hudson.model.Executor$1.call(Executor.java:317)
            at hudson.model.Queue._withLock(Queue.java:1348)
            at hudson.model.Queue.withLock(Queue.java:1213)
            at hudson.model.Executor.run(Executor.java:317)

            I recently started using github push triggers (manually configured). this ticket tldr. Did i miss something (sorry)? I'm limping along with this. Thanks.

            Show
            prosegay philip rosegay added a comment - I've got 1.609.3 LTS installed with git plugin 2.4.0 and matrix project 1.6 and I'm getting lots of java.lang.IllegalStateException: /var/lib/jenkins/jobs/java8-1-build-pro-java7-master/builds/1388 already existed; will not overwite with java8-1-build-pro-java7-master #1388 at hudson.model.RunMap.put(RunMap.java:187) at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178) at hudson.model.AbstractProject.newBuild(AbstractProject.java:1010) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1209) at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144) at hudson.model.Executor$1.call(Executor.java:335) at hudson.model.Executor$1.call(Executor.java:317) at hudson.model.Queue._withLock(Queue.java:1348) at hudson.model.Queue.withLock(Queue.java:1213) at hudson.model.Executor.run(Executor.java:317) I recently started using github push triggers (manually configured). this ticket tldr. Did i miss something (sorry)? I'm limping along with this. Thanks.
            prosegay philip rosegay made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Reopened [ 4 ]
            Hide
            jglick Jesse Glick added a comment -

            philip rosegay if you have applied the relevant plugin updates then you are hitting a different bug with the same symptom but a distinct cause, which would be tracked separately. Without knowing how to reproduce we are unlikely to be able to help.

            Show
            jglick Jesse Glick added a comment - philip rosegay if you have applied the relevant plugin updates then you are hitting a different bug with the same symptom but a distinct cause, which would be tracked separately. Without knowing how to reproduce we are unlikely to be able to help.
            jglick Jesse Glick made changes -
            Resolution Fixed [ 1 ]
            Status Reopened [ 4 ] Resolved [ 5 ]
            Hide
            prosegay philip rosegay added a comment - - edited

            ya think? seems unlikely as the stack trace looked the same to me. what do you need?

            Show
            prosegay philip rosegay added a comment - - edited ya think? seems unlikely as the stack trace looked the same to me. what do you need?
            Hide
            jglick Jesse Glick added a comment -

            Jenkins core is simply reporting an illegal condition which something else was responsible for producing. Other than the original report here, all cases have been tracked back to a combination of bugs in the Git and Matrix Project plugins, both of which were fixed. What is causing the problem in your case I do not know.

            Show
            jglick Jesse Glick added a comment - Jenkins core is simply reporting an illegal condition which something else was responsible for producing. Other than the original report here, all cases have been tracked back to a combination of bugs in the Git and Matrix Project plugins, both of which were fixed. What is causing the problem in your case I do not know.
            Hide
            prosegay philip rosegay added a comment -

            didn't answer the question: what do you need to isolate to root cause?

            Show
            prosegay philip rosegay added a comment - didn't answer the question: what do you need to isolate to root cause?
            Hide
            markewaite Mark Waite added a comment -

            Usually, isolating a problem to root cause needs enough description of the distinct conditions which caused the problem so that someone else can duplicate the problem.

            In this case, it may need a support bundle (to show the versions of various plugins installed), a copy of the job definition (or a detailed enough description of the job definition that someone can recreate the job from that description), and a description of the actions taken to show the problem.

            Show
            markewaite Mark Waite added a comment - Usually, isolating a problem to root cause needs enough description of the distinct conditions which caused the problem so that someone else can duplicate the problem. In this case, it may need a support bundle (to show the versions of various plugins installed), a copy of the job definition (or a detailed enough description of the job definition that someone can recreate the job from that description), and a description of the actions taken to show the problem.
            Hide
            jglick Jesse Glick added a comment -

            Right, ideally a way to reproduce the problem from scratch. When that cannot be found, there may be some clues that are helpful (for example: only happens when a certain plugin is installed/configured), but in general the problem may not be fixable. For certain bugs of course the error message/stack trace suffices to guess at a diagnosis. Unfortunately that is not the case here.

            Show
            jglick Jesse Glick added a comment - Right, ideally a way to reproduce the problem from scratch. When that cannot be found, there may be some clues that are helpful (for example: only happens when a certain plugin is installed/configured), but in general the problem may not be fixable. For certain bugs of course the error message/stack trace suffices to guess at a diagnosis. Unfortunately that is not the case here.
            Hide
            dbharris David Harris added a comment -

            Is this the best/current bug to discuss the 'dead node' issue? Should I be using JENKINS-29268 or something else instead?

            This bug is affecting us pretty badly (team of ~15 developers) and we're willing to put some work in to help get it fixed.

            Thanks in advance

            Show
            dbharris David Harris added a comment - Is this the best/current bug to discuss the 'dead node' issue? Should I be using JENKINS-29268 or something else instead? This bug is affecting us pretty badly (team of ~15 developers) and we're willing to put some work in to help get it fixed. Thanks in advance
            Hide
            markewaite Mark Waite added a comment -

            I don't think a resolved issue is generally a good place to discuss an issue you're currently seeing. If there is an open issue that is the same, or appears strongly related, that would be a reasonable place for the discussion.

            If the issue is still there, and this describes that issue, then I think this bug should be reopened.

            Show
            markewaite Mark Waite added a comment - I don't think a resolved issue is generally a good place to discuss an issue you're currently seeing. If there is an open issue that is the same, or appears strongly related, that would be a reasonable place for the discussion. If the issue is still there, and this describes that issue, then I think this bug should be reopened.
            jglick Jesse Glick made changes -
            Link This issue is related to JENKINS-27530 [ JENKINS-27530 ]
            Hide
            jglick Jesse Glick added a comment -

            Do not reopen this issue.

            There is a known but unconfirmed occurrence when using reload-from-disk. If you know of some other means of reproducing, mention in JENKINS-27530.

            Show
            jglick Jesse Glick added a comment - Do not reopen this issue. There is a known but unconfirmed occurrence when using reload-from-disk. If you know of some other means of reproducing, mention in JENKINS-27530 .
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 160654 ] JNJira + In-Review [ 196510 ]
            markewaite Mark Waite made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

              People

              Assignee:
              jglick Jesse Glick
              Reporter:
              msinclair Mark Sinclair
              Votes:
              6 Vote for this issue
              Watchers:
              22 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: