• Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • core
    • None
    • Jenkins ver. 1.618
      Red Hat Enterprise Linux Server release 5.10 (Tikanga)
      Linux 2.6.18-371.6.1.el5 #1 SMP x86_64 GNU/Linux

      Several times a day I encounter executor death with stacktrace like this:

      {{Unexpected executor death
      java.lang.IllegalStateException: /XXXXXXXXXXXX/jenkins_workdir/jobs/XXXXX-1.2.x (continuous integration)/builds/214 already existed; will not overwite with XXXXX-1.2.x (continuous integration) #214
      at hudson.model.RunMap.put(RunMap.java:189)
      at jenkins.model.lazy.LazyBuildMixIn.newBuild(LazyBuildMixIn.java:178)
      at hudson.model.AbstractProject.newBuild(AbstractProject.java:1017)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1216)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
      at hudson.model.Executor$1.call(Executor.java:335)
      at hudson.model.Executor$1.call(Executor.java:317)
      at hudson.model.Queue._withLock(Queue.java:1345)
      at hudson.model.Queue.withLock(Queue.java:1210)
      at hudson.model.Executor.run(Executor.java:317)}}

        1. screenshot1.png
          screenshot1.png
          33 kB
        2. screenshot2.png
          screenshot2.png
          61 kB
        3. screenshot3.png
          screenshot3.png
          28 kB
        4. screenshot4.png
          screenshot4.png
          32 kB

          [JENKINS-29268] Unexpected executor death

          Mark Harmer added a comment -

          Linked duplicate issues since I was also running into this problem. It seems 26582 was closed as fixing a slightly separate problem, but the original submission still seems to be the same.

          Mark Harmer added a comment - Linked duplicate issues since I was also running into this problem. It seems 26582 was closed as fixing a slightly separate problem, but the original submission still seems to be the same.

          Mark Harmer added a comment -

          After discovering this problem while we were migrating our configuration to a new Jenkins master, this seems to be a race condition on Reloading the Configuration while a new job is queued. I was able to reproduce this reliably through the following steps (msinclair's notes from 26582 aided in this):

          1. Start a fresh server (slave nodes aren't necessary to setup)
          2. Add a Freestyle Project
            1. This job will need to run for at least a few seconds, I setup a Unix shell script to just "sleep 90"
          3. Build revision 1 of the new project
          4. Run the following bash script that will continually reload the configuration in the background:
            #!/bin/bash
            while [ true ]; do
            	curl -X POST http://localhost:8080/reload 
            	sleep 1
            done
            
          5. Attempt to queue additional builds from the same project, this can be spam-clicked for a few seconds
          6. Observe that the project itself will not display the queued jobs
          7. Go to the main Jenkins page and observe that the Queue will have multiple jobs for the same project queued
          8. Cancel or let the initial job fail, the next job in the queue will attempt to use the same build revision number as cause the above crash.

          I was attempting to track down when the issue actually starting showing up and the strange build queue issue seems to go back at least before April of 2014. The crash seems to have occurred due to a change between November 11, 2014 and January 9th, 2015 (these were the arbitrary commits I manually checked out). I think the crash is related, but is actually caused by these weird queue issues while reloading.

          Note: Although the reproduce steps here are for simplicity I think this is observed in production conditions when a reload occurs and a non-manual build trigger happens - either by a periodic build, SCM trigger, or other means.

          Mark Harmer added a comment - After discovering this problem while we were migrating our configuration to a new Jenkins master, this seems to be a race condition on Reloading the Configuration while a new job is queued. I was able to reproduce this reliably through the following steps ( msinclair 's notes from 26582 aided in this): Start a fresh server (slave nodes aren't necessary to setup) Add a Freestyle Project This job will need to run for at least a few seconds, I setup a Unix shell script to just "sleep 90" Build revision 1 of the new project Run the following bash script that will continually reload the configuration in the background: #!/bin/bash while [ true ]; do curl -X POST http://localhost:8080/reload sleep 1 done Attempt to queue additional builds from the same project, this can be spam-clicked for a few seconds Observe that the project itself will not display the queued jobs Go to the main Jenkins page and observe that the Queue will have multiple jobs for the same project queued Cancel or let the initial job fail, the next job in the queue will attempt to use the same build revision number as cause the above crash. I was attempting to track down when the issue actually starting showing up and the strange build queue issue seems to go back at least before April of 2014. The crash seems to have occurred due to a change between November 11, 2014 and January 9th, 2015 (these were the arbitrary commits I manually checked out). I think the crash is related, but is actually caused by these weird queue issues while reloading. Note: Although the reproduce steps here are for simplicity I think this is observed in production conditions when a reload occurs and a non-manual build trigger happens - either by a periodic build, SCM trigger, or other means.

          Mark Harmer added a comment -

          screenshot1 shows the odd queue behavior, all of these jobs will have the same build revision.
          screenshot2 shows that these are not seen in the project listing itself.
          screenshot3 and 4 show the dead node problem once the exception fires due to the duplicate directory, since the same build revision is used.

          Mark Harmer added a comment - screenshot1 shows the odd queue behavior, all of these jobs will have the same build revision. screenshot2 shows that these are not seen in the project listing itself. screenshot3 and 4 show the dead node problem once the exception fires due to the duplicate directory, since the same build revision is used.

          We encounter this issue on freestyle builds that are triggered by a cron expression and an upstream job. When those triggers both happen close to each other it seams to "kill" the executor due to the IllegalStateException

          Steffen Breitbach added a comment - We encounter this issue on freestyle builds that are triggered by a cron expression and an upstream job. When those triggers both happen close to each other it seams to "kill" the executor due to the IllegalStateException

            Unassigned Unassigned
            mkochano MichaƂ Kochanowicz
            Votes:
            6 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: