Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-26739

ISE from AbstractLazyLoadRunMap.proposeNewNumber for concurrent matrix builds

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • matrix-project-plugin
    • master on Linux - 2-3 Linux slaves with 10 executors - 1 windows slave with one executor. Problem only shows up on linux slaves where we have the highest load - up to 30 parallel jobs

      Thread has died

      java.lang.IllegalStateException: cannot create a build with number 8558 since that (or higher) is already in use among [8099, 8312, 8317, 8318, 8319, 8320, 8321, 8322, 8323, 8326, 8328, 8329, 8330, 8331, 8333, 8335, 8336, 8338, 8340, 8341, 8348, 8351, 8355, 8358, 8360, 8361, 8362, 8363, 8370, 8371, 8380, 8381, 8386, 8387, 8394, 8397, 8398, 8399, 8400, 8401, 8402, 8403, 8404, 8405, 8406, 8407, 8408, 8409, 8418, 8419, 8420, 8421, 8422, 8423, 8424, 8426, 8428, 8435, 8436, 8440, 8441, 8442, 8484, 8487, 8488, 8489, 8490, 8491, 8492, 8493, 8495, 8497, 8498, 8499, 8500, 8501, 8508, 8512, 8513, 8514, 8515, 8522, 8523, 8524, 8526, 8527, 8528, 8529, 8530, 8531, 8535, 8536, 8537, 8545, 8546, 8549, 8550, 8552, 8554, 8555, 8556, 8557, 8560, 8563]
      at jenkins.model.lazy.AbstractLazyLoadRunMap.proposeNewNumber(AbstractLazyLoadRunMap.java:361)
      at hudson.model.RunMap.put(RunMap.java:189)
      at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:284)
      at hudson.matrix.MatrixConfiguration.newBuild(MatrixConfiguration.java:74)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:1205)
      at hudson.model.AbstractProject.createExecutable(AbstractProject.java:144)
      at hudson.model.Executor.run(Executor.java:213)

      more info

          [JENKINS-26739] ISE from AbstractLazyLoadRunMap.proposeNewNumber for concurrent matrix builds

          Thomas Müller created issue -
          Thomas Müller made changes -
          Environment New: master on Linux - 2-3 Linux slaves with 10 executors - 1 windows slave with one executor. Problem only shows up on linux slaves where we have the highest load - up to 30 parallel jobs

          Daniel Beck added a comment -

          Possible. Check the file named 'nextBuildNumber' in JENKINS_HOME/jobs/(jobname). It may be as simple as changing its contents.

          Anything in the logs to indicate how you got into this situation?

          Daniel Beck added a comment - Possible. Check the file named 'nextBuildNumber' in JENKINS_HOME/jobs/(jobname). It may be as simple as changing its contents. Anything in the logs to indicate how you got into this situation?

          Thanks for your feedback Daniel.

          nextBuildNumber at the moment holds 8610 - which is true having a look at the web console https://ci.owncloud.org/job/pull-request-analyser-ng-simple/

          To me this feels like some runtime/concurrency issue as soon as too many builds are triggered the same time.
          This job is hooked up with github and jobs are being kicked off as soon as a pull request is created or new commits are pushed to the branches.
          As desribed in the environment field - this can be up to 30 jobs running in parallel. Under these circumstances the executors die.

          Any specific logger category of interest to analyse this issue?

          THX

          Thomas Müller added a comment - Thanks for your feedback Daniel. nextBuildNumber at the moment holds 8610 - which is true having a look at the web console https://ci.owncloud.org/job/pull-request-analyser-ng-simple/ To me this feels like some runtime/concurrency issue as soon as too many builds are triggered the same time. This job is hooked up with github and jobs are being kicked off as soon as a pull request is created or new commits are pushed to the branches. As desribed in the environment field - this can be up to 30 jobs running in parallel. Under these circumstances the executors die. Any specific logger category of interest to analyse this issue? THX

          Daniel Beck added a comment -

          Builds in question are gone. Would have been interesting to see whether they were created in quick succession (within a few seconds at most).

          Any further occurrences of this specific issue?

          Daniel Beck added a comment - Builds in question are gone. Would have been interesting to see whether they were created in quick succession (within a few seconds at most). Any further occurrences of this specific issue?

          Daniel Beck added a comment -

          Assigning to jglick as it's related to the new build numbering, may be a concurrency issue when many builds are started in parallel (since they now can be started at a rate of more than 1/second).

          Daniel Beck added a comment - Assigning to jglick as it's related to the new build numbering, may be a concurrency issue when many builds are started in parallel (since they now can be started at a rate of more than 1/second).
          Daniel Beck made changes -
          Assignee New: Jesse Glick [ jglick ]
          Jesse Glick made changes -
          Link New: This issue is blocking JENKINS-24380 [ JENKINS-24380 ]

          Jesse Glick added a comment -

          Well Job.assignBuildNumber is synchronized so new Run instances should never collide on number. But some sort of race condition seems like a likely explanation.

          Jesse Glick added a comment - Well Job.assignBuildNumber is synchronized so new Run instances should never collide on number. But some sort of race condition seems like a likely explanation.

          Jesse Glick added a comment -

          Similar to JENKINS-26582 but with a different stack trace, which may or may not be significant.

          Are you using any plugins which might do funny things with builds—Heavy Job, Gerrit Trigger (known to be buggy unless you install the new beta), etc.?

          Jesse Glick added a comment - Similar to JENKINS-26582 but with a different stack trace, which may or may not be significant. Are you using any plugins which might do funny things with builds—Heavy Job, Gerrit Trigger (known to be buggy unless you install the new beta), etc.?
          Jesse Glick made changes -
          Link New: This issue is related to JENKINS-26582 [ JENKINS-26582 ]

            jglick Jesse Glick
            deepdiver Thomas Müller
            Votes:
            3 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: