Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-27708

Concurrent build limits not honored on Jenkins 1.607

      After upgrading to 1.607 we noticed the throttle plugin doesn't always prevent jobs from running in parallel as expected.

      Steps to reproduce:
      1. Create a throttle category with global limit 1, per-node limit 1.
      2. Create 3 jobs using the category, restricted to a node with two executors, sleep 60 as a build step.
      3. Request builds of all 3 jobs.

      What should happen:
      4. Jobs run in sequence; 1 then 2 then 3.

      What actually happens:
      5. Job 1 starts building, jobs 2 and 3 wait in queue (OK).
      6. After job 1 finishes, both job 2 and 3 start running (not OK).

      Plugin version is 1.8.4. Issue does not appear in Jenkins 1.606 with the same version of the plugin. Issue is reproducible on a fresh install.

      We were forced to downgrade back to 1.606 as a workaround (which is unfortunately not trivial due to JENKINS-27700).

          [JENKINS-27708] Concurrent build limits not honored on Jenkins 1.607

          Tomasz Śniatowski created issue -
          Oleg Nenashev made changes -
          Link New: This issue is related to JENKINS-27565 [ JENKINS-27565 ]

          Oleg Nenashev added a comment -

          https://github.com/jenkinsci/jenkins/pull/1596 reworks thread synchronization approaches in the core. It may cause such issue.

          Oleg Nenashev added a comment - https://github.com/jenkinsci/jenkins/pull/1596 reworks thread synchronization approaches in the core. It may cause such issue.

          Actually I suspect that https://github.com/jenkinsci/jenkins/pull/1596 just exposes problems with throttle concurrent builds plugin that were masked.

          Throttle concurrent builds "works" by searching for jobs that are running during the QueueTaskDispatcher.canTake evaluation. In Jenkins core, pre: https://github.com/jenkinsci/jenkins/blob/jenkins-1.606/core/src/main/java/hudson/model/Queue.java#L1178 and post: https://github.com/jenkinsci/jenkins/blob/jenkins-607/core/src/main/java/hudson/model/Queue.java#L1352 the QueueTaskDispatcher methods are called for all Buildable jobs in the Queue before mapping any of them against an executor.

          Thus https://github.com/jenkinsci/throttle-concurrent-builds-plugin/blob/throttle-concurrents-1.8.4/src/main/java/hudson/plugins/throttleconcurrents/ThrottleQueueTaskDispatcher.java#L194 will always fail to count jobs that are being picked off the queue at the same time.

          What the TCB plugin should probably be doing is either:

          The code change in 1.607 just removed some of the threading issues that allowed the TCB plugin to work most of the time by accident this bug represents a logical flaw in TCB that could still occur in 1.536-1.606 (most likely if running on a single core master) if the thread sequencing falls just right as Queue.maintain evaluates all projects in Buildable in one chunk and then maps them against executors.

          With 1.536-1.606 the executor threads could pick up the tasks while the Queue was in the process of dividing out the atomic set of work assignments, and thus - as long as the executor threads were scheduled fast enough - the TCB plugin would see the work assignments in the process of being assigned (but on a heavily loaded system or a single core system this issue would still occur)

          With 1.607 we ensure that the work assignment from Queue.maintain is mapped as an atomic unit, and thus the bug in TCB is exposed always.

          At least that is my analysis

          Stephen Connolly added a comment - Actually I suspect that https://github.com/jenkinsci/jenkins/pull/1596 just exposes problems with throttle concurrent builds plugin that were masked. Throttle concurrent builds "works" by searching for jobs that are running during the QueueTaskDispatcher.canTake evaluation. In Jenkins core, pre: https://github.com/jenkinsci/jenkins/blob/jenkins-1.606/core/src/main/java/hudson/model/Queue.java#L1178 and post: https://github.com/jenkinsci/jenkins/blob/jenkins-607/core/src/main/java/hudson/model/Queue.java#L1352 the QueueTaskDispatcher methods are called for all Buildable jobs in the Queue before mapping any of them against an executor. Thus https://github.com/jenkinsci/throttle-concurrent-builds-plugin/blob/throttle-concurrents-1.8.4/src/main/java/hudson/plugins/throttleconcurrents/ThrottleQueueTaskDispatcher.java#L194 will always fail to count jobs that are being picked off the queue at the same time. What the TCB plugin should probably be doing is either: using ResourceLists https://github.com/jenkinsci/jenkins/blob/jenkins-1.606/core/src/main/java/hudson/model/ResourceController.java#L80 to prevent the build from actually starting or checking against both projects on or when counting builds on a node: https://github.com/jenkinsci/throttle-concurrent-builds-plugin/blob/throttle-concurrents-1.8.4/src/main/java/hudson/plugins/throttleconcurrents/ThrottleQueueTaskDispatcher.java#L194 it should actually be including also the Buildable projects against that computer https://github.com/jenkinsci/jenkins/blob/jenkins-1.607/core/src/main/java/hudson/model/Queue.java#L846 The code change in 1.607 just removed some of the threading issues that allowed the TCB plugin to work most of the time by accident this bug represents a logical flaw in TCB that could still occur in 1.536-1.606 (most likely if running on a single core master) if the thread sequencing falls just right as Queue.maintain evaluates all projects in Buildable in one chunk and then maps them against executors. With 1.536-1.606 the executor threads could pick up the tasks while the Queue was in the process of dividing out the atomic set of work assignments, and thus - as long as the executor threads were scheduled fast enough - the TCB plugin would see the work assignments in the process of being assigned (but on a heavily loaded system or a single core system this issue would still occur) With 1.607 we ensure that the work assignment from Queue.maintain is mapped as an atomic unit, and thus the bug in TCB is exposed always. At least that is my analysis

          Oleg Nenashev added a comment -

          Stephen, thanks for the analysis.
          I confirm the issue and hope to work on it soon if somebody does not take the issue. It seems to be a showstopper for jenkins-1.607+.

          Oleg Nenashev added a comment - Stephen, thanks for the analysis. I confirm the issue and hope to work on it soon if somebody does not take the issue. It seems to be a showstopper for jenkins-1.607+.

          > It seems to be a showstopper for jenkins-1.607+.
          the issue's priority should be changed to blocker then

          Alexander Petrov added a comment - > It seems to be a showstopper for jenkins-1.607+. the issue's priority should be changed to blocker then

          Daniel Beck added a comment -

          Could be worse. This is not breaking your Jenkins install (as in crashes).

          Daniel Beck added a comment - Could be worse. This is not breaking your Jenkins install (as in crashes).
          Jesse Glick made changes -
          Link New: This issue is related to JENKINS-27650 [ JENKINS-27650 ]
          Stephen Connolly made changes -
          Link New: This issue is related to JENKINS-27871 [ JENKINS-27871 ]

          Can you see if the "hack" I have made in http://repo.jenkins-ci.org/snapshots/org/jenkins-ci/main/jenkins-war/1.609-SNAPSHOT/jenkins-war-1.609-20150410.091817-1.war resolves this issue for you?

          diff --git a/core/src/main/java/hudson/model/Queue.java b/core/src/main/java/hudson/model/Queue.java
          index d776a57..6a5e006 100644
          --- a/core/src/main/java/hudson/model/Queue.java
          +++ b/core/src/main/java/hudson/model/Queue.java
          @@ -1335,7 +1335,10 @@ public class Queue extends ResourceController implements Saveable {
                       final QueueSorter s = sorter;
                       if (s != null)
                           s.sortBuildableItems(buildables);
          -
          +            
          +            // JENKINS-27708, JENKINS-27871, etc
          +            updateSnapshot();
          +            
                       // allocate buildable jobs to executors
                       for (BuildableItem p : new ArrayList<BuildableItem>(
                               buildables)) {// copy as we'll mutate the list in the loop
          @@ -1372,6 +1375,9 @@ public class Queue extends ResourceController implements Saveable {
                               makePending(p);
                           else
                               LOGGER.log(Level.FINE, "BuildableItem {0} with empty work units!?", p);
          +                                
          +                // JENKINS-27708, JENKINS-27871, etc
          +                updateSnapshot();
                       }
                   } finally { updateSnapshot(); } } finally {
                       lock.unlock();
          

          Stephen Connolly added a comment - Can you see if the "hack" I have made in http://repo.jenkins-ci.org/snapshots/org/jenkins-ci/main/jenkins-war/1.609-SNAPSHOT/jenkins-war-1.609-20150410.091817-1.war resolves this issue for you? diff --git a/core/src/main/java/hudson/model/Queue.java b/core/src/main/java/hudson/model/Queue.java index d776a57..6a5e006 100644 --- a/core/src/main/java/hudson/model/Queue.java +++ b/core/src/main/java/hudson/model/Queue.java @@ -1335,7 +1335,10 @@ public class Queue extends ResourceController implements Saveable { final QueueSorter s = sorter; if (s != null ) s.sortBuildableItems(buildables); - + + // JENKINS-27708, JENKINS-27871, etc + updateSnapshot(); + // allocate buildable jobs to executors for (BuildableItem p : new ArrayList<BuildableItem>( buildables)) { // copy as we'll mutate the list in the loop @@ -1372,6 +1375,9 @@ public class Queue extends ResourceController implements Saveable { makePending(p); else LOGGER.log(Level.FINE, "BuildableItem {0} with empty work units!?" , p); + + // JENKINS-27708, JENKINS-27871, etc + updateSnapshot(); } } finally { updateSnapshot(); } } finally { lock.unlock();

            oleg_nenashev Oleg Nenashev
            tsniatowski Tomasz Śniatowski
            Votes:
            2 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: