Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-7667

Build waits for next available executor even if several are available

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • slave-squatter-plugin
    • None
    • Hudson 1.379, slave-squatter 1.1, heavy-job 1.0

      (Couldn't find slave-squatter nor heavy-job as components, so posted in core)
      I think there's a problem in the cooperation between these components.
      I started using them yesterday to replace a more complicated mix of "heavy-slaves".

      I have one slave with 8 executors. I configured it, using slave-squatter, to reserve 3 of these during office hours, leaving 5 for use. I then noticed a job with weight 3 claiming to wait for next available executor even though it was the only job in the queue and there were no builds on the slave (or anywhere, actually).
      I tried lowering the reservation of the slave to 2 (i.e. 6 executors free) which caused the build to start - but having to keep an eye on these situations is not why I run Hudson

      The only issue I could find that smells a bit like this was JENKINS-7033 (Job in build queue is not executed).

          [JENKINS-7667] Build waits for next available executor even if several are available

          torbent added a comment -

          They smell a bit similar, but are maybe not?

          torbent added a comment - They smell a bit similar, but are maybe not?

          torbent added a comment -

          slave-squatter definitely is involved here.

          Without reservations:

          • a job of weight 2 will happily run on a 2-executor slave.

          With reservations:

          • a job of weight 2 will NOT run on a 2-executor slave.
          • a job of weight 1 will NOT run on a 1-executor slave.
          • a job of weight 1 WILL run on a 2-executor slave.
          • a job of weight 2 requires 4 available executors.
          • a job of weight 3 requires 6 available executors.

          Smells like a doubling, or something being subtracted twice?

          torbent added a comment - slave-squatter definitely is involved here. Without reservations: a job of weight 2 will happily run on a 2-executor slave. With reservations: a job of weight 2 will NOT run on a 2-executor slave. a job of weight 1 will NOT run on a 1-executor slave. a job of weight 1 WILL run on a 2-executor slave. a job of weight 2 requires 4 available executors. a job of weight 3 requires 6 available executors. Smells like a doubling, or something being subtracted twice?

          torbent added a comment -

          It gets better (or worse).
          The slave (with 8 executors) was running some builds; there was a single executor available. I then tried manually requesting a build of a simple job (weight 1).
          This caused all idle executor threads inside the Hudson server to crash! The build executor status showed "Dead!" with a link to this report:

          java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1)
          at java.util.SubList.(Unknown Source)
          at java.util.RandomAccessSubList.(Unknown Source)
          at java.util.AbstractList.subList(Unknown Source)
          at hudson.model.queue.MappingWorksheet.(MappingWorksheet.java:311)
          at hudson.model.Queue.pop(Queue.java:753)
          at hudson.model.Executor.grabJob(Executor.java:175)
          at hudson.model.Executor.run(Executor.java:113)

          The running builds continued without problems to completion, and their threads appeared to stay functional. The dead threads were unsalvable and I had to restart Hudson.

          torbent added a comment - It gets better (or worse). The slave (with 8 executors) was running some builds; there was a single executor available. I then tried manually requesting a build of a simple job (weight 1). This caused all idle executor threads inside the Hudson server to crash! The build executor status showed "Dead!" with a link to this report: java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1) at java.util.SubList.(Unknown Source) at java.util.RandomAccessSubList.(Unknown Source) at java.util.AbstractList.subList(Unknown Source) at hudson.model.queue.MappingWorksheet.(MappingWorksheet.java:311) at hudson.model.Queue.pop(Queue.java:753) at hudson.model.Executor.grabJob(Executor.java:175) at hudson.model.Executor.run(Executor.java:113) The running builds continued without problems to completion, and their threads appeared to stay functional. The dead threads were unsalvable and I had to restart Hudson.

          jsiirola added a comment -

          Simple patch that corrects the reported bug for the situation where a slave only has a single active reservation. It will not work if the slave has multiple overlapping reservations.

          jsiirola added a comment - Simple patch that corrects the reported bug for the situation where a slave only has a single active reservation. It will not work if the slave has multiple overlapping reservations.

          jsiirola added a comment -

          The root cause is that active reservations are being double-counted: once by the hudson.plugins.slave_squatter.LoadPredictorImpl, and again by hudson.model.queue.LoadPredictor.CurrentlyRunningTasks. That leads to two separate failures: the first is that double-counting the reservations makes the scheduler think that the machine is busier than it actually is (the original ticket). The second is that the double counting can cause hudson.model.queue.MappingWorksheet to generate a maximum predicted load that is greater than the total number of executors, but does not clamp the prediction down to the actual total number of executors (see JENKINS-8882), which leads to the "Dead!" executors.

          I have a simple patch (attached) that works as long as the slave only ever has a single active reservation: it will fail if the slave has overlapping reservations. Basically, the patch is a hack that reports "negative load" for currently running reservations to counter-balance the extra load reported due to the double counting. A better approach would be to fundamentally redesign the slave-squatter's LoadPredictor API so as to only ever report predicted load and to exclude all currently-running reservations.

          jsiirola added a comment - The root cause is that active reservations are being double-counted: once by the hudson.plugins.slave_squatter.LoadPredictorImpl, and again by hudson.model.queue.LoadPredictor.CurrentlyRunningTasks. That leads to two separate failures: the first is that double-counting the reservations makes the scheduler think that the machine is busier than it actually is (the original ticket). The second is that the double counting can cause hudson.model.queue.MappingWorksheet to generate a maximum predicted load that is greater than the total number of executors, but does not clamp the prediction down to the actual total number of executors (see JENKINS-8882 ), which leads to the "Dead!" executors. I have a simple patch ( attached ) that works as long as the slave only ever has a single active reservation: it will fail if the slave has overlapping reservations. Basically, the patch is a hack that reports "negative load" for currently running reservations to counter-balance the extra load reported due to the double counting. A better approach would be to fundamentally redesign the slave-squatter's LoadPredictor API so as to only ever report predicted load and to exclude all currently-running reservations.

          jsiirola added a comment -

          Assigning to Kohsuke (slave-squatter author) so he can comment on the best way to address the double-counting of reservations.

          jsiirola added a comment - Assigning to Kohsuke (slave-squatter author) so he can comment on the best way to address the double-counting of reservations.

          cpringle added a comment -

          I've just installed this plugin onto Jenkins and am getting the same behaviour as described in the initial bug report. I have 6 executors, but only 2 are available during the day. I currently have executors 2 and 6 free (the rest are reserved), and there are 2 jobs stuck in the build queue even though there are 2 spare executors.

          Are we able to get a fix for this?

          cpringle added a comment - I've just installed this plugin onto Jenkins and am getting the same behaviour as described in the initial bug report. I have 6 executors, but only 2 are available during the day. I currently have executors 2 and 6 free (the rest are reserved), and there are 2 jobs stuck in the build queue even though there are 2 spare executors. Are we able to get a fix for this?

          Same problem here:
          I've a Jenkins server on Ubuntu 32 bit and a slave on Windows 7 64 bit.
          The slave connects correctly on the server.
          When starting the job, the status always says waiting the next executor on host <HOSTNAME>
          So... I'm unable to use any of my slave!

          Please fix this ASAP!!!

          François-Xavier Choinière added a comment - Same problem here: I've a Jenkins server on Ubuntu 32 bit and a slave on Windows 7 64 bit. The slave connects correctly on the server. When starting the job, the status always says waiting the next executor on host <HOSTNAME> So... I'm unable to use any of my slave! Please fix this ASAP!!!

          Daniel Beck added a comment -

          Can this issue still be reproduced on recent Jenkins + plugins versions?

          Daniel Beck added a comment - Can this issue still be reproduced on recent Jenkins + plugins versions?

          It is still happening with fully updated LTS version.

          Slawomir Czarko added a comment - It is still happening with fully updated LTS version.

            dhaliwal_rajann Rajan
            torbent torbent
            Votes:
            6 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: