Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28513

Build-Blocker-Plugin blocks on builds queued leading to deadlock

      Assume two projects that never shall be build in parallel. You may build project A, then project B or vice versa, but never project A and B together.
      In Project A you define project B as blocker.
      In Project B you define project A as blocker.

      All will be fine:
      Project A builds, Project B is blocked.
      Project B builds, Project A is blocked.

      ... until when Project A is queued, because no available slots to build, and then Project B is queued to. Now Project A blocks Project B, while Project B blocks Project A. Both will never be build! Cause:
      "Build-Blocker-Plugin" does not only take running projects in to account to block other projects, it takes the queue too! This leads to deadlocks as soon as both projects are queued. The plugin should only take running projects into account blocking other projects.

          [JENKINS-28513] Build-Blocker-Plugin blocks on builds queued leading to deadlock

          Lars Vateman added a comment -

          I had to downgrade to Jenkins 1.608 to make it work again. Something in 1.609 or 1.610 must have broken it.

          Lars Vateman added a comment - I had to downgrade to Jenkins 1.608 to make it work again. Something in 1.609 or 1.610 must have broken it.

          I ran into this with the current stable release, v1.609-1. I can confirm that downgrading fixes the problem for me. I tested with v1.596.3 (works), v1.608 (works) and v1.617 (still broken).

          Kenneth Pronovici added a comment - I ran into this with the current stable release, v1.609-1. I can confirm that downgrading fixes the problem for me. I tested with v1.596.3 (works), v1.608 (works) and v1.617 (still broken).

          marlene cote added a comment -

          is there a work around for this?? It is a major stopper for us.

          marlene cote added a comment - is there a work around for this?? It is a major stopper for us.

          Daniel Beck added a comment -

          There may be a fix in 1.618 (it's for JENKINS-28926 but it looks similar).

          Daniel Beck added a comment - There may be a fix in 1.618 (it's for JENKINS-28926 but it looks similar).

          We just recently (like yesterday) upgraded to the latest LTS edition (1.609.1) and discovered this bug exists there as well.

          Downgrading to an older Jenkins version is not an option for us as we've already migrated all of our production servers to the new version and it would be significant effort to do so. However, seeing as how this bug is debilitating (ie: completely blocks dozens of jobs, affecting hundreds more that depend on them, thus affecting many different development teams) I would appreciate any feedback that would allow us to work around or fix the problem asap.

          Really, any input from anyone would be helpful!

          Kevin Phillips added a comment - We just recently (like yesterday) upgraded to the latest LTS edition (1.609.1) and discovered this bug exists there as well. Downgrading to an older Jenkins version is not an option for us as we've already migrated all of our production servers to the new version and it would be significant effort to do so. However, seeing as how this bug is debilitating (ie: completely blocks dozens of jobs, affecting hundreds more that depend on them, thus affecting many different development teams) I would appreciate any feedback that would allow us to work around or fix the problem asap. Really, any input from anyone would be helpful!

          NOTE: It appears the maintainer of this plugin is aware that deadlocks are potentially (or, dare I say guaranteed) when using this plugin. There is a TODO on the plugin information page stating just that:

          https://wiki.jenkins-ci.org/display/JENKINS/Build+Blocker+Plugin

          I am concerned that this may indicate a pre-existing bug that somehow has just been exploited or exacerbated in some way to changes to the Jenkins core. We never had this problem prior to our upgrade, but we were using a very old version of the core (1.532.3) so it may be hard to track down which change to which version may be the culprit.

          Kevin Phillips added a comment - NOTE: It appears the maintainer of this plugin is aware that deadlocks are potentially (or, dare I say guaranteed) when using this plugin. There is a TODO on the plugin information page stating just that: https://wiki.jenkins-ci.org/display/JENKINS/Build+Blocker+Plugin I am concerned that this may indicate a pre-existing bug that somehow has just been exploited or exacerbated in some way to changes to the Jenkins core. We never had this problem prior to our upgrade, but we were using a very old version of the core (1.532.3) so it may be hard to track down which change to which version may be the culprit.

          Szymon Stasik added a comment -

          actually the plugin is unusable while excluding checks is being done on both 'in queue'/'building' state after any job has been put to the queue - the deadlock is rather inevitable for any generic rule.

          Szymon Stasik added a comment - actually the plugin is unusable while excluding checks is being done on both 'in queue'/'building' state after any job has been put to the queue - the deadlock is rather inevitable for any generic rule.

          ciekawy
          My thoughts exactly.

          Kevin Phillips added a comment - ciekawy My thoughts exactly.

          Trey Bohon added a comment - - edited

          This was pretty brutal for us. v1.609.1 is completely broken without a working queue system for build blocker - we use it extensively along with job weight to control safe parallelism on single nodes. After mutating all of our build history to v1.609.1, going back to v1.596.3 removed all build history visibility. v1.608 fixes the queue system and most of build history, but we have some weird items...seems like any builds on v1.609.1 are identified as happening in year 1969, and tons of jobs claim never to have run or failed even though they have hundreds of times.

          I don't feel like v1.609.1 was delivered with the spirit of LTS. After remoting in a few hours over the weekend to update our Jenkins master (took 3 hours to mutate build history) I got in office to this bug. Why were such large changes rushed into an LTS release? Edit: If Kevin is correct and it is a pre-existing bug that happens to be exposed by core changes, then I could see how this streak of bad luck happened.

          Trey Bohon added a comment - - edited This was pretty brutal for us. v1.609.1 is completely broken without a working queue system for build blocker - we use it extensively along with job weight to control safe parallelism on single nodes. After mutating all of our build history to v1.609.1, going back to v1.596.3 removed all build history visibility. v1.608 fixes the queue system and most of build history, but we have some weird items...seems like any builds on v1.609.1 are identified as happening in year 1969, and tons of jobs claim never to have run or failed even though they have hundreds of times. I don't feel like v1.609.1 was delivered with the spirit of LTS. After remoting in a few hours over the weekend to update our Jenkins master (took 3 hours to mutate build history) I got in office to this bug. Why were such large changes rushed into an LTS release? Edit: If Kevin is correct and it is a pre-existing bug that happens to be exposed by core changes, then I could see how this streak of bad luck happened.

          Why were such large changes rushed into an LTS release?

          My sentiments exactly.

          We hit at least 6 or 7 hugely critical, production stop bugs just like this when doing our latest upgrade and we've had to devote significant time and effort just to get back to a state similar to what we were in before the upgrade - and that work is still ongoing today after weeks of effort.

          I'd like to say this is an isolated circumstance but the last 2 or 3 upgrades we've tried since our adoption of the tool a couple of years ago have had similar results. In fact, I've even gone so far as to question whether there is any value whatsoever in adopting the LTS edition at all. It's benefit is supposed to provide a stable working environment for production use but in-so-far as upgrades are concerned, that seems far from the truth. Then, to make matters worse, fixes for the critical bugs are expected to be rolled out and tested on the mainline first, meaning that fixes take way longer to get released on the LTS branch - which compounds the problem.

          Really, we love the tool - when it works. But managing upgrades is so painful that we may need to consider adopting a different tool in the longer term. We just can't afford to spend this amount of time and effort managing the tool, and the cost of the downtime it causes for our production teams.

          Kevin Phillips added a comment - Why were such large changes rushed into an LTS release? My sentiments exactly. We hit at least 6 or 7 hugely critical, production stop bugs just like this when doing our latest upgrade and we've had to devote significant time and effort just to get back to a state similar to what we were in before the upgrade - and that work is still ongoing today after weeks of effort. I'd like to say this is an isolated circumstance but the last 2 or 3 upgrades we've tried since our adoption of the tool a couple of years ago have had similar results. In fact, I've even gone so far as to question whether there is any value whatsoever in adopting the LTS edition at all. It's benefit is supposed to provide a stable working environment for production use but in-so-far as upgrades are concerned, that seems far from the truth. Then, to make matters worse, fixes for the critical bugs are expected to be rolled out and tested on the mainline first, meaning that fixes take way longer to get released on the LTS branch - which compounds the problem. Really, we love the tool - when it works. But managing upgrades is so painful that we may need to consider adopting a different tool in the longer term. We just can't afford to spend this amount of time and effort managing the tool, and the cost of the downtime it causes for our production teams.

          Daniel Beck added a comment -

          After mutating all of our build history to v1.609.1, going back to v1.596.3 removed all build history visibility. v1.608 fixes the queue system and most of build history, but we have some weird items...seems like any builds on v1.609.1 are identified as happening in year 1969, and tons of jobs claim never to have run or failed even though they have hundreds of times.

          https://wiki.jenkins-ci.org/display/JENKINS/JENKINS-24380+Migration

          Unfortunately we are currently not able to mention major changes like these in the LTS changelog, it's purely a backporting changelog. For the actual changes, you need to review the regular weekly changelog. I'll go annoy Kohsuke again about this, but don't hold your breath.

          Also, 1.609.1 (or rather 1.607) changed how slave configs are stored. Downgrading will probably lose those entirely.

          Daniel Beck added a comment - After mutating all of our build history to v1.609.1, going back to v1.596.3 removed all build history visibility. v1.608 fixes the queue system and most of build history, but we have some weird items...seems like any builds on v1.609.1 are identified as happening in year 1969, and tons of jobs claim never to have run or failed even though they have hundreds of times. https://wiki.jenkins-ci.org/display/JENKINS/JENKINS-24380+Migration Unfortunately we are currently not able to mention major changes like these in the LTS changelog, it's purely a backporting changelog. For the actual changes, you need to review the regular weekly changelog. I'll go annoy Kohsuke again about this, but don't hold your breath. Also, 1.609.1 (or rather 1.607) changed how slave configs are stored. Downgrading will probably lose those entirely.

          Also, 1.609.1 (or rather 1.607) changed how slave configs are stored. Downgrading will probably lose those entirely.

          Thanks for the heads up. We actually had no alternative recourse but to attempt a downgrade to 1.596.3 and noticed this problem immediately. Luckily it was relatively easy to hack around by moving some XML declarations around in the configuration files.

          Kevin Phillips added a comment - Also, 1.609.1 (or rather 1.607) changed how slave configs are stored. Downgrading will probably lose those entirely. Thanks for the heads up. We actually had no alternative recourse but to attempt a downgrade to 1.596.3 and noticed this problem immediately. Luckily it was relatively easy to hack around by moving some XML declarations around in the configuration files.

          Trey Bohon added a comment -

          Anyone try 1.618? There are a few queue deadlock related fixes in that build, wondering if this is a duplicate issue with one of those.

          Trey Bohon added a comment - Anyone try 1.618? There are a few queue deadlock related fixes in that build, wondering if this is a duplicate issue with one of those.

          Manni added a comment -

          Yes, Trey, I just tried 1.618 and with a couple of test jobs that do nothing but sleep for a while, this release looks very promising. None of the jobs got stuck in the queue. I'll try some real jobs that gave me trouble before tomorrow, but I'm confident that the problem is gone. No idea what part of the change log to thank for, but I don't really care.

          Manni added a comment - Yes, Trey, I just tried 1.618 and with a couple of test jobs that do nothing but sleep for a while, this release looks very promising. None of the jobs got stuck in the queue. I'll try some real jobs that gave me trouble before tomorrow, but I'm confident that the problem is gone. No idea what part of the change log to thank for, but I don't really care.

          Lars Vateman added a comment -

          1.618 did resolve the problem for me. Jobs are being blocked and released as they should again

          Lars Vateman added a comment - 1.618 did resolve the problem for me. Jobs are being blocked and released as they should again

          Daniel Beck added a comment -

          Resolving as duplicate of JENKINS-28926 (which looks like the best candidate for this) after comments indicating this is fixed in 1.618.

          Daniel Beck added a comment - Resolving as duplicate of JENKINS-28926 (which looks like the best candidate for this) after comments indicating this is fixed in 1.618.

          Seems to work again since 1.618

          Thomas Schweikle added a comment - Seems to work again since 1.618

          I have this issue with 1.654

          Queue is full, all nodes idling.

          Timer task hudson.model.Queue$MaintainTask@75b41624 failed
          java.lang.NullPointerException
          	at hudson.plugins.buildblocker.BlockingJobsMonitor.checkNodeForQueueEntries(BlockingJobsMonitor.java:108)
          	at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkAccordingToProperties(BuildBlockerQueueTaskDispatcher.java:171)
          	at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkForBlock(BuildBlockerQueueTaskDispatcher.java:127)
          	at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.canTake(BuildBlockerQueueTaskDispatcher.java:110)
          	at hudson.model.Queue$JobOffer.canTake(Queue.java:260)
          	at hudson.model.Queue.maintain(Queue.java:1529)
          	at hudson.model.Queue$MaintainTask.doRun(Queue.java:2719)
          	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
          	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
          	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
          	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          	at java.lang.Thread.run(Thread.java:745)
          
          

          Felix Sperling added a comment - I have this issue with 1.654 Queue is full, all nodes idling. Timer task hudson.model.Queue$MaintainTask@75b41624 failed java.lang.NullPointerException at hudson.plugins.buildblocker.BlockingJobsMonitor.checkNodeForQueueEntries(BlockingJobsMonitor.java:108) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkAccordingToProperties(BuildBlockerQueueTaskDispatcher.java:171) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkForBlock(BuildBlockerQueueTaskDispatcher.java:127) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.canTake(BuildBlockerQueueTaskDispatcher.java:110) at hudson.model.Queue$JobOffer.canTake(Queue.java:260) at hudson.model.Queue.maintain(Queue.java:1529) at hudson.model.Queue$MaintainTask.doRun(Queue.java:2719) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang. Thread .run( Thread .java:745)

          Felix Sperling added a comment - - edited

          Problem is appearing in v1.654
          Seems like 1.651.1 and 1.651.2 are also affected.

          Felix Sperling added a comment - - edited Problem is appearing in v1.654 Seems like 1.651.1 and 1.651.2 are also affected.

          Any update on this? :/

          Jenkins: 2.7.1
          Build Blocker Plugin: 1.7.3

          Puneeth Nanjundaswamy added a comment - Any update on this? :/ Jenkins: 2.7.1 Build Blocker Plugin: 1.7.3

          malavika chintapanti added a comment - - edited

          I am also facing same issue. version of plugin-1.7.3 , Jenkins-2.19.3
          Setting: block on node level = true
          check buildable queued builds = true
          Blocking Jobs = .*
          Two jobs keep blocking each other if these are queued.
          There should be some way it resolves automatically.

          malavika chintapanti added a comment - - edited I am also facing same issue. version of plugin-1.7.3 , Jenkins-2.19.3 Setting: block on node level = true check buildable queued builds = true Blocking Jobs = .* Two jobs keep blocking each other if these are queued. There should be some way it resolves automatically.

          Asaf M added a comment -

          Happens to me as well:

          Jan 05, 2017 9:12:17 AM hudson.triggers.SafeTimerTask run
          SEVERE: Timer task hudson.model.Queue$MaintainTask@63703c8 failed
          java.lang.NullPointerException
                  at hudson.plugins.buildblocker.BlockingJobsMonitor.checkNodeForQueueEntries(BlockingJobsMonitor.java:108)
                  at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkAccordingToProperties(BuildBlockerQueueTaskDispatcher.java:171)
                  at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkForBlock(BuildBlockerQueueTaskDispatcher.java:127)
                  at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.canTake(BuildBlockerQueueTaskDispatcher.java:110)
                  at hudson.model.Queue$JobOffer.canTake(Queue.java:258)
                  at hudson.model.Queue.maintain(Queue.java:1519)
                  at hudson.model.Queue$MaintainTask.doRun(Queue.java:2709)
                  at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
                  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                  at java.lang.Thread.run(Thread.java:745)
          

          Jenkins version 2.15

          Asaf M added a comment - Happens to me as well: Jan 05, 2017 9:12:17 AM hudson.triggers.SafeTimerTask run SEVERE: Timer task hudson.model.Queue$MaintainTask@63703c8 failed java.lang.NullPointerException at hudson.plugins.buildblocker.BlockingJobsMonitor.checkNodeForQueueEntries(BlockingJobsMonitor.java:108) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkAccordingToProperties(BuildBlockerQueueTaskDispatcher.java:171) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkForBlock(BuildBlockerQueueTaskDispatcher.java:127) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.canTake(BuildBlockerQueueTaskDispatcher.java:110) at hudson.model.Queue$JobOffer.canTake(Queue.java:258) at hudson.model.Queue.maintain(Queue.java:1519) at hudson.model.Queue$MaintainTask.doRun(Queue.java:2709) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang. Thread .run( Thread .java:745) Jenkins version 2.15

          Oleg Nenashev added a comment -

          https://github.com/jenkinsci/build-blocker-plugin/pull/9 is an attempt to fix that, but AFAIK there is no active maintainer

          Oleg Nenashev added a comment - https://github.com/jenkinsci/build-blocker-plugin/pull/9 is an attempt to fix that, but AFAIK there is no active maintainer

          Denis Mone added a comment -

          The pull request has been merged and released on version 1.7.5 of the plugin.
          Please verify that the fix covers your use case.

          Denis Mone added a comment - The pull request has been merged and released on version 1.7.5 of the plugin. Please verify that the fix covers your use case.

            dmone Denis Mone
            tps800 Thomas Schweikle
            Votes:
            14 Vote for this issue
            Watchers:
            21 Start watching this issue

              Created:
              Updated:
              Resolved: