Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28513

Build-Blocker-Plugin blocks on builds queued leading to deadlock

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Assume two projects that never shall be build in parallel. You may build project A, then project B or vice versa, but never project A and B together.
      In Project A you define project B as blocker.
      In Project B you define project A as blocker.

      All will be fine:
      Project A builds, Project B is blocked.
      Project B builds, Project A is blocked.

      ... until when Project A is queued, because no available slots to build, and then Project B is queued to. Now Project A blocks Project B, while Project B blocks Project A. Both will never be build! Cause:
      "Build-Blocker-Plugin" does not only take running projects in to account to block other projects, it takes the queue too! This leads to deadlocks as soon as both projects are queued. The plugin should only take running projects into account blocking other projects.

        Attachments

          Issue Links

            Activity

            tps800 Thomas Schweikle created issue -
            Hide
            larvat Lars Vateman added a comment -

            I had to downgrade to Jenkins 1.608 to make it work again. Something in 1.609 or 1.610 must have broken it.

            Show
            larvat Lars Vateman added a comment - I had to downgrade to Jenkins 1.608 to make it work again. Something in 1.609 or 1.610 must have broken it.
            Hide
            pronovic Kenneth Pronovici added a comment -

            I ran into this with the current stable release, v1.609-1. I can confirm that downgrading fixes the problem for me. I tested with v1.596.3 (works), v1.608 (works) and v1.617 (still broken).

            Show
            pronovic Kenneth Pronovici added a comment - I ran into this with the current stable release, v1.609-1. I can confirm that downgrading fixes the problem for me. I tested with v1.596.3 (works), v1.608 (works) and v1.617 (still broken).
            Hide
            funeeldy marlene cote added a comment -

            is there a work around for this?? It is a major stopper for us.

            Show
            funeeldy marlene cote added a comment - is there a work around for this?? It is a major stopper for us.
            funeeldy marlene cote made changes -
            Field Original Value New Value
            Labels blocking blocking queue
            Hide
            danielbeck Daniel Beck added a comment -

            There may be a fix in 1.618 (it's for JENKINS-28926 but it looks similar).

            Show
            danielbeck Daniel Beck added a comment - There may be a fix in 1.618 (it's for JENKINS-28926 but it looks similar).
            Hide
            leedega Kevin Phillips added a comment -

            We just recently (like yesterday) upgraded to the latest LTS edition (1.609.1) and discovered this bug exists there as well.

            Downgrading to an older Jenkins version is not an option for us as we've already migrated all of our production servers to the new version and it would be significant effort to do so. However, seeing as how this bug is debilitating (ie: completely blocks dozens of jobs, affecting hundreds more that depend on them, thus affecting many different development teams) I would appreciate any feedback that would allow us to work around or fix the problem asap.

            Really, any input from anyone would be helpful!

            Show
            leedega Kevin Phillips added a comment - We just recently (like yesterday) upgraded to the latest LTS edition (1.609.1) and discovered this bug exists there as well. Downgrading to an older Jenkins version is not an option for us as we've already migrated all of our production servers to the new version and it would be significant effort to do so. However, seeing as how this bug is debilitating (ie: completely blocks dozens of jobs, affecting hundreds more that depend on them, thus affecting many different development teams) I would appreciate any feedback that would allow us to work around or fix the problem asap. Really, any input from anyone would be helpful!
            Hide
            leedega Kevin Phillips added a comment -

            NOTE: It appears the maintainer of this plugin is aware that deadlocks are potentially (or, dare I say guaranteed) when using this plugin. There is a TODO on the plugin information page stating just that:

            https://wiki.jenkins-ci.org/display/JENKINS/Build+Blocker+Plugin

            I am concerned that this may indicate a pre-existing bug that somehow has just been exploited or exacerbated in some way to changes to the Jenkins core. We never had this problem prior to our upgrade, but we were using a very old version of the core (1.532.3) so it may be hard to track down which change to which version may be the culprit.

            Show
            leedega Kevin Phillips added a comment - NOTE: It appears the maintainer of this plugin is aware that deadlocks are potentially (or, dare I say guaranteed) when using this plugin. There is a TODO on the plugin information page stating just that: https://wiki.jenkins-ci.org/display/JENKINS/Build+Blocker+Plugin I am concerned that this may indicate a pre-existing bug that somehow has just been exploited or exacerbated in some way to changes to the Jenkins core. We never had this problem prior to our upgrade, but we were using a very old version of the core (1.532.3) so it may be hard to track down which change to which version may be the culprit.
            Hide
            ciekawy Szymon Stasik added a comment -

            actually the plugin is unusable while excluding checks is being done on both 'in queue'/'building' state after any job has been put to the queue - the deadlock is rather inevitable for any generic rule.

            Show
            ciekawy Szymon Stasik added a comment - actually the plugin is unusable while excluding checks is being done on both 'in queue'/'building' state after any job has been put to the queue - the deadlock is rather inevitable for any generic rule.
            Hide
            leedega Kevin Phillips added a comment -

            Szymon Stasik
            My thoughts exactly.

            Show
            leedega Kevin Phillips added a comment - Szymon Stasik My thoughts exactly.
            Hide
            treybohon Trey Bohon added a comment - - edited

            This was pretty brutal for us. v1.609.1 is completely broken without a working queue system for build blocker - we use it extensively along with job weight to control safe parallelism on single nodes. After mutating all of our build history to v1.609.1, going back to v1.596.3 removed all build history visibility. v1.608 fixes the queue system and most of build history, but we have some weird items...seems like any builds on v1.609.1 are identified as happening in year 1969, and tons of jobs claim never to have run or failed even though they have hundreds of times.

            I don't feel like v1.609.1 was delivered with the spirit of LTS. After remoting in a few hours over the weekend to update our Jenkins master (took 3 hours to mutate build history) I got in office to this bug. Why were such large changes rushed into an LTS release? Edit: If Kevin is correct and it is a pre-existing bug that happens to be exposed by core changes, then I could see how this streak of bad luck happened.

            Show
            treybohon Trey Bohon added a comment - - edited This was pretty brutal for us. v1.609.1 is completely broken without a working queue system for build blocker - we use it extensively along with job weight to control safe parallelism on single nodes. After mutating all of our build history to v1.609.1, going back to v1.596.3 removed all build history visibility. v1.608 fixes the queue system and most of build history, but we have some weird items...seems like any builds on v1.609.1 are identified as happening in year 1969, and tons of jobs claim never to have run or failed even though they have hundreds of times. I don't feel like v1.609.1 was delivered with the spirit of LTS. After remoting in a few hours over the weekend to update our Jenkins master (took 3 hours to mutate build history) I got in office to this bug. Why were such large changes rushed into an LTS release? Edit: If Kevin is correct and it is a pre-existing bug that happens to be exposed by core changes, then I could see how this streak of bad luck happened.
            Hide
            leedega Kevin Phillips added a comment -

            Why were such large changes rushed into an LTS release?

            My sentiments exactly.

            We hit at least 6 or 7 hugely critical, production stop bugs just like this when doing our latest upgrade and we've had to devote significant time and effort just to get back to a state similar to what we were in before the upgrade - and that work is still ongoing today after weeks of effort.

            I'd like to say this is an isolated circumstance but the last 2 or 3 upgrades we've tried since our adoption of the tool a couple of years ago have had similar results. In fact, I've even gone so far as to question whether there is any value whatsoever in adopting the LTS edition at all. It's benefit is supposed to provide a stable working environment for production use but in-so-far as upgrades are concerned, that seems far from the truth. Then, to make matters worse, fixes for the critical bugs are expected to be rolled out and tested on the mainline first, meaning that fixes take way longer to get released on the LTS branch - which compounds the problem.

            Really, we love the tool - when it works. But managing upgrades is so painful that we may need to consider adopting a different tool in the longer term. We just can't afford to spend this amount of time and effort managing the tool, and the cost of the downtime it causes for our production teams.

            Show
            leedega Kevin Phillips added a comment - Why were such large changes rushed into an LTS release? My sentiments exactly. We hit at least 6 or 7 hugely critical, production stop bugs just like this when doing our latest upgrade and we've had to devote significant time and effort just to get back to a state similar to what we were in before the upgrade - and that work is still ongoing today after weeks of effort. I'd like to say this is an isolated circumstance but the last 2 or 3 upgrades we've tried since our adoption of the tool a couple of years ago have had similar results. In fact, I've even gone so far as to question whether there is any value whatsoever in adopting the LTS edition at all. It's benefit is supposed to provide a stable working environment for production use but in-so-far as upgrades are concerned, that seems far from the truth. Then, to make matters worse, fixes for the critical bugs are expected to be rolled out and tested on the mainline first, meaning that fixes take way longer to get released on the LTS branch - which compounds the problem. Really, we love the tool - when it works. But managing upgrades is so painful that we may need to consider adopting a different tool in the longer term. We just can't afford to spend this amount of time and effort managing the tool, and the cost of the downtime it causes for our production teams.
            Hide
            danielbeck Daniel Beck added a comment -

            After mutating all of our build history to v1.609.1, going back to v1.596.3 removed all build history visibility. v1.608 fixes the queue system and most of build history, but we have some weird items...seems like any builds on v1.609.1 are identified as happening in year 1969, and tons of jobs claim never to have run or failed even though they have hundreds of times.

            https://wiki.jenkins-ci.org/display/JENKINS/JENKINS-24380+Migration

            Unfortunately we are currently not able to mention major changes like these in the LTS changelog, it's purely a backporting changelog. For the actual changes, you need to review the regular weekly changelog. I'll go annoy Kohsuke again about this, but don't hold your breath.

            Also, 1.609.1 (or rather 1.607) changed how slave configs are stored. Downgrading will probably lose those entirely.

            Show
            danielbeck Daniel Beck added a comment - After mutating all of our build history to v1.609.1, going back to v1.596.3 removed all build history visibility. v1.608 fixes the queue system and most of build history, but we have some weird items...seems like any builds on v1.609.1 are identified as happening in year 1969, and tons of jobs claim never to have run or failed even though they have hundreds of times. https://wiki.jenkins-ci.org/display/JENKINS/JENKINS-24380+Migration Unfortunately we are currently not able to mention major changes like these in the LTS changelog, it's purely a backporting changelog. For the actual changes, you need to review the regular weekly changelog. I'll go annoy Kohsuke again about this, but don't hold your breath. Also, 1.609.1 (or rather 1.607) changed how slave configs are stored. Downgrading will probably lose those entirely.
            Hide
            leedega Kevin Phillips added a comment -

            Also, 1.609.1 (or rather 1.607) changed how slave configs are stored. Downgrading will probably lose those entirely.

            Thanks for the heads up. We actually had no alternative recourse but to attempt a downgrade to 1.596.3 and noticed this problem immediately. Luckily it was relatively easy to hack around by moving some XML declarations around in the configuration files.

            Show
            leedega Kevin Phillips added a comment - Also, 1.609.1 (or rather 1.607) changed how slave configs are stored. Downgrading will probably lose those entirely. Thanks for the heads up. We actually had no alternative recourse but to attempt a downgrade to 1.596.3 and noticed this problem immediately. Luckily it was relatively easy to hack around by moving some XML declarations around in the configuration files.
            Hide
            treybohon Trey Bohon added a comment -

            Anyone try 1.618? There are a few queue deadlock related fixes in that build, wondering if this is a duplicate issue with one of those.

            Show
            treybohon Trey Bohon added a comment - Anyone try 1.618? There are a few queue deadlock related fixes in that build, wondering if this is a duplicate issue with one of those.
            Hide
            manni Manni Heumann added a comment -

            Yes, Trey, I just tried 1.618 and with a couple of test jobs that do nothing but sleep for a while, this release looks very promising. None of the jobs got stuck in the queue. I'll try some real jobs that gave me trouble before tomorrow, but I'm confident that the problem is gone. No idea what part of the change log to thank for, but I don't really care.

            Show
            manni Manni Heumann added a comment - Yes, Trey, I just tried 1.618 and with a couple of test jobs that do nothing but sleep for a while, this release looks very promising. None of the jobs got stuck in the queue. I'll try some real jobs that gave me trouble before tomorrow, but I'm confident that the problem is gone. No idea what part of the change log to thank for, but I don't really care.
            Hide
            larvat Lars Vateman added a comment -

            1.618 did resolve the problem for me. Jobs are being blocked and released as they should again

            Show
            larvat Lars Vateman added a comment - 1.618 did resolve the problem for me. Jobs are being blocked and released as they should again
            Hide
            danielbeck Daniel Beck added a comment -

            Resolving as duplicate of JENKINS-28926 (which looks like the best candidate for this) after comments indicating this is fixed in 1.618.

            Show
            danielbeck Daniel Beck added a comment - Resolving as duplicate of JENKINS-28926 (which looks like the best candidate for this) after comments indicating this is fixed in 1.618.
            danielbeck Daniel Beck made changes -
            Resolution Duplicate [ 3 ]
            Status Open [ 1 ] Resolved [ 5 ]
            Hide
            tps800 Thomas Schweikle added a comment -

            Seems to work again since 1.618

            Show
            tps800 Thomas Schweikle added a comment - Seems to work again since 1.618
            Hide
            fxsp2 Felix Sperling added a comment -

            I have this issue with 1.654

            Queue is full, all nodes idling.

            Timer task hudson.model.Queue$MaintainTask@75b41624 failed
            java.lang.NullPointerException
            	at hudson.plugins.buildblocker.BlockingJobsMonitor.checkNodeForQueueEntries(BlockingJobsMonitor.java:108)
            	at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkAccordingToProperties(BuildBlockerQueueTaskDispatcher.java:171)
            	at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkForBlock(BuildBlockerQueueTaskDispatcher.java:127)
            	at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.canTake(BuildBlockerQueueTaskDispatcher.java:110)
            	at hudson.model.Queue$JobOffer.canTake(Queue.java:260)
            	at hudson.model.Queue.maintain(Queue.java:1529)
            	at hudson.model.Queue$MaintainTask.doRun(Queue.java:2719)
            	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50)
            	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
            	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
            	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            	at java.lang.Thread.run(Thread.java:745)
            
            
            Show
            fxsp2 Felix Sperling added a comment - I have this issue with 1.654 Queue is full, all nodes idling. Timer task hudson.model.Queue$MaintainTask@75b41624 failed java.lang.NullPointerException at hudson.plugins.buildblocker.BlockingJobsMonitor.checkNodeForQueueEntries(BlockingJobsMonitor.java:108) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkAccordingToProperties(BuildBlockerQueueTaskDispatcher.java:171) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkForBlock(BuildBlockerQueueTaskDispatcher.java:127) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.canTake(BuildBlockerQueueTaskDispatcher.java:110) at hudson.model.Queue$JobOffer.canTake(Queue.java:260) at hudson.model.Queue.maintain(Queue.java:1529) at hudson.model.Queue$MaintainTask.doRun(Queue.java:2719) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang. Thread .run( Thread .java:745)
            Hide
            fxsp2 Felix Sperling added a comment - - edited

            Problem is appearing in v1.654
            Seems like 1.651.1 and 1.651.2 are also affected.

            Show
            fxsp2 Felix Sperling added a comment - - edited Problem is appearing in v1.654 Seems like 1.651.1 and 1.651.2 are also affected.
            fxsp2 Felix Sperling made changes -
            Resolution Duplicate [ 3 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 163354 ] JNJira + In-Review [ 186288 ]
            Hide
            puneeth_n Puneeth Nanjundaswamy added a comment -

            Any update on this? :/

            Jenkins: 2.7.1
            Build Blocker Plugin: 1.7.3

            Show
            puneeth_n Puneeth Nanjundaswamy added a comment - Any update on this? :/ Jenkins: 2.7.1 Build Blocker Plugin: 1.7.3
            Hide
            malavikac malavika chintapanti added a comment - - edited

            I am also facing same issue. version of plugin-1.7.3 , Jenkins-2.19.3
            Setting: block on node level = true
            check buildable queued builds = true
            Blocking Jobs = .*
            Two jobs keep blocking each other if these are queued.
            There should be some way it resolves automatically.

            Show
            malavikac malavika chintapanti added a comment - - edited I am also facing same issue. version of plugin-1.7.3 , Jenkins-2.19.3 Setting: block on node level = true check buildable queued builds = true Blocking Jobs = .* Two jobs keep blocking each other if these are queued. There should be some way it resolves automatically.
            Hide
            asafm Asaf M added a comment -

            Happens to me as well:

            Jan 05, 2017 9:12:17 AM hudson.triggers.SafeTimerTask run
            SEVERE: Timer task hudson.model.Queue$MaintainTask@63703c8 failed
            java.lang.NullPointerException
                    at hudson.plugins.buildblocker.BlockingJobsMonitor.checkNodeForQueueEntries(BlockingJobsMonitor.java:108)
                    at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkAccordingToProperties(BuildBlockerQueueTaskDispatcher.java:171)
                    at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkForBlock(BuildBlockerQueueTaskDispatcher.java:127)
                    at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.canTake(BuildBlockerQueueTaskDispatcher.java:110)
                    at hudson.model.Queue$JobOffer.canTake(Queue.java:258)
                    at hudson.model.Queue.maintain(Queue.java:1519)
                    at hudson.model.Queue$MaintainTask.doRun(Queue.java:2709)
                    at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
                    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
                    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
                    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                    at java.lang.Thread.run(Thread.java:745)
            

            Jenkins version 2.15

            Show
            asafm Asaf M added a comment - Happens to me as well: Jan 05, 2017 9:12:17 AM hudson.triggers.SafeTimerTask run SEVERE: Timer task hudson.model.Queue$MaintainTask@63703c8 failed java.lang.NullPointerException at hudson.plugins.buildblocker.BlockingJobsMonitor.checkNodeForQueueEntries(BlockingJobsMonitor.java:108) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkAccordingToProperties(BuildBlockerQueueTaskDispatcher.java:171) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkForBlock(BuildBlockerQueueTaskDispatcher.java:127) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.canTake(BuildBlockerQueueTaskDispatcher.java:110) at hudson.model.Queue$JobOffer.canTake(Queue.java:258) at hudson.model.Queue.maintain(Queue.java:1519) at hudson.model.Queue$MaintainTask.doRun(Queue.java:2709) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang. Thread .run( Thread .java:745) Jenkins version 2.15
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            https://github.com/jenkinsci/build-blocker-plugin/pull/9 is an attempt to fix that, but AFAIK there is no active maintainer

            Show
            oleg_nenashev Oleg Nenashev added a comment - https://github.com/jenkinsci/build-blocker-plugin/pull/9 is an attempt to fix that, but AFAIK there is no active maintainer
            oleg_nenashev Oleg Nenashev made changes -
            Link This issue is duplicated by JENKINS-53384 [ JENKINS-53384 ]

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              tps800 Thomas Schweikle
              Votes:
              14 Vote for this issue
              Watchers:
              20 Start watching this issue

                Dates

                Created:
                Updated: