-
Bug
-
Resolution: Fixed
-
Major
-
Powered by SuggestiMate
Assume two projects that never shall be build in parallel. You may build project A, then project B or vice versa, but never project A and B together.
In Project A you define project B as blocker.
In Project B you define project A as blocker.
All will be fine:
Project A builds, Project B is blocked.
Project B builds, Project A is blocked.
... until when Project A is queued, because no available slots to build, and then Project B is queued to. Now Project A blocks Project B, while Project B blocks Project A. Both will never be build! Cause:
"Build-Blocker-Plugin" does not only take running projects in to account to block other projects, it takes the queue too! This leads to deadlocks as soon as both projects are queued. The plugin should only take running projects into account blocking other projects.
[JENKINS-28513] Build-Blocker-Plugin blocks on builds queued leading to deadlock
I ran into this with the current stable release, v1.609-1. I can confirm that downgrading fixes the problem for me. I tested with v1.596.3 (works), v1.608 (works) and v1.617 (still broken).
There may be a fix in 1.618 (it's for JENKINS-28926 but it looks similar).
We just recently (like yesterday) upgraded to the latest LTS edition (1.609.1) and discovered this bug exists there as well.
Downgrading to an older Jenkins version is not an option for us as we've already migrated all of our production servers to the new version and it would be significant effort to do so. However, seeing as how this bug is debilitating (ie: completely blocks dozens of jobs, affecting hundreds more that depend on them, thus affecting many different development teams) I would appreciate any feedback that would allow us to work around or fix the problem asap.
Really, any input from anyone would be helpful!
NOTE: It appears the maintainer of this plugin is aware that deadlocks are potentially (or, dare I say guaranteed) when using this plugin. There is a TODO on the plugin information page stating just that:
https://wiki.jenkins-ci.org/display/JENKINS/Build+Blocker+Plugin
I am concerned that this may indicate a pre-existing bug that somehow has just been exploited or exacerbated in some way to changes to the Jenkins core. We never had this problem prior to our upgrade, but we were using a very old version of the core (1.532.3) so it may be hard to track down which change to which version may be the culprit.
actually the plugin is unusable while excluding checks is being done on both 'in queue'/'building' state after any job has been put to the queue - the deadlock is rather inevitable for any generic rule.
This was pretty brutal for us. v1.609.1 is completely broken without a working queue system for build blocker - we use it extensively along with job weight to control safe parallelism on single nodes. After mutating all of our build history to v1.609.1, going back to v1.596.3 removed all build history visibility. v1.608 fixes the queue system and most of build history, but we have some weird items...seems like any builds on v1.609.1 are identified as happening in year 1969, and tons of jobs claim never to have run or failed even though they have hundreds of times.
I don't feel like v1.609.1 was delivered with the spirit of LTS. After remoting in a few hours over the weekend to update our Jenkins master (took 3 hours to mutate build history) I got in office to this bug. Why were such large changes rushed into an LTS release? Edit: If Kevin is correct and it is a pre-existing bug that happens to be exposed by core changes, then I could see how this streak of bad luck happened.
Why were such large changes rushed into an LTS release?
My sentiments exactly.
We hit at least 6 or 7 hugely critical, production stop bugs just like this when doing our latest upgrade and we've had to devote significant time and effort just to get back to a state similar to what we were in before the upgrade - and that work is still ongoing today after weeks of effort.
I'd like to say this is an isolated circumstance but the last 2 or 3 upgrades we've tried since our adoption of the tool a couple of years ago have had similar results. In fact, I've even gone so far as to question whether there is any value whatsoever in adopting the LTS edition at all. It's benefit is supposed to provide a stable working environment for production use but in-so-far as upgrades are concerned, that seems far from the truth. Then, to make matters worse, fixes for the critical bugs are expected to be rolled out and tested on the mainline first, meaning that fixes take way longer to get released on the LTS branch - which compounds the problem.
Really, we love the tool - when it works. But managing upgrades is so painful that we may need to consider adopting a different tool in the longer term. We just can't afford to spend this amount of time and effort managing the tool, and the cost of the downtime it causes for our production teams.
After mutating all of our build history to v1.609.1, going back to v1.596.3 removed all build history visibility. v1.608 fixes the queue system and most of build history, but we have some weird items...seems like any builds on v1.609.1 are identified as happening in year 1969, and tons of jobs claim never to have run or failed even though they have hundreds of times.
https://wiki.jenkins-ci.org/display/JENKINS/JENKINS-24380+Migration
Unfortunately we are currently not able to mention major changes like these in the LTS changelog, it's purely a backporting changelog. For the actual changes, you need to review the regular weekly changelog. I'll go annoy Kohsuke again about this, but don't hold your breath.
Also, 1.609.1 (or rather 1.607) changed how slave configs are stored. Downgrading will probably lose those entirely.
Also, 1.609.1 (or rather 1.607) changed how slave configs are stored. Downgrading will probably lose those entirely.
Thanks for the heads up. We actually had no alternative recourse but to attempt a downgrade to 1.596.3 and noticed this problem immediately. Luckily it was relatively easy to hack around by moving some XML declarations around in the configuration files.
Anyone try 1.618? There are a few queue deadlock related fixes in that build, wondering if this is a duplicate issue with one of those.
Yes, Trey, I just tried 1.618 and with a couple of test jobs that do nothing but sleep for a while, this release looks very promising. None of the jobs got stuck in the queue. I'll try some real jobs that gave me trouble before tomorrow, but I'm confident that the problem is gone. No idea what part of the change log to thank for, but I don't really care.
1.618 did resolve the problem for me. Jobs are being blocked and released as they should again
Resolving as duplicate of JENKINS-28926 (which looks like the best candidate for this) after comments indicating this is fixed in 1.618.
I have this issue with 1.654
Queue is full, all nodes idling.
Timer task hudson.model.Queue$MaintainTask@75b41624 failed java.lang.NullPointerException at hudson.plugins.buildblocker.BlockingJobsMonitor.checkNodeForQueueEntries(BlockingJobsMonitor.java:108) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkAccordingToProperties(BuildBlockerQueueTaskDispatcher.java:171) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkForBlock(BuildBlockerQueueTaskDispatcher.java:127) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.canTake(BuildBlockerQueueTaskDispatcher.java:110) at hudson.model.Queue$JobOffer.canTake(Queue.java:260) at hudson.model.Queue.maintain(Queue.java:1529) at hudson.model.Queue$MaintainTask.doRun(Queue.java:2719) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Problem is appearing in v1.654
Seems like 1.651.1 and 1.651.2 are also affected.
Any update on this? :/
Jenkins: 2.7.1
Build Blocker Plugin: 1.7.3
I am also facing same issue. version of plugin-1.7.3 , Jenkins-2.19.3
Setting: block on node level = true
check buildable queued builds = true
Blocking Jobs = .*
Two jobs keep blocking each other if these are queued.
There should be some way it resolves automatically.
Happens to me as well:
Jan 05, 2017 9:12:17 AM hudson.triggers.SafeTimerTask run SEVERE: Timer task hudson.model.Queue$MaintainTask@63703c8 failed java.lang.NullPointerException at hudson.plugins.buildblocker.BlockingJobsMonitor.checkNodeForQueueEntries(BlockingJobsMonitor.java:108) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkAccordingToProperties(BuildBlockerQueueTaskDispatcher.java:171) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.checkForBlock(BuildBlockerQueueTaskDispatcher.java:127) at hudson.plugins.buildblocker.BuildBlockerQueueTaskDispatcher.canTake(BuildBlockerQueueTaskDispatcher.java:110) at hudson.model.Queue$JobOffer.canTake(Queue.java:258) at hudson.model.Queue.maintain(Queue.java:1519) at hudson.model.Queue$MaintainTask.doRun(Queue.java:2709) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Jenkins version 2.15
https://github.com/jenkinsci/build-blocker-plugin/pull/9 is an attempt to fix that, but AFAIK there is no active maintainer
The pull request has been merged and released on version 1.7.5 of the plugin.
Please verify that the fix covers your use case.
I had to downgrade to Jenkins 1.608 to make it work again. Something in 1.609 or 1.610 must have broken it.