Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38514

CauseOfBlockage from QueueTaskDispatcher.canTake discarded

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: core
    • Labels:
    • Similar Issues:

      Description

      If you have a QueueTaskDispatcher which returns a CauseOfBlockage from canRun, that becomes BlockedItem.getCauseOfBlockage, which is displayed in the queue widget.

      But if it returns a CauseOfBlockage from canTake (AFAICT the same for Node.canTake), JobOffer.canTake sees that it is non-null, throws out the actual object with all of its diagnostics, and you wind up with a BuildableItem with CauseOfBlockage.BecauseNodeIsBusy which tells you nothing and may be totally misleading.

      By asking an implementation to return a @CheckForNull CauseOfBlockage rather than a simple boolean, the implication is that a non-null return value will be displayed to the user. Currently this is not the case.

      To add insult to injury, Support Core does not report the result of canTake.

        Attachments

          Issue Links

            Activity

            jglick Jesse Glick created issue -
            jglick Jesse Glick made changes -
            Field Original Value New Value
            Link This issue relates to JENKINS-35403 [ JENKINS-35403 ]
            jglick Jesse Glick made changes -
            Component/s support-core-plugin [ 18146 ]
            jglick Jesse Glick made changes -
            Component/s support-core-plugin [ 18146 ]
            Hide
            jglick Jesse Glick added a comment -

            Not clear that support-core can do anything, since canTake requires a specific Node.

            Show
            jglick Jesse Glick added a comment - Not clear that support-core can do anything, since canTake requires a specific Node .
            jglick Jesse Glick made changes -
            Assignee Jesse Glick [ jglick ]
            jglick Jesse Glick made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            jglick Jesse Glick added a comment -

            Unfortunately it is not obvious how the relevant CauseOfBlockage can be identified: there can be numerous JobOffer s which are considered, yet we would expect most of them to refuse canTake, for example because of Node.LabelMissing. The “buildable” item stays in queue when all of the offers are rejected, but how do we identify the one which we expected to be accepted?

            Can certainly improve detail-level logging to allow the issue to be tracked down, but it is less clear that BuildableItem.getWhy can be improved to display the ultimate problem in the UI (or in support bundles without a custom logger).

            Show
            jglick Jesse Glick added a comment - Unfortunately it is not obvious how the relevant CauseOfBlockage can be identified: there can be numerous JobOffer s which are considered, yet we would expect most of them to refuse canTake , for example because of Node.LabelMissing . The “buildable” item stays in queue when all of the offers are rejected, but how do we identify the one which we expected to be accepted? Can certainly improve detail-level logging to allow the issue to be tracked down, but it is less clear that BuildableItem.getWhy can be improved to display the ultimate problem in the UI (or in support bundles without a custom logger).
            Hide
            jglick Jesse Glick added a comment -

            For those running current core builds who wish to diagnose such issues, try running in /script:

            for (i in Jenkins.instance.queue.buildableItems) {
              println "considering ${i}"
              for (c in Jenkins.instance.computers) {
                println "found computer ${c}"
                EXEC: for (e in c.executors) {
                  if (e.interrupted || !e.parking) continue
                  println "with executor ${e}"
                  def o = new Queue.JobOffer(Jenkins.instance.queue, e, null)
                  if (!o.canTake(i)) {
                    println "${o} refused ${i}"
                    def node = o.node
                    if (node == null) {
                      println "no node associated with ${c}"
                      continue
                    }
                    def cob = node.canTake(i)
                    if (cob != null) {
                      println "because of ${cob}"
                      continue
                    }
                    for (d in hudson.model.queue.QueueTaskDispatcher.all()) {
                      cob = d.canTake(node, i)
                      if (cob != null) {
                        println "because of ${cob} from ${d}"
                        continue EXEC
                      }
                    }
                    if (!o.available) {
                      println "${o} not available"
                      if (o.workUnit != null) println "has a workUnit ${o.workUnit}"
                      if (c.offline) println "${c} is offline"
                      if (!c.acceptingTasks) println "${c} is not accepting tasks"
                    }
                  }
                }
              }
            }
            

            In one reported case, the root issue was that the Authorize Project plugin was configured, so Node.canTake was returning anonymous doesn’t have a permission to run on [sic]; yet the build queue (and support bundle) displayed only Waiting for next available executor.

            Show
            jglick Jesse Glick added a comment - For those running current core builds who wish to diagnose such issues, try running in /script : for (i in Jenkins.instance.queue.buildableItems) { println "considering ${i}" for (c in Jenkins.instance.computers) { println "found computer ${c}" EXEC: for (e in c.executors) { if (e.interrupted || !e.parking) continue println "with executor ${e}" def o = new Queue.JobOffer(Jenkins.instance.queue, e, null ) if (!o.canTake(i)) { println "${o} refused ${i}" def node = o.node if (node == null ) { println "no node associated with ${c}" continue } def cob = node.canTake(i) if (cob != null ) { println "because of ${cob}" continue } for (d in hudson.model.queue.QueueTaskDispatcher.all()) { cob = d.canTake(node, i) if (cob != null ) { println "because of ${cob} from ${d}" continue EXEC } } if (!o.available) { println "${o} not available" if (o.workUnit != null ) println "has a workUnit ${o.workUnit}" if (c.offline) println "${c} is offline" if (!c.acceptingTasks) println "${c} is not accepting tasks" } } } } } In one reported case, the root issue was that the Authorize Project plugin was configured, so Node.canTake was returning anonymous doesn’t have a permission to run on [sic]; yet the build queue (and support bundle) displayed only Waiting for next available executor .
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PR 2651 (Web Link)" [ 15074 ]
            jglick Jesse Glick made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            jglick Jesse Glick made changes -
            Link This issue relates to JENKINS-6598 [ JENKINS-6598 ]
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            core/src/main/java/hudson/model/Node.java
            core/src/main/java/hudson/model/Queue.java
            core/src/main/java/hudson/model/queue/CauseOfBlockage.java
            core/src/main/java/jenkins/model/queue/CompositeCauseOfBlockage.java
            core/src/main/resources/hudson/model/Messages.properties
            core/src/main/resources/jenkins/model/queue/CompositeCauseOfBlockage/summary.jelly
            test/src/test/java/hudson/model/queue/QueueTaskDispatcherTest.java
            test/src/test/java/hudson/slaves/NodeCanTakeTaskTest.java
            http://jenkins-ci.org/commit/jenkins/8d23041d4b785947dee1bc02f54a41d86b59bdda
            Log:
            JENKINS-38514 Retain CauseOfBlockage from JobOffer (#2651)

            • Converted to JenkinsRule.
            • Improved messages from Node.canTake.
            • [FIXED JENKINS-38514] BuildableItem needs to retain information from JobOffer about why it is neither blocked nor building.
            • Converted to JenkinsRule.
            • Found an existing usage of BecauseNodeIsNotAcceptingTasks.
            • Ensure that a BuildableItem which is simply waiting for a free executor reports that as its CauseOfBlockage.
            • Review comments from @oleg-nenashev.
            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/hudson/model/Node.java core/src/main/java/hudson/model/Queue.java core/src/main/java/hudson/model/queue/CauseOfBlockage.java core/src/main/java/jenkins/model/queue/CompositeCauseOfBlockage.java core/src/main/resources/hudson/model/Messages.properties core/src/main/resources/jenkins/model/queue/CompositeCauseOfBlockage/summary.jelly test/src/test/java/hudson/model/queue/QueueTaskDispatcherTest.java test/src/test/java/hudson/slaves/NodeCanTakeTaskTest.java http://jenkins-ci.org/commit/jenkins/8d23041d4b785947dee1bc02f54a41d86b59bdda Log: JENKINS-38514 Retain CauseOfBlockage from JobOffer (#2651) Converted to JenkinsRule. Improved messages from Node.canTake. [FIXED JENKINS-38514] BuildableItem needs to retain information from JobOffer about why it is neither blocked nor building. Converted to JenkinsRule. Found an existing usage of BecauseNodeIsNotAcceptingTasks. Original JENKINS-6598 test was checking behavior we want amended by JENKINS-38514 . Ensure that a BuildableItem which is simply waiting for a free executor reports that as its CauseOfBlockage. Review comments from @oleg-nenashev.
            jglick Jesse Glick made changes -
            Resolution Fixed [ 1 ]
            Status In Review [ 10005 ] Resolved [ 5 ]
            oleg_nenashev Oleg Nenashev made changes -
            Link This issue is related to JENKINS-45927 [ JENKINS-45927 ]
            cloudbees CloudBees Inc. made changes -
            Remote Link This issue links to "CloudBees Internal OSS-1219 (Web Link)" [ 18757 ]

              People

              Assignee:
              jglick Jesse Glick
              Reporter:
              jglick Jesse Glick
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: