Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38514

CauseOfBlockage from QueueTaskDispatcher.canTake discarded

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core

      If you have a QueueTaskDispatcher which returns a CauseOfBlockage from canRun, that becomes BlockedItem.getCauseOfBlockage, which is displayed in the queue widget.

      But if it returns a CauseOfBlockage from canTake (AFAICT the same for Node.canTake), JobOffer.canTake sees that it is non-null, throws out the actual object with all of its diagnostics, and you wind up with a BuildableItem with CauseOfBlockage.BecauseNodeIsBusy which tells you nothing and may be totally misleading.

      By asking an implementation to return a @CheckForNull CauseOfBlockage rather than a simple boolean, the implication is that a non-null return value will be displayed to the user. Currently this is not the case.

      To add insult to injury, Support Core does not report the result of canTake.

          [JENKINS-38514] CauseOfBlockage from QueueTaskDispatcher.canTake discarded

          Jesse Glick created issue -
          Jesse Glick made changes -
          Link New: This issue relates to JENKINS-35403 [ JENKINS-35403 ]
          Jesse Glick made changes -
          Component/s New: support-core-plugin [ 18146 ]
          Jesse Glick made changes -
          Component/s Original: support-core-plugin [ 18146 ]

          Jesse Glick added a comment -

          Not clear that support-core can do anything, since canTake requires a specific Node.

          Jesse Glick added a comment - Not clear that support-core can do anything, since canTake requires a specific Node .
          Jesse Glick made changes -
          Assignee New: Jesse Glick [ jglick ]
          Jesse Glick made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]

          Jesse Glick added a comment -

          Unfortunately it is not obvious how the relevant CauseOfBlockage can be identified: there can be numerous JobOffer s which are considered, yet we would expect most of them to refuse canTake, for example because of Node.LabelMissing. The “buildable” item stays in queue when all of the offers are rejected, but how do we identify the one which we expected to be accepted?

          Can certainly improve detail-level logging to allow the issue to be tracked down, but it is less clear that BuildableItem.getWhy can be improved to display the ultimate problem in the UI (or in support bundles without a custom logger).

          Jesse Glick added a comment - Unfortunately it is not obvious how the relevant CauseOfBlockage can be identified: there can be numerous JobOffer s which are considered, yet we would expect most of them to refuse canTake , for example because of Node.LabelMissing . The “buildable” item stays in queue when all of the offers are rejected, but how do we identify the one which we expected to be accepted? Can certainly improve detail-level logging to allow the issue to be tracked down, but it is less clear that BuildableItem.getWhy can be improved to display the ultimate problem in the UI (or in support bundles without a custom logger).

          Jesse Glick added a comment -

          For those running current core builds who wish to diagnose such issues, try running in /script:

          for (i in Jenkins.instance.queue.buildableItems) {
            println "considering ${i}"
            for (c in Jenkins.instance.computers) {
              println "found computer ${c}"
              EXEC: for (e in c.executors) {
                if (e.interrupted || !e.parking) continue
                println "with executor ${e}"
                def o = new Queue.JobOffer(Jenkins.instance.queue, e, null)
                if (!o.canTake(i)) {
                  println "${o} refused ${i}"
                  def node = o.node
                  if (node == null) {
                    println "no node associated with ${c}"
                    continue
                  }
                  def cob = node.canTake(i)
                  if (cob != null) {
                    println "because of ${cob}"
                    continue
                  }
                  for (d in hudson.model.queue.QueueTaskDispatcher.all()) {
                    cob = d.canTake(node, i)
                    if (cob != null) {
                      println "because of ${cob} from ${d}"
                      continue EXEC
                    }
                  }
                  if (!o.available) {
                    println "${o} not available"
                    if (o.workUnit != null) println "has a workUnit ${o.workUnit}"
                    if (c.offline) println "${c} is offline"
                    if (!c.acceptingTasks) println "${c} is not accepting tasks"
                  }
                }
              }
            }
          }
          

          In one reported case, the root issue was that the Authorize Project plugin was configured, so Node.canTake was returning anonymous doesn’t have a permission to run on [sic]; yet the build queue (and support bundle) displayed only Waiting for next available executor.

          Jesse Glick added a comment - For those running current core builds who wish to diagnose such issues, try running in /script : for (i in Jenkins.instance.queue.buildableItems) { println "considering ${i}" for (c in Jenkins.instance.computers) { println "found computer ${c}" EXEC: for (e in c.executors) { if (e.interrupted || !e.parking) continue println "with executor ${e}" def o = new Queue.JobOffer(Jenkins.instance.queue, e, null ) if (!o.canTake(i)) { println "${o} refused ${i}" def node = o.node if (node == null ) { println "no node associated with ${c}" continue } def cob = node.canTake(i) if (cob != null ) { println "because of ${cob}" continue } for (d in hudson.model.queue.QueueTaskDispatcher.all()) { cob = d.canTake(node, i) if (cob != null ) { println "because of ${cob} from ${d}" continue EXEC } } if (!o.available) { println "${o} not available" if (o.workUnit != null ) println "has a workUnit ${o.workUnit}" if (c.offline) println "${c} is offline" if (!c.acceptingTasks) println "${c} is not accepting tasks" } } } } } In one reported case, the root issue was that the Authorize Project plugin was configured, so Node.canTake was returning anonymous doesn’t have a permission to run on [sic]; yet the build queue (and support bundle) displayed only Waiting for next available executor .
          Jesse Glick made changes -
          Remote Link New: This issue links to "PR 2651 (Web Link)" [ 15074 ]

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: