Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-6598

Let Node and NodeProperty have more control over whether a node can run a task

    • Icon: Patch Patch
    • Resolution: Done
    • Icon: Minor Minor
    • core
    • None

      Right now, the only logic to determine whether a Node can run a particular Queue.Task is in JobOffer.canTake(Task). The logic is as follows:

      1. Check if the task has an assigned label; if it does and this node is not in the label, the node can't take the task
      2. If the task does not have an assigned label and this node only allows tied jobs (Mode.EXCLUSIVE), the node can't take the task
      3. If the node is offline or not accepting tasks, the node can't take the task

      I would like to add Node.canTake(Task) and NodeProperty.canTake(Task) methods. The JobOffer.canTake(Task) method would be changed to call Node.canTake(), moving checks #1 and #2 into the Node.canTake() implementation. Node.canTake() would then call NodeProperty.canTake(Task) on all of its assigned properties; if any of them return false, Node.canTake(Task) will also return false. The default implementation in the NodeProperty base class will return true.

      This allows Node subclasses and custom NodeProperties to control whether or not a particular Task should go to a particular Node, making it possible to do things like capabilities-based job assignment as opposed to the manually-intensive use of tying and node labels.

      I'm attaching a patch I've made to our internal copy of Hudson to make this change. I believe I have commit privileges to commit this if nobody objects to this change, otherwise I can get one of the other Yahoo! folks to do it.

          [JENKINS-6598] Let Node and NodeProperty have more control over whether a node can run a task

          mdillon added a comment -

          I just wanted to point out one way this could be improved that I didn't include in the patch.

          As it stands, if all nodes reject a task, it will sit in the queue as a BuildableItem (as it should), but its cause of blockage will be the generic message "Waiting for next available executor". The problem is that the existing JobOffer.canTake() only returns a boolean, so the code assumes that if there is no assigned label for the job and it was not taken by a online node, then it must be waiting for an executor.

          One approach to fixing this would be to have Node.canTake(Task) and in turn NodeProperty.canTake(Task) return a CauseOfBlockage. I don't think that it's possible in general to use this CauseOfBlockage as the queue item tooltip, because that would involved folding together multiple CauseOfBlockage instances from all blocking nodes, but it would be possible to show a message like "Rejected by all available executors". The same thing could also be accomplished by adding a Node.getCauseOfBlockage(Task) method, but then BuildableItem.getCauseOfBlockage() would have to call it on all nodes and the settings of the node could have changed since canTake() was called.

          mdillon added a comment - I just wanted to point out one way this could be improved that I didn't include in the patch. As it stands, if all nodes reject a task, it will sit in the queue as a BuildableItem (as it should), but its cause of blockage will be the generic message "Waiting for next available executor". The problem is that the existing JobOffer.canTake() only returns a boolean, so the code assumes that if there is no assigned label for the job and it was not taken by a online node, then it must be waiting for an executor. One approach to fixing this would be to have Node.canTake(Task) and in turn NodeProperty.canTake(Task) return a CauseOfBlockage. I don't think that it's possible in general to use this CauseOfBlockage as the queue item tooltip, because that would involved folding together multiple CauseOfBlockage instances from all blocking nodes, but it would be possible to show a message like "Rejected by all available executors". The same thing could also be accomplished by adding a Node.getCauseOfBlockage(Task) method, but then BuildableItem.getCauseOfBlockage() would have to call it on all nodes and the settings of the node could have changed since canTake() was called.

          Andrew Bayer added a comment -

          +1 - this would make JENKINS-6586 much, much easier. Well, ok, it'd make it work, is probably the more accurate way to put it, given the bizarre problems I'm having with dynamically adding/removing labels and the resulting changes not actually mattering in terms of whether a job gets run. The code looks good to me, and the functionality will be an excellent addition.

          The CauseOfBlockage stuff could either go in the same change as this, or perhaps more cleanly, a separate change. I'd tend towards the latter.

          Andrew Bayer added a comment - +1 - this would make JENKINS-6586 much, much easier. Well, ok, it'd make it work , is probably the more accurate way to put it, given the bizarre problems I'm having with dynamically adding/removing labels and the resulting changes not actually mattering in terms of whether a job gets run. The code looks good to me, and the functionality will be an excellent addition. The CauseOfBlockage stuff could either go in the same change as this, or perhaps more cleanly, a separate change. I'd tend towards the latter.

          For the JENKINS-6586 use case, this change by itself is not suffice. You'd need an extension point not scoped to a node, something like:

          interface QueueTaskDispatcher extends ExtensionPoint {
            boolean canTake(Node,Task);
          }
          

          Technically speaking, this would make it possible for custom Node implementations and NodeProperty implementations to insert the canTake logic without the proposed changes, although there's not much harm in leaving it in, either.

          Kohsuke Kawaguchi added a comment - For the JENKINS-6586 use case, this change by itself is not suffice. You'd need an extension point not scoped to a node, something like: interface QueueTaskDispatcher extends ExtensionPoint { boolean canTake(Node,Task); } Technically speaking, this would make it possible for custom Node implementations and NodeProperty implementations to insert the canTake logic without the proposed changes, although there's not much harm in leaving it in, either.

          mdillon added a comment -

          I'd be happy with either approach. Dean Yu actually suggested an approach similar to this when we discussed the idea of involving Node and NodeProperty in the canTake decision. The reason I went with the approach I did was on analogy with the extensions that have been added the JobProperty over the years to allow it to participate more fully in the build lifecycle.

          mdillon added a comment - I'd be happy with either approach. Dean Yu actually suggested an approach similar to this when we discussed the idea of involving Node and NodeProperty in the canTake decision. The reason I went with the approach I did was on analogy with the extensions that have been added the JobProperty over the years to allow it to participate more fully in the build lifecycle.

          Code changed in hudson
          User: : kohsuke
          Path:
          trunk/hudson/main/core/src/main/java/hudson/model/Node.java
          trunk/hudson/main/core/src/main/java/hudson/model/Queue.java
          trunk/hudson/main/core/src/main/java/hudson/model/queue/QueueTaskDispatcher.java
          trunk/hudson/main/core/src/main/java/hudson/slaves/NodeProperty.java
          trunk/hudson/main/core/src/main/resources/hudson/model/Messages.properties
          trunk/hudson/main/test/src/test/java/hudson/slaves/NodeCanTakeTaskTest.java
          trunk/www/changelog.html
          http://jenkins-ci.org/commit/31304
          Log:
          [FIXED JENKINS-6598] applied a patch from Mike Dillon, plus the separate independent extension point. In 1.360.

          SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: trunk/hudson/main/core/src/main/java/hudson/model/Node.java trunk/hudson/main/core/src/main/java/hudson/model/Queue.java trunk/hudson/main/core/src/main/java/hudson/model/queue/QueueTaskDispatcher.java trunk/hudson/main/core/src/main/java/hudson/slaves/NodeProperty.java trunk/hudson/main/core/src/main/resources/hudson/model/Messages.properties trunk/hudson/main/test/src/test/java/hudson/slaves/NodeCanTakeTaskTest.java trunk/www/changelog.html http://jenkins-ci.org/commit/31304 Log: [FIXED JENKINS-6598] applied a patch from Mike Dillon, plus the separate independent extension point. In 1.360.

          Jesse Glick added a comment -

          As it stands, if all nodes reject a task, it will sit in the queue as a BuildableItem (as it should), but its cause of blockage will be the generic message "Waiting for next available executor".

          See JENKINS-38514.

          Jesse Glick added a comment - As it stands, if all nodes reject a task, it will sit in the queue as a BuildableItem (as it should), but its cause of blockage will be the generic message "Waiting for next available executor". See JENKINS-38514 .

          Code changed in jenkins
          User: Jesse Glick
          Path:
          core/src/main/java/hudson/model/Node.java
          core/src/main/java/hudson/model/Queue.java
          core/src/main/java/hudson/model/queue/CauseOfBlockage.java
          core/src/main/java/jenkins/model/queue/CompositeCauseOfBlockage.java
          core/src/main/resources/hudson/model/Messages.properties
          core/src/main/resources/jenkins/model/queue/CompositeCauseOfBlockage/summary.jelly
          test/src/test/java/hudson/model/queue/QueueTaskDispatcherTest.java
          test/src/test/java/hudson/slaves/NodeCanTakeTaskTest.java
          http://jenkins-ci.org/commit/jenkins/8d23041d4b785947dee1bc02f54a41d86b59bdda
          Log:
          JENKINS-38514 Retain CauseOfBlockage from JobOffer (#2651)

          • Converted to JenkinsRule.
          • Improved messages from Node.canTake.
          • [FIXED JENKINS-38514] BuildableItem needs to retain information from JobOffer about why it is neither blocked nor building.
          • Converted to JenkinsRule.
          • Found an existing usage of BecauseNodeIsNotAcceptingTasks.
          • Ensure that a BuildableItem which is simply waiting for a free executor reports that as its CauseOfBlockage.
          • Review comments from @oleg-nenashev.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/hudson/model/Node.java core/src/main/java/hudson/model/Queue.java core/src/main/java/hudson/model/queue/CauseOfBlockage.java core/src/main/java/jenkins/model/queue/CompositeCauseOfBlockage.java core/src/main/resources/hudson/model/Messages.properties core/src/main/resources/jenkins/model/queue/CompositeCauseOfBlockage/summary.jelly test/src/test/java/hudson/model/queue/QueueTaskDispatcherTest.java test/src/test/java/hudson/slaves/NodeCanTakeTaskTest.java http://jenkins-ci.org/commit/jenkins/8d23041d4b785947dee1bc02f54a41d86b59bdda Log: JENKINS-38514 Retain CauseOfBlockage from JobOffer (#2651) Converted to JenkinsRule. Improved messages from Node.canTake. [FIXED JENKINS-38514] BuildableItem needs to retain information from JobOffer about why it is neither blocked nor building. Converted to JenkinsRule. Found an existing usage of BecauseNodeIsNotAcceptingTasks. Original JENKINS-6598 test was checking behavior we want amended by JENKINS-38514 . Ensure that a BuildableItem which is simply waiting for a free executor reports that as its CauseOfBlockage. Review comments from @oleg-nenashev.

            kohsuke Kohsuke Kawaguchi
            md5 Mike Dillon
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: