-
Patch
-
Resolution: Done
-
Minor
-
None
Right now, the only logic to determine whether a Node can run a particular Queue.Task is in JobOffer.canTake(Task). The logic is as follows:
1. Check if the task has an assigned label; if it does and this node is not in the label, the node can't take the task
2. If the task does not have an assigned label and this node only allows tied jobs (Mode.EXCLUSIVE), the node can't take the task
3. If the node is offline or not accepting tasks, the node can't take the task
I would like to add Node.canTake(Task) and NodeProperty.canTake(Task) methods. The JobOffer.canTake(Task) method would be changed to call Node.canTake(), moving checks #1 and #2 into the Node.canTake() implementation. Node.canTake() would then call NodeProperty.canTake(Task) on all of its assigned properties; if any of them return false, Node.canTake(Task) will also return false. The default implementation in the NodeProperty base class will return true.
This allows Node subclasses and custom NodeProperties to control whether or not a particular Task should go to a particular Node, making it possible to do things like capabilities-based job assignment as opposed to the manually-intensive use of tying and node labels.
I'm attaching a patch I've made to our internal copy of Hudson to make this change. I believe I have commit privileges to commit this if nobody objects to this change, otherwise I can get one of the other Yahoo! folks to do it.
- relates to
-
JENKINS-38514 CauseOfBlockage from QueueTaskDispatcher.canTake discarded
-
- Resolved
-
I just wanted to point out one way this could be improved that I didn't include in the patch.
As it stands, if all nodes reject a task, it will sit in the queue as a BuildableItem (as it should), but its cause of blockage will be the generic message "Waiting for next available executor". The problem is that the existing JobOffer.canTake() only returns a boolean, so the code assumes that if there is no assigned label for the job and it was not taken by a online node, then it must be waiting for an executor.
One approach to fixing this would be to have Node.canTake(Task) and in turn NodeProperty.canTake(Task) return a CauseOfBlockage. I don't think that it's possible in general to use this CauseOfBlockage as the queue item tooltip, because that would involved folding together multiple CauseOfBlockage instances from all blocking nodes, but it would be possible to show a message like "Rejected by all available executors". The same thing could also be accomplished by adding a Node.getCauseOfBlockage(Task) method, but then BuildableItem.getCauseOfBlockage() would have to call it on all nodes and the settings of the node could have changed since canTake() was called.