-
Improvement
-
Resolution: Unresolved
-
Major
-
None
This manifests itself with EC2 plugin mostly but is a fundamental problem for any cloud plugin.
In EC2 templates support NORMAL mode: either accept tasks with no label or with a specific label.
The actual side-effect of this issue is extra spawned EC2 instances such that for every SlaveTemplate that has a NORMAL slave mode, whenever a label-bound job is started, two Cloud.provision() calls are made (one with null-label and one with specific label), causing two nodes to be started.
This literally wastes money.
NodeProvisioner for unspecified label (Jenkins.unlabeledNodeProvisioner) and NodeProvisioner for a specific label (Label.nodeProvisioner) get both triggered for the same workload:
NodeProvisioner.update():
// Make sure this cloud actually can provision for this label. if (c.canProvision(label)) { // provisioning a new node should be conservative --- for example if exceeWorkload is 1.4, // we don't want to allocate two nodes but just one. // OTOH, because of the exponential decay, even when we need one slave, excess workload is always // something like 0.95, in which case we want to allocate one node. // so the threshold here is 1-MARGIN, and hence floor(excessWorkload+MARGIN) is needed to handle this. int workloadToProvision = (int) Math.round(Math.floor(excessWorkload + m)); for (CloudProvisioningListener cl : CloudProvisioningListener.all()) // consider displaying reasons in a future cloud ux if (cl.canProvision(c,label,workloadToProvision) != null) break CLOUD; Collection<PlannedNode> additionalCapacities = c.provision(label, workloadToProvision); for (CloudProvisioningListener cl : CloudProvisioningListener.all()) cl.onStarted(c, label, additionalCapacities); for (PlannedNode ac : additionalCapacities) { excessWorkload -= ac.numExecutors; LOGGER.info("Started provisioning "+ac.displayName+" from "+c.name+" with "+ac.numExecutors+" executors. Remaining excess workload:"+excessWorkload); } pendingLaunches.addAll(additionalCapacities); }
There is no clear fix for this inside the implementing cloud: the cloud can't distinguish between workload of null-label and the same workload supplied against a specific label.
The fix, therefore should be in the NodeProvisioner. The plannedCapacitiesEMA is, however, both NodeProvisioner-specific and label-agnostic.
- is related to
-
JENKINS-27034 BuildableItems (Subtasks) are miscounted in the queue
- Open