NodeProvisioner: multiple instances spawned if Cloud is not EXCLUSIVE

This issue is archived. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      This manifests itself with EC2 plugin mostly but is a fundamental problem for any cloud plugin.

      In EC2 templates support NORMAL mode: either accept tasks with no label or with a specific label.

      The actual side-effect of this issue is extra spawned EC2 instances such that for every SlaveTemplate that has a NORMAL slave mode, whenever a label-bound job is started, two Cloud.provision() calls are made (one with null-label and one with specific label), causing two nodes to be started.

      This literally wastes money.

      NodeProvisioner for unspecified label (Jenkins.unlabeledNodeProvisioner) and NodeProvisioner for a specific label (Label.nodeProvisioner) get both triggered for the same workload:

      NodeProvisioner.update():

                          // Make sure this cloud actually can provision for this label.
                          if (c.canProvision(label)) {
                              // provisioning a new node should be conservative --- for example if exceeWorkload is 1.4,
                              // we don't want to allocate two nodes but just one.
                              // OTOH, because of the exponential decay, even when we need one slave, excess workload is always
                              // something like 0.95, in which case we want to allocate one node.
                              // so the threshold here is 1-MARGIN, and hence floor(excessWorkload+MARGIN) is needed to handle this.
      
                              int workloadToProvision = (int) Math.round(Math.floor(excessWorkload + m));
      
                              for (CloudProvisioningListener cl : CloudProvisioningListener.all())
                                  // consider displaying reasons in a future cloud ux
                                  if (cl.canProvision(c,label,workloadToProvision) != null)
                                      break CLOUD;
      
                              Collection<PlannedNode> additionalCapacities = c.provision(label, workloadToProvision);
      
                              for (CloudProvisioningListener cl : CloudProvisioningListener.all())
                                  cl.onStarted(c, label, additionalCapacities);
      
                              for (PlannedNode ac : additionalCapacities) {
                                  excessWorkload -= ac.numExecutors;
                                  LOGGER.info("Started provisioning "+ac.displayName+" from "+c.name+" with "+ac.numExecutors+" executors. Remaining excess workload:"+excessWorkload);
                              }
                              pendingLaunches.addAll(additionalCapacities);
                          }
      

      There is no clear fix for this inside the implementing cloud: the cloud can't distinguish between workload of null-label and the same workload supplied against a specific label.

      The fix, therefore should be in the NodeProvisioner. The plannedCapacitiesEMA is, however, both NodeProvisioner-specific and label-agnostic.

            Assignee:
            Unassigned
            Reporter:
            Arcadiy Ivanov
            Archiver:
            Jenkins Service Account

              Created:
              Updated:
              Archived: