Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25207

NodeProvisioner: multiple instances spawned if Cloud is not EXCLUSIVE

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major Major
    • core
    • None

      This manifests itself with EC2 plugin mostly but is a fundamental problem for any cloud plugin.

      In EC2 templates support NORMAL mode: either accept tasks with no label or with a specific label.

      The actual side-effect of this issue is extra spawned EC2 instances such that for every SlaveTemplate that has a NORMAL slave mode, whenever a label-bound job is started, two Cloud.provision() calls are made (one with null-label and one with specific label), causing two nodes to be started.

      This literally wastes money.

      NodeProvisioner for unspecified label (Jenkins.unlabeledNodeProvisioner) and NodeProvisioner for a specific label (Label.nodeProvisioner) get both triggered for the same workload:

      NodeProvisioner.update():

                          // Make sure this cloud actually can provision for this label.
                          if (c.canProvision(label)) {
                              // provisioning a new node should be conservative --- for example if exceeWorkload is 1.4,
                              // we don't want to allocate two nodes but just one.
                              // OTOH, because of the exponential decay, even when we need one slave, excess workload is always
                              // something like 0.95, in which case we want to allocate one node.
                              // so the threshold here is 1-MARGIN, and hence floor(excessWorkload+MARGIN) is needed to handle this.
      
                              int workloadToProvision = (int) Math.round(Math.floor(excessWorkload + m));
      
                              for (CloudProvisioningListener cl : CloudProvisioningListener.all())
                                  // consider displaying reasons in a future cloud ux
                                  if (cl.canProvision(c,label,workloadToProvision) != null)
                                      break CLOUD;
      
                              Collection<PlannedNode> additionalCapacities = c.provision(label, workloadToProvision);
      
                              for (CloudProvisioningListener cl : CloudProvisioningListener.all())
                                  cl.onStarted(c, label, additionalCapacities);
      
                              for (PlannedNode ac : additionalCapacities) {
                                  excessWorkload -= ac.numExecutors;
                                  LOGGER.info("Started provisioning "+ac.displayName+" from "+c.name+" with "+ac.numExecutors+" executors. Remaining excess workload:"+excessWorkload);
                              }
                              pendingLaunches.addAll(additionalCapacities);
                          }
      

      There is no clear fix for this inside the implementing cloud: the cloud can't distinguish between workload of null-label and the same workload supplied against a specific label.

      The fix, therefore should be in the NodeProvisioner. The plannedCapacitiesEMA is, however, both NodeProvisioner-specific and label-agnostic.

            Unassigned Unassigned
            arcivanov Arcadiy Ivanov
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: