Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25207

NodeProvisioner: multiple instances spawned if Cloud is not EXCLUSIVE

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major Major
    • core
    • None

      This manifests itself with EC2 plugin mostly but is a fundamental problem for any cloud plugin.

      In EC2 templates support NORMAL mode: either accept tasks with no label or with a specific label.

      The actual side-effect of this issue is extra spawned EC2 instances such that for every SlaveTemplate that has a NORMAL slave mode, whenever a label-bound job is started, two Cloud.provision() calls are made (one with null-label and one with specific label), causing two nodes to be started.

      This literally wastes money.

      NodeProvisioner for unspecified label (Jenkins.unlabeledNodeProvisioner) and NodeProvisioner for a specific label (Label.nodeProvisioner) get both triggered for the same workload:

      NodeProvisioner.update():

                          // Make sure this cloud actually can provision for this label.
                          if (c.canProvision(label)) {
                              // provisioning a new node should be conservative --- for example if exceeWorkload is 1.4,
                              // we don't want to allocate two nodes but just one.
                              // OTOH, because of the exponential decay, even when we need one slave, excess workload is always
                              // something like 0.95, in which case we want to allocate one node.
                              // so the threshold here is 1-MARGIN, and hence floor(excessWorkload+MARGIN) is needed to handle this.
      
                              int workloadToProvision = (int) Math.round(Math.floor(excessWorkload + m));
      
                              for (CloudProvisioningListener cl : CloudProvisioningListener.all())
                                  // consider displaying reasons in a future cloud ux
                                  if (cl.canProvision(c,label,workloadToProvision) != null)
                                      break CLOUD;
      
                              Collection<PlannedNode> additionalCapacities = c.provision(label, workloadToProvision);
      
                              for (CloudProvisioningListener cl : CloudProvisioningListener.all())
                                  cl.onStarted(c, label, additionalCapacities);
      
                              for (PlannedNode ac : additionalCapacities) {
                                  excessWorkload -= ac.numExecutors;
                                  LOGGER.info("Started provisioning "+ac.displayName+" from "+c.name+" with "+ac.numExecutors+" executors. Remaining excess workload:"+excessWorkload);
                              }
                              pendingLaunches.addAll(additionalCapacities);
                          }
      

      There is no clear fix for this inside the implementing cloud: the cloud can't distinguish between workload of null-label and the same workload supplied against a specific label.

      The fix, therefore should be in the NodeProvisioner. The plannedCapacitiesEMA is, however, both NodeProvisioner-specific and label-agnostic.

          [JENKINS-25207] NodeProvisioner: multiple instances spawned if Cloud is not EXCLUSIVE

          Thomas Suckow added a comment -

          I believe https://github.com/jenkinsci/jenkins/pull/1569 should fix the issue you describe

          Thomas Suckow added a comment - I believe https://github.com/jenkinsci/jenkins/pull/1569 should fix the issue you describe

          Daniel Beck added a comment -

          raihaan Why close? Do you know of a fix?

          Daniel Beck added a comment - raihaan Why close? Do you know of a fix?

          danielbeck sorry i was clearing out some of old issues for EC2, closed this by mistake. AFAICT this is not limited to EC2 so I'm going to leave this assigned to core

          Raihaan Shouhell added a comment - danielbeck sorry i was clearing out some of old issues for EC2, closed this by mistake. AFAICT this is not limited to EC2 so I'm going to leave this assigned to core

            Unassigned Unassigned
            arcivanov Arcadiy Ivanov
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: