Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25207

NodeProvisioner: multiple instances spawned if Cloud is not EXCLUSIVE

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: core
    • Labels:
      None
    • Similar Issues:

      Description

      This manifests itself with EC2 plugin mostly but is a fundamental problem for any cloud plugin.

      In EC2 templates support NORMAL mode: either accept tasks with no label or with a specific label.

      The actual side-effect of this issue is extra spawned EC2 instances such that for every SlaveTemplate that has a NORMAL slave mode, whenever a label-bound job is started, two Cloud.provision() calls are made (one with null-label and one with specific label), causing two nodes to be started.

      This literally wastes money.

      NodeProvisioner for unspecified label (Jenkins.unlabeledNodeProvisioner) and NodeProvisioner for a specific label (Label.nodeProvisioner) get both triggered for the same workload:

      NodeProvisioner.update():

                          // Make sure this cloud actually can provision for this label.
                          if (c.canProvision(label)) {
                              // provisioning a new node should be conservative --- for example if exceeWorkload is 1.4,
                              // we don't want to allocate two nodes but just one.
                              // OTOH, because of the exponential decay, even when we need one slave, excess workload is always
                              // something like 0.95, in which case we want to allocate one node.
                              // so the threshold here is 1-MARGIN, and hence floor(excessWorkload+MARGIN) is needed to handle this.
      
                              int workloadToProvision = (int) Math.round(Math.floor(excessWorkload + m));
      
                              for (CloudProvisioningListener cl : CloudProvisioningListener.all())
                                  // consider displaying reasons in a future cloud ux
                                  if (cl.canProvision(c,label,workloadToProvision) != null)
                                      break CLOUD;
      
                              Collection<PlannedNode> additionalCapacities = c.provision(label, workloadToProvision);
      
                              for (CloudProvisioningListener cl : CloudProvisioningListener.all())
                                  cl.onStarted(c, label, additionalCapacities);
      
                              for (PlannedNode ac : additionalCapacities) {
                                  excessWorkload -= ac.numExecutors;
                                  LOGGER.info("Started provisioning "+ac.displayName+" from "+c.name+" with "+ac.numExecutors+" executors. Remaining excess workload:"+excessWorkload);
                              }
                              pendingLaunches.addAll(additionalCapacities);
                          }
      

      There is no clear fix for this inside the implementing cloud: the cloud can't distinguish between workload of null-label and the same workload supplied against a specific label.

      The fix, therefore should be in the NodeProvisioner. The plannedCapacitiesEMA is, however, both NodeProvisioner-specific and label-agnostic.

        Attachments

          Issue Links

            Activity

            Hide
            thomassuckow Thomas Suckow added a comment -

            I believe https://github.com/jenkinsci/jenkins/pull/1569 should fix the issue you describe

            Show
            thomassuckow Thomas Suckow added a comment - I believe https://github.com/jenkinsci/jenkins/pull/1569 should fix the issue you describe
            Hide
            danielbeck Daniel Beck added a comment -

            Raihaan Shouhell Why close? Do you know of a fix?

            Show
            danielbeck Daniel Beck added a comment - Raihaan Shouhell Why close? Do you know of a fix?
            Hide
            raihaan Raihaan Shouhell added a comment -

            Daniel Beck sorry i was clearing out some of old issues for EC2, closed this by mistake. AFAICT this is not limited to EC2 so I'm going to leave this assigned to core

            Show
            raihaan Raihaan Shouhell added a comment - Daniel Beck sorry i was clearing out some of old issues for EC2, closed this by mistake. AFAICT this is not limited to EC2 so I'm going to leave this assigned to core

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              arcivanov Arcadiy Ivanov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated: