Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25207

NodeProvisioner: multiple instances spawned if Cloud is not EXCLUSIVE

    XMLWordPrintable

Details

    • Improvement
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • core
    • None

    Description

      This manifests itself with EC2 plugin mostly but is a fundamental problem for any cloud plugin.

      In EC2 templates support NORMAL mode: either accept tasks with no label or with a specific label.

      The actual side-effect of this issue is extra spawned EC2 instances such that for every SlaveTemplate that has a NORMAL slave mode, whenever a label-bound job is started, two Cloud.provision() calls are made (one with null-label and one with specific label), causing two nodes to be started.

      This literally wastes money.

      NodeProvisioner for unspecified label (Jenkins.unlabeledNodeProvisioner) and NodeProvisioner for a specific label (Label.nodeProvisioner) get both triggered for the same workload:

      NodeProvisioner.update():

                          // Make sure this cloud actually can provision for this label.
                          if (c.canProvision(label)) {
                              // provisioning a new node should be conservative --- for example if exceeWorkload is 1.4,
                              // we don't want to allocate two nodes but just one.
                              // OTOH, because of the exponential decay, even when we need one slave, excess workload is always
                              // something like 0.95, in which case we want to allocate one node.
                              // so the threshold here is 1-MARGIN, and hence floor(excessWorkload+MARGIN) is needed to handle this.
      
                              int workloadToProvision = (int) Math.round(Math.floor(excessWorkload + m));
      
                              for (CloudProvisioningListener cl : CloudProvisioningListener.all())
                                  // consider displaying reasons in a future cloud ux
                                  if (cl.canProvision(c,label,workloadToProvision) != null)
                                      break CLOUD;
      
                              Collection<PlannedNode> additionalCapacities = c.provision(label, workloadToProvision);
      
                              for (CloudProvisioningListener cl : CloudProvisioningListener.all())
                                  cl.onStarted(c, label, additionalCapacities);
      
                              for (PlannedNode ac : additionalCapacities) {
                                  excessWorkload -= ac.numExecutors;
                                  LOGGER.info("Started provisioning "+ac.displayName+" from "+c.name+" with "+ac.numExecutors+" executors. Remaining excess workload:"+excessWorkload);
                              }
                              pendingLaunches.addAll(additionalCapacities);
                          }
      

      There is no clear fix for this inside the implementing cloud: the cloud can't distinguish between workload of null-label and the same workload supplied against a specific label.

      The fix, therefore should be in the NodeProvisioner. The plannedCapacitiesEMA is, however, both NodeProvisioner-specific and label-agnostic.

      Attachments

        Issue Links

          Activity

            arcivanov Arcadiy Ivanov created issue -
            arcivanov Arcadiy Ivanov made changes -
            Field Original Value New Value
            Summary NodeProvisioner: multiple instances spawned is Cloud is not EXCLUSIVE NodeProvisioner: multiple instances spawned if Cloud is not EXCLUSIVE
            thomassuckow Thomas Suckow added a comment -

            I believe https://github.com/jenkinsci/jenkins/pull/1569 should fix the issue you describe

            thomassuckow Thomas Suckow added a comment - I believe https://github.com/jenkinsci/jenkins/pull/1569 should fix the issue you describe
            thomassuckow Thomas Suckow made changes -
            Link This issue is related to JENKINS-27034 [ JENKINS-27034 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 159119 ] JNJira + In-Review [ 179876 ]
            raihaan Raihaan Shouhell made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]
            danielbeck Daniel Beck added a comment -

            raihaan Why close? Do you know of a fix?

            danielbeck Daniel Beck added a comment - raihaan Why close? Do you know of a fix?
            raihaan Raihaan Shouhell made changes -
            Assignee Francis Upton [ francisu ]
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Reopened [ 4 ]

            danielbeck sorry i was clearing out some of old issues for EC2, closed this by mistake. AFAICT this is not limited to EC2 so I'm going to leave this assigned to core

            raihaan Raihaan Shouhell added a comment - danielbeck sorry i was clearing out some of old issues for EC2, closed this by mistake. AFAICT this is not limited to EC2 so I'm going to leave this assigned to core
            raihaan Raihaan Shouhell made changes -
            Component/s ec2-plugin [ 15625 ]
            danielbeck Daniel Beck made changes -
            Status Reopened [ 4 ] Open [ 1 ]

            People

              Unassigned Unassigned
              arcivanov Arcadiy Ivanov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: