Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-76171

ec2 plugin overprovisions nodes when rapidly scheduling builds

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • ec2-plugin
    • None

      The EC2 plugin massively over-provisions nodes when using the NoDelayProvisionerStrategy. The severity varies based on configuration, but is most extreme with maxTotalUses=1 combined with minimumNumberOfInstances=0 and minimumNumberOfSpareInstances=0.

      Severity by configuration:

      • maxTotalUses=1 (nodes terminate after one build): 5-6x over-provisioning (100 builds → 500-600 nodes)
      • maxTotalUses > 1: Less severe but still present due to capacity accounting gaps
      • minimumNumberOfSpareInstances > 0: Additional over-provisioning from race conditions in MinimumInstanceChecker

      The configuration maxTotalUses=1 with minimum instances set to 0 represents the worst-case scenario because:
      1. Nodes rapidly accept tasks and immediately suspend, triggering concurrent provisioning checks
      2. The capacity accounting gaps are most visible during rapid node turnover
      3. Multiple provisioning code paths (NoDelayProvisionerStrategy and MinimumInstanceChecker) race against each other

      However, the underlying capacity accounting bugs affect all configurations using NoDelayProvisionerStrategy:

      • Offline nodes are never counted (affects all configs)
      • Busy executors are not counted (affects all configs)
      • Race conditions occur whenever multiple agents accept tasks simultaneously (most severe with maxTotalUses=1)

      Environment

      • Jenkins EC2 Plugin
      • Configuration showing the most severe over-provisioning:
      • NoDelayProvisionerStrategy enabled (noDelayProvisioning=true)
      • 1 executor per node (numExecutors=1)
      • Nodes terminate after one use (maxTotalUses=1)
      • No minimum instance requirements (minimumNumberOfInstances=0, minimumNumberOfSpareInstances=0)

      Reproduction steps:

      1. stand up an instance and configure a valid ec2 cloud using the above values
      2. create a simple "hello world" test pipeline and give it a label requirement using an ec2 node
      3. run the following in the script console to rapidly kick off a volume of builds

      ```
      import jenkins.model.Jenkins
      import org.jenkinsci.plugins.workflow.job.WorkflowJob
      import hudson.model.ParametersAction
      import hudson.model.StringParameterValue

      def triggerCount = 100
      def jobNames = ['test']
      def jenkins = Jenkins.instance

      jobNames.each { jobName ->
      def job = jenkins.getItemByFullName(jobName, WorkflowJob.class)
      for (int i = 1; i <= triggerCount; i++) {
      def parameterAction = new ParametersAction(
      new StringParameterValue("unique", "unique_${i}")
      )
      job.scheduleBuild2(0, null, parameterAction)
      println "Triggered build #${i} of '${jobName}' with unique=unique_${i}"
      Thread.sleep(2)
      }
      println "Done triggering ${triggerCount} builds of '${jobName}'."
      }
      ```

      Expected
      100 nodes created, 100 builds completed, no running nodes afterward

      Actual
      600-700 nodes created, 100 builds completed, 500-600 nodes left over!

            mikecirioli mike cirioli
            mikecirioli mike cirioli
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: