-
Bug
-
Resolution: Unresolved
-
Minor
-
None
The EC2 plugin massively over-provisions nodes when using the NoDelayProvisionerStrategy. The severity varies based on configuration, but is most extreme with maxTotalUses=1 combined with minimumNumberOfInstances=0 and minimumNumberOfSpareInstances=0.
Severity by configuration:
- maxTotalUses=1 (nodes terminate after one build): 5-6x over-provisioning (100 builds → 500-600 nodes)
- maxTotalUses > 1: Less severe but still present due to capacity accounting gaps
- minimumNumberOfSpareInstances > 0: Additional over-provisioning from race conditions in MinimumInstanceChecker
The configuration maxTotalUses=1 with minimum instances set to 0 represents the worst-case scenario because:
1. Nodes rapidly accept tasks and immediately suspend, triggering concurrent provisioning checks
2. The capacity accounting gaps are most visible during rapid node turnover
3. Multiple provisioning code paths (NoDelayProvisionerStrategy and MinimumInstanceChecker) race against each other
However, the underlying capacity accounting bugs affect all configurations using NoDelayProvisionerStrategy:
- Offline nodes are never counted (affects all configs)
- Busy executors are not counted (affects all configs)
- Race conditions occur whenever multiple agents accept tasks simultaneously (most severe with maxTotalUses=1)
Environment
- Jenkins EC2 Plugin
- Configuration showing the most severe over-provisioning:
- NoDelayProvisionerStrategy enabled (noDelayProvisioning=true)
- 1 executor per node (numExecutors=1)
- Nodes terminate after one use (maxTotalUses=1)
- No minimum instance requirements (minimumNumberOfInstances=0, minimumNumberOfSpareInstances=0)
Reproduction steps:
- stand up an instance and configure a valid ec2 cloud using the above values
- create a simple "hello world" test pipeline and give it a label requirement using an ec2 node
- run the following in the script console to rapidly kick off a volume of builds
```
import jenkins.model.Jenkins
import org.jenkinsci.plugins.workflow.job.WorkflowJob
import hudson.model.ParametersAction
import hudson.model.StringParameterValue
def triggerCount = 100
def jobNames = ['test']
def jenkins = Jenkins.instance
jobNames.each { jobName ->
def job = jenkins.getItemByFullName(jobName, WorkflowJob.class)
for (int i = 1; i <= triggerCount; i++) {
def parameterAction = new ParametersAction(
new StringParameterValue("unique", "unique_${i}")
)
job.scheduleBuild2(0, null, parameterAction)
println "Triggered build #${i} of '${jobName}' with unique=unique_${i}"
Thread.sleep(2)
}
println "Done triggering ${triggerCount} builds of '${jobName}'."
}
```
Expected
100 nodes created, 100 builds completed, no running nodes afterward
Actual
600-700 nodes created, 100 builds completed, 500-600 nodes left over!