Details
-
Type:
Bug
-
Status: Resolved (View Workflow)
-
Priority:
Critical
-
Resolution: Fixed
-
Component/s: ec2-plugin
-
Labels:None
-
Environment:Jenkins ver. 2.176.1, 2.204.2
ec2 plugin 1.43, 1.44, 1.45, 1.49.1
-
Similar Issues:
-
Released As:ec2 1.51
Description
Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.
The plugin will just loop on this:
SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0 May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.
It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.
We use a single subnet, security group and vpc (I've seen some reports about this causing problems).
We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.
Attachments
Issue Links
- links to
We've identified the cause of our issue. The orphan re-attachment logic is tied the EC2Cloud's provision method. But the issue occurs when the actual number of existing AWS nodes has hit an instance cap (i.e. no more nodes can be provisioned). Because we've hit an instance cap, provisioning isn't even attempted and the orphan re-attachment logic isn't triggered. Submitted a PR here: https://github.com/jenkinsci/ec2-plugin/pull/448