[JENKINS-57795] Orphaned EC2 instances after Jenkins restart

Type: Bug
Resolution: Fixed
Priority: Critical
Component/s: ec2-plugin
Labels:
None
Environment:
Jenkins ver. 2.176.1, 2.204.2
ec2 plugin 1.43, 1.44, 1.45, 1.49.1

Similar Issues:
Powered by SuggestiMate

Show
Released As:
ec2 1.51

Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

The plugin will just loop on this:

SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}

If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

start_fresh_1.46-rc1050.43f9773eed95.txt
5 kB
2019-09-19 11:32
jenkins.temp_dsl.log
12 kB
2019-09-16 10:11
jenkins_201909121030.log
15 kB
2019-09-12 11:28

links to

PR-448

Jakub Bochenski created issue - 2019-05-31 15:04

Jakub Bochenski made changes - 2019-05-31 15:07

Description

Original: Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

The plugin will just loop on this:
{code}SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
{code}

If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

It seems the problems do not occur when I do a `/safeRestart` but they do if I use e.g. "restart Jenkins when no jobs are running" form the Update Center.

We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.

New: Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

The plugin will just loop on this:
{code}SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
{code}

If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

It seems the problems do not occur when I do a {{/safeRestart}} but they do if I use e.g. "restart Jenkins when no jobs are running" form the Update Center.

We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.

Jakub Bochenski made changes - 2019-06-26 15:47

Description

Original: Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

The plugin will just loop on this:
{code}SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
{code}

If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

It seems the problems do not occur when I do a {{/safeRestart}} but they do if I use e.g. "restart Jenkins when no jobs are running" form the Update Center.

We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.

New: Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

The plugin will just loop on this:
{code}SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
{code}

If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.

Jakub Bochenski made changes - 2019-06-27 12:24

Environment

New: Jenkins ver. 2.176.1
ec2 plugin 1.43, 1.44

Jakub Bochenski made changes - 2019-08-12 09:23

Environment

Original: Jenkins ver. 2.176.1
ec2 plugin 1.43, 1.44

New: Jenkins ver. 2.176.1
ec2 plugin 1.43, 1.44, 1.45

cedric lecoz made changes - 2019-09-12 11:28

Attachment

New: jenkins_201909121030.log [ 48722 ]

cedric lecoz made changes - 2019-09-16 10:11

Attachment

New: jenkins.temp_dsl.log [ 48747 ]

Raihaan Shouhell made changes - 2019-09-17 04:31

Attachment

New: ec2.hpi [ 48761 ]

Raihaan Shouhell made changes - 2019-09-18 09:29

Attachment

Original: ec2.hpi [ 48761 ]

cedric lecoz made changes - 2019-09-19 11:32

Attachment

New: start_fresh_1.46-rc1050.43f9773eed95.txt [ 48816 ]

Jakub Bochenski made changes - 2020-03-05 11:17

Environment

Original: Jenkins ver. 2.176.1
ec2 plugin 1.43, 1.44, 1.45

New: Jenkins ver. 2.176.1
ec2 plugin 1.43, 1.44, 1.45, 1.49.1

Assignee:: FABRIZIO MANFREDI

Reporter:: Jakub Bochenski

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2019-05-31 15:04

Updated:: 2020-11-16 00:30

Resolved:: 2020-11-08 11:26

Jenkins

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates