[JENKINS-23850] PATCH: EC2-plugin always starting new slaves instead of restarting existing - Jenkins Jira

Type: Bug
Resolution: Fixed
Priority: Major
Component/s: ec2-plugin
Labels:
- demand-launch
- ec2
- patch
- regression
Environment:
Jenkins 1.541, EC2 plugin 1.23, Node Iterator API plugin 1.5

Similar Issues:
Powered by SuggestiMate

Show

Every time I start a build, Jenkins launches a new slave rather than restarting one of the stopped instances. It is successfully stopping the instance when it hits the idle time.

During idle, the log shows lots of this (the _check, idleTimeout, stop entries appear once for every instance currently registered):

Jul 16, 2014 12:40:47 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Started EC2 alive slaves monitor
Jul 16, 2014 12:40:48 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished EC2 alive slaves monitor. 1172 ms
Jul 16, 2014 12:41:53 PM hudson.plugins.ec2.EC2RetentionStrategy _check
INFO: Idle timeout: edifestivalsapi build slave (i-ce32a08c)
Jul 16, 2014 12:41:53 PM hudson.plugins.ec2.EC2AbstractSlave idleTimeout
INFO: EC2 instance idle time expired: i-ce32a08c
Jul 16, 2014 12:41:53 PM hudson.plugins.ec2.EC2AbstractSlave stop
INFO: EC2 instance stopped: i-ce32a08c

Then a build is triggered and the idle timeout checks run again (again, one set of entries for every instance):

Jul 16, 2014 12:44:23 PM com.cloudbees.jenkins.GitHubPushTrigger$1 run
INFO: SCM changes detected in edifestivalsapi-master. Triggering  #36
Jul 16, 2014 12:45:53 PM hudson.plugins.ec2.EC2RetentionStrategy _check
INFO: Idle timeout: edifestivalsapi build slave (i-ce32a08c)
Jul 16, 2014 12:45:53 PM hudson.plugins.ec2.EC2AbstractSlave idleTimeout
INFO: EC2 instance idle time expired: i-ce32a08c
Jul 16, 2014 12:45:53 PM hudson.plugins.ec2.EC2AbstractSlave stop
INFO: EC2 instance stopped: i-ce32a08c

And then the plugin starts to provision a new instance - apparently without any attempt to restart a stopped slave.

Jul 16, 2014 12:46:33 PM hudson.plugins.ec2.EC2Cloud provision
INFO: Excess workload after pending Spot instances: 1
Jul 16, 2014 12:46:33 PM hudson.plugins.ec2.EC2Cloud addProvisionedSlave
INFO: Provisioning for AMI ami-57ea3d20; Estimated number of total slaves: 0; Estimated number of slaves for ami ami-57ea3d20: 0
Launching ami-57ea3d20 for template edifestivalsapi build slave
Jul 16, 2014 12:46:33 PM hudson.slaves.NodeProvisioner update
INFO: Started provisioning edifestivalsapi build slave (ami-57ea3d20) from ec2-eu-west-1 with 1 executors. Remaining excess workload:0.0
Looking for existing instances: {InstanceIds: [],Filters: [{Name: image-id,Values: [ami-57ea3d20]}, {Name: group-name,Values: [jenkins-build-slave]}, {Name: key-name,Values: [build-slave]}, {Name: instance-type,Values: [t1.micro]}, {Name: tag:Name,Values: [edifestivalsapi-build-slave]}, {Name: tag:Project,Values: [edifestivalsapi]}, {Name: instance-state-name,Values: [stopped, stopping]}],}
No existing instance found - created: {InstanceId: i-eb35a8a9,ImageId: ami-57ea3d20,State: {Code: 0,Name: pending},"**REDACTED**}

Then another block of the idle timeout checks while the instance is launched, and then this:

Jul 16, 2014 12:47:44 PM hudson.slaves.NodeProvisioner update
INFO: edifestivalsapi build slave (ami-57ea3d20) provisioningE successfully completed. We have now 8 computer(s)
Jul 16, 2014 12:47:47 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
WARNING: Making edifestivalsapi build slave (i-ce32a08c) offline because it’s not responding

The UI shows all the slaves that were launched for previous jobs but shows them as offline with "Time out for last 5 try". When I manually start the instance (by clicking onto the slave page and clicking "Launch slave agent") I see that the stopped instance is restarted and comes online as expected.

It does successfully run a subsequent build on the correct existing node if there is one still running.

So my hunch is that that Jenkins somehow isn't detecting that it has a stopped instance for the given AMI?

I have two slave types configured, both using the same AMI but with different labels.

I had initially posted this as a comment on https://issues.jenkins-ci.org/browse/JENKINS-23787 but it sounds like it may not be related - he's seeing it waiting indefinitely for the instance to start, where I'm seeing it skipping and spinning up a new node straight away, leaving the old one marked as not available.

is related to

JENKINS-23787 EC2-plugin not spooling up stopped nodes - "still in the queue ... all nodes of label ... are offline"

Closed

Details

Description

Attachments

Issue Links

Activity

People

Dates