[JENKINS-23787] EC2-plugin not spooling up stopped nodes - "still in the queue ... all nodes of label ... are offline" - Jenkins Jira

Type: Bug
Resolution: Fixed
Priority: Major
Component/s: ec2-plugin
Labels:
Environment:
Jenkins 1.572, EC2 plugin 1.21, Node Iterator API Plugin 1.5

Similar Issues:
Powered by SuggestiMate

Show

The Jenkins EC2 plugin no longer launches stopped nodes. Unfortunately I'm not sure exactly when it stopped working - I wasn't sure that was the issue until later, due to unrelated issues caused by too many nodes spawning and having to be killed.

If I use Manage Jenkins -> Manage Nodes to start a stopped EC2 node that a build is waiting on manually, the build proceeds.

Builds succeed when the EC2 plugin spawns a new node for the first time. It's only a problem if the node is stopped for idleness - the plugin doesn't seem to restart it.

Builds get stuck with output like:

Triggering bdr_linux ? x64,debian7
Triggering bdr_linux ? x86,amazonlinux201209
Triggering bdr_linux ? x86,debian6
Triggering bdr_linux ? x64,amazonlinux201209
Configuration bdr_linux ? x86,amazonlinux201209 is still in the queue: Amazon Linux 2012.09 EBS 32-bit  (i-b848fbfa) is offline
Configuration bdr_linux ? x86,amazonlinux201209 is still in the queue: All nodes of label ?amazonlinux201209&&x86? are offline

where there's at least one node with that label stopped, ready to start and use.

There's no sign that any attempt is made to start the node.

is related to

JENKINS-23792 PATCH: EC2 plugin idles-down nodes that are still launching

Resolved

JENKINS-23850 PATCH: EC2-plugin always starting new slaves instead of restarting existing

Closed

Craig Ringer added a comment - 2014-07-15 04:12

After updating to 1.23 I instead get the behaviour in ~~JENKINS-23788~~ . Manual node launch no longer works.

Craig Ringer added a comment - 2014-07-15 04:12 After updating to 1.23 I instead get the behaviour in JENKINS-23788 . Manual node launch no longer works.

Andrew Coulton added a comment - 2014-07-16 12:42

I think I am seeing a similar issue with EC2 1.23 and Node Iterator 1.5.

Every time I start a build, Jenkins launches a new slave rather than restarting one of the stopped instances. It is successfully stopping the instance when it hits the idle time.

During idle, the log shows lots of this (the _check, idleTimeout, stop entries appear once for every instance currently registered):

Jul 16, 2014 12:40:47 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Started EC2 alive slaves monitor
Jul 16, 2014 12:40:48 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished EC2 alive slaves monitor. 1172 ms
Jul 16, 2014 12:41:53 PM hudson.plugins.ec2.EC2RetentionStrategy _check
INFO: Idle timeout: edifestivalsapi build slave (i-ce32a08c)
Jul 16, 2014 12:41:53 PM hudson.plugins.ec2.EC2AbstractSlave idleTimeout
INFO: EC2 instance idle time expired: i-ce32a08c
Jul 16, 2014 12:41:53 PM hudson.plugins.ec2.EC2AbstractSlave stop
INFO: EC2 instance stopped: i-ce32a08c

Then a build is triggered and the idle timeout checks run again (again, one set of entries for every instance):

Jul 16, 2014 12:44:23 PM com.cloudbees.jenkins.GitHubPushTrigger$1 run
INFO: SCM changes detected in edifestivalsapi-master. Triggering  #36
Jul 16, 2014 12:45:53 PM hudson.plugins.ec2.EC2RetentionStrategy _check
INFO: Idle timeout: edifestivalsapi build slave (i-ce32a08c)
Jul 16, 2014 12:45:53 PM hudson.plugins.ec2.EC2AbstractSlave idleTimeout
INFO: EC2 instance idle time expired: i-ce32a08c
Jul 16, 2014 12:45:53 PM hudson.plugins.ec2.EC2AbstractSlave stop
INFO: EC2 instance stopped: i-ce32a08c

And then the plugin starts to provision a new instance - apparently without any attempt to restart a stopped slave.

Jul 16, 2014 12:46:33 PM hudson.plugins.ec2.EC2Cloud provision
INFO: Excess workload after pending Spot instances: 1
Jul 16, 2014 12:46:33 PM hudson.plugins.ec2.EC2Cloud addProvisionedSlave
INFO: Provisioning for AMI ami-57ea3d20; Estimated number of total slaves: 0; Estimated number of slaves for ami ami-57ea3d20: 0
Launching ami-57ea3d20 for template edifestivalsapi build slave
Jul 16, 2014 12:46:33 PM hudson.slaves.NodeProvisioner update
INFO: Started provisioning edifestivalsapi build slave (ami-57ea3d20) from ec2-eu-west-1 with 1 executors. Remaining excess workload:0.0
Looking for existing instances: {InstanceIds: [],Filters: [{Name: image-id,Values: [ami-57ea3d20]}, {Name: group-name,Values: [jenkins-build-slave]}, {Name: key-name,Values: [build-slave]}, {Name: instance-type,Values: [t1.micro]}, {Name: tag:Name,Values: [edifestivalsapi-build-slave]}, {Name: tag:Project,Values: [edifestivalsapi]}, {Name: instance-state-name,Values: [stopped, stopping]}],}
No existing instance found - created: {InstanceId: i-eb35a8a9,ImageId: ami-57ea3d20,State: {Code: 0,Name: pending},"**REDACTED**}

Then another block of the idle timeout checks while the instance is launched, and then this:

Jul 16, 2014 12:47:44 PM hudson.slaves.NodeProvisioner update
INFO: edifestivalsapi build slave (ami-57ea3d20) provisioningE successfully completed. We have now 8 computer(s)
Jul 16, 2014 12:47:47 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
WARNING: Making edifestivalsapi build slave (i-ce32a08c) offline because it’s not responding

The UI shows all the slaves that were launched for previous jobs but shows them as offline with "Time out for last 5 try". When I manually start the instance (by clicking onto the slave page and clicking "Launch slave agent") I see that the stopped instance is restarted and comes online as expected.

So my hunch is that that Jenkins somehow isn't detecting that it has a stopped instance for the given AMI?

Andrew Coulton added a comment - 2014-07-16 12:42 I think I am seeing a similar issue with EC2 1.23 and Node Iterator 1.5. Every time I start a build, Jenkins launches a new slave rather than restarting one of the stopped instances. It is successfully stopping the instance when it hits the idle time. During idle, the log shows lots of this (the _check, idleTimeout, stop entries appear once for every instance currently registered): Jul 16, 2014 12:40:47 PM hudson.model.AsyncPeriodicWork$1 run INFO: Started EC2 alive slaves monitor Jul 16, 2014 12:40:48 PM hudson.model.AsyncPeriodicWork$1 run INFO: Finished EC2 alive slaves monitor. 1172 ms Jul 16, 2014 12:41:53 PM hudson.plugins.ec2.EC2RetentionStrategy _check INFO: Idle timeout: edifestivalsapi build slave (i-ce32a08c) Jul 16, 2014 12:41:53 PM hudson.plugins.ec2.EC2AbstractSlave idleTimeout INFO: EC2 instance idle time expired: i-ce32a08c Jul 16, 2014 12:41:53 PM hudson.plugins.ec2.EC2AbstractSlave stop INFO: EC2 instance stopped: i-ce32a08c Then a build is triggered and the idle timeout checks run again (again, one set of entries for every instance): Jul 16, 2014 12:44:23 PM com.cloudbees.jenkins.GitHubPushTrigger$1 run INFO: SCM changes detected in edifestivalsapi-master. Triggering #36 Jul 16, 2014 12:45:53 PM hudson.plugins.ec2.EC2RetentionStrategy _check INFO: Idle timeout: edifestivalsapi build slave (i-ce32a08c) Jul 16, 2014 12:45:53 PM hudson.plugins.ec2.EC2AbstractSlave idleTimeout INFO: EC2 instance idle time expired: i-ce32a08c Jul 16, 2014 12:45:53 PM hudson.plugins.ec2.EC2AbstractSlave stop INFO: EC2 instance stopped: i-ce32a08c And then the plugin starts to provision a new instance - apparently without any attempt to restart a stopped slave. Jul 16, 2014 12:46:33 PM hudson.plugins.ec2.EC2Cloud provision INFO: Excess workload after pending Spot instances: 1 Jul 16, 2014 12:46:33 PM hudson.plugins.ec2.EC2Cloud addProvisionedSlave INFO: Provisioning for AMI ami-57ea3d20; Estimated number of total slaves: 0; Estimated number of slaves for ami ami-57ea3d20: 0 Launching ami-57ea3d20 for template edifestivalsapi build slave Jul 16, 2014 12:46:33 PM hudson.slaves.NodeProvisioner update INFO: Started provisioning edifestivalsapi build slave (ami-57ea3d20) from ec2-eu-west-1 with 1 executors. Remaining excess workload:0.0 Looking for existing instances: {InstanceIds: [],Filters: [{Name: image-id,Values: [ami-57ea3d20]}, {Name: group-name,Values: [jenkins-build-slave]}, {Name: key-name,Values: [build-slave]}, {Name: instance-type,Values: [t1.micro]}, {Name: tag:Name,Values: [edifestivalsapi-build-slave]}, {Name: tag:Project,Values: [edifestivalsapi]}, {Name: instance-state-name,Values: [stopped, stopping]}],} No existing instance found - created: {InstanceId: i-eb35a8a9,ImageId: ami-57ea3d20,State: {Code: 0,Name: pending},"**REDACTED**} Then another block of the idle timeout checks while the instance is launched, and then this: Jul 16, 2014 12:47:44 PM hudson.slaves.NodeProvisioner update INFO: edifestivalsapi build slave (ami-57ea3d20) provisioningE successfully completed. We have now 8 computer(s) Jul 16, 2014 12:47:47 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor WARNING: Making edifestivalsapi build slave (i-ce32a08c) offline because it’s not responding The UI shows all the slaves that were launched for previous jobs but shows them as offline with "Time out for last 5 try". When I manually start the instance (by clicking onto the slave page and clicking "Launch slave agent") I see that the stopped instance is restarted and comes online as expected. So my hunch is that that Jenkins somehow isn't detecting that it has a stopped instance for the given AMI?

Craig Ringer added a comment - 2014-07-17 02:52

So my hunch is that that Jenkins somehow isn't detecting that it has a stopped instance for the given AMI?

That used to happen due to a bug when label support was added, but it's fixed in 1.21 IIRC.

I'm not seeing the same behaviour - rather, it's just waiting indefinitely for the stopped node to start.

It might be a good idea to file a separate issue for what you're discussing here, then comment to mention the issue number here in case they prove to be related.

Craig Ringer added a comment - 2014-07-17 02:52 So my hunch is that that Jenkins somehow isn't detecting that it has a stopped instance for the given AMI? That used to happen due to a bug when label support was added, but it's fixed in 1.21 IIRC. I'm not seeing the same behaviour - rather, it's just waiting indefinitely for the stopped node to start. It might be a good idea to file a separate issue for what you're discussing here, then comment to mention the issue number here in case they prove to be related.

Andrew Coulton added a comment - 2014-07-17 08:01

Thanks Craig, I've filed as a separate issue at https://issues.jenkins-ci.org/browse/JENKINS-23850. Do you have an instance cap? It's possible we're seeing the same thing if your Jenkins has hit the instance cap and therefore can't start a new node (so the build stalls), while mine is uncapped so just goes ahead and makes a new one.

Andrew Coulton added a comment - 2014-07-17 08:01 Thanks Craig, I've filed as a separate issue at https://issues.jenkins-ci.org/browse/JENKINS-23850 . Do you have an instance cap? It's possible we're seeing the same thing if your Jenkins has hit the instance cap and therefore can't start a new node (so the build stalls), while mine is uncapped so just goes ahead and makes a new one.

Craig Ringer added a comment - 2014-07-18 02:25

I do have an instance cap, but neither the per-node-type nor global instance caps are being reached. It's an issue with restarting existing stoppped nodes, not with starting new ones.

Craig Ringer added a comment - 2014-07-18 02:25 I do have an instance cap, but neither the per-node-type nor global instance caps are being reached. It's an issue with restarting existing stoppped nodes, not with starting new ones.

Andrew Coulton added a comment - 2014-07-18 08:06

OK, does sound like yours is a different issue then.

Andrew Coulton added a comment - 2014-07-18 08:06 OK, does sound like yours is a different issue then.

Craig Ringer added a comment - 2014-07-18 09:25 - edited

(Removed; this should've been a comment on ~~JENKINS-23792~~ instead.)

Craig Ringer added a comment - 2014-07-18 09:25 - edited (Removed; this should've been a comment on JENKINS-23792 instead.)

Craig Ringer added a comment - 2014-07-18 10:46 - edited

Here's an example slave log from such a case.

Starting existing instance: i-185cce5a result:{StartingInstances: [{InstanceId: i-185cce5a,CurrentState: {Code: 0,Name: pending},PreviousState: {Code: 80,Name: stopped}}]}
Starting existing instance: i-185cce5a result:{StartingInstances: [{InstanceId: i-185cce5a,CurrentState: {Code: 0,Name: pending},PreviousState: {Code: 80,Name: stopped}}]}
Connecting to 10.0.1.17 on port 22, with timeout 10000.
Connected via SSH.
bootstrap()
Getting keypair...
Using key: jenkins_ec2_key
3c:05:cc:a3:75:e9:2b:01:01:a4:b0:a2:0b:bd:18:32:d3:59:ae:5d
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAl5L6kI5Dzmvo6gYt7FgWnt/Z//ybp+l2YB0F8CkoZMM0wj16wCXwJvDlZOeC
QAt3FoKQn8s/OTqcaUMKKhBefGgQCmI+ZaKE3/Xu5AGMxu66bqI
Authenticating as admin
take over connection
Executing init script
sudo: unable to resolve host ip-10-0-1-17
sudo: unable to resolve host ip-10-0-1-17
Hit http://ftp.ie.debian.org squeeze Release.gpg
Ign http://ftp.ie.debian.org/debian/ squeeze/main Translation-en
Hit http://ftp.ie.debian.org squeeze Release
Hit http://security.debian.org squeeze/updates Release.gpg
Ign http://security.debian.org/ squeeze/updates/main Translation-en
Hit http://ftp.ie.debian.org squeeze/main Sources
Hit http://ftp.ie.debian.org squeeze/main amd64 Packages
Hit http://security.debian.org squeeze/updates Release
Hit http://security.debian.org squeeze/updates/main Sources
Hit http://security.debian.org squeeze/updates/main amd64 Packages
Reading package lists...
Reading package lists...
Building dependency tree...

Broadcast message from root@ip-10-0-1-17 (Fri Jul 18 04:44:23 2014):




The system is going down for system halt NOW!

Jenkins thinks the state of the node is "Offline" with message "This node is being launched" but the log clearly shows that it's actually shut down.

In this case the node was launched by a previous job. It looks like it must've been idling out when the job began, though there aren't any timestamps in the job console output to confirm:

Started by an SCM change
[EnvInject] - Loading node environment variables.
Building on master in workspace /var/lib/jenkins/workspace/bdr_linux
.... git spam here ....
Triggering bdr_linux ? x64,debian7
Triggering bdr_linux ? x64,amazonlinux201209
Triggering bdr_linux ? x86,amazonlinux201209
Triggering bdr_linux ? x86,debian6
Configuration bdr_linux ? x64,amazonlinux201209 is still in the queue: Amazon Linux 2012.09 EBS 64-bit  (i-205dcf62) is offline
Configuration bdr_linux ? x64,amazonlinux201209 is still in the queue: All nodes of label ?amazonlinux201209&&x64? are offline
Configuration bdr_linux ? x64,amazonlinux201209 is still in the queue: Waiting for next available executor on amazonlinux201209&&x64
bdr_linux ? x64,amazonlinux201209 completed with result SUCCESS
bdr_linux ? x86,amazonlinux201209 completed with result SUCCESS
Configuration bdr_linux ? x86,debian6 is still in the queue: Debian6_x86 EBS (i-185cce5a) is offline

The build start timestamp for the matrix build is Jul 18, 2014 4:34:14 .

The EC2 plugin logs show it being idled down:

Jul 18, 2014 10:43:11 AM INFO hudson.plugins.ec2.EC2RetentionStrategy _check
Idle timeout: Debian6_x86 EBS (i-185cce5a)
Jul 18, 2014 10:43:11 AM INFO hudson.plugins.ec2.EC2AbstractSlave idleTimeout
EC2 instance idle time expired: i-185cce5a
Jul 18, 2014 10:43:11 AM INFO hudson.plugins.ec2.EC2AbstractSlave stop
EC2 instance stopped: i-185cce5a

The worker node is in a different timezone to Jenkins, hence the mismatched times. You can see the node went down at :44, moments after idle stop was requested.

I'm beginning to suspect that this issue is related to matrix builds, and specifically to matrix builds with a touchstone build that runs first.

Craig Ringer added a comment - 2014-07-18 10:46 - edited Here's an example slave log from such a case. Starting existing instance: i-185cce5a result:{StartingInstances: [{InstanceId: i-185cce5a,CurrentState: {Code: 0,Name: pending},PreviousState: {Code: 80,Name: stopped}}]} Starting existing instance: i-185cce5a result:{StartingInstances: [{InstanceId: i-185cce5a,CurrentState: {Code: 0,Name: pending},PreviousState: {Code: 80,Name: stopped}}]} Connecting to 10.0.1.17 on port 22, with timeout 10000. Connected via SSH. bootstrap() Getting keypair... Using key: jenkins_ec2_key 3c:05:cc:a3:75:e9:2b:01:01:a4:b0:a2:0b:bd:18:32:d3:59:ae:5d -----BEGIN RSA PRIVATE KEY----- MIIEpAIBAAKCAQEAl5L6kI5Dzmvo6gYt7FgWnt/Z //ybp+l2YB0F8CkoZMM0wj16wCXwJvDlZOeC QAt3FoKQn8s/OTqcaUMKKhBefGgQCmI+ZaKE3/Xu5AGMxu66bqI Authenticating as admin take over connection Executing init script sudo: unable to resolve host ip-10-0-1-17 sudo: unable to resolve host ip-10-0-1-17 Hit http: //ftp.ie.debian.org squeeze Release.gpg Ign http: //ftp.ie.debian.org/debian/ squeeze/main Translation-en Hit http: //ftp.ie.debian.org squeeze Release Hit http: //security.debian.org squeeze/updates Release.gpg Ign http: //security.debian.org/ squeeze/updates/main Translation-en Hit http: //ftp.ie.debian.org squeeze/main Sources Hit http: //ftp.ie.debian.org squeeze/main amd64 Packages Hit http: //security.debian.org squeeze/updates Release Hit http: //security.debian.org squeeze/updates/main Sources Hit http: //security.debian.org squeeze/updates/main amd64 Packages Reading package lists... Reading package lists... Building dependency tree... Broadcast message from root@ip-10-0-1-17 (Fri Jul 18 04:44:23 2014): The system is going down for system halt NOW! Jenkins thinks the state of the node is "Offline" with message "This node is being launched" but the log clearly shows that it's actually shut down. In this case the node was launched by a previous job. It looks like it must've been idling out when the job began, though there aren't any timestamps in the job console output to confirm: Started by an SCM change [EnvInject] - Loading node environment variables. Building on master in workspace / var /lib/jenkins/workspace/bdr_linux .... git spam here .... Triggering bdr_linux ? x64,debian7 Triggering bdr_linux ? x64,amazonlinux201209 Triggering bdr_linux ? x86,amazonlinux201209 Triggering bdr_linux ? x86,debian6 Configuration bdr_linux ? x64,amazonlinux201209 is still in the queue: Amazon Linux 2012.09 EBS 64-bit (i-205dcf62) is offline Configuration bdr_linux ? x64,amazonlinux201209 is still in the queue: All nodes of label ?amazonlinux201209&&x64? are offline Configuration bdr_linux ? x64,amazonlinux201209 is still in the queue: Waiting for next available executor on amazonlinux201209&&x64 bdr_linux ? x64,amazonlinux201209 completed with result SUCCESS bdr_linux ? x86,amazonlinux201209 completed with result SUCCESS Configuration bdr_linux ? x86,debian6 is still in the queue: Debian6_x86 EBS (i-185cce5a) is offline The build start timestamp for the matrix build is Jul 18, 2014 4:34:14 . The EC2 plugin logs show it being idled down: Jul 18, 2014 10:43:11 AM INFO hudson.plugins.ec2.EC2RetentionStrategy _check Idle timeout: Debian6_x86 EBS (i-185cce5a) Jul 18, 2014 10:43:11 AM INFO hudson.plugins.ec2.EC2AbstractSlave idleTimeout EC2 instance idle time expired: i-185cce5a Jul 18, 2014 10:43:11 AM INFO hudson.plugins.ec2.EC2AbstractSlave stop EC2 instance stopped: i-185cce5a The worker node is in a different timezone to Jenkins, hence the mismatched times. You can see the node went down at :44, moments after idle stop was requested. I'm beginning to suspect that this issue is related to matrix builds, and specifically to matrix builds with a touchstone build that runs first.

Magnar Sveen added a comment - 2014-10-30 14:01 - edited

We had this same issue. Or the same symptoms at least. If your issue is with matrix build, then it's not the same. But I'll add this here in case someone else stumbles over the same problem:

The latest stable release of Jenkins just looked at the instance AMI ID to determine if any slaves were running. Since our master had the same AMI as our slave, Jenkins figured we had reached our maximum amount of slaves (1).

Having hit the maximum amount of slaves, it never tried provisioning any new slaves - and the EC2 plugin fires up stopped slaves as part of the provision call.

Upgrading to a SNAPSHOT version of Jenkins solved the issue. It now adds a tag to all slaves, and uses that when counting.

Magnar Sveen added a comment - 2014-10-30 14:01 - edited We had this same issue. Or the same symptoms at least. If your issue is with matrix build, then it's not the same. But I'll add this here in case someone else stumbles over the same problem: The latest stable release of Jenkins just looked at the instance AMI ID to determine if any slaves were running. Since our master had the same AMI as our slave, Jenkins figured we had reached our maximum amount of slaves (1). Having hit the maximum amount of slaves, it never tried provisioning any new slaves - and the EC2 plugin fires up stopped slaves as part of the provision call. Upgrading to a SNAPSHOT version of Jenkins solved the issue. It now adds a tag to all slaves, and uses that when counting.

Ebrahim Moshaya added a comment - 2015-08-28 13:20

I'm having the same issue with Jenkins 1.626 and ec2 plugin 1.29. None of the offline ec2 slaves are starting up when the matrix job starts. Instead I have to manually start them up. Does anyone have a solution for this?

Ebrahim Moshaya added a comment - 2015-08-28 13:20 I'm having the same issue with Jenkins 1.626 and ec2 plugin 1.29. None of the offline ec2 slaves are starting up when the matrix job starts. Instead I have to manually start them up. Does anyone have a solution for this?

Francis Upton added a comment - 2015-08-31 09:42

I have just made a fix for this in unreleased 1.30, if you can get the SNAPSHOT version of the ec2-plugin, hopefully this is fixed. Please give it and try and let me know.

Francis Upton added a comment - 2015-08-31 09:42 I have just made a fix for this in unreleased 1.30, if you can get the SNAPSHOT version of the ec2-plugin, hopefully this is fixed. Please give it and try and let me know.

Ebrahim Moshaya added a comment - 2015-08-31 10:25

francisu I found the main issue was with using the ec2 plugin with the Throttle Concurrent Builds plugin:

https://wiki.jenkins-ci.org/display/JENKINS/Throttle+Concurrent+Builds+Plugin

If I set a job to to the only job executing on a node with 5 executors. Other jobs triggered wait in the queue for the job to finish as opposed to spinning up another instance that's in a stopped state.

Ebrahim Moshaya added a comment - 2015-08-31 10:25 francisu I found the main issue was with using the ec2 plugin with the Throttle Concurrent Builds plugin: https://wiki.jenkins-ci.org/display/JENKINS/Throttle+Concurrent+Builds+Plugin If I set a job to to the only job executing on a node with 5 executors. Other jobs triggered wait in the queue for the job to finish as opposed to spinning up another instance that's in a stopped state.

Francis Upton added a comment - 2015-08-31 10:30

Thanks for letting me know. I will close this one and reopen my duplicated report, as that has to do with hooking up to EC2 instances that are actually running (even though the corresponding Jenkins slave is offline).

Francis Upton added a comment - 2015-08-31 10:30 Thanks for letting me know. I will close this one and reopen my duplicated report, as that has to do with hooking up to EC2 instances that are actually running (even though the corresponding Jenkins slave is offline).

Ted Xiao added a comment - 2015-09-11 08:40 - edited

I think the commit https://github.com/jenkinsci/ec2-plugin/commit/c9d69a5da8c9be094701d4c191ba7b1d06c200c9 breaks plugin, user can not launch multiple instances for same AMI since DescribeInstancesRequest returns all instances instead of stopped instances, and no instances can be launched if there is any running

below line was removed
diFilters.add(new Filter("instance-state-name").withValues(InstanceStateName.Stopped.toString(), InstanceStateName.Stopping.toString()));

Ted Xiao added a comment - 2015-09-11 08:40 - edited I think the commit https://github.com/jenkinsci/ec2-plugin/commit/c9d69a5da8c9be094701d4c191ba7b1d06c200c9 breaks plugin, user can not launch multiple instances for same AMI since DescribeInstancesRequest returns all instances instead of stopped instances, and no instances can be launched if there is any running below line was removed diFilters.add(new Filter("instance-state-name").withValues(InstanceStateName.Stopped.toString(), InstanceStateName.Stopping.toString()));

Francis Upton added a comment - 2015-10-06 00:05

@Ted, I don't see how that line broke it. Note that I changed EC2Cloud to look up any running (or pending) instances and check that they are known. Only if they are known will they count against the existing instances. I will test it again though, to try multiple instances with the same AMI. The reason the filter was removed is that we want to see if there is a running instance that's not known, and if so, use that one.

Francis Upton added a comment - 2015-10-06 00:05 @Ted, I don't see how that line broke it. Note that I changed EC2Cloud to look up any running (or pending) instances and check that they are known. Only if they are known will they count against the existing instances. I will test it again though, to try multiple instances with the same AMI. The reason the filter was removed is that we want to see if there is a running instance that's not known, and if so, use that one.

Francis Upton added a comment - 2015-10-25 04:44

I believe I have now fixed this.

Francis Upton added a comment - 2015-10-25 04:44 I believe I have now fixed this.

Francis Upton added a comment - 2015-10-25 04:45

Should be released in 1.30

Francis Upton added a comment - 2015-10-25 04:45 Should be released in 1.30

Francis Upton added a comment - 2016-01-07 04:29

Fixed a problem with this of it not respecting the instance caps for on-demand nodes.

Francis Upton added a comment - 2016-01-07 04:29 Fixed a problem with this of it not respecting the instance caps for on-demand nodes.

Assignee:: Francis Upton

Reporter:: Craig Ringer

Votes:: 2 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2014-07-15 04:02

Updated:: 2016-05-09 07:23

Resolved:: 2015-10-25 04:45

Jenkins

Details

Description

Attachments

Issue Links

Activity

[JENKINS-23787] EC2-plugin not spooling up stopped nodes - "still in the queue ... all nodes of label ... are offline"

Collapse comment: Craig Ringer added a comment - 2014-07-15 04:12

Expand comment: Craig Ringer added a comment - 2014-07-15 04:12

Collapse comment: Andrew Coulton added a comment - 2014-07-16 12:42

Expand comment: Andrew Coulton added a comment - 2014-07-16 12:42

Collapse comment: Craig Ringer added a comment - 2014-07-17 02:52

Expand comment: Craig Ringer added a comment - 2014-07-17 02:52

Collapse comment: Andrew Coulton added a comment - 2014-07-17 08:01

Expand comment: Andrew Coulton added a comment - 2014-07-17 08:01

Collapse comment: Craig Ringer added a comment - 2014-07-18 02:25

Expand comment: Craig Ringer added a comment - 2014-07-18 02:25

Collapse comment: Andrew Coulton added a comment - 2014-07-18 08:06

Expand comment: Andrew Coulton added a comment - 2014-07-18 08:06

Collapse comment: Craig Ringer added a comment - 2014-07-18 09:25, Edited by Craig Ringer - 2014-07-18 10:28

Expand comment: Craig Ringer added a comment - 2014-07-18 09:25, Edited by Craig Ringer - 2014-07-18 10:28

Collapse comment: Craig Ringer added a comment - 2014-07-18 10:46, Edited by Craig Ringer - 2014-07-18 10:51

Expand comment: Craig Ringer added a comment - 2014-07-18 10:46, Edited by Craig Ringer - 2014-07-18 10:51

Collapse comment: Magnar Sveen added a comment - 2014-10-30 14:01, Edited by Magnar Sveen - 2014-10-30 14:03

Expand comment: Magnar Sveen added a comment - 2014-10-30 14:01, Edited by Magnar Sveen - 2014-10-30 14:03

Collapse comment: Ebrahim Moshaya added a comment - 2015-08-28 13:20

Expand comment: Ebrahim Moshaya added a comment - 2015-08-28 13:20

Collapse comment: Francis Upton added a comment - 2015-08-31 09:42

Expand comment: Francis Upton added a comment - 2015-08-31 09:42

Collapse comment: Ebrahim Moshaya added a comment - 2015-08-31 10:25

Expand comment: Ebrahim Moshaya added a comment - 2015-08-31 10:25

Collapse comment: Francis Upton added a comment - 2015-08-31 10:30

Expand comment: Francis Upton added a comment - 2015-08-31 10:30

Collapse comment: Ted Xiao added a comment - 2015-09-11 08:40, Edited by Ted Xiao - 2015-09-11 08:40

Expand comment: Ted Xiao added a comment - 2015-09-11 08:40, Edited by Ted Xiao - 2015-09-11 08:40

Collapse comment: Francis Upton added a comment - 2015-10-06 00:05

Expand comment: Francis Upton added a comment - 2015-10-06 00:05

Collapse comment: Francis Upton added a comment - 2015-10-25 04:44

Expand comment: Francis Upton added a comment - 2015-10-25 04:44

Collapse comment: Francis Upton added a comment - 2015-10-25 04:45

Expand comment: Francis Upton added a comment - 2015-10-25 04:45

Collapse comment: Francis Upton added a comment - 2016-01-07 04:29

Expand comment: Francis Upton added a comment - 2016-01-07 04:29

People

Dates