Details
-
Bug
-
Status: Resolved (View Workflow)
-
Blocker
-
Resolution: Fixed
-
None
-
Jenkins 1.481
Description
Every day or so I get errors like this when starting a slave:
ERROR: The instance ID 'i-6fdc2512' does not exist
Status Code: 400, AWS Service: AmazonEC2, AWS Request ID: 195aae09-8d42-4f24-9a30-2d31b7447818, AWS Error Code: InvalidInstanceID.NotFound, AWS Error Message: The instance ID 'i-6fdc2512' does not exist
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:547)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:284)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:169)
at com.amazonaws.services.ec2.AmazonEC2Client.invoke(AmazonEC2Client.java:5684)
at com.amazonaws.services.ec2.AmazonEC2Client.describeInstances(AmazonEC2Client.java:2543)
at hudson.plugins.ec2.EC2Computer._describeInstance(EC2Computer.java:97)
at hudson.plugins.ec2.EC2Computer.getState(EC2Computer.java:76)
at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:33)
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:200)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
and I have to manually delete them to unblock jobs. The odd part is that this instance actually does exists on the EC2 Mgmnt console. Any idea what this is about and how I can resolve it ?
-John
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Summary | Error startign slave nodes | Failure starting slave nodes |
Priority | Critical [ 2 ] | Blocker [ 1 ] |
Resolution | Fixed [ 1 ] | |
Status | Open [ 1 ] | Resolved [ 5 ] |
Workflow | JNJira [ 146042 ] | JNJira + In-Review [ 191756 ] |
I have found a few reports of this type of error on the Amazon forums and it seems to be some sort of race condition when calling describeInstance() within a few seconds of creating the instance. I guess that instance hasn't propagated in the AWS system.... ? I was thinking the simply solution here might be to add a catch this and add a configurable retry counter with a delay on this type of exception. Thoughts?