Resolution: Fixed
* EC2 plugin version 1.26.
* Jenkins 1.580.2 running inside the official Jenkins Docker LTS image.
* Host O/S: Ubuntu 14.04 LTS 64-bit on an EC2 master.
* EC2 rights are conferred via an EC2 InstanceProfile.
After Jenkins first starts it is able to launch EC2 slaves, both manually and when jobs indicate they need to use the slave label.
A few hours later (not sure how long, maybe 24 hours?) slaves no longer start, manually or automatically. In "Manage Jenkins -> System Log -> All Jenkins Logs" the following error occurs repeatedly. Restarting Jenkins solves the problem.
Started EC2 alive slaves monitor Feb 09, 2015 5:14:47 AM INFO hudson.model.AsyncPeriodicWork$1 run Finished EC2 alive slaves monitor. 0 ms Feb 09, 2015 5:15:51 AM INFO hudson.plugins.ec2.EC2Cloud provision Excess workload after pending Spot instances: 1 Feb 09, 2015 5:15:53 AM WARNING hudson.plugins.ec2.EC2Cloud provision Failed to count the # of live instances on EC2 com.amazonaws.AmazonServiceException: Request has expired. (Service: AmazonEC2; Status Code: 400; Error Code: RequestExpired; Request ID: 59f7935f-15f0-455c-a6f1-f6057f5ffc77) at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:886) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:484) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:256) at com.amazonaws.services.ec2.AmazonEC2Client.invoke(AmazonEC2Client.java:8798) at com.amazonaws.services.ec2.AmazonEC2Client.describeInstances(AmazonEC2Client.java:4137) at com.amazonaws.services.ec2.AmazonEC2Client.describeInstances(AmazonEC2Client.java:8087) at hudson.plugins.ec2.EC2Cloud.countCurrentEC2Slaves(EC2Cloud.java:228) at hudson.plugins.ec2.EC2Cloud.addProvisionedSlave(EC2Cloud.java:299) at hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:389) at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:281) at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:51) at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:368) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
I also don't understand the log statement Excess workload after pending Spot instances: 1 as I have not ticked the "Use Spot instance" tick box.
In my cloud settings I have ticked the "Use EC2 instance profile to obtain credentials" and have set both the access key and secret key values to "THIS VALUE IS NOT USED - THE INSTANCE PROFILE IS USED INSTEAD".
Thanks for the info Martin.
I spent some more time looking into this tonight and I think I found the cause. Even better, I think the fix is quite simple. At the moment, in EC2Cloud: we create an AmazonEC2Client like so
AmazonEC2 client = new AmazonEC2Client(credentialsProvider.getCredentials(), config);
According to the Amazon SDK source this creates a StaticCredentialsProvider using the given credentials. From what I can tell, StaticCredentialsProvider never refreshes its credentials, leading to expiration.
Instead, you can create an AmazonEC2Client with a credentials provider directly. This should, as far as I can tell, refresh the credentials as needed.
AmazonEC2 client = new AmazonEC2Client(credentialsProvider, config);
This is further supported by this amazon documentation, which states
[emphasis mine]I just uploaded a version of the plugin with this change to our Jenkins server. I'll let it run and report back tomorrow if I see any errors. If it works, I'll create a pull request.