The ec2 plugin for Jenkins generates too much AWS API calls.
After analysis of AWS API calls most calls (99%) was caused by following line of code:
This means that every job step execution the ec2 plugin calls ec2:DescribeInstances AWS API with instance ID of jenkins slave as a filter.
In our environment of about 108 slaves this cause AWS API calls in frequency of 5.5 invocations per second (more than 25% of allowable AWS limits) or about 183 calls per hour (3 calls per minute) per one slave which is too high.
Moreover the retry mechanics is very bad. The retry count was hard-coded in 80 times (5 soft retries from https://issues.jenkins-ci.org/browse/JENKINS-15319 and 16 API retries from https://issues.jenkins-ci.org/browse/JENKINS-26800).
This cause even DOS of AWS API calls when AWS API already throttles.
So nobody can access AWS API properly when this happens.
What is required:
- Use some cache of slave information to not update instance information so fast (e.g. use configurable update interval defaults to minute and in case of slave access error).
- Configure retries in more usual way allow non-linear time grow before next AWS API call.