Jenkins is scheduling jobs on machines that the EC2 plugin is terminating.
Jenkins Logs:
{"message":"2021-06-10 00:13:44.883+0000 [id=2488021]\tINFO\th.plugins.ec2.EC2OndemandSlave#lambda$terminate$0: Removed EC2 instance from jenkins controller: i-009304530cc4685ec"}
{"message":"2021-06-10 00:13:41.832+0000 [id=2488021]\tINFO\th.plugins.ec2.EC2OndemandSlave#lambda$terminate$0: Terminated EC2 instance (terminated): i-009304530cc4685ec"}
{"message":"2021-06-10 00:13:41.560+0000 [id=97]\tINFO\th.p.ec2.EC2RetentionStrategy#postJobAction: Agent i-009304530cc4685ec is terminated due to maxTotalUses (1)"}
{"message":"2021-06-10 00:13:40.455+0000 [id=2488307]\tINFO\th.p.ec2.EC2RetentionStrategy#taskAccepted: maxTotalUses drained - suspending agent i-009304530cc4685ec"}
{"message":"2021-06-10 00:12:12.929+0000 [id=402]\tINFO\th.p.ec2.EC2RetentionStrategy#postJobAction: Agent i-009304530cc4685ec is still in use by more than one (1) executers."}
{"message":"2021-06-10 00:12:10.237+0000 [id=2487762]\tINFO\th.p.ec2.EC2RetentionStrategy#taskAccepted: Agent i-009304530cc4685ec has 1 builds left"}
{"message":"2021-06-10 00:05:55.538+0000 [id=2487599]\tINFO\th.p.ec2.EC2RetentionStrategy#taskAccepted: Agent i-009304530cc4685ec has 2 builds left"}
{"message":"2021-06-10 00:04:43.156+0000 [id=2486970]\tINFO\th.p.ec2.EC2RetentionStrategy#taskAccepted: Agent i-009304530cc4685ec has 3 builds left"}
Build Log:
[2021-06-10T00:13:40.457Z] Running on EC2 (go-runner) - go-runner-ami (i-009304530cc4685ec) in /var/jenkins_home/workspace/go-mega-build_PR-28557
...
[2021-06-10T00:13:41.532Z] The recommended git tool is: git
Remote call on EC2 (go-runner) - go-runner-ami (i-009304530cc4685ec) failed
So put a cohesive timeline to this it appears that:
00:13:40.455+0000 [id=2488307]\tINFO\th.p.ec2.EC2RetentionStrategy#taskAccepted: maxTotalUses drained - suspending agent i-009304530cc4685ec"} - Suspend
00:13:40.457Z] Running on EC2 (go-runner) - go-runner-ami (i-009304530cc4685ec) in /var/jenkins_home/workspace/go-mega-build_PR-28557 - Schedule
00:13:41.560+0000 [id=97]\tINFO\th.p.ec2.EC2RetentionStrategy#postJobAction: Agent i-009304530cc4685ec is terminated due to maxTotalUses (1)"} - Terminate
All over the course of 5ms. It feels like there is something of a race condition happening unless how jobs are being counted has changed unexpectedly.