-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Jenkins Version: Jenkins 2.277.1
EC2 Fleet Plugin: 2.5.1
ssh-slaves-plugin: 1.31.5
We currently use ec2-fleet plugin for spinning up jenkins agents in EC2. The fleet plugin basically uses an autoscaling group and modifies the desired count/destroys instances as and when required, depending upon the jobs in queue.
Intermittently we see jenkins agents getting disconnected and terminated in between execution of jobs. The logs we found when this happens in jenkins.log file is below.
2022-04-20 11:04:28.423+0000 [id=995292] INFO h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel i-0cc23ec232cdf0ea8 3java.io.EOFException 4 at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2832) 5 at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3307) 6 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:934) 7 at java.io.ObjectInputStream.<init>(ObjectInputStream.java:396) 8 at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49) 9 at hudson.remoting.Command.readFrom(Command.java:142) 10 at hudson.remoting.Command.readFrom(Command.java:128) 11 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35) 12 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61) 13Caused: java.io.IOException: Unexpected termination of the channel 14 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75) 152022-04-20 11:04:28.426+0000 [id=995292] INFO c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: DISCONNECTED: FleetCloud i-0cc23ec232cdf0ea8 162022-04-20 11:04:28.426+0000 [id=995292] INFO c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: Start retriggering executors for FleetCloud i-0cc23ec232cdf0ea8 172022-04-20 11:04:28.427+0000 [id=995292] INFO c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: RETRIGGERING: org.jenkinsci.plugins.workflow.job.WorkflowJob@602b0ba2[] - WITH ACTIONS: [] 182022-04-20 11:04:28.429+0000 [id=995292] INFO c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: RETRIGGERING: org.jenkinsci.plugins.workflow.job.WorkflowJob@40bf5615[] - WITH ACTIONS: [] 192022-04-20 11:04:28.430+0000 [id=995292] INFO c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: Finished retriggering executors for FleetCloud i-0cc23ec232cdf0ea82022-04-20 11:04:28.506+0000 [id=38] INFO c.a.j.ec2fleet.EC2FleetCloud#info: FleetCloud [ec2-fleet] described instances: [i-093eb1fe7798a4da6, i-0f1a6b1d1ca61cf46, i-033c1bb525f125dcb, i-0eec19a1626895af8, i-05667bfe3f3c94454, i-0a9b67281da8a465a, i-09a49b1aa04e1e4c2, i-09eacb3761c8a02c5, i-0db8a502c2bfcb1e3]2022-04-20 11:04:28.506+0000 [id=38] INFO c.a.j.ec2fleet.EC2FleetCloud#info: FleetCloud [ec2-fleet] jenkins nodes: [i-093eb1fe7798a4da6, i-0f1a6b1d1ca61cf46, i-0cc23ec232cdf0ea8, i-033c1bb525f125dcb, i-0eec19a1626895af8, i-05667bfe3f3c94454, i-0a9b67281da8a465a, i-09a49b1aa04e1e4c2, i-09eacb3761c8a02c5, i-0db8a502c2bfcb1e3]2022-04-20 11:04:28.506+0000 [id=38] INFO c.a.j.ec2fleet.EC2FleetCloud#info: FleetCloud [ec2-fleet] terminated Fleet instance(s): [i-0cc23ec232cdf0ea8]
We see this multiple times during the day. Basically the issue is sporadic. We have tried several things but nothing helped. some of the things we tried are below.
1. Disable "Response Time" Check in computer/configure URL for Jenkins agents.
2. Upgraded fleet plugin
3. Disabled Agent Access Control