Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68350

Intermittent/Sporadic I/O errors to Jenkins Agents From Controller Nodes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • ec2-fleet-plugin
    • None
    • Jenkins Version: Jenkins 2.277.1
      EC2 Fleet Plugin: 2.5.1
      ssh-slaves-plugin: 1.31.5

      We currently use ec2-fleet plugin for spinning up jenkins agents in EC2. The fleet plugin basically uses an autoscaling group and modifies the desired count/destroys instances as and when required, depending upon the jobs in queue. 

      Intermittently we see jenkins agents getting disconnected and terminated in between execution of jobs. The logs we found when this happens in jenkins.log file is below.

      2022-04-20 11:04:28.423+0000 [id=995292]        INFO    h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel i-0cc23ec232cdf0ea8
      3java.io.EOFException
      4        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2832)
      5        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3307)
      6        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:934)
      7        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:396)
      8        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
      9        at hudson.remoting.Command.readFrom(Command.java:142)
      10        at hudson.remoting.Command.readFrom(Command.java:128)
      11        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
      12        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
      13Caused: java.io.IOException: Unexpected termination of the channel
      14        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
      152022-04-20 11:04:28.426+0000 [id=995292]        INFO    c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: DISCONNECTED: FleetCloud i-0cc23ec232cdf0ea8
      162022-04-20 11:04:28.426+0000 [id=995292]        INFO    c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: Start retriggering executors for FleetCloud i-0cc23ec232cdf0ea8
      172022-04-20 11:04:28.427+0000 [id=995292]        INFO    c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: RETRIGGERING: org.jenkinsci.plugins.workflow.job.WorkflowJob@602b0ba2[] - WITH ACTIONS: []
      182022-04-20 11:04:28.429+0000 [id=995292]        INFO    c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: RETRIGGERING: org.jenkinsci.plugins.workflow.job.WorkflowJob@40bf5615[] - WITH ACTIONS: []
      192022-04-20 11:04:28.430+0000 [id=995292]        INFO    c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: Finished retriggering executors for FleetCloud i-0cc23ec232cdf0ea82022-04-20 11:04:28.506+0000 [id=38]    INFO    c.a.j.ec2fleet.EC2FleetCloud#info: FleetCloud [ec2-fleet] described instances: [i-093eb1fe7798a4da6, i-0f1a6b1d1ca61cf46, i-033c1bb525f125dcb, i-0eec19a1626895af8, i-05667bfe3f3c94454, i-0a9b67281da8a465a, i-09a49b1aa04e1e4c2, i-09eacb3761c8a02c5, i-0db8a502c2bfcb1e3]2022-04-20 11:04:28.506+0000 [id=38]    INFO    c.a.j.ec2fleet.EC2FleetCloud#info: FleetCloud [ec2-fleet] jenkins nodes: [i-093eb1fe7798a4da6, i-0f1a6b1d1ca61cf46, i-0cc23ec232cdf0ea8, i-033c1bb525f125dcb, i-0eec19a1626895af8, i-05667bfe3f3c94454, i-0a9b67281da8a465a, i-09a49b1aa04e1e4c2, i-09eacb3761c8a02c5, i-0db8a502c2bfcb1e3]2022-04-20 11:04:28.506+0000 [id=38]    INFO    c.a.j.ec2fleet.EC2FleetCloud#info: FleetCloud [ec2-fleet] terminated Fleet instance(s): [i-0cc23ec232cdf0ea8]
       

      We see this multiple times during the day. Basically the issue is sporadic. We have tried several things but nothing helped. some of the things we tried are below. 
      1. Disable "Response Time" Check in computer/configure URL for Jenkins agents. 
      2. Upgraded fleet plugin
      3. Disabled Agent Access Control

            schmutze Chad Schmutzer
            sarath_pillai Sarath Pillai
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: