Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68350

Intermittent/Sporadic I/O errors to Jenkins Agents From Controller Nodes

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • ec2-fleet-plugin
    • None
    • Jenkins Version: Jenkins 2.277.1
      EC2 Fleet Plugin: 2.5.1
      ssh-slaves-plugin: 1.31.5

    Description

      We currently use ec2-fleet plugin for spinning up jenkins agents in EC2. The fleet plugin basically uses an autoscaling group and modifies the desired count/destroys instances as and when required, depending upon the jobs in queue. 

      Intermittently we see jenkins agents getting disconnected and terminated in between execution of jobs. The logs we found when this happens in jenkins.log file is below.

      2022-04-20 11:04:28.423+0000 [id=995292]        INFO    h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel i-0cc23ec232cdf0ea8
      3java.io.EOFException
      4        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2832)
      5        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3307)
      6        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:934)
      7        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:396)
      8        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
      9        at hudson.remoting.Command.readFrom(Command.java:142)
      10        at hudson.remoting.Command.readFrom(Command.java:128)
      11        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
      12        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
      13Caused: java.io.IOException: Unexpected termination of the channel
      14        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
      152022-04-20 11:04:28.426+0000 [id=995292]        INFO    c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: DISCONNECTED: FleetCloud i-0cc23ec232cdf0ea8
      162022-04-20 11:04:28.426+0000 [id=995292]        INFO    c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: Start retriggering executors for FleetCloud i-0cc23ec232cdf0ea8
      172022-04-20 11:04:28.427+0000 [id=995292]        INFO    c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: RETRIGGERING: org.jenkinsci.plugins.workflow.job.WorkflowJob@602b0ba2[] - WITH ACTIONS: []
      182022-04-20 11:04:28.429+0000 [id=995292]        INFO    c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: RETRIGGERING: org.jenkinsci.plugins.workflow.job.WorkflowJob@40bf5615[] - WITH ACTIONS: []
      192022-04-20 11:04:28.430+0000 [id=995292]        INFO    c.a.j.e.EC2FleetAutoResubmitComputerLauncher#afterDisconnect: Finished retriggering executors for FleetCloud i-0cc23ec232cdf0ea82022-04-20 11:04:28.506+0000 [id=38]    INFO    c.a.j.ec2fleet.EC2FleetCloud#info: FleetCloud [ec2-fleet] described instances: [i-093eb1fe7798a4da6, i-0f1a6b1d1ca61cf46, i-033c1bb525f125dcb, i-0eec19a1626895af8, i-05667bfe3f3c94454, i-0a9b67281da8a465a, i-09a49b1aa04e1e4c2, i-09eacb3761c8a02c5, i-0db8a502c2bfcb1e3]2022-04-20 11:04:28.506+0000 [id=38]    INFO    c.a.j.ec2fleet.EC2FleetCloud#info: FleetCloud [ec2-fleet] jenkins nodes: [i-093eb1fe7798a4da6, i-0f1a6b1d1ca61cf46, i-0cc23ec232cdf0ea8, i-033c1bb525f125dcb, i-0eec19a1626895af8, i-05667bfe3f3c94454, i-0a9b67281da8a465a, i-09a49b1aa04e1e4c2, i-09eacb3761c8a02c5, i-0db8a502c2bfcb1e3]2022-04-20 11:04:28.506+0000 [id=38]    INFO    c.a.j.ec2fleet.EC2FleetCloud#info: FleetCloud [ec2-fleet] terminated Fleet instance(s): [i-0cc23ec232cdf0ea8]
       

      We see this multiple times during the day. Basically the issue is sporadic. We have tried several things but nothing helped. some of the things we tried are below. 
      1. Disable "Response Time" Check in computer/configure URL for Jenkins agents. 
      2. Upgraded fleet plugin
      3. Disabled Agent Access Control

      Attachments

        Activity

          sarath_pillai Sarath Pillai created issue -
          housni Housni Yakoob added a comment -

          I have the same problem. A resolution to this issue would be most welcome!

          schmutze , I'm curious to know if you've seen this same problem before.

          housni Housni Yakoob added a comment - I have the same problem. A resolution to this issue would be most welcome! schmutze , I'm curious to know if you've seen this same problem before.
          ifernandezcalvo Ivan Fernandez Calvo made changes -
          Field Original Value New Value
          Component/s ssh-slaves-plugin [ 15578 ]

          People

            schmutze Chad Schmutzer
            sarath_pillai Sarath Pillai
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: