Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24213

Jenkins slaves repeatable disconnect and connect during startup, related to remoting/nio or swarm plugin

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • core, remoting
    • Jenkins LTS 1.565.1
      OpenJDK Java 1.7.0_65 on Debian wheezy 7.6, amd64
      Swarm plugin v1.16

      I'm not yet sure whether that's related to JENKINS-18781, and JENKINS-22758 seems to be solved already. I'm experiencing strange issues with the connection between Jenkins master and its slaves (connected using the swarm plugin) during startup. All the >25 slaves repeatable disconnect and automatically reconnect for quite some time:

      1. grep -c 'WARNING: Channel reader thread: .* terminated' ~log/jenkins/jenkins.log
        2882

      The data for the slaves at $JENKINS_URL/computer/ is missing for quite some time, but finally they persistent and then they seem to work fine. Was there some timeout decreased recently which might explain that issue?

      Quoting Jenkins master's log:

      Aug 12, 2014 12:43:08 AM jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed
      WARNING: Channel reader thread: docker1 for + docker1 terminated
      java.nio.channels.AsynchronousCloseException
      at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
      at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:412)
      at hudson.remoting.SocketChannelStream$1.read(SocketChannelStream.java:33)
      at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
      at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
      at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
      at java.io.InputStream.read(InputStream.java:101)
      at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81)
      at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
      at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2293)
      at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2586)
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
      at hudson.remoting.Command.readFrom(Command.java:92)
      at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:70)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

      On the client that's in the logs:

      Aug 12, 2014 12:43:06 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run
      SEVERE: I/O error in channel channel
      java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2598)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
      at hudson.remoting.Command.readFrom(Command.java:92)
      at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:71)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          [JENKINS-24213] Jenkins slaves repeatable disconnect and connect during startup, related to remoting/nio or swarm plugin

          Do you have the whole master log? AsynchronousCloseException indicates that another thread in the master closed the connection (and EOFException on the slave is consistent with that). And hopefully that another thread (perhaps ping thread, or perhaps NIO selector thread) should have left the record of that around the same time.

          If you scroll up and down around that AsynchronousCloseException, maybe you'll find it.

          Kohsuke Kawaguchi added a comment - Do you have the whole master log? AsynchronousCloseException indicates that another thread in the master closed the connection (and EOFException on the slave is consistent with that). And hopefully that another thread (perhaps ping thread, or perhaps NIO selector thread) should have left the record of that around the same time. If you scroll up and down around that AsynchronousCloseException , maybe you'll find it.

          Hi Kohsuke, I've uploaded the whole Jenkins log for you (be warned, it's 19MB):

          http://michael-prokop.at/sipwise/jenkins.log

          (I'll keep the file around until you've had a chance to investigate, then I'll remove it again)

          Michael Prokop added a comment - Hi Kohsuke, I've uploaded the whole Jenkins log for you (be warned, it's 19MB): http://michael-prokop.at/sipwise/jenkins.log (I'll keep the file around until you've had a chance to investigate, then I'll remove it again)

          I'm still seeing this with Jenkins 1.565.2, is there anything I might help in getting further to resolving this issue?

          Michael Prokop added a comment - I'm still seeing this with Jenkins 1.565.2, is there anything I might help in getting further to resolving this issue?

          Andy Chen added a comment -

          Please, please fix this bug. I keep seeing this issue every 3 days for my 30+ Slaves after upgraded to build 1.587. It happened on previous build but not that frequent.

          Andy Chen added a comment - Please, please fix this bug. I keep seeing this issue every 3 days for my 30+ Slaves after upgraded to build 1.587. It happened on previous build but not that frequent.

            Unassigned Unassigned
            mika Michael Prokop
            Votes:
            5 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: