• Icon: Bug Bug
    • Resolution: Incomplete
    • Icon: Blocker Blocker
    • core, remoting

      Chronic intermittent slave disconnect issues with many windows slaves. These jobs typically take 10 hours, disconnects occur around 4 to 5 hours into the job. stack trace follows:

      Slave went offline during the build
      ERROR: Connection was broken: java.io.IOException: Connection reset by peer
      at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
      at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
      at sun.nio.ch.IOUtil.read(IOUtil.java:197)
      at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
      at hudson.remoting.SocketChannelStream$1.read(SocketChannelStream.java:35)
      at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
      at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
      at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
      at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
      at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
      at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
      at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
      at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
      at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
      at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          [JENKINS-32808] windows 7 slaves intermittent disconnect

          Paul Buckley added a comment -

          I had the same issue here with 20 slaves on a Jenkins cluster running 107k JUnits all at once, the issue turned out to be the high ping time between the slave and master. After I put them onto a dedicated switch with the same vlan upstream my problem went away. I also noted that the high load on my build machine causes a considerably higher ping time in Jenkins only. Possibly to do with the way the Slave is written and nothing to do with the network.

          Paul Buckley added a comment - I had the same issue here with 20 slaves on a Jenkins cluster running 107k JUnits all at once, the issue turned out to be the high ping time between the slave and master. After I put them onto a dedicated switch with the same vlan upstream my problem went away. I also noted that the high load on my build machine causes a considerably higher ping time in Jenkins only. Possibly to do with the way the Slave is written and nothing to do with the network.

          Oleg Nenashev added a comment -

          It is not possible to diagnose this issue without an agent log from the remote side.

          Over there last year there were many fixes in the Remoting stability area + JNLP4 changed the remoting NIO management significantly. I am going to close this issue as a duplicate of a similar one. Please provide more details there if it still happens on the new version.

          If you use vMotion by a chance, there are known issues with it

          Oleg Nenashev added a comment - It is not possible to diagnose this issue without an agent log from the remote side. Over there last year there were many fixes in the Remoting stability area + JNLP4 changed the remoting NIO management significantly. I am going to close this issue as a duplicate of a similar one. Please provide more details there if it still happens on the new version. If you use vMotion by a chance, there are known issues with it

          Oleg Nenashev added a comment -

          Closing as incomplete since I cannot diagnose the root cause. If the issue still happens on new Remoting versions, please reopen it

          Oleg Nenashev added a comment - Closing as incomplete since I cannot diagnose the root cause. If the issue still happens on new Remoting versions, please reopen it

          Bilel Yahia added a comment - - edited

          oleg_nenashev I'm re-opening this ticket as i have the same problem.

          Logs from server side:

          Connection was broken: java.io.IOException: Connection reset by peer
                          at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
                          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
                          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
                          at sun.nio.ch.IOUtil.read(IOUtil.java:197)
                          at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
                          at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:142)
                          at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:359)
                          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:564)
          Caused: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@1ab8ef91[name=Channel to /]
                          at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:210)
                          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:635)
                          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
                          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                          at java.lang.Thread.run(Thread.java:745)
          

          Logs from slave side:

          Jun 12, 2017 7:13:17 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run
          SEVERE: I/O error in channel channel
          java.net.SocketException: Connection reset
          at java.net.SocketInputStream.read(Unknown Source)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.io.BufferedInputStream.fill(Unknown Source)
          at java.io.BufferedInputStream.read(Unknown Source)
          at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
          at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
          at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
          at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
          at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
          
          Jun 12, 2017 7:13:17 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Jun 12, 2017 7:13:27 AM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onReconnect
          INFO: Restarting agent via jenkins.slaves.restarter.WinswSlaveRestarter@63a236
          Jun 12, 2017 7:13:37 AM hudson.remoting.jnlp.Main createEngine
          
          2017-06-12 07:13:32 - Stopping jenkinsslave-D__jenkins
          2017-06-12 07:13:32 - ProcessKill 1748
          2017-06-12 07:13:32 - Stopping process 1748
          2017-06-12 07:13:32 - Send SIGINT 1748
          2017-06-12 07:13:32 - SIGINT to1748 successful
          2017-06-12 07:13:32 - Finished jenkinsslave-D__jenkins
          2017-06-12 07:13:36 - Starting C:\Program Files (x86)\Java\jre1.8.0_121\bin\java.exe -Xrs -jar "D:\jenkins\slave.jar" -jnlpUrl https://jenkins/computer/jenkinsslave001/slave-agent.jnlp -secret f939a20894cb58d0e15a9269e859b7c8b2e8abf2b1ebda4a25f90f1e20ab30cf
          2017-06-12 07:13:36 - Started 2656

          Bilel Yahia added a comment - - edited oleg_nenashev I'm re-opening this ticket as i have the same problem. Logs from server side: Connection was broken: java.io.IOException: Connection reset by peer                 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)                 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)                 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)                 at sun.nio.ch.IOUtil.read(IOUtil.java:197)                 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)                 at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:142)                 at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:359)                 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:564) Caused: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@1ab8ef91[name=Channel to /]                 at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:210)                 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:635)                 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)                 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)                 at java.util.concurrent.FutureTask.run(FutureTask.java:266)                 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)                 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)                 at java.lang. Thread .run( Thread .java:745) Logs from slave side: Jun 12, 2017 7:13:17 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run SEVERE: I/O error in channel channel java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82) at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72) at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) Jun 12, 2017 7:13:17 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Jun 12, 2017 7:13:27 AM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onReconnect INFO: Restarting agent via jenkins.slaves.restarter.WinswSlaveRestarter@63a236 Jun 12, 2017 7:13:37 AM hudson.remoting.jnlp.Main createEngine 2017-06-12 07:13:32 - Stopping jenkinsslave-D__jenkins 2017-06-12 07:13:32 - ProcessKill 1748 2017-06-12 07:13:32 - Stopping process 1748 2017-06-12 07:13:32 - Send SIGINT 1748 2017-06-12 07:13:32 - SIGINT to1748 successful 2017-06-12 07:13:32 - Finished jenkinsslave-D__jenkins 2017-06-12 07:13:36 - Starting C:\Program Files (x86)\Java\jre1.8.0_121\bin\java.exe -Xrs -jar "D:\jenkins\slave.jar" -jnlpUrl https: //jenkins/computer/jenkinsslave001/slave-agent.jnlp -secret f939a20894cb58d0e15a9269e859b7c8b2e8abf2b1ebda4a25f90f1e20ab30cf 2017-06-12 07:13:36 - Started 2656

          Oleg Nenashev added a comment - - edited

          byahia Hmm...

          Jun 12, 2017 7:13:17 AM hudson.remoting.jnlp.Main$CuiListener status
           INFO: Terminated
           Jun 12, 2017 7:13:27 AM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onReconnect
           INFO: Restarting agent via jenkins.slaves.restarter.WinswSlaveRestarter@63a236
           Jun 12, 2017 7:13:37 AM hudson.remoting.jnlp.Main createEngine

          It means that the agent is being restarted after the Windows service installation. In such case "Connection reset" error on the agent is expectable though the diagnostics could be better.

          • Are running any logic for agent installation as a service (Windows slaves plugin, Groovy scripts, manual actions, etc.)?

           

          Oleg Nenashev added a comment - - edited byahia Hmm... Jun 12, 2017 7:13:17 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Jun 12, 2017 7:13:27 AM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onReconnect INFO: Restarting agent via jenkins.slaves.restarter.WinswSlaveRestarter@63a236 Jun 12, 2017 7:13:37 AM hudson.remoting.jnlp.Main createEngine It means that the agent is being restarted after the Windows service installation. In such case "Connection reset" error on the agent is expectable though the diagnostics could be better. Are running any logic for agent installation as a service (Windows slaves plugin, Groovy scripts, manual actions, etc.)?  

          Oleg Nenashev added a comment -

          byahia ping

          Oleg Nenashev added a comment - byahia ping

          Alex Earl added a comment -

          No response to question

          Alex Earl added a comment - No response to question

            Unassigned Unassigned
            cypress Donald Duncan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: