Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-23852

slave builds repeatedly broken by "hudson.remoting.ChannelClosedException: channel is already closed"

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • remoting
    • Master: Jenkins 1.572 on OsX 10.7.5, Java: 1.6.0_65-b14-462-11M4609
      Slave: Slave version 2.43 (as reported by the versioncolumn plugin) on Windows 2008 R2 Standard (64bit), Java 7 update 55 (32 bit)

      I have a similar issue to JENKINS-14332, although that says there is no problem on a windows slave (because it eventually reconnects?), so I'm logging this seperatly.

      The message "hudson.remoting.ChannelClosedException: channel is already closed" breaks my builds on a windows slave. It doesn't happen on every build, but a nightly run of 7 builds seems to have at least one each time.

      Because these seem to be connection issues, so might be imeout related, here's some setup info, even though I doubt it will be relevant: all those builds are matrix builds. My setup consist of an OSX master YYY that does the build for the 'mac' axis, and a slave XXX that does the build of the 'windows' axis. The windows machine is significantly slower than the mac, so some the builds spend quite some time in a state of '1 axis done, other axis queued'. I don't really see issues here, it seems to work fine for most of the builds most of the time: e.g. these 7 builds are all queued at 20:00, and the example below fails shortly after 21:00, but other builds finish fine later on (so after a longer wait). The failing build on XXX started at 20:50, so it fails after 17 minutes. Other, later and earlier builds run for an hour without problems.

      In the middle of a build, the build log (as seen on the master) suddenly says:

      ==========================================
      21:07:10 FATAL: channel is already closed
      21:07:10 hudson.remoting.ChannelClosedException: channel is already closed
      21:07:10 at hudson.remoting.Channel.send(Channel.java:541)
      21:07:10 at hudson.remoting.Request.call(Request.java:129)
      21:07:10 at hudson.remoting.Channel.call(Channel.java:739)
      21:07:10 at hudson.EnvVars.getRemote(EnvVars.java:404)
      21:07:10 at hudson.model.Computer.getEnvironment(Computer.java:912)
      21:07:10 at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:29)
      21:07:10 at hudson.model.Run.getEnvironment(Run.java:2250)
      21:07:10 at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:907)
      21:07:10 at org.jenkinsci.plugins.tokenmacro.impl.EnvironmentVariableMacro.evaluate(EnvironmentVariableMacro.java:23)
      21:07:10 at org.jenkinsci.plugins.tokenmacro.DataBoundTokenMacro.evaluate(DataBoundTokenMacro.java:189)
      21:07:10 at org.jenkinsci.plugins.tokenmacro.TokenMacro.expand(TokenMacro.java:182)
      21:07:10 at org.jenkinsci.plugins.tokenmacro.TokenMacro.expand(TokenMacro.java:154)
      21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter.setDisplayName(BuildNameSetter.java:50)
      21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter.access$000(BuildNameSetter.java:26)
      21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter$1.tearDown(BuildNameSetter.java:42)
      21:07:10 at hudson.model.Build$BuildExecution.doRun(Build.java:171)
      21:07:10 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:535)
      21:07:10 at hudson.model.Run.execute(Run.java:1732)
      21:07:10 at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
      21:07:10 at hudson.model.ResourceController.execute(ResourceController.java:88)
      21:07:10 at hudson.model.Executor.run(Executor.java:234)
      21:07:10 Caused by: java.io.IOException: Failed to abort
      21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:195)
      21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:581)
      21:07:10 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      21:07:10 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
      21:07:10 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      21:07:10 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      21:07:10 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
      21:07:10 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
      21:07:10 at java.lang.Thread.run(Thread.java:695)
      21:07:10 Caused by: java.io.IOException: Operation timed out
      21:07:10 at sun.nio.ch.FileDispatcher.read0(Native Method)
      21:07:10 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
      21:07:10 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
      21:07:10 at sun.nio.ch.IOUtil.read(IOUtil.java:171)
      21:07:10 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
      21:07:10 at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
      21:07:10 at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
      21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
      21:07:10 ... 7 more
      ==========================================

      This combines with this in the slave error log:

      ==========================================
      Jul 16, 2014 9:01:57 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
      SEVERE: I/O error in channel channel
      java.net.SocketException: Software caused connection abort: recv failed
      at java.net.SocketInputStream.socketRead0(Native Method)
      at java.net.SocketInputStream.read(Unknown Source)
      at java.net.SocketInputStream.read(Unknown Source)
      at java.io.BufferedInputStream.fill(Unknown Source)
      at java.io.BufferedInputStream.read(Unknown Source)
      at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
      at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
      at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
      at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
      at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

      Jul 16, 2014 9:01:57 PM hudson.util.ProcessTree getKillers
      WARNING: Failed to obtain killers
      hudson.remoting.ChannelClosedException: channel is already closed
      at hudson.remoting.Channel.send(Channel.java:541)
      at hudson.remoting.Request.call(Request.java:129)
      at hudson.remoting.Channel.call(Channel.java:739)
      at hudson.util.ProcessTree.getKillers(ProcessTree.java:162)
      at hudson.util.ProcessTree$OSProcess.killByKiller(ProcessTree.java:221)
      at hudson.util.ProcessTree$Windows$1.killRecursively(ProcessTree.java:413)
      at hudson.util.ProcessTree.killAll(ProcessTree.java:149)
      at hudson.Proc$LocalProc.destroy(Proc.java:379)
      at hudson.Proc$LocalProc.join(Proc.java:352)
      at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1116)
      at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:309)
      at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:290)
      at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:249)
      at hudson.remoting.UserRequest.perform(UserRequest.java:118)
      at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      at hudson.remoting.Request$2.run(Request.java:328)
      at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at hudson.remoting.Engine$1$1.run(Engine.java:63)
      at java.lang.Thread.run(Unknown Source)
      Caused by: java.net.SocketException: Software caused connection abort: recv failed
      at java.net.SocketInputStream.socketRead0(Native Method)
      at java.net.SocketInputStream.read(Unknown Source)
      at java.net.SocketInputStream.read(Unknown Source)
      at java.io.BufferedInputStream.fill(Unknown Source)
      at java.io.BufferedInputStream.read(Unknown Source)
      at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
      at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
      at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
      at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
      at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

      Jul 16, 2014 9:01:57 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Jul 16, 2014 9:01:57 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect
      INFO: Restarting slave via jenkins.slaves.restarter.WinswSlaveRestarter@18f3cc5
      Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: XXX
      Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among http://YYY:8080/
      Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to YYY:62768
      Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
      java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
      at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
      at hudson.remoting.Engine.run(Engine.java:276)

      [repeated reconnection attempts, +- 2 times/sec]

      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: XXX
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among http://YYY:8080/
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to YYY:62768
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
      java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
      at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
      at hudson.remoting.Engine.run(Engine.java:276)

      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: XXX
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among http://YYY:8080/
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to YYY:62768
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Jul 16, 2014 9:07:07 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      ==========================================

      and the wrapper log says:

      ==========================================
      2014-07-16 21:01:59 - Stopping jenkinsslave-e__JenkinsHome
      2014-07-16 21:01:59 - ProcessKill 3664
      2014-07-16 21:01:59 - Send SIGINT 3664
      2014-07-16 21:01:59 - SIGINT to3664 successful
      2014-07-16 21:01:59 - Finished jenkinsslave-e__JenkinsHome
      2014-07-16 21:02:01 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
      2014-07-16 21:02:01 - Started 3264
      2014-07-16 21:02:06 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
      2014-07-16 21:02:06 - Started 1364

      [many more of these reconnection attempts]

      2014-07-16 21:07:04 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
      2014-07-16 21:07:04 - Started 3448
      2014-07-16 21:07:05 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
      2014-07-16 21:07:05 - Started 1384
      2014-07-16 21:07:06 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
      2014-07-16 21:07:06 - Started 3140
      [nothing after that, so this is the reconnection attempt that worked, I think.]
      ==========================================

      Finally in the server log:

      ==========================================
      TCP slave agent connection handler #1153 with /IPofXXX:52399 is aborted: Unrecognized name: XXX

      Jul 16, 2014 9:07:10 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run

      Accepted connection #1154 from /IPofXXX:52406

      Jul 16, 2014 9:07:10 PM WARNING jenkins.slaves.JnlpSlaveHandshake error

      TCP slave agent connection handler #1154 with /IPofXXX:52406 is aborted: XXX is already connected to this master. Rejecting this connection.

      Jul 16, 2014 9:07:10 PM WARNING jenkins.slaves.JnlpSlaveHandshake error

      TCP slave agent connection handler #1154 with /IPofXXX:52406 is aborted: Unrecognized name: XXX

      Jul 16, 2014 9:07:10 PM WARNING org.jenkinsci.remoting.nio.NioChannelHub run

      Communication problem
      java.io.IOException: Operation timed out
      at sun.nio.ch.FileDispatcher.read0(Native Method)
      at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
      at sun.nio.ch.IOUtil.read(IOUtil.java:171)
      at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
      at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
      at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
      at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
      at java.lang.Thread.run(Thread.java:695)

      Jul 16, 2014 9:07:11 PM WARNING jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed

      NioChannelHub keys=1 gen=1952690: Computer.threadPoolForRemoting 1 for + XXX terminated
      java.io.IOException: Failed to abort
      at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:195)
      at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:581)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
      at java.lang.Thread.run(Thread.java:695)
      Caused by: java.io.IOException: Operation timed out
      at sun.nio.ch.FileDispatcher.read0(Native Method)
      at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
      at sun.nio.ch.IOUtil.read(IOUtil.java:171)
      at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
      at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
      at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
      at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
      ... 7 more
      ==========================================

      Note that the clocks on both machines are within a few seconds of each other, so it seems that the order of events is:

      • slave detects a failed recv, which closes the channel
      • next communication attempt detects the closed channel
      • slave tries to reconnect, +- 2 times/sec
      • server refuses this because it thinks it is still connected
      • after +-5 minutes, the server pings the slave (some sort of keepalive?), which finds out the connection is already dead
      • at that point, the server accepts the next connection attempt from the slave

      If the cause of the first failed recv can't be found or prevented, it seems the communication between client and server needs some sort of 'reconnection' mechanism (so the client can reconnect right away, instead of having to abort the build and wait until the server times out the connection). Then, the client could reconnect and retry what it was doing, instead of aborting the build on a failed connection?

          [JENKINS-23852] slave builds repeatedly broken by "hudson.remoting.ChannelClosedException: channel is already closed"

          Arnt Witteveen created issue -
          Arnt Witteveen made changes -
          Description Original: I have a similar issue to JENKINS-14332, although that says there is no problem on a windows slave (because it eventually reconnects?), so I'm logging this seperatly.

          The message "hudson.remoting.ChannelClosedException: channel is already closed" breaks my builds on a windows slave. It doesn't happen on every build, but a nightly run of 7 builds seems to have at least one each time.

          Because these seem to be connection issues, so might be imeout related, here's some setup info, even though I doubt it will be relevant: all those builds are matrix builds. My setup consist of an OSX master YYY that does the build for the 'mac' axis, and a slave XXX that does the build of the 'windows' axis. The windows machine is significantly slower than the mac, so some the builds spend quite some time in a state of '1 axis done, other axis queued'. I don't really see issues here, it seems to work fine for most of the builds most of the time: e.g. these 7 builds are all queued at 20:00, and the example below fails shortly after 21:00, but other builds finish fine later on (so after a longer wait). The failing build on XXX started at 20:50, so it fails after 17 minutes. Other, later and earlier builds run for an hour without problems.


          In the middle of a build, the build log suddenly says:

          ==========================================
          21:07:10 FATAL: channel is already closed
          21:07:10 hudson.remoting.ChannelClosedException: channel is already closed
          21:07:10 at hudson.remoting.Channel.send(Channel.java:541)
          21:07:10 at hudson.remoting.Request.call(Request.java:129)
          21:07:10 at hudson.remoting.Channel.call(Channel.java:739)
          21:07:10 at hudson.EnvVars.getRemote(EnvVars.java:404)
          21:07:10 at hudson.model.Computer.getEnvironment(Computer.java:912)
          21:07:10 at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:29)
          21:07:10 at hudson.model.Run.getEnvironment(Run.java:2250)
          21:07:10 at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:907)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.impl.EnvironmentVariableMacro.evaluate(EnvironmentVariableMacro.java:23)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.DataBoundTokenMacro.evaluate(DataBoundTokenMacro.java:189)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.TokenMacro.expand(TokenMacro.java:182)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.TokenMacro.expand(TokenMacro.java:154)
          21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter.setDisplayName(BuildNameSetter.java:50)
          21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter.access$000(BuildNameSetter.java:26)
          21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter$1.tearDown(BuildNameSetter.java:42)
          21:07:10 at hudson.model.Build$BuildExecution.doRun(Build.java:171)
          21:07:10 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:535)
          21:07:10 at hudson.model.Run.execute(Run.java:1732)
          21:07:10 at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
          21:07:10 at hudson.model.ResourceController.execute(ResourceController.java:88)
          21:07:10 at hudson.model.Executor.run(Executor.java:234)
          21:07:10 Caused by: java.io.IOException: Failed to abort
          21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:195)
          21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:581)
          21:07:10 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          21:07:10 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
          21:07:10 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          21:07:10 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          21:07:10 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          21:07:10 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          21:07:10 at java.lang.Thread.run(Thread.java:695)
          21:07:10 Caused by: java.io.IOException: Operation timed out
          21:07:10 at sun.nio.ch.FileDispatcher.read0(Native Method)
          21:07:10 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          21:07:10 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
          21:07:10 at sun.nio.ch.IOUtil.read(IOUtil.java:171)
          21:07:10 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
          21:07:10 at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
          21:07:10 at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
          21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
          21:07:10 ... 7 more
          ==========================================

          This combines with this in the slave error log:

          ==========================================
          Jul 16, 2014 9:01:57 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
          SEVERE: I/O error in channel channel
          java.net.SocketException: Software caused connection abort: recv failed
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.io.BufferedInputStream.fill(Unknown Source)
          at java.io.BufferedInputStream.read(Unknown Source)
          at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
          at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
          at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
          at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
          at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          Jul 16, 2014 9:01:57 PM hudson.util.ProcessTree getKillers
          WARNING: Failed to obtain killers
          hudson.remoting.ChannelClosedException: channel is already closed
          at hudson.remoting.Channel.send(Channel.java:541)
          at hudson.remoting.Request.call(Request.java:129)
          at hudson.remoting.Channel.call(Channel.java:739)
          at hudson.util.ProcessTree.getKillers(ProcessTree.java:162)
          at hudson.util.ProcessTree$OSProcess.killByKiller(ProcessTree.java:221)
          at hudson.util.ProcessTree$Windows$1.killRecursively(ProcessTree.java:413)
          at hudson.util.ProcessTree.killAll(ProcessTree.java:149)
          at hudson.Proc$LocalProc.destroy(Proc.java:379)
          at hudson.Proc$LocalProc.join(Proc.java:352)
          at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1116)
          at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
          at java.lang.reflect.Method.invoke(Unknown Source)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:309)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:290)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:249)
          at hudson.remoting.UserRequest.perform(UserRequest.java:118)
          at hudson.remoting.UserRequest.perform(UserRequest.java:48)
          at hudson.remoting.Request$2.run(Request.java:328)
          at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
          at java.util.concurrent.FutureTask.run(Unknown Source)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
          at hudson.remoting.Engine$1$1.run(Engine.java:63)
          at java.lang.Thread.run(Unknown Source)
          Caused by: java.net.SocketException: Software caused connection abort: recv failed
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.io.BufferedInputStream.fill(Unknown Source)
          at java.io.BufferedInputStream.read(Unknown Source)
          at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
          at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
          at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
          at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
          at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          Jul 16, 2014 9:01:57 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Jul 16, 2014 9:01:57 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect
          INFO: Restarting slave via jenkins.slaves.restarter.WinswSlaveRestarter@18f3cc5
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: XXX
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://YYY:8080/]
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to YYY:62768
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
          at hudson.remoting.Engine.run(Engine.java:276)

          [repeated reconnection attempts, +- 2 times/sec]

          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: XXX
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://YYY:8080/]
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to YYY:62768
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
          at hudson.remoting.Engine.run(Engine.java:276)

          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: XXX
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://YYY:8080/]
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to YYY:62768
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 16, 2014 9:07:07 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          ==========================================

          and the wrapper log says:

          ==========================================
          2014-07-16 21:01:59 - Stopping jenkinsslave-e__JenkinsHome
          2014-07-16 21:01:59 - ProcessKill 3664
          2014-07-16 21:01:59 - Send SIGINT 3664
          2014-07-16 21:01:59 - SIGINT to3664 successful
          2014-07-16 21:01:59 - Finished jenkinsslave-e__JenkinsHome
          2014-07-16 21:02:01 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:02:01 - Started 3264
          2014-07-16 21:02:06 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:02:06 - Started 1364

          [many more of these reconnection attempts]

          2014-07-16 21:07:04 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:07:04 - Started 3448
          2014-07-16 21:07:05 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:07:05 - Started 1384
          2014-07-16 21:07:06 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:07:06 - Started 3140
          [nothing after that, so this is the reconnection attempt that worked, I think.]
          ==========================================


          Finally in the server log:

          ==========================================
          TCP slave agent connection handler #1153 with /IPofXXX:52399 is aborted: Unrecognized name: XXX

          Jul 16, 2014 9:07:10 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run

          Accepted connection #1154 from /IPofXXX:52406

          Jul 16, 2014 9:07:10 PM WARNING jenkins.slaves.JnlpSlaveHandshake error

          TCP slave agent connection handler #1154 with /IPofXXX:52406 is aborted: XXX is already connected to this master. Rejecting this connection.

          Jul 16, 2014 9:07:10 PM WARNING jenkins.slaves.JnlpSlaveHandshake error

          TCP slave agent connection handler #1154 with /IPofXXX:52406 is aborted: Unrecognized name: XXX

          Jul 16, 2014 9:07:10 PM WARNING org.jenkinsci.remoting.nio.NioChannelHub run

          Communication problem
          java.io.IOException: Operation timed out
          at sun.nio.ch.FileDispatcher.read0(Native Method)
          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
          at sun.nio.ch.IOUtil.read(IOUtil.java:171)
          at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
          at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
          at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          at java.lang.Thread.run(Thread.java:695)

          Jul 16, 2014 9:07:11 PM WARNING jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed

          NioChannelHub keys=1 gen=1952690: Computer.threadPoolForRemoting [#1] for + XXX terminated
          java.io.IOException: Failed to abort
          at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:195)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:581)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          at java.lang.Thread.run(Thread.java:695)
          Caused by: java.io.IOException: Operation timed out
          at sun.nio.ch.FileDispatcher.read0(Native Method)
          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
          at sun.nio.ch.IOUtil.read(IOUtil.java:171)
          at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
          at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
          at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
          ... 7 more
          ==========================================

          Note that the clocks on both machines are within a few seconds of each other, so it seems that the order of events is:
          - slave detects a failed recv, which closes the channel
          - next communication attempt detects the closed channel
          - slave tries to reconnect, +- 2 times/sec
          - server refuses this because it thinks it is still connected
          - after +-5 minutes, the server pings the slave (some sort of keepalive?), which finds out the connection is already dead
          - at that point, the server accepts the next connection attempt from the slave

          If the cause of the first failed recv can't be found or prevented, it seems the communication between client and server needs some sort of 'reconnection' mechanism (so the client can reconnect right away, instead of having to abort the build and wait until the server times out the connection). Then, the client could reconnect and retry what it was doing, instead of aborting the build on a failed connection?
          New: I have a similar issue to JENKINS-14332, although that says there is no problem on a windows slave (because it eventually reconnects?), so I'm logging this seperatly.

          The message "hudson.remoting.ChannelClosedException: channel is already closed" breaks my builds on a windows slave. It doesn't happen on every build, but a nightly run of 7 builds seems to have at least one each time.

          Because these seem to be connection issues, so might be imeout related, here's some setup info, even though I doubt it will be relevant: all those builds are matrix builds. My setup consist of an OSX master YYY that does the build for the 'mac' axis, and a slave XXX that does the build of the 'windows' axis. The windows machine is significantly slower than the mac, so some the builds spend quite some time in a state of '1 axis done, other axis queued'. I don't really see issues here, it seems to work fine for most of the builds most of the time: e.g. these 7 builds are all queued at 20:00, and the example below fails shortly after 21:00, but other builds finish fine later on (so after a longer wait). The failing build on XXX started at 20:50, so it fails after 17 minutes. Other, later and earlier builds run for an hour without problems.


          In the middle of a build, the build log (as seen on the master) suddenly says:

          ==========================================
          21:07:10 FATAL: channel is already closed
          21:07:10 hudson.remoting.ChannelClosedException: channel is already closed
          21:07:10 at hudson.remoting.Channel.send(Channel.java:541)
          21:07:10 at hudson.remoting.Request.call(Request.java:129)
          21:07:10 at hudson.remoting.Channel.call(Channel.java:739)
          21:07:10 at hudson.EnvVars.getRemote(EnvVars.java:404)
          21:07:10 at hudson.model.Computer.getEnvironment(Computer.java:912)
          21:07:10 at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:29)
          21:07:10 at hudson.model.Run.getEnvironment(Run.java:2250)
          21:07:10 at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:907)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.impl.EnvironmentVariableMacro.evaluate(EnvironmentVariableMacro.java:23)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.DataBoundTokenMacro.evaluate(DataBoundTokenMacro.java:189)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.TokenMacro.expand(TokenMacro.java:182)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.TokenMacro.expand(TokenMacro.java:154)
          21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter.setDisplayName(BuildNameSetter.java:50)
          21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter.access$000(BuildNameSetter.java:26)
          21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter$1.tearDown(BuildNameSetter.java:42)
          21:07:10 at hudson.model.Build$BuildExecution.doRun(Build.java:171)
          21:07:10 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:535)
          21:07:10 at hudson.model.Run.execute(Run.java:1732)
          21:07:10 at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
          21:07:10 at hudson.model.ResourceController.execute(ResourceController.java:88)
          21:07:10 at hudson.model.Executor.run(Executor.java:234)
          21:07:10 Caused by: java.io.IOException: Failed to abort
          21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:195)
          21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:581)
          21:07:10 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          21:07:10 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
          21:07:10 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          21:07:10 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          21:07:10 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          21:07:10 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          21:07:10 at java.lang.Thread.run(Thread.java:695)
          21:07:10 Caused by: java.io.IOException: Operation timed out
          21:07:10 at sun.nio.ch.FileDispatcher.read0(Native Method)
          21:07:10 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          21:07:10 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
          21:07:10 at sun.nio.ch.IOUtil.read(IOUtil.java:171)
          21:07:10 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
          21:07:10 at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
          21:07:10 at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
          21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
          21:07:10 ... 7 more
          ==========================================

          This combines with this in the slave error log:

          ==========================================
          Jul 16, 2014 9:01:57 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
          SEVERE: I/O error in channel channel
          java.net.SocketException: Software caused connection abort: recv failed
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.io.BufferedInputStream.fill(Unknown Source)
          at java.io.BufferedInputStream.read(Unknown Source)
          at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
          at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
          at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
          at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
          at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          Jul 16, 2014 9:01:57 PM hudson.util.ProcessTree getKillers
          WARNING: Failed to obtain killers
          hudson.remoting.ChannelClosedException: channel is already closed
          at hudson.remoting.Channel.send(Channel.java:541)
          at hudson.remoting.Request.call(Request.java:129)
          at hudson.remoting.Channel.call(Channel.java:739)
          at hudson.util.ProcessTree.getKillers(ProcessTree.java:162)
          at hudson.util.ProcessTree$OSProcess.killByKiller(ProcessTree.java:221)
          at hudson.util.ProcessTree$Windows$1.killRecursively(ProcessTree.java:413)
          at hudson.util.ProcessTree.killAll(ProcessTree.java:149)
          at hudson.Proc$LocalProc.destroy(Proc.java:379)
          at hudson.Proc$LocalProc.join(Proc.java:352)
          at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1116)
          at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
          at java.lang.reflect.Method.invoke(Unknown Source)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:309)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:290)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:249)
          at hudson.remoting.UserRequest.perform(UserRequest.java:118)
          at hudson.remoting.UserRequest.perform(UserRequest.java:48)
          at hudson.remoting.Request$2.run(Request.java:328)
          at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
          at java.util.concurrent.FutureTask.run(Unknown Source)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
          at hudson.remoting.Engine$1$1.run(Engine.java:63)
          at java.lang.Thread.run(Unknown Source)
          Caused by: java.net.SocketException: Software caused connection abort: recv failed
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.io.BufferedInputStream.fill(Unknown Source)
          at java.io.BufferedInputStream.read(Unknown Source)
          at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
          at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
          at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
          at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
          at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          Jul 16, 2014 9:01:57 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Jul 16, 2014 9:01:57 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect
          INFO: Restarting slave via jenkins.slaves.restarter.WinswSlaveRestarter@18f3cc5
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: XXX
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://YYY:8080/]
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to YYY:62768
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
          at hudson.remoting.Engine.run(Engine.java:276)

          [repeated reconnection attempts, +- 2 times/sec]

          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: XXX
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://YYY:8080/]
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to YYY:62768
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
          at hudson.remoting.Engine.run(Engine.java:276)

          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: XXX
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://YYY:8080/]
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to YYY:62768
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 16, 2014 9:07:07 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          ==========================================

          and the wrapper log says:

          ==========================================
          2014-07-16 21:01:59 - Stopping jenkinsslave-e__JenkinsHome
          2014-07-16 21:01:59 - ProcessKill 3664
          2014-07-16 21:01:59 - Send SIGINT 3664
          2014-07-16 21:01:59 - SIGINT to3664 successful
          2014-07-16 21:01:59 - Finished jenkinsslave-e__JenkinsHome
          2014-07-16 21:02:01 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:02:01 - Started 3264
          2014-07-16 21:02:06 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:02:06 - Started 1364

          [many more of these reconnection attempts]

          2014-07-16 21:07:04 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:07:04 - Started 3448
          2014-07-16 21:07:05 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:07:05 - Started 1384
          2014-07-16 21:07:06 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:07:06 - Started 3140
          [nothing after that, so this is the reconnection attempt that worked, I think.]
          ==========================================


          Finally in the server log:

          ==========================================
          TCP slave agent connection handler #1153 with /IPofXXX:52399 is aborted: Unrecognized name: XXX

          Jul 16, 2014 9:07:10 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run

          Accepted connection #1154 from /IPofXXX:52406

          Jul 16, 2014 9:07:10 PM WARNING jenkins.slaves.JnlpSlaveHandshake error

          TCP slave agent connection handler #1154 with /IPofXXX:52406 is aborted: XXX is already connected to this master. Rejecting this connection.

          Jul 16, 2014 9:07:10 PM WARNING jenkins.slaves.JnlpSlaveHandshake error

          TCP slave agent connection handler #1154 with /IPofXXX:52406 is aborted: Unrecognized name: XXX

          Jul 16, 2014 9:07:10 PM WARNING org.jenkinsci.remoting.nio.NioChannelHub run

          Communication problem
          java.io.IOException: Operation timed out
          at sun.nio.ch.FileDispatcher.read0(Native Method)
          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
          at sun.nio.ch.IOUtil.read(IOUtil.java:171)
          at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
          at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
          at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          at java.lang.Thread.run(Thread.java:695)

          Jul 16, 2014 9:07:11 PM WARNING jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed

          NioChannelHub keys=1 gen=1952690: Computer.threadPoolForRemoting [#1] for + XXX terminated
          java.io.IOException: Failed to abort
          at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:195)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:581)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          at java.lang.Thread.run(Thread.java:695)
          Caused by: java.io.IOException: Operation timed out
          at sun.nio.ch.FileDispatcher.read0(Native Method)
          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
          at sun.nio.ch.IOUtil.read(IOUtil.java:171)
          at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
          at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
          at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
          ... 7 more
          ==========================================

          Note that the clocks on both machines are within a few seconds of each other, so it seems that the order of events is:
          - slave detects a failed recv, which closes the channel
          - next communication attempt detects the closed channel
          - slave tries to reconnect, +- 2 times/sec
          - server refuses this because it thinks it is still connected
          - after +-5 minutes, the server pings the slave (some sort of keepalive?), which finds out the connection is already dead
          - at that point, the server accepts the next connection attempt from the slave

          If the cause of the first failed recv can't be found or prevented, it seems the communication between client and server needs some sort of 'reconnection' mechanism (so the client can reconnect right away, instead of having to abort the build and wait until the server times out the connection). Then, the client could reconnect and retry what it was doing, instead of aborting the build on a failed connection?
          Arnt Witteveen made changes -
          Environment Original: Master: Jenkins 1.572 on OsX 10.7.5, Java: 1.6.0_65-b14-462-11M4609
          Slave: Slave version 2.43 (as reported by the versioncolumn plugin) on Windows 2008 R2 Standard, Java 7 update 55
          New: Master: Jenkins 1.572 on OsX 10.7.5, Java: 1.6.0_65-b14-462-11M4609
          Slave: Slave version 2.43 (as reported by the versioncolumn plugin) on Windows 2008 R2 Standard (64bit), Java 7 update 55 (32 bit)
          Arnt Witteveen made changes -
          Description Original: I have a similar issue to JENKINS-14332, although that says there is no problem on a windows slave (because it eventually reconnects?), so I'm logging this seperatly.

          The message "hudson.remoting.ChannelClosedException: channel is already closed" breaks my builds on a windows slave. It doesn't happen on every build, but a nightly run of 7 builds seems to have at least one each time.

          Because these seem to be connection issues, so might be imeout related, here's some setup info, even though I doubt it will be relevant: all those builds are matrix builds. My setup consist of an OSX master YYY that does the build for the 'mac' axis, and a slave XXX that does the build of the 'windows' axis. The windows machine is significantly slower than the mac, so some the builds spend quite some time in a state of '1 axis done, other axis queued'. I don't really see issues here, it seems to work fine for most of the builds most of the time: e.g. these 7 builds are all queued at 20:00, and the example below fails shortly after 21:00, but other builds finish fine later on (so after a longer wait). The failing build on XXX started at 20:50, so it fails after 17 minutes. Other, later and earlier builds run for an hour without problems.


          In the middle of a build, the build log (as seen on the master) suddenly says:

          ==========================================
          21:07:10 FATAL: channel is already closed
          21:07:10 hudson.remoting.ChannelClosedException: channel is already closed
          21:07:10 at hudson.remoting.Channel.send(Channel.java:541)
          21:07:10 at hudson.remoting.Request.call(Request.java:129)
          21:07:10 at hudson.remoting.Channel.call(Channel.java:739)
          21:07:10 at hudson.EnvVars.getRemote(EnvVars.java:404)
          21:07:10 at hudson.model.Computer.getEnvironment(Computer.java:912)
          21:07:10 at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:29)
          21:07:10 at hudson.model.Run.getEnvironment(Run.java:2250)
          21:07:10 at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:907)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.impl.EnvironmentVariableMacro.evaluate(EnvironmentVariableMacro.java:23)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.DataBoundTokenMacro.evaluate(DataBoundTokenMacro.java:189)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.TokenMacro.expand(TokenMacro.java:182)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.TokenMacro.expand(TokenMacro.java:154)
          21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter.setDisplayName(BuildNameSetter.java:50)
          21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter.access$000(BuildNameSetter.java:26)
          21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter$1.tearDown(BuildNameSetter.java:42)
          21:07:10 at hudson.model.Build$BuildExecution.doRun(Build.java:171)
          21:07:10 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:535)
          21:07:10 at hudson.model.Run.execute(Run.java:1732)
          21:07:10 at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
          21:07:10 at hudson.model.ResourceController.execute(ResourceController.java:88)
          21:07:10 at hudson.model.Executor.run(Executor.java:234)
          21:07:10 Caused by: java.io.IOException: Failed to abort
          21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:195)
          21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:581)
          21:07:10 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          21:07:10 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
          21:07:10 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          21:07:10 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          21:07:10 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          21:07:10 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          21:07:10 at java.lang.Thread.run(Thread.java:695)
          21:07:10 Caused by: java.io.IOException: Operation timed out
          21:07:10 at sun.nio.ch.FileDispatcher.read0(Native Method)
          21:07:10 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          21:07:10 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
          21:07:10 at sun.nio.ch.IOUtil.read(IOUtil.java:171)
          21:07:10 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
          21:07:10 at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
          21:07:10 at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
          21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
          21:07:10 ... 7 more
          ==========================================

          This combines with this in the slave error log:

          ==========================================
          Jul 16, 2014 9:01:57 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
          SEVERE: I/O error in channel channel
          java.net.SocketException: Software caused connection abort: recv failed
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.io.BufferedInputStream.fill(Unknown Source)
          at java.io.BufferedInputStream.read(Unknown Source)
          at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
          at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
          at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
          at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
          at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          Jul 16, 2014 9:01:57 PM hudson.util.ProcessTree getKillers
          WARNING: Failed to obtain killers
          hudson.remoting.ChannelClosedException: channel is already closed
          at hudson.remoting.Channel.send(Channel.java:541)
          at hudson.remoting.Request.call(Request.java:129)
          at hudson.remoting.Channel.call(Channel.java:739)
          at hudson.util.ProcessTree.getKillers(ProcessTree.java:162)
          at hudson.util.ProcessTree$OSProcess.killByKiller(ProcessTree.java:221)
          at hudson.util.ProcessTree$Windows$1.killRecursively(ProcessTree.java:413)
          at hudson.util.ProcessTree.killAll(ProcessTree.java:149)
          at hudson.Proc$LocalProc.destroy(Proc.java:379)
          at hudson.Proc$LocalProc.join(Proc.java:352)
          at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1116)
          at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
          at java.lang.reflect.Method.invoke(Unknown Source)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:309)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:290)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:249)
          at hudson.remoting.UserRequest.perform(UserRequest.java:118)
          at hudson.remoting.UserRequest.perform(UserRequest.java:48)
          at hudson.remoting.Request$2.run(Request.java:328)
          at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
          at java.util.concurrent.FutureTask.run(Unknown Source)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
          at hudson.remoting.Engine$1$1.run(Engine.java:63)
          at java.lang.Thread.run(Unknown Source)
          Caused by: java.net.SocketException: Software caused connection abort: recv failed
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.io.BufferedInputStream.fill(Unknown Source)
          at java.io.BufferedInputStream.read(Unknown Source)
          at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
          at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
          at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
          at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
          at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          Jul 16, 2014 9:01:57 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Jul 16, 2014 9:01:57 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect
          INFO: Restarting slave via jenkins.slaves.restarter.WinswSlaveRestarter@18f3cc5
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: XXX
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://YYY:8080/]
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to YYY:62768
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
          at hudson.remoting.Engine.run(Engine.java:276)

          [repeated reconnection attempts, +- 2 times/sec]

          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: XXX
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://YYY:8080/]
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to YYY:62768
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
          at hudson.remoting.Engine.run(Engine.java:276)

          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: XXX
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://YYY:8080/]
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to YYY:62768
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 16, 2014 9:07:07 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          ==========================================

          and the wrapper log says:

          ==========================================
          2014-07-16 21:01:59 - Stopping jenkinsslave-e__JenkinsHome
          2014-07-16 21:01:59 - ProcessKill 3664
          2014-07-16 21:01:59 - Send SIGINT 3664
          2014-07-16 21:01:59 - SIGINT to3664 successful
          2014-07-16 21:01:59 - Finished jenkinsslave-e__JenkinsHome
          2014-07-16 21:02:01 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:02:01 - Started 3264
          2014-07-16 21:02:06 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:02:06 - Started 1364

          [many more of these reconnection attempts]

          2014-07-16 21:07:04 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:07:04 - Started 3448
          2014-07-16 21:07:05 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:07:05 - Started 1384
          2014-07-16 21:07:06 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://rdmacbuild05:8080/computer/rdvmbuild92/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:07:06 - Started 3140
          [nothing after that, so this is the reconnection attempt that worked, I think.]
          ==========================================


          Finally in the server log:

          ==========================================
          TCP slave agent connection handler #1153 with /IPofXXX:52399 is aborted: Unrecognized name: XXX

          Jul 16, 2014 9:07:10 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run

          Accepted connection #1154 from /IPofXXX:52406

          Jul 16, 2014 9:07:10 PM WARNING jenkins.slaves.JnlpSlaveHandshake error

          TCP slave agent connection handler #1154 with /IPofXXX:52406 is aborted: XXX is already connected to this master. Rejecting this connection.

          Jul 16, 2014 9:07:10 PM WARNING jenkins.slaves.JnlpSlaveHandshake error

          TCP slave agent connection handler #1154 with /IPofXXX:52406 is aborted: Unrecognized name: XXX

          Jul 16, 2014 9:07:10 PM WARNING org.jenkinsci.remoting.nio.NioChannelHub run

          Communication problem
          java.io.IOException: Operation timed out
          at sun.nio.ch.FileDispatcher.read0(Native Method)
          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
          at sun.nio.ch.IOUtil.read(IOUtil.java:171)
          at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
          at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
          at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          at java.lang.Thread.run(Thread.java:695)

          Jul 16, 2014 9:07:11 PM WARNING jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed

          NioChannelHub keys=1 gen=1952690: Computer.threadPoolForRemoting [#1] for + XXX terminated
          java.io.IOException: Failed to abort
          at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:195)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:581)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          at java.lang.Thread.run(Thread.java:695)
          Caused by: java.io.IOException: Operation timed out
          at sun.nio.ch.FileDispatcher.read0(Native Method)
          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
          at sun.nio.ch.IOUtil.read(IOUtil.java:171)
          at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
          at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
          at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
          ... 7 more
          ==========================================

          Note that the clocks on both machines are within a few seconds of each other, so it seems that the order of events is:
          - slave detects a failed recv, which closes the channel
          - next communication attempt detects the closed channel
          - slave tries to reconnect, +- 2 times/sec
          - server refuses this because it thinks it is still connected
          - after +-5 minutes, the server pings the slave (some sort of keepalive?), which finds out the connection is already dead
          - at that point, the server accepts the next connection attempt from the slave

          If the cause of the first failed recv can't be found or prevented, it seems the communication between client and server needs some sort of 'reconnection' mechanism (so the client can reconnect right away, instead of having to abort the build and wait until the server times out the connection). Then, the client could reconnect and retry what it was doing, instead of aborting the build on a failed connection?
          New: I have a similar issue to JENKINS-14332, although that says there is no problem on a windows slave (because it eventually reconnects?), so I'm logging this seperatly.

          The message "hudson.remoting.ChannelClosedException: channel is already closed" breaks my builds on a windows slave. It doesn't happen on every build, but a nightly run of 7 builds seems to have at least one each time.

          Because these seem to be connection issues, so might be imeout related, here's some setup info, even though I doubt it will be relevant: all those builds are matrix builds. My setup consist of an OSX master YYY that does the build for the 'mac' axis, and a slave XXX that does the build of the 'windows' axis. The windows machine is significantly slower than the mac, so some the builds spend quite some time in a state of '1 axis done, other axis queued'. I don't really see issues here, it seems to work fine for most of the builds most of the time: e.g. these 7 builds are all queued at 20:00, and the example below fails shortly after 21:00, but other builds finish fine later on (so after a longer wait). The failing build on XXX started at 20:50, so it fails after 17 minutes. Other, later and earlier builds run for an hour without problems.


          In the middle of a build, the build log (as seen on the master) suddenly says:

          ==========================================
          21:07:10 FATAL: channel is already closed
          21:07:10 hudson.remoting.ChannelClosedException: channel is already closed
          21:07:10 at hudson.remoting.Channel.send(Channel.java:541)
          21:07:10 at hudson.remoting.Request.call(Request.java:129)
          21:07:10 at hudson.remoting.Channel.call(Channel.java:739)
          21:07:10 at hudson.EnvVars.getRemote(EnvVars.java:404)
          21:07:10 at hudson.model.Computer.getEnvironment(Computer.java:912)
          21:07:10 at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:29)
          21:07:10 at hudson.model.Run.getEnvironment(Run.java:2250)
          21:07:10 at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:907)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.impl.EnvironmentVariableMacro.evaluate(EnvironmentVariableMacro.java:23)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.DataBoundTokenMacro.evaluate(DataBoundTokenMacro.java:189)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.TokenMacro.expand(TokenMacro.java:182)
          21:07:10 at org.jenkinsci.plugins.tokenmacro.TokenMacro.expand(TokenMacro.java:154)
          21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter.setDisplayName(BuildNameSetter.java:50)
          21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter.access$000(BuildNameSetter.java:26)
          21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter$1.tearDown(BuildNameSetter.java:42)
          21:07:10 at hudson.model.Build$BuildExecution.doRun(Build.java:171)
          21:07:10 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:535)
          21:07:10 at hudson.model.Run.execute(Run.java:1732)
          21:07:10 at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
          21:07:10 at hudson.model.ResourceController.execute(ResourceController.java:88)
          21:07:10 at hudson.model.Executor.run(Executor.java:234)
          21:07:10 Caused by: java.io.IOException: Failed to abort
          21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:195)
          21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:581)
          21:07:10 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          21:07:10 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
          21:07:10 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          21:07:10 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          21:07:10 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          21:07:10 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          21:07:10 at java.lang.Thread.run(Thread.java:695)
          21:07:10 Caused by: java.io.IOException: Operation timed out
          21:07:10 at sun.nio.ch.FileDispatcher.read0(Native Method)
          21:07:10 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          21:07:10 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
          21:07:10 at sun.nio.ch.IOUtil.read(IOUtil.java:171)
          21:07:10 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
          21:07:10 at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
          21:07:10 at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
          21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
          21:07:10 ... 7 more
          ==========================================

          This combines with this in the slave error log:

          ==========================================
          Jul 16, 2014 9:01:57 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
          SEVERE: I/O error in channel channel
          java.net.SocketException: Software caused connection abort: recv failed
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.io.BufferedInputStream.fill(Unknown Source)
          at java.io.BufferedInputStream.read(Unknown Source)
          at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
          at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
          at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
          at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
          at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          Jul 16, 2014 9:01:57 PM hudson.util.ProcessTree getKillers
          WARNING: Failed to obtain killers
          hudson.remoting.ChannelClosedException: channel is already closed
          at hudson.remoting.Channel.send(Channel.java:541)
          at hudson.remoting.Request.call(Request.java:129)
          at hudson.remoting.Channel.call(Channel.java:739)
          at hudson.util.ProcessTree.getKillers(ProcessTree.java:162)
          at hudson.util.ProcessTree$OSProcess.killByKiller(ProcessTree.java:221)
          at hudson.util.ProcessTree$Windows$1.killRecursively(ProcessTree.java:413)
          at hudson.util.ProcessTree.killAll(ProcessTree.java:149)
          at hudson.Proc$LocalProc.destroy(Proc.java:379)
          at hudson.Proc$LocalProc.join(Proc.java:352)
          at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1116)
          at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
          at java.lang.reflect.Method.invoke(Unknown Source)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:309)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:290)
          at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:249)
          at hudson.remoting.UserRequest.perform(UserRequest.java:118)
          at hudson.remoting.UserRequest.perform(UserRequest.java:48)
          at hudson.remoting.Request$2.run(Request.java:328)
          at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
          at java.util.concurrent.FutureTask.run(Unknown Source)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
          at hudson.remoting.Engine$1$1.run(Engine.java:63)
          at java.lang.Thread.run(Unknown Source)
          Caused by: java.net.SocketException: Software caused connection abort: recv failed
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.io.BufferedInputStream.fill(Unknown Source)
          at java.io.BufferedInputStream.read(Unknown Source)
          at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
          at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
          at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
          at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
          at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          Jul 16, 2014 9:01:57 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Jul 16, 2014 9:01:57 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect
          INFO: Restarting slave via jenkins.slaves.restarter.WinswSlaveRestarter@18f3cc5
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: XXX
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://YYY:8080/]
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to YYY:62768
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
          at hudson.remoting.Engine.run(Engine.java:276)

          [repeated reconnection attempts, +- 2 times/sec]

          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: XXX
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://YYY:8080/]
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to YYY:62768
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
          at hudson.remoting.Engine.run(Engine.java:276)

          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: XXX
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://YYY:8080/]
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to YYY:62768
          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          Jul 16, 2014 9:07:07 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          ==========================================

          and the wrapper log says:

          ==========================================
          2014-07-16 21:01:59 - Stopping jenkinsslave-e__JenkinsHome
          2014-07-16 21:01:59 - ProcessKill 3664
          2014-07-16 21:01:59 - Send SIGINT 3664
          2014-07-16 21:01:59 - SIGINT to3664 successful
          2014-07-16 21:01:59 - Finished jenkinsslave-e__JenkinsHome
          2014-07-16 21:02:01 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:02:01 - Started 3264
          2014-07-16 21:02:06 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:02:06 - Started 1364

          [many more of these reconnection attempts]

          2014-07-16 21:07:04 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:07:04 - Started 3448
          2014-07-16 21:07:05 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:07:05 - Started 1384
          2014-07-16 21:07:06 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
          2014-07-16 21:07:06 - Started 3140
          [nothing after that, so this is the reconnection attempt that worked, I think.]
          ==========================================


          Finally in the server log:

          ==========================================
          TCP slave agent connection handler #1153 with /IPofXXX:52399 is aborted: Unrecognized name: XXX

          Jul 16, 2014 9:07:10 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run

          Accepted connection #1154 from /IPofXXX:52406

          Jul 16, 2014 9:07:10 PM WARNING jenkins.slaves.JnlpSlaveHandshake error

          TCP slave agent connection handler #1154 with /IPofXXX:52406 is aborted: XXX is already connected to this master. Rejecting this connection.

          Jul 16, 2014 9:07:10 PM WARNING jenkins.slaves.JnlpSlaveHandshake error

          TCP slave agent connection handler #1154 with /IPofXXX:52406 is aborted: Unrecognized name: XXX

          Jul 16, 2014 9:07:10 PM WARNING org.jenkinsci.remoting.nio.NioChannelHub run

          Communication problem
          java.io.IOException: Operation timed out
          at sun.nio.ch.FileDispatcher.read0(Native Method)
          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
          at sun.nio.ch.IOUtil.read(IOUtil.java:171)
          at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
          at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
          at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          at java.lang.Thread.run(Thread.java:695)

          Jul 16, 2014 9:07:11 PM WARNING jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed

          NioChannelHub keys=1 gen=1952690: Computer.threadPoolForRemoting [#1] for + XXX terminated
          java.io.IOException: Failed to abort
          at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:195)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:581)
          at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
          at java.lang.Thread.run(Thread.java:695)
          Caused by: java.io.IOException: Operation timed out
          at sun.nio.ch.FileDispatcher.read0(Native Method)
          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
          at sun.nio.ch.IOUtil.read(IOUtil.java:171)
          at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
          at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
          at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
          at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
          ... 7 more
          ==========================================

          Note that the clocks on both machines are within a few seconds of each other, so it seems that the order of events is:
          - slave detects a failed recv, which closes the channel
          - next communication attempt detects the closed channel
          - slave tries to reconnect, +- 2 times/sec
          - server refuses this because it thinks it is still connected
          - after +-5 minutes, the server pings the slave (some sort of keepalive?), which finds out the connection is already dead
          - at that point, the server accepts the next connection attempt from the slave

          If the cause of the first failed recv can't be found or prevented, it seems the communication between client and server needs some sort of 'reconnection' mechanism (so the client can reconnect right away, instead of having to abort the build and wait until the server times out the connection). Then, the client could reconnect and retry what it was doing, instead of aborting the build on a failed connection?

          JENKINS-19513 may also be related.

          Arnt Witteveen added a comment - JENKINS-19513 may also be related.
          Daniel Beck made changes -
          Labels Original: connection exception jenkins slave New: remoting

          Thanks for the detailed log analysis. I agree with your order of events list, although there are some details that are still suspicious. For example, if the client has decided to shutdown the connection and that properly arrived at the server, we should see something other than "java.io.IOException: Operation timed out" in the read. And we don't set any read timeout, so I'm puzzled as to why we see this exception during the read.

          In any case, according to that theory, the first event that caused this is the following:

          Jul 16, 2014 9:01:57 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
          SEVERE: I/O error in channel channel
          java.net.SocketException: Software caused connection abort: recv failed
          at java.net.SocketInputStream.socketRead0(Native Method)
          

          This appears to be an error unique to Windows, and I found this stackoverflow conversation helpful to understand what it means, as well as this KB article.

          If I understand this error correctly, it means that Jenkins slave has given WinSock some data to send, and WinSock has sent some packets over the network, but it's not getting the data through to the other side, and so it has given up and declared the connection lost. That happens much later after the write is called, so the next read that came in gets hit by this problem.

          The links I mentioned above do not really discuss what that "data transmission time-out" is about. It could be anything from WinSock not getting TCK ACK from the other side, TCP flow control, or self-imposed timeout unique to WinSock.

          I'd have normally suspected the network issue, but the succssive reconnection attempts seem to exclude this possibility.

          Another possibility could be that this is related to issues like JENKINS-24050, where an unrelated problem kills the NIO selector thread. When that happens, all the sockets stopped getting serviced, so data that the client sent will not be picked up by the application on the server side. TCP ACK should still come in this case, since kernel is getting the packets all right, but perhaps in this situation WinSock might decide to time out, perhaps due to a self-imposed timeout. This would need some experiments.

          I've just fixed JENKINS-24050, so if possible I'd love to have you try this fix, which should make it into 1.580. Another useful test is to disable the use of NIO on the server side for managing JNLP slaves with the system property '-Djenkins.slaves.NioChannelSelector.disabled=true' on the master. If the problem goes away with it, that'd be an useful input.

          For anyone else seeing problems, please check relevant logs on both the server and the slave and share it with us, so that we can resolve this problem efficiently.

          Kohsuke Kawaguchi added a comment - Thanks for the detailed log analysis. I agree with your order of events list, although there are some details that are still suspicious. For example, if the client has decided to shutdown the connection and that properly arrived at the server, we should see something other than "java.io.IOException: Operation timed out" in the read. And we don't set any read timeout, so I'm puzzled as to why we see this exception during the read. In any case, according to that theory, the first event that caused this is the following: Jul 16, 2014 9:01:57 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run SEVERE: I/O error in channel channel java.net.SocketException: Software caused connection abort: recv failed at java.net.SocketInputStream.socketRead0(Native Method) This appears to be an error unique to Windows, and I found this stackoverflow conversation helpful to understand what it means, as well as this KB article . If I understand this error correctly, it means that Jenkins slave has given WinSock some data to send, and WinSock has sent some packets over the network, but it's not getting the data through to the other side, and so it has given up and declared the connection lost. That happens much later after the write is called, so the next read that came in gets hit by this problem. The links I mentioned above do not really discuss what that "data transmission time-out" is about. It could be anything from WinSock not getting TCK ACK from the other side, TCP flow control, or self-imposed timeout unique to WinSock. I'd have normally suspected the network issue, but the succssive reconnection attempts seem to exclude this possibility. Another possibility could be that this is related to issues like JENKINS-24050 , where an unrelated problem kills the NIO selector thread. When that happens, all the sockets stopped getting serviced, so data that the client sent will not be picked up by the application on the server side. TCP ACK should still come in this case, since kernel is getting the packets all right, but perhaps in this situation WinSock might decide to time out, perhaps due to a self-imposed timeout. This would need some experiments. I've just fixed JENKINS-24050 , so if possible I'd love to have you try this fix, which should make it into 1.580. Another useful test is to disable the use of NIO on the server side for managing JNLP slaves with the system property '-Djenkins.slaves.NioChannelSelector.disabled=true' on the master. If the problem goes away with it, that'd be an useful input. For anyone else seeing problems, please check relevant logs on both the server and the slave and share it with us, so that we can resolve this problem efficiently.

          You are correct that this was a network issue: after some time I noticed this always happened around 9PM and mentioned this to a network admin. He said that something similar had happened once, and we let a ping run for a whole day. It seems that there were some network interruptions (of up to a minute IIRC) at the specific time this happened. The machine got switched to another port on the router (or another router), and the issue has not reoccurred since.

          Sorry for not mentioning this earlier, I forgot about having logged this.

          Arnt Witteveen added a comment - You are correct that this was a network issue: after some time I noticed this always happened around 9PM and mentioned this to a network admin. He said that something similar had happened once, and we let a ping run for a whole day. It seems that there were some network interruptions (of up to a minute IIRC) at the specific time this happened. The machine got switched to another port on the router (or another router), and the issue has not reoccurred since. Sorry for not mentioning this earlier, I forgot about having logged this.

          I have been seeing similar problems to this. In my situation one of my slaves is a Windows 7 desktop computer that is set to suspend after a period of inactivity. I've had this running for a few years without problem and the slave has always reconnected successfully when the desktop is woken up.

          However since the NIO selector thread has been introduced the slaves have ended up being offline for prolonged periods of time and require a manual restart of the Jenkins slave service to bring them back online (see JENKINS-24272). One other symptom that I have observed during testing of this is the server rejecting connections similar to that reported here namely:

          Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
          at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
          at hudson.remoting.Engine.run(Engine.java:276)
          

          Having investigated I'm seeing that the slave has decided that the connection is broken and has tore down its TCP connection to the master but the master never sees the TCP RST for this so it continues to retry the TCP packets for 10 or so minutes until the TCP connection times out and then the master tears down the connection.

          The really strange this is that this behaviour seems to be different between NIO and threaded methods on the master. With threaded the TCP RST is sent by W7 and kills the connection before the slave reconnects but in NIO no RST is ever sent. That happens down at the network packet level so it shouldn't really be affected by NIO but I wonder if it a slight different configuration in the network stack (non-blocking vs blocking) that causes the different behaviour.

          I also suspect that this may be related to the change to re-start JNLP slaves using a new JVM rather than just restarting in the same process. It may be that the Windows TCP stack just immediately forgets the old TCP socket when the old JVM process exits and hence ignores retries rather than RSTing the connection.

          I'm planning on waiting for the current crop of JNLP slave fixes to get integrated and will then experiment a bit further to see if I can work out what is stopping the NIO selector getting notified that the slave has closed the TCP connection.

          Richard Mortimer added a comment - I have been seeing similar problems to this. In my situation one of my slaves is a Windows 7 desktop computer that is set to suspend after a period of inactivity. I've had this running for a few years without problem and the slave has always reconnected successfully when the desktop is woken up. However since the NIO selector thread has been introduced the slaves have ended up being offline for prolonged periods of time and require a manual restart of the Jenkins slave service to bring them back online (see JENKINS-24272 ). One other symptom that I have observed during testing of this is the server rejecting connections similar to that reported here namely: Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection. java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection. at hudson.remoting.Engine.onConnectionRejected(Engine.java:304) at hudson.remoting.Engine.run(Engine.java:276) Having investigated I'm seeing that the slave has decided that the connection is broken and has tore down its TCP connection to the master but the master never sees the TCP RST for this so it continues to retry the TCP packets for 10 or so minutes until the TCP connection times out and then the master tears down the connection. The really strange this is that this behaviour seems to be different between NIO and threaded methods on the master. With threaded the TCP RST is sent by W7 and kills the connection before the slave reconnects but in NIO no RST is ever sent. That happens down at the network packet level so it shouldn't really be affected by NIO but I wonder if it a slight different configuration in the network stack (non-blocking vs blocking) that causes the different behaviour. I also suspect that this may be related to the change to re-start JNLP slaves using a new JVM rather than just restarting in the same process. It may be that the Windows TCP stack just immediately forgets the old TCP socket when the old JVM process exits and hence ignores retries rather than RSTing the connection. I'm planning on waiting for the current crop of JNLP slave fixes to get integrated and will then experiment a bit further to see if I can work out what is stopping the NIO selector getting notified that the slave has closed the TCP connection.

          Thinking about this some more, I still think it would be nice that a temporary network outage of half a minute or so :

          • does not abort a build (that may have been halfway through a build process of several hours). I wouldn't mind seeing, in the log on the server, a message saying [network was out, log may be missing a part here] in the middle of the log, but why stop the build completely right away? Even in cases where the slave needs the server, there's little point in aborting the build, because the slave is not going to get new work from the server if it can't connect to it anyway?
          • does not cause the client to be offline for that long. If the server receives a new connection attempt of a slave node, and there already is a connection of the same slave node, who not just disconnect the old connection and continue with the new one?

          Arnt Witteveen added a comment - Thinking about this some more, I still think it would be nice that a temporary network outage of half a minute or so : does not abort a build (that may have been halfway through a build process of several hours). I wouldn't mind seeing, in the log on the server, a message saying [network was out, log may be missing a part here] in the middle of the log, but why stop the build completely right away? Even in cases where the slave needs the server, there's little point in aborting the build, because the slave is not going to get new work from the server if it can't connect to it anyway? does not cause the client to be offline for that long. If the server receives a new connection attempt of a slave node, and there already is a connection of the same slave node, who not just disconnect the old connection and continue with the new one?

            Unassigned Unassigned
            legolas Arnt Witteveen
            Votes:
            41 Vote for this issue
            Watchers:
            52 Start watching this issue

              Created:
              Updated: