Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-23852

slave builds repeatedly broken by "hudson.remoting.ChannelClosedException: channel is already closed"

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • remoting
    • Master: Jenkins 1.572 on OsX 10.7.5, Java: 1.6.0_65-b14-462-11M4609
      Slave: Slave version 2.43 (as reported by the versioncolumn plugin) on Windows 2008 R2 Standard (64bit), Java 7 update 55 (32 bit)

      I have a similar issue to JENKINS-14332, although that says there is no problem on a windows slave (because it eventually reconnects?), so I'm logging this seperatly.

      The message "hudson.remoting.ChannelClosedException: channel is already closed" breaks my builds on a windows slave. It doesn't happen on every build, but a nightly run of 7 builds seems to have at least one each time.

      Because these seem to be connection issues, so might be imeout related, here's some setup info, even though I doubt it will be relevant: all those builds are matrix builds. My setup consist of an OSX master YYY that does the build for the 'mac' axis, and a slave XXX that does the build of the 'windows' axis. The windows machine is significantly slower than the mac, so some the builds spend quite some time in a state of '1 axis done, other axis queued'. I don't really see issues here, it seems to work fine for most of the builds most of the time: e.g. these 7 builds are all queued at 20:00, and the example below fails shortly after 21:00, but other builds finish fine later on (so after a longer wait). The failing build on XXX started at 20:50, so it fails after 17 minutes. Other, later and earlier builds run for an hour without problems.

      In the middle of a build, the build log (as seen on the master) suddenly says:

      ==========================================
      21:07:10 FATAL: channel is already closed
      21:07:10 hudson.remoting.ChannelClosedException: channel is already closed
      21:07:10 at hudson.remoting.Channel.send(Channel.java:541)
      21:07:10 at hudson.remoting.Request.call(Request.java:129)
      21:07:10 at hudson.remoting.Channel.call(Channel.java:739)
      21:07:10 at hudson.EnvVars.getRemote(EnvVars.java:404)
      21:07:10 at hudson.model.Computer.getEnvironment(Computer.java:912)
      21:07:10 at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:29)
      21:07:10 at hudson.model.Run.getEnvironment(Run.java:2250)
      21:07:10 at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:907)
      21:07:10 at org.jenkinsci.plugins.tokenmacro.impl.EnvironmentVariableMacro.evaluate(EnvironmentVariableMacro.java:23)
      21:07:10 at org.jenkinsci.plugins.tokenmacro.DataBoundTokenMacro.evaluate(DataBoundTokenMacro.java:189)
      21:07:10 at org.jenkinsci.plugins.tokenmacro.TokenMacro.expand(TokenMacro.java:182)
      21:07:10 at org.jenkinsci.plugins.tokenmacro.TokenMacro.expand(TokenMacro.java:154)
      21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter.setDisplayName(BuildNameSetter.java:50)
      21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter.access$000(BuildNameSetter.java:26)
      21:07:10 at org.jenkinsci.plugins.buildnamesetter.BuildNameSetter$1.tearDown(BuildNameSetter.java:42)
      21:07:10 at hudson.model.Build$BuildExecution.doRun(Build.java:171)
      21:07:10 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:535)
      21:07:10 at hudson.model.Run.execute(Run.java:1732)
      21:07:10 at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
      21:07:10 at hudson.model.ResourceController.execute(ResourceController.java:88)
      21:07:10 at hudson.model.Executor.run(Executor.java:234)
      21:07:10 Caused by: java.io.IOException: Failed to abort
      21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:195)
      21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:581)
      21:07:10 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      21:07:10 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
      21:07:10 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      21:07:10 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      21:07:10 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
      21:07:10 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
      21:07:10 at java.lang.Thread.run(Thread.java:695)
      21:07:10 Caused by: java.io.IOException: Operation timed out
      21:07:10 at sun.nio.ch.FileDispatcher.read0(Native Method)
      21:07:10 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
      21:07:10 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
      21:07:10 at sun.nio.ch.IOUtil.read(IOUtil.java:171)
      21:07:10 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
      21:07:10 at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
      21:07:10 at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
      21:07:10 at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
      21:07:10 ... 7 more
      ==========================================

      This combines with this in the slave error log:

      ==========================================
      Jul 16, 2014 9:01:57 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
      SEVERE: I/O error in channel channel
      java.net.SocketException: Software caused connection abort: recv failed
      at java.net.SocketInputStream.socketRead0(Native Method)
      at java.net.SocketInputStream.read(Unknown Source)
      at java.net.SocketInputStream.read(Unknown Source)
      at java.io.BufferedInputStream.fill(Unknown Source)
      at java.io.BufferedInputStream.read(Unknown Source)
      at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
      at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
      at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
      at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
      at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

      Jul 16, 2014 9:01:57 PM hudson.util.ProcessTree getKillers
      WARNING: Failed to obtain killers
      hudson.remoting.ChannelClosedException: channel is already closed
      at hudson.remoting.Channel.send(Channel.java:541)
      at hudson.remoting.Request.call(Request.java:129)
      at hudson.remoting.Channel.call(Channel.java:739)
      at hudson.util.ProcessTree.getKillers(ProcessTree.java:162)
      at hudson.util.ProcessTree$OSProcess.killByKiller(ProcessTree.java:221)
      at hudson.util.ProcessTree$Windows$1.killRecursively(ProcessTree.java:413)
      at hudson.util.ProcessTree.killAll(ProcessTree.java:149)
      at hudson.Proc$LocalProc.destroy(Proc.java:379)
      at hudson.Proc$LocalProc.join(Proc.java:352)
      at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1116)
      at sun.reflect.GeneratedMethodAccessor61.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:309)
      at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:290)
      at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:249)
      at hudson.remoting.UserRequest.perform(UserRequest.java:118)
      at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      at hudson.remoting.Request$2.run(Request.java:328)
      at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      at java.util.concurrent.FutureTask.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at hudson.remoting.Engine$1$1.run(Engine.java:63)
      at java.lang.Thread.run(Unknown Source)
      Caused by: java.net.SocketException: Software caused connection abort: recv failed
      at java.net.SocketInputStream.socketRead0(Native Method)
      at java.net.SocketInputStream.read(Unknown Source)
      at java.net.SocketInputStream.read(Unknown Source)
      at java.io.BufferedInputStream.fill(Unknown Source)
      at java.io.BufferedInputStream.read(Unknown Source)
      at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
      at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
      at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
      at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
      at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

      Jul 16, 2014 9:01:57 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Jul 16, 2014 9:01:57 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect
      INFO: Restarting slave via jenkins.slaves.restarter.WinswSlaveRestarter@18f3cc5
      Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: XXX
      Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among http://YYY:8080/
      Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to YYY:62768
      Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Jul 16, 2014 9:02:05 PM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
      java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
      at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
      at hudson.remoting.Engine.run(Engine.java:276)

      [repeated reconnection attempts, +- 2 times/sec]

      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: XXX
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among http://YYY:8080/
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to YYY:62768
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
      java.lang.Exception: The server rejected the connection: XXX is already connected to this master. Rejecting this connection.
      at hudson.remoting.Engine.onConnectionRejected(Engine.java:304)
      at hudson.remoting.Engine.run(Engine.java:276)

      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: XXX
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among http://YYY:8080/
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to YYY:62768
      Jul 16, 2014 9:07:06 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Jul 16, 2014 9:07:07 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      ==========================================

      and the wrapper log says:

      ==========================================
      2014-07-16 21:01:59 - Stopping jenkinsslave-e__JenkinsHome
      2014-07-16 21:01:59 - ProcessKill 3664
      2014-07-16 21:01:59 - Send SIGINT 3664
      2014-07-16 21:01:59 - SIGINT to3664 successful
      2014-07-16 21:01:59 - Finished jenkinsslave-e__JenkinsHome
      2014-07-16 21:02:01 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
      2014-07-16 21:02:01 - Started 3264
      2014-07-16 21:02:06 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
      2014-07-16 21:02:06 - Started 1364

      [many more of these reconnection attempts]

      2014-07-16 21:07:04 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
      2014-07-16 21:07:04 - Started 3448
      2014-07-16 21:07:05 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
      2014-07-16 21:07:05 - Started 1384
      2014-07-16 21:07:06 - Starting C:\Program Files (x86)\Java\jre7\bin\java.exe -Xrs -jar "E:\JenkinsHome\slave.jar" -jnlpUrl http://YYY:8080/computer/XXX/slave-agent.jnlp -secret 69619b5aede9e043c5697ef11f4a51e679140506c282aefdd7f728b8a7ac7d2f
      2014-07-16 21:07:06 - Started 3140
      [nothing after that, so this is the reconnection attempt that worked, I think.]
      ==========================================

      Finally in the server log:

      ==========================================
      TCP slave agent connection handler #1153 with /IPofXXX:52399 is aborted: Unrecognized name: XXX

      Jul 16, 2014 9:07:10 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run

      Accepted connection #1154 from /IPofXXX:52406

      Jul 16, 2014 9:07:10 PM WARNING jenkins.slaves.JnlpSlaveHandshake error

      TCP slave agent connection handler #1154 with /IPofXXX:52406 is aborted: XXX is already connected to this master. Rejecting this connection.

      Jul 16, 2014 9:07:10 PM WARNING jenkins.slaves.JnlpSlaveHandshake error

      TCP slave agent connection handler #1154 with /IPofXXX:52406 is aborted: Unrecognized name: XXX

      Jul 16, 2014 9:07:10 PM WARNING org.jenkinsci.remoting.nio.NioChannelHub run

      Communication problem
      java.io.IOException: Operation timed out
      at sun.nio.ch.FileDispatcher.read0(Native Method)
      at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
      at sun.nio.ch.IOUtil.read(IOUtil.java:171)
      at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
      at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
      at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
      at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
      at java.lang.Thread.run(Thread.java:695)

      Jul 16, 2014 9:07:11 PM WARNING jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed

      NioChannelHub keys=1 gen=1952690: Computer.threadPoolForRemoting 1 for + XXX terminated
      java.io.IOException: Failed to abort
      at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:195)
      at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:581)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
      at java.lang.Thread.run(Thread.java:695)
      Caused by: java.io.IOException: Operation timed out
      at sun.nio.ch.FileDispatcher.read0(Native Method)
      at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
      at sun.nio.ch.IOUtil.read(IOUtil.java:171)
      at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
      at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
      at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
      at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:514)
      ... 7 more
      ==========================================

      Note that the clocks on both machines are within a few seconds of each other, so it seems that the order of events is:

      • slave detects a failed recv, which closes the channel
      • next communication attempt detects the closed channel
      • slave tries to reconnect, +- 2 times/sec
      • server refuses this because it thinks it is still connected
      • after +-5 minutes, the server pings the slave (some sort of keepalive?), which finds out the connection is already dead
      • at that point, the server accepts the next connection attempt from the slave

      If the cause of the first failed recv can't be found or prevented, it seems the communication between client and server needs some sort of 'reconnection' mechanism (so the client can reconnect right away, instead of having to abort the build and wait until the server times out the connection). Then, the client could reconnect and retry what it was doing, instead of aborting the build on a failed connection?

          [JENKINS-23852] slave builds repeatedly broken by "hudson.remoting.ChannelClosedException: channel is already closed"

          Kevin R. added a comment -

          bump

          Kevin R. added a comment - bump

          Since we switched from freestyle jobs to pipelines (a very decision if you as me), about every second jobs fails with this error which makes Windows slaves pretty much unusable. It has never been a problem before, therefore my guess is that is has something to do with pipelines. Here's a strack trace, which always looks the same:

          java.io.IOException: remote file operation failed: C:\Users\jenkins\slave\workspace\g.knime.product.full_master-6YACVMZPMCFFZKR6NNQZEQAJGOVHWVVS2MSXJGRRFPLNAY2WNRWQ at hudson.remoting.Channel@2eed4a92:Channel to /xxx.xxx.xxx.xxx hudson.remoting.ChannelClosedException: channel is already closed
          	at hudson.FilePath.act(FilePath.java:992)
          	at hudson.FilePath.act(FilePath.java:974)
          	at hudson.FilePath.mkdirs(FilePath.java:1157)
          	at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:77)
          	at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:65)
          	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:49)
          	at hudson.security.ACL.impersonate(ACL.java:221)
          	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:46)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          	at java.lang.Thread.run(Thread.java:745)
          Caused by: hudson.remoting.ChannelClosedException: channel is already closed
          	at hudson.remoting.Channel.send(Channel.java:604)
          	at hudson.remoting.Request.call(Request.java:130)
          	at hudson.remoting.Channel.call(Channel.java:821)
          	at hudson.FilePath.act(FilePath.java:985)
          	... 12 more
          Caused by: java.io.IOException: Unexpected EOF while receiving the data from the channel. FIFO buffer has been already closed
          	at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:617)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
          	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          	... 5 more
          Caused by: org.jenkinsci.remoting.nio.FifoBuffer$CloseCause: Buffer close has been requested
          	at org.jenkinsci.remoting.nio.FifoBuffer.close(FifoBuffer.java:426)
          	at org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport.closeR(NioChannelHub.java:332)
          	at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:565)
          	... 6 more

          I'm happy to debug this further if someone tells me what to do.

          Thorsten Meinl added a comment - Since we switched from freestyle jobs to pipelines (a very decision if you as me), about every second jobs fails with this error which makes Windows slaves pretty much unusable. It has never been a problem before, therefore my guess is that is has something to do with pipelines. Here's a strack trace, which always looks the same: java.io.IOException: remote file operation failed: C:\Users\jenkins\slave\workspace\g.knime.product.full_master-6YACVMZPMCFFZKR6NNQZEQAJGOVHWVVS2MSXJGRRFPLNAY2WNRWQ at hudson.remoting.Channel@2eed4a92:Channel to /xxx.xxx.xxx.xxx hudson.remoting.ChannelClosedException: channel is already closed at hudson.FilePath.act(FilePath.java:992) at hudson.FilePath.act(FilePath.java:974) at hudson.FilePath.mkdirs(FilePath.java:1157) at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:77) at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:65) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:49) at hudson.security.ACL.impersonate(ACL.java:221) at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:46) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:604) at hudson.remoting.Request.call(Request.java:130) at hudson.remoting.Channel.call(Channel.java:821) at hudson.FilePath.act(FilePath.java:985) ... 12 more Caused by: java.io.IOException: Unexpected EOF while receiving the data from the channel. FIFO buffer has been already closed at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:617) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) ... 5 more Caused by: org.jenkinsci.remoting.nio.FifoBuffer$CloseCause: Buffer close has been requested at org.jenkinsci.remoting.nio.FifoBuffer.close(FifoBuffer.java:426) at org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport.closeR(NioChannelHub.java:332) at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:565) ... 6 more I'm happy to debug this further if someone tells me what to do.

          Daniel Sobral added a comment -

          My maven builds were plagued by this, so it's not pipeline-related. I don't see it much with pipelines, though I wrote my pipelines with retry from the start, because I expected that very problem to persist.

          Daniel Sobral added a comment - My maven builds were plagued by this, so it's not pipeline-related. I don't see it much with pipelines, though I wrote my pipelines with retry from the start, because I expected that very problem to persist.

          I have a vague feeling that this problem occurs when there is more than one job running on the slave. If I have time, I will try to restrict the Windows slaves to one slot and see if this makes the problem go away. Maybe some of you can try this, too.

          Thorsten Meinl added a comment - I have a vague feeling that this problem occurs when there is more than one job running on the slave. If I have time, I will try to restrict the Windows slaves to one slot and see if this makes the problem go away. Maybe some of you can try this, too.

          sithmein we encounter this error on machines that only have 1 executor setup.

          Maybe this happens if there are flyweight executors running on the node?

          Mihai Stoichitescu added a comment - sithmein we encounter this error on machines that only have 1 executor setup. Maybe this happens if there are flyweight executors running on the node?

          Stuart Smith added a comment -

          We have frequent disconnects on two of our slave machines, both are located in geographically the same location, and no other machines in any other location suffer from this, so we are suspecting that it is maybe something firewall related in that location. The infra guys have initially suggested that the firewall is killing idle connections, and have increased the timeout, but we are still seeing frequent disconnections. Is the JNLP connection really idle between slave and master if no builds are running? Our call stack is as follows (Jenkins Master 2.46.1):

          JNLP agent connected from xx Slave.jar version: 3.4.1 This is a Windows agent Agent successfully connected and online ERROR: Connection terminated [8mha:////4EVWKVufSpoBsjG/AK97kvCQst6o1LLM9fjogkB0XVcIAAAAWB+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=[0mjava.nio.channels.ClosedChannelException at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154) at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179) at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:721) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

          Stuart Smith added a comment - We have frequent disconnects on two of our slave machines, both are located in geographically the same location, and no other machines in any other location suffer from this, so we are suspecting that it is maybe something firewall related in that location. The infra guys have initially suggested that the firewall is killing idle connections, and have increased the timeout, but we are still seeing frequent disconnections. Is the JNLP connection really idle between slave and master if no builds are running? Our call stack is as follows (Jenkins Master 2.46.1): JNLP agent connected from xx Slave.jar version: 3.4.1 This is a Windows agent Agent successfully connected and online ERROR: Connection terminated [8mha:////4EVWKVufSpoBsjG/AK97kvCQst6o1LLM9fjogkB0XVcIAAAAWB+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=[0mjava.nio.channels.ClosedChannelException at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154) at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179) at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:721) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

          If no builds are running then the connection can be idle though given you are running JNLP4 then the TLS transport layer should be trickling some data periodically... OTOH perhaps that periodic trickle is actually detecting a network issue

          Stephen Connolly added a comment - If no builds are running then the connection can be idle though given you are running JNLP4 then the TLS transport layer should be trickling some data periodically... OTOH perhaps that periodic trickle is actually detecting a network issue

          Oleg Nenashev added a comment -

          stuartjsmith "RecvClosed()" means that the Tcp session is interrupted by the operating system or by any router between Jenkins master and agent. My first recommendation would be to increase Tcp retransmission timeout in the operating system. If it does not help, then you need to identify which node in your network interrupts the session. It can be done by analyzing Tcp dumps 

          Oleg Nenashev added a comment - stuartjsmith "RecvClosed()" means that the Tcp session is interrupted by the operating system or by any router between Jenkins master and agent. My first recommendation would be to increase Tcp retransmission timeout in the operating system. If it does not help, then you need to identify which node in your network interrupts the session. It can be done by analyzing Tcp dumps 

          Doesn't the "ping thread" prevent the connection from becoming idle?

          Daniel Serodio added a comment - Doesn't the " ping thread " prevent the connection from becoming idle?

          I also get this problem. Just one slave is affected (Windows 10) and this slave is running two agents based on separate folders. One agent is intended to be used to provide a build slave, the other one is used for providing a smoke test slave (which does not need a high performance).

          Heiko Nardmann added a comment - I also get this problem. Just one slave is affected (Windows 10) and this slave is running two agents based on separate folders. One agent is intended to be used to provide a build slave, the other one is used for providing a smoke test slave (which does not need a high performance).

            Unassigned Unassigned
            legolas Arnt Witteveen
            Votes:
            41 Vote for this issue
            Watchers:
            52 Start watching this issue

              Created:
              Updated: