Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-12235

FATAL, Unable to delete script file, IOException2, remote file operation failed, unexpected termination of channel

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Critical
    • Resolution: Duplicate
    • core, remoting
    • None

    Description

      Below is the stacktrace.

      It happened when I ran two jobs on a master. After running a while, both jobs crashed with this exception.
      I think this might be caused by a small flip-flop connectivity of the network, but I didn't noticed any disconnection.
      Another cause may be the huge load of jenkins:

      PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
      25942 hudson 15 0 6902m 5.8g 5720 S 0.3 74.3 401:22.30 java

      Does the jenkins runs its own garbage collector at some specified time?
      We have to restart every few days because it's getting slower and slower until hangs out.

      FATAL: Unable to delete script file /tmp/hudson8303731085225956739.sh
      hudson.util.IOException2: remote file operation failed: /tmp/hudson8303731085225956739.sh at hudson.remoting.Channel@30e472f4:build@autom-1
      at hudson.FilePath.act(FilePath.java:781)
      at hudson.FilePath.act(FilePath.java:767)
      at hudson.FilePath.delete(FilePath.java:1022)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:695)
      at hudson.model.Build$RunnerImpl.build(Build.java:178)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:461)
      at hudson.model.Run.run(Run.java:1404)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:230)
      Caused by: hudson.remoting.ChannelClosedException: channel is already closed
      at hudson.remoting.Channel.send(Channel.java:499)
      at hudson.remoting.Request.call(Request.java:110)
      at hudson.remoting.Channel.call(Channel.java:681)
      at hudson.FilePath.act(FilePath.java:774)
      ... 13 more
      Caused by: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1115)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)
      FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Request.call(Request.java:149)
      at hudson.remoting.Channel.call(Channel.java:681)
      at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
      at $Proxy29.join(Unknown Source)
      at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:859)
      at hudson.Launcher$ProcStarter.join(Launcher.java:345)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:695)
      at hudson.model.Build$RunnerImpl.build(Build.java:178)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:461)
      at hudson.model.Run.run(Run.java:1404)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:230)
      Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Request.abort(Request.java:273)
      at hudson.remoting.Channel.terminate(Channel.java:732)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1139)
      Caused by: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1115)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)

      Attachments

        Issue Links

          Activity

            This issues is bothering, in special, when running on windows slaves connected via JNLP agent.

            In windows slaves case, it seems that the jnlp socket connection is quite sensitive to connection even it's not used at 100%.

            Maybe the solution for this is to use a ssh server on windows?

            Thanks

            dumghen Ghenadie Dumitru added a comment - This issues is bothering, in special, when running on windows slaves connected via JNLP agent. In windows slaves case, it seems that the jnlp socket connection is quite sensitive to connection even it's not used at 100%. Maybe the solution for this is to use a ssh server on windows? Thanks
            erik_purins Erik Purins added a comment -

            Hitting this frequently on windows jenkins slaves. Similar call stack attached.

            erik_purins Erik Purins added a comment - Hitting this frequently on windows jenkins slaves. Similar call stack attached.
            brianharris brianharris added a comment -

            Suspected duplicate: JENKINS-6817

            brianharris brianharris added a comment - Suspected duplicate: JENKINS-6817

            We have started getting this reciently. Difficult to search but I think about version 1.455.

            • If there is no obvious fix can the exception be caught so that it does fail an otherwise successful build?

            I think this would be an adequate workaround for most people, atm this issue is causing random builds to fail which is a significant annoyance for the developers.

            Thanks.
            Rich.

            richardtaylor Richard Taylor added a comment - We have started getting this reciently. Difficult to search but I think about version 1.455. If there is no obvious fix can the exception be caught so that it does fail an otherwise successful build? I think this would be an adequate workaround for most people, atm this issue is causing random builds to fail which is a significant annoyance for the developers. Thanks. Rich.
            krikar Kristian Karl added a comment - - edited

            I get this problem even so often on Windows machines. I run Jenkins ver. 1.463
            See attached stacktrace.txt

            krikar Kristian Karl added a comment - - edited I get this problem even so often on Windows machines. I run Jenkins ver. 1.463 See attached stacktrace.txt
            brianfromoregon Brian Harris added a comment -

            For us, the cause of this error was our build slaves (VMs) running out of memory and self-rebooting.

            brianfromoregon Brian Harris added a comment - For us, the cause of this error was our build slaves (VMs) running out of memory and self-rebooting.
            erik_purins Erik Purins added a comment -

            Disconnects with large stack traces still occurring in latest 1.471. So far have encountered this on Windows, but historically we have seen this also on OSX and Linux. It looks slightly different in my latest, I get a socket reset exception, but failure still first hit in deleting the script file after a long build (over 3 hours).

            erik_purins Erik Purins added a comment - Disconnects with large stack traces still occurring in latest 1.471. So far have encountered this on Windows, but historically we have seen this also on OSX and Linux. It looks slightly different in my latest, I get a socket reset exception, but failure still first hit in deleting the script file after a long build (over 3 hours).
            erik_purins Erik Purins added a comment -

            This is intermittent, but if it's a sign of a client problem (out of memory or whatever), a more useful error caught earlier on would be an improvement.

            erik_purins Erik Purins added a comment - This is intermittent, but if it's a sign of a client problem (out of memory or whatever), a more useful error caught earlier on would be an improvement.
            jvoegele jvoegele added a comment -

            We have also been seeing this problem intermittently. It is not only Windows for us, but our Suse and Red Hat Linux slaves have also been suffering from the same problem.

            Can anything be done to prevent this problem from failing a build? The spurious failures are a distraction.

            jvoegele jvoegele added a comment - We have also been seeing this problem intermittently. It is not only Windows for us, but our Suse and Red Hat Linux slaves have also been suffering from the same problem. Can anything be done to prevent this problem from failing a build? The spurious failures are a distraction.
            forever_xt Zhijun Xu added a comment - - edited

            I encountered the problem these days, my job is running on a Linux slave, due to some reason, there have lots of 'wait' steps ,which need lots of time in the job procedure, I think during those wait steps, the master has no communication with the slave, then the slave(ssh server) disconnects the connection to the master(ssh client), then the problem happened.

            So I configured the ssh server send messages to ssh client every minutes to ensure the connection quality, the problem is resolved

            forever_xt Zhijun Xu added a comment - - edited I encountered the problem these days, my job is running on a Linux slave, due to some reason, there have lots of 'wait' steps ,which need lots of time in the job procedure, I think during those wait steps, the master has no communication with the slave, then the slave(ssh server) disconnects the connection to the master(ssh client), then the problem happened. So I configured the ssh server send messages to ssh client every minutes to ensure the connection quality, the problem is resolved

            We have been facing this issue for some time. Yesterday, we upgraded to 1.466 and the issue persists.
            Jenkins master is running on Windows and the slaves are mainly Windows but a few are Linux. The issue randomly appears on any slave.

            Thanks
            Shobha

            script file c:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\hudson920936561807305456.bat
            hudson.util.IOException2: remote file operation failed: c:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\hudson920936561807305456.bat at hudson.remoting.Channel@5f1603a2: Slave123
            at hudson.FilePath.act(FilePath.java:835)
            at hudson.FilePath.act(FilePath.java:821)
            at hudson.FilePath.delete(FilePath.java:1126)
            at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
            at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
            at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:710)
            at hudson.model.Build$RunnerImpl.build(Build.java:178)
            at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
            at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:480)
            at hudson.model.Run.run(Run.java:1438)
            at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            at hudson.model.ResourceController.execute(ResourceController.java:88)
            at hudson.model.Executor.run(Executor.java:239)
            Caused by: hudson.remoting.ChannelClosedException: channel is already closed

            shobhad Shobha Dashottar added a comment - We have been facing this issue for some time. Yesterday, we upgraded to 1.466 and the issue persists. Jenkins master is running on Windows and the slaves are mainly Windows but a few are Linux. The issue randomly appears on any slave. Thanks Shobha script file c:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\hudson920936561807305456.bat hudson.util.IOException2: remote file operation failed: c:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\hudson920936561807305456.bat at hudson.remoting.Channel@5f1603a2: Slave123 at hudson.FilePath.act(FilePath.java:835) at hudson.FilePath.act(FilePath.java:821) at hudson.FilePath.delete(FilePath.java:1126) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:710) at hudson.model.Build$RunnerImpl.build(Build.java:178) at hudson.model.Build$RunnerImpl.doRun(Build.java:139) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:480) at hudson.model.Run.run(Run.java:1438) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:239) Caused by: hudson.remoting.ChannelClosedException: channel is already closed
            c_welch Chris Welch added a comment -

            We consistently get this error on a Linux build slave consistently on jobs that have little output and take a long time (typically > 2 hours). We are using ssh and I suspect the problem is due to no traffic on the ssh link for this long period. Using Jenkins 1.434.

            c_welch Chris Welch added a comment - We consistently get this error on a Linux build slave consistently on jobs that have little output and take a long time (typically > 2 hours). We are using ssh and I suspect the problem is due to no traffic on the ssh link for this long period. Using Jenkins 1.434.
            c_welch Chris Welch added a comment -

            We are able to work around the problem by adding:

            ClientAliveInterval 60

            to /etc/ssh/sshd_config on the Jenkins host

            c_welch Chris Welch added a comment - We are able to work around the problem by adding: ClientAliveInterval 60 to /etc/ssh/sshd_config on the Jenkins host
            aikipooh Yury Pukhalsky added a comment - - edited

            Jenkins 1.489 here and it happens too. The master and slave are RHEL5.8.
            The task runs for 15 minutes in my case. The output is being spewn and setting neither ClientAliveInterval and ClientAliveCountMax nor TCPKeepAlive helped.
            It started to happen after i've joined two "execute shell" steps into one. The sockets, processes numbers are well within limits

            The slave appears to exit (crash?). In the slave log there is:

            ...
            Evacuated stdout
            Slave successfully connected and online
            ERROR: Connection terminated
            java.io.IOException: Unexpected termination of the channel
            at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException

            Probably it's somewhere in the JVM entrails? Now i'll try to play with different JVMs and settings as it's a blocking issue for me.

            aikipooh Yury Pukhalsky added a comment - - edited Jenkins 1.489 here and it happens too. The master and slave are RHEL5.8. The task runs for 15 minutes in my case. The output is being spewn and setting neither ClientAliveInterval and ClientAliveCountMax nor TCPKeepAlive helped. It started to happen after i've joined two "execute shell" steps into one. The sockets, processes numbers are well within limits The slave appears to exit (crash?). In the slave log there is: ... Evacuated stdout Slave successfully connected and online ERROR: Connection terminated java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException Probably it's somewhere in the JVM entrails? Now i'll try to play with different JVMs and settings as it's a blocking issue for me.

            Code changed in jenkins
            User: Nicolas De Loof
            Path:
            core/src/main/java/hudson/tasks/CommandInterpreter.java
            http://jenkins-ci.org/commit/jenkins/8e74242d8b961a78d5d498b55e1f3797f92bb8a1
            Log:
            JENKINS-12235 root cause is hidden by by script deletion failure

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Nicolas De Loof Path: core/src/main/java/hudson/tasks/CommandInterpreter.java http://jenkins-ci.org/commit/jenkins/8e74242d8b961a78d5d498b55e1f3797f92bb8a1 Log: JENKINS-12235 root cause is hidden by by script deletion failure
            dogfood dogfood added a comment -

            Integrated in jenkins_main_trunk #2141
            JENKINS-12235 root cause is hidden by by script deletion failure (Revision 8e74242d8b961a78d5d498b55e1f3797f92bb8a1)

            Result = SUCCESS
            Nicolas De Loof : 8e74242d8b961a78d5d498b55e1f3797f92bb8a1
            Files :

            • core/src/main/java/hudson/tasks/CommandInterpreter.java
            dogfood dogfood added a comment - Integrated in jenkins_main_trunk #2141 JENKINS-12235 root cause is hidden by by script deletion failure (Revision 8e74242d8b961a78d5d498b55e1f3797f92bb8a1) Result = SUCCESS Nicolas De Loof : 8e74242d8b961a78d5d498b55e1f3797f92bb8a1 Files : core/src/main/java/hudson/tasks/CommandInterpreter.java
            rb2k Marc Seeger added a comment - - edited

            I get this on Linux -> Linux with Jenkins ver. 1.504
            Different data centers though, so probably not the most stable network connection.

            FATAL: Unable to delete script file /tmp/hudson9103641402954770242.sh
            hudson.util.IOException2: remote file operation failed: /tmp/hudson9103641402954770242.sh at hudson.remoting.Channel@3ce3262f:django
            	at hudson.FilePath.act(FilePath.java:861)
            	at hudson.FilePath.act(FilePath.java:838)
            	at hudson.FilePath.delete(FilePath.java:1223)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
            	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:814)
            	at hudson.model.Build$BuildExecution.build(Build.java:199)
            	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:593)
            	at hudson.model.Run.execute(Run.java:1567)
            	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            	at hudson.model.ResourceController.execute(ResourceController.java:88)
            	at hudson.model.Executor.run(Executor.java:237)
            Caused by: hudson.remoting.ChannelClosedException: channel is already closed
            	at hudson.remoting.Channel.send(Channel.java:494)
            	at hudson.remoting.Request.call(Request.java:129)
            	at hudson.remoting.Channel.call(Channel.java:672)
            	at hudson.FilePath.act(FilePath.java:854)
            	... 13 more
            Caused by: hudson.remoting.Channel$OrderlyShutdown
            	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:850)
            	at hudson.remoting.Channel$2.handle(Channel.java:435)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)
            Caused by: Command close created at
            	at hudson.remoting.Command.<init>(Command.java:56)
            	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:844)
            	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:842)
            	at hudson.remoting.Channel.close(Channel.java:909)
            	at hudson.remoting.Channel.close(Channel.java:892)
            	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:849)
            	... 2 more
            FATAL: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
            hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
            	at hudson.remoting.Request.call(Request.java:174)
            	at hudson.remoting.Channel.call(Channel.java:672)
            	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
            	at $Proxy46.join(Unknown Source)
            	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915)
            	at hudson.Launcher$ProcStarter.join(Launcher.java:360)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
            	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:814)
            	at hudson.model.Build$BuildExecution.build(Build.java:199)
            	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:593)
            	at hudson.model.Run.execute(Run.java:1567)
            	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            	at hudson.model.ResourceController.execute(ResourceController.java:88)
            	at hudson.model.Executor.run(Executor.java:237)
            Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
            	at hudson.remoting.Request.abort(Request.java:299)
            	at hudson.remoting.Channel.terminate(Channel.java:732)
            	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:850)
            	at hudson.remoting.Channel$2.handle(Channel.java:435)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)
            Caused by: hudson.remoting.Channel$OrderlyShutdown
            	... 3 more
            Caused by: Command close created at
            	at hudson.remoting.Command.<init>(Command.java:56)
            	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:844)
            	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:842)
            	at hudson.remoting.Channel.close(Channel.java:909)
            	at hudson.remoting.Channel.close(Channel.java:892)
            	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:849)
            	... 2 more
            

            The file is still around:

            # ls -lash /tmp/hudson9103641402954770242.sh
            4.0K -rw-rw-r-- 1 jenkins jenkins 96 Mar  8 14:50 /tmp/hudson9103641402954770242.sh
            
            rb2k Marc Seeger added a comment - - edited I get this on Linux -> Linux with Jenkins ver. 1.504 Different data centers though, so probably not the most stable network connection. FATAL: Unable to delete script file /tmp/hudson9103641402954770242.sh hudson.util.IOException2: remote file operation failed: /tmp/hudson9103641402954770242.sh at hudson.remoting.Channel@3ce3262f:django at hudson.FilePath.act(FilePath.java:861) at hudson.FilePath.act(FilePath.java:838) at hudson.FilePath.delete(FilePath.java:1223) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:814) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:593) at hudson.model.Run.execute(Run.java:1567) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:237) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:494) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:672) at hudson.FilePath.act(FilePath.java:854) ... 13 more Caused by: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Channel$CloseCommand.execute(Channel.java:850) at hudson.remoting.Channel$2.handle(Channel.java:435) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60) Caused by: Command close created at at hudson.remoting.Command.<init>(Command.java:56) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:844) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:842) at hudson.remoting.Channel.close(Channel.java:909) at hudson.remoting.Channel.close(Channel.java:892) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:849) ... 2 more FATAL: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:672) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at $Proxy46.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:814) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:593) at hudson.model.Run.execute(Run.java:1567) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:237) Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:732) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:850) at hudson.remoting.Channel$2.handle(Channel.java:435) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60) Caused by: hudson.remoting.Channel$OrderlyShutdown ... 3 more Caused by: Command close created at at hudson.remoting.Command.<init>(Command.java:56) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:844) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:842) at hudson.remoting.Channel.close(Channel.java:909) at hudson.remoting.Channel.close(Channel.java:892) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:849) ... 2 more The file is still around: # ls -lash /tmp/hudson9103641402954770242.sh 4.0K -rw-rw-r-- 1 jenkins jenkins 96 Mar 8 14:50 /tmp/hudson9103641402954770242.sh
            erik_purins Erik Purins added a comment -

            We still get this from time to time with debian squeeze master 1.504 jenkins, osx 10.8 client. It would be nice to resolve this, or at least handle this common error, with less stack trace, more human-readable text. Alternately, it would be nice if we could have a more fault-tolerant delete temporary file command, that either retries or schedules a cleanup of the temp file when it can.

            erik_purins Erik Purins added a comment - We still get this from time to time with debian squeeze master 1.504 jenkins, osx 10.8 client. It would be nice to resolve this, or at least handle this common error, with less stack trace, more human-readable text. Alternately, it would be nice if we could have a more fault-tolerant delete temporary file command, that either retries or schedules a cleanup of the temp file when it can.
            shinsato shinsato added a comment -

            Have seen this issue for some time intermittently. It might make the jenkins slave feature unusable for many applications if we can't find a workaround.

            shinsato shinsato added a comment - Have seen this issue for some time intermittently. It might make the jenkins slave feature unusable for many applications if we can't find a workaround.
            x29a x29a added a comment - - edited

            Unfortunately, i experience this (very prominent!) problem as well. Here are some more infos and a possible workaround.

            Setup:

            • Jenkins 1.512 on tomcat 6.0.36, in a VirtualBox Windows 7 Guest, JRE 1.7.0
            • Slave connected via JNLP client in a VirtualBox Windows 7 Guest, JRE 1.7.0

            The Slave is started via:
            java -Xmx512m -jar slave.jar -jnlpUrl http://jenkins/computer/slave/slave-agent.jnlp in order to see the log messages

            First off, setting various values to these variables (in catalina on tomcat) did not seem to improve the behaviour:
            -Dhudson.remoting.Launcher.pingTimeoutSec
            -Dhudson.remoting.Launcher.pingIntervalSec
            -Dhudson.slaves.ChannelPinger.pingInterval

            I was getting the "channel already closed" exception quite frequently and mostly at the same spot during script execution. The job (between 12h and 16h) on the slave (via windows batch file) generates large amounts of documentation via doxygen and pipes the output into a logfile, so it uses quite some CPU and does not echo progress. Throttling the CPU so that the NIC wont suffer from the overload, did not help the problem though. Also, i performed continuous pings to the slave (from the master and back) and ping requests only seldomly failed (normal network tolerances).

            To say this first: allthough jenkins failed with the above mentioned exception, the slave continued to perform its job "in the background", so if the exception came after 1h, i would see the updated documentation after 16h allthough jenkins already declared the job as failed.

            For the chronology, these are the log excerpts:

            In the live console on the jenkins WebUI i see (THE FIRST LINE IS THE LAST OUTPUT BY MY SCRIPT):

            Jenkins WebUI
            2013-05-03 18:36:50 - Processing: documentationA
            FATAL: Unable to delete script file c:\temp\hudson3125329676016517230.bat
            hudson.util.IOException2: remote file operation failed: c:\temp\hudson3125329676016517230.bat at hudson.remoting.Channel@1f12b9f:slave
            	at hudson.FilePath.act(FilePath.java:900)
            	at hudson.FilePath.act(FilePath.java:877)
            	at hudson.FilePath.delete(FilePath.java:1262)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
            	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:802)
            	at hudson.model.Build$BuildExecution.build(Build.java:199)
            	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584)
            	at hudson.model.Run.execute(Run.java:1575)
            	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            	at hudson.model.ResourceController.execute(ResourceController.java:88)
            	at hudson.model.Executor.run(Executor.java:237)
            Caused by: hudson.remoting.ChannelClosedException: channel is already closed
            	at hudson.remoting.Channel.send(Channel.java:494)
            	at hudson.remoting.Request.call(Request.java:129)
            	at hudson.remoting.Channel.call(Channel.java:672)
            	at hudson.FilePath.act(FilePath.java:893)
            	... 13 more
            Caused by: java.io.IOException: Unexpected termination of the channel
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException
            	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
            	at java.io.ObjectInputStream.readObject0(Unknown Source)
            	at java.io.ObjectInputStream.readObject(Unknown Source)
            	at hudson.remoting.Command.readFrom(Command.java:92)
            	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
            hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
            	at hudson.remoting.Request.call(Request.java:174)
            	at hudson.remoting.Channel.call(Channel.java:672)
            	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
            	at $Proxy52.join(Unknown Source)
            	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915)
            	at hudson.Launcher$ProcStarter.join(Launcher.java:360)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
            	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:802)
            	at hudson.model.Build$BuildExecution.build(Build.java:199)
            	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584)
            	at hudson.model.Run.execute(Run.java:1575)
            	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            	at hudson.model.ResourceController.execute(ResourceController.java:88)
            	at hudson.model.Executor.run(Executor.java:237)
            Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
            	at hudson.remoting.Request.abort(Request.java:299)
            	at hudson.remoting.Channel.terminate(Channel.java:732)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
            Caused by: java.io.IOException: Unexpected termination of the channel
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException
            	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
            	at java.io.ObjectInputStream.readObject0(Unknown Source)
            	at java.io.ObjectInputStream.readObject(Unknown Source)
            	at hudson.remoting.Command.readFrom(Command.java:92)
            	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            

            In the JNLP clientlog on my slave i got:

            JNLP Client on slave
            Mai 03, 2013 7:16:43 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
            SEVERE: I/O error in channel channel
            java.net.SocketTimeoutException: Read timed out
                    at java.net.SocketInputStream.socketRead0(Native Method)
                    at java.net.SocketInputStream.read(Unknown Source)
                    at java.net.SocketInputStream.read(Unknown Source)
                    at java.io.BufferedInputStream.fill(Unknown Source)
                    at java.io.BufferedInputStream.read(Unknown Source)
                    at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
                    at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
                    at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
                    at java.io.ObjectInputStream.readObject0(Unknown Source)
                    at java.io.ObjectInputStream.readObject(Unknown Source)
                    at hudson.remoting.Command.readFrom(Command.java:92)
                    at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            
            Mai 03, 2013 7:16:43 PM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Terminated
            Mai 03, 2013 7:16:56 PM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Locating server among [http://jenkins/]
            

            And on the Tomcat server:

            Tomcat server
            May 03, 2013 7:16:42 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
            SEVERE: I/O error in channel slave
            java.io.IOException: Unexpected termination of the channel
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException
                    at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
                    at java.io.ObjectInputStream.readObject0(Unknown Source)
                    at java.io.ObjectInputStream.readObject(Unknown Source)
                    at hudson.remoting.Command.readFrom(Command.java:92)
                    at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            
            May 03, 2013 7:16:42 PM jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed
            WARNING: Channel reader thread: slave for + slave terminated
            java.io.IOException: Unexpected termination of the channel
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException
                    at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
                    at java.io.ObjectInputStream.readObject0(Unknown Source)
                    at java.io.ObjectInputStream.readObject(Unknown Source)
                    at hudson.remoting.Command.readFrom(Command.java:92)
                    at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
                    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            
            May 03, 2013 7:16:53 PM hudson.TcpSlaveAgentListener$ConnectionHandler run
            INFO: Accepted connection #78 from /10.0.0.2:49300
            

            It seemed, that the channel gets closed, when there is no data going through the connection (hence playing with the ping settings mentioned above). I cant definately say how long it would stay open, but in the case shown above, it was about 45min without output. Therefor i modified the script to call doxygen in a thread and output a "." every 15s. So far, no more closed channels If you cant modify your script to generate continuous output, maybe pipe your command (in the batch file) to some program which outputs the output or triggers a continuous output. Also, i noticed that the dots generated from my script modification are not shown in the WebUI until a newline is sent. Nevertheless, the channel did not get closed.

            I hope that this investigation delivers some clues to fix this problem and make distributed working with jenkins more stable!

            x29a x29a added a comment - - edited Unfortunately, i experience this (very prominent!) problem as well. Here are some more infos and a possible workaround. Setup: Jenkins 1.512 on tomcat 6.0.36, in a VirtualBox Windows 7 Guest, JRE 1.7.0 Slave connected via JNLP client in a VirtualBox Windows 7 Guest, JRE 1.7.0 The Slave is started via: java -Xmx512m -jar slave.jar -jnlpUrl http://jenkins/computer/slave/slave-agent.jnlp in order to see the log messages First off, setting various values to these variables (in catalina on tomcat) did not seem to improve the behaviour: -Dhudson.remoting.Launcher.pingTimeoutSec -Dhudson.remoting.Launcher.pingIntervalSec -Dhudson.slaves.ChannelPinger.pingInterval I was getting the "channel already closed" exception quite frequently and mostly at the same spot during script execution. The job (between 12h and 16h) on the slave (via windows batch file) generates large amounts of documentation via doxygen and pipes the output into a logfile, so it uses quite some CPU and does not echo progress. Throttling the CPU so that the NIC wont suffer from the overload, did not help the problem though. Also, i performed continuous pings to the slave (from the master and back) and ping requests only seldomly failed (normal network tolerances). To say this first: allthough jenkins failed with the above mentioned exception, the slave continued to perform its job "in the background", so if the exception came after 1h, i would see the updated documentation after 16h allthough jenkins already declared the job as failed. For the chronology, these are the log excerpts: In the live console on the jenkins WebUI i see ( THE FIRST LINE IS THE LAST OUTPUT BY MY SCRIPT ): Jenkins WebUI 2013-05-03 18:36:50 - Processing: documentationA FATAL: Unable to delete script file c:\temp\hudson3125329676016517230.bat hudson.util.IOException2: remote file operation failed: c:\temp\hudson3125329676016517230.bat at hudson.remoting.Channel@1f12b9f:slave at hudson.FilePath.act(FilePath.java:900) at hudson.FilePath.act(FilePath.java:877) at hudson.FilePath.delete(FilePath.java:1262) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:802) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584) at hudson.model.Run.execute(Run.java:1575) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:237) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:494) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:672) at hudson.FilePath.act(FilePath.java:893) ... 13 more Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:672) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at $Proxy52.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:802) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584) at hudson.model.Run.execute(Run.java:1575) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:237) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:732) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) In the JNLP clientlog on my slave i got: JNLP Client on slave Mai 03, 2013 7:16:43 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run SEVERE: I/O error in channel channel java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) Mai 03, 2013 7:16:43 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Mai 03, 2013 7:16:56 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http: //jenkins/] And on the Tomcat server: Tomcat server May 03, 2013 7:16:42 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run SEVERE: I/O error in channel slave java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) May 03, 2013 7:16:42 PM jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed WARNING: Channel reader thread: slave for + slave terminated java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) May 03, 2013 7:16:53 PM hudson.TcpSlaveAgentListener$ConnectionHandler run INFO: Accepted connection #78 from /10.0.0.2:49300 It seemed, that the channel gets closed, when there is no data going through the connection (hence playing with the ping settings mentioned above). I cant definately say how long it would stay open, but in the case shown above, it was about 45min without output. Therefor i modified the script to call doxygen in a thread and output a "." every 15s. So far, no more closed channels If you cant modify your script to generate continuous output, maybe pipe your command (in the batch file) to some program which outputs the output or triggers a continuous output. Also, i noticed that the dots generated from my script modification are not shown in the WebUI until a newline is sent. Nevertheless, the channel did not get closed. I hope that this investigation delivers some clues to fix this problem and make distributed working with jenkins more stable!

            I got a this problem everyday.

            Jenkins 1.518 on tomcat 7.0.35, in Windows XP, JRE 1.7.0_13
            slave PC : windows xp 32bit.

            FATAL: Unable to delete script file C:\DOCUME~1\dg\LOCALS~1\Temp\hudson229281249267934971.bat
            hudson.util.IOException2: remote file operation failed: C:\DOCUME~1\dg\LOCALS~1\Temp\hudson229281249267934971.bat at hudson.remoting.Channel@7ccfde:PC_068_LX760
            	at hudson.FilePath.act(FilePath.java:901)
            	at hudson.FilePath.act(FilePath.java:878)
            	at hudson.FilePath.delete(FilePath.java:1263)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
            	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
            	at hudson.model.Build$BuildExecution.build(Build.java:199)
            	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
            	at hudson.model.Run.execute(Run.java:1576)
            	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            	at hudson.model.ResourceController.execute(ResourceController.java:88)
            	at hudson.model.Executor.run(Executor.java:241)
            Caused by: hudson.remoting.ChannelClosedException: channel is already closed
            	at hudson.remoting.Channel.send(Channel.java:494)
            	at hudson.remoting.Request.call(Request.java:129)
            	at hudson.remoting.Channel.call(Channel.java:672)
            	at hudson.FilePath.act(FilePath.java:894)
            	... 13 more
            Caused by: java.net.SocketException: Connection reset
            	at java.net.SocketInputStream.read(Unknown Source)
            	at java.net.SocketInputStream.read(Unknown Source)
            	at java.io.BufferedInputStream.fill(Unknown Source)
            	at java.io.BufferedInputStream.read(Unknown Source)
            	at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
            	at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
            	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
            	at java.io.ObjectInputStream.readObject0(Unknown Source)
            	at java.io.ObjectInputStream.readObject(Unknown Source)
            	at hudson.remoting.Command.readFrom(Command.java:92)
            	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            FATAL: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
            hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
            	at hudson.remoting.Request.call(Request.java:174)
            	at hudson.remoting.Channel.call(Channel.java:672)
            	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
            	at sun.proxy.$Proxy72.join(Unknown Source)
            	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915)
            	at hudson.Launcher$ProcStarter.join(Launcher.java:360)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
            	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
            	at hudson.model.Build$BuildExecution.build(Build.java:199)
            	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
            	at hudson.model.Run.execute(Run.java:1576)
            	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            	at hudson.model.ResourceController.execute(ResourceController.java:88)
            	at hudson.model.Executor.run(Executor.java:241)
            Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
            	at hudson.remoting.Request.abort(Request.java:299)
            	at hudson.remoting.Channel.terminate(Channel.java:732)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
            Caused by: java.net.SocketException: Connection reset
            	at java.net.SocketInputStream.read(Unknown Source)
            	at java.net.SocketInputStream.read(Unknown Source)
            	at java.io.BufferedInputStream.fill(Unknown Source)
            	at java.io.BufferedInputStream.read(Unknown Source)
            	at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
            	at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
            	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
            	at java.io.ObjectInputStream.readObject0(Unknown Source)
            	at java.io.ObjectInputStream.readObject(Unknown Source)
            	at hudson.remoting.Command.readFrom(Command.java:92)
            	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            
            yesing Ryang Woo Park added a comment - I got a this problem everyday. Jenkins 1.518 on tomcat 7.0.35, in Windows XP, JRE 1.7.0_13 slave PC : windows xp 32bit. FATAL: Unable to delete script file C:\DOCUME~1\dg\LOCALS~1\Temp\hudson229281249267934971.bat hudson.util.IOException2: remote file operation failed: C:\DOCUME~1\dg\LOCALS~1\Temp\hudson229281249267934971.bat at hudson.remoting.Channel@7ccfde:PC_068_LX760 at hudson.FilePath.act(FilePath.java:901) at hudson.FilePath.act(FilePath.java:878) at hudson.FilePath.delete(FilePath.java:1263) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1576) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:241) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:494) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:672) at hudson.FilePath.act(FilePath.java:894) ... 13 more Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) FATAL: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:672) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at sun.proxy.$Proxy72.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1576) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:241) Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:732) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

            We are hitting this problem around once a day as well.

            Jenkins: Jenkins ver. 1.517 on Windows Server 2008 R2, 8GB RAM, JRE 1.7.0_21
            Slave PC: Windows 7, 4GB RAM, JRE 1.7.0_21

            What's interesting is that the exception (when it does occur) always seems to happen on the exact same unit test that is executing. That test in particular spawns off a new process and then kills just that process and all child processes. Below is the stack trace:

            16:54:43 FATAL: Unable to delete script file C:\Users\****\AppData\Local\Temp\hudson8255606542971992250.ps1
            16:54:50 hudson.util.IOException2: remote file operation failed: C:\Users\****\AppData\Local\Temp\hudson8255606542971992250.ps1 at hudson.remoting.Channel@fde4a0:scheduler_tests
            16:54:51 	at hudson.FilePath.act(FilePath.java:901)
            16:54:55 	at hudson.FilePath.act(FilePath.java:878)
            16:54:55 	at hudson.FilePath.delete(FilePath.java:1263)
            16:54:55 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101)
            16:55:19 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
            16:55:19 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            16:55:19 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
            16:55:19 	at hudson.model.Build$BuildExecution.build(Build.java:199)
            16:55:19 	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
            16:55:19 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
            16:55:19 	at hudson.model.Run.execute(Run.java:1576)
            16:55:19 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            16:55:19 	at hudson.model.ResourceController.execute(ResourceController.java:88)
            16:55:19 	at hudson.model.Executor.run(Executor.java:241)
            16:55:19 Caused by: hudson.remoting.ChannelClosedException: channel is already closed
            16:55:19 	at hudson.remoting.Channel.send(Channel.java:494)
            16:55:19 	at hudson.remoting.Request.call(Request.java:129)
            16:55:19 	at hudson.remoting.Channel.call(Channel.java:672)
            16:55:19 	at hudson.FilePath.act(FilePath.java:894)
            16:55:19 	... 13 more
            16:55:19 Caused by: java.net.SocketException: Connection reset
            16:55:19 	at java.net.SocketInputStream.read(Unknown Source)
            16:55:19 	at java.net.SocketInputStream.read(Unknown Source)
            16:55:19 	at java.io.BufferedInputStream.fill(Unknown Source)
            16:55:19 	at java.io.BufferedInputStream.read(Unknown Source)
            16:55:19 	at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
            16:55:19 	at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
            16:55:19 	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
            16:55:19 	at java.io.ObjectInputStream.readObject0(Unknown Source)
            16:55:19 	at java.io.ObjectInputStream.readObject(Unknown Source)
            16:55:19 	at hudson.remoting.Command.readFrom(Command.java:92)
            16:55:19 	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
            16:55:19 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            16:55:19 FATAL: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
            16:55:19 hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
            16:55:19 	at hudson.remoting.Request.call(Request.java:174)
            16:55:19 	at hudson.remoting.Channel.call(Channel.java:672)
            16:55:19 	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
            16:55:19 	at com.sun.proxy.$Proxy41.join(Unknown Source)
            16:55:19 	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915)
            16:55:19 	at hudson.Launcher$ProcStarter.join(Launcher.java:360)
            16:55:19 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
            16:55:19 	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
            16:55:19 	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            16:55:19 	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
            16:55:19 	at hudson.model.Build$BuildExecution.build(Build.java:199)
            16:55:19 	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
            16:55:19 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
            16:55:19 	at hudson.model.Run.execute(Run.java:1576)
            16:55:19 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            16:55:19 	at hudson.model.ResourceController.execute(ResourceController.java:88)
            16:55:19 	at hudson.model.Executor.run(Executor.java:241)
            16:55:19 Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
            16:55:19 	at hudson.remoting.Request.abort(Request.java:299)
            16:55:19 	at hudson.remoting.Channel.terminate(Channel.java:732)
            16:55:19 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
            16:55:19 Caused by: java.net.SocketException: Connection reset
            16:55:19 	at java.net.SocketInputStream.read(Unknown Source)
            16:55:19 	at java.net.SocketInputStream.read(Unknown Source)
            16:55:19 	at java.io.BufferedInputStream.fill(Unknown Source)
            16:55:19 	at java.io.BufferedInputStream.read(Unknown Source)
            16:55:19 	at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
            16:55:19 	at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
            16:55:19 	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
            16:55:19 	at java.io.ObjectInputStream.readObject0(Unknown Source)
            16:55:19 	at java.io.ObjectInputStream.readObject(Unknown Source)
            16:55:19 	at hudson.remoting.Command.readFrom(Command.java:92)
            16:55:19 	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
            16:55:19 	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            
            davidriggleman David Riggleman added a comment - We are hitting this problem around once a day as well. Jenkins: Jenkins ver. 1.517 on Windows Server 2008 R2, 8GB RAM, JRE 1.7.0_21 Slave PC: Windows 7, 4GB RAM, JRE 1.7.0_21 What's interesting is that the exception (when it does occur) always seems to happen on the exact same unit test that is executing. That test in particular spawns off a new process and then kills just that process and all child processes. Below is the stack trace: 16:54:43 FATAL: Unable to delete script file C:\Users\****\AppData\Local\Temp\hudson8255606542971992250.ps1 16:54:50 hudson.util.IOException2: remote file operation failed: C:\Users\****\AppData\Local\Temp\hudson8255606542971992250.ps1 at hudson.remoting.Channel@fde4a0:scheduler_tests 16:54:51 at hudson.FilePath.act(FilePath.java:901) 16:54:55 at hudson.FilePath.act(FilePath.java:878) 16:54:55 at hudson.FilePath.delete(FilePath.java:1263) 16:54:55 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) 16:55:19 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) 16:55:19 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) 16:55:19 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) 16:55:19 at hudson.model.Build$BuildExecution.build(Build.java:199) 16:55:19 at hudson.model.Build$BuildExecution.doRun(Build.java:160) 16:55:19 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) 16:55:19 at hudson.model.Run.execute(Run.java:1576) 16:55:19 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) 16:55:19 at hudson.model.ResourceController.execute(ResourceController.java:88) 16:55:19 at hudson.model.Executor.run(Executor.java:241) 16:55:19 Caused by: hudson.remoting.ChannelClosedException: channel is already closed 16:55:19 at hudson.remoting.Channel.send(Channel.java:494) 16:55:19 at hudson.remoting.Request.call(Request.java:129) 16:55:19 at hudson.remoting.Channel.call(Channel.java:672) 16:55:19 at hudson.FilePath.act(FilePath.java:894) 16:55:19 ... 13 more 16:55:19 Caused by: java.net.SocketException: Connection reset 16:55:19 at java.net.SocketInputStream.read(Unknown Source) 16:55:19 at java.net.SocketInputStream.read(Unknown Source) 16:55:19 at java.io.BufferedInputStream.fill(Unknown Source) 16:55:19 at java.io.BufferedInputStream.read(Unknown Source) 16:55:19 at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) 16:55:19 at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) 16:55:19 at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) 16:55:19 at java.io.ObjectInputStream.readObject0(Unknown Source) 16:55:19 at java.io.ObjectInputStream.readObject(Unknown Source) 16:55:19 at hudson.remoting.Command.readFrom(Command.java:92) 16:55:19 at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) 16:55:19 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) 16:55:19 FATAL: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset 16:55:19 hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset 16:55:19 at hudson.remoting.Request.call(Request.java:174) 16:55:19 at hudson.remoting.Channel.call(Channel.java:672) 16:55:19 at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) 16:55:19 at com.sun.proxy.$Proxy41.join(Unknown Source) 16:55:19 at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915) 16:55:19 at hudson.Launcher$ProcStarter.join(Launcher.java:360) 16:55:19 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) 16:55:19 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) 16:55:19 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) 16:55:19 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) 16:55:19 at hudson.model.Build$BuildExecution.build(Build.java:199) 16:55:19 at hudson.model.Build$BuildExecution.doRun(Build.java:160) 16:55:19 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) 16:55:19 at hudson.model.Run.execute(Run.java:1576) 16:55:19 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) 16:55:19 at hudson.model.ResourceController.execute(ResourceController.java:88) 16:55:19 at hudson.model.Executor.run(Executor.java:241) 16:55:19 Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset 16:55:19 at hudson.remoting.Request.abort(Request.java:299) 16:55:19 at hudson.remoting.Channel.terminate(Channel.java:732) 16:55:19 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) 16:55:19 Caused by: java.net.SocketException: Connection reset 16:55:19 at java.net.SocketInputStream.read(Unknown Source) 16:55:19 at java.net.SocketInputStream.read(Unknown Source) 16:55:19 at java.io.BufferedInputStream.fill(Unknown Source) 16:55:19 at java.io.BufferedInputStream.read(Unknown Source) 16:55:19 at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) 16:55:19 at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) 16:55:19 at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) 16:55:19 at java.io.ObjectInputStream.readObject0(Unknown Source) 16:55:19 at java.io.ObjectInputStream.readObject(Unknown Source) 16:55:19 at hudson.remoting.Command.readFrom(Command.java:92) 16:55:19 at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) 16:55:19 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            x29a x29a added a comment -

            Can you try my possible workaround? Or give estimates of how long the individual steps take (and produce no output)? I think we have enough stacktraces and as long as they do not provide more detailed information, it would be better to collect background information on what led to this error like

            • does it fail always at the same spot in the job
            • does that spot take a long time (and how long is that time)
            • is output generated (and propagated back to the jenkins master)
            • does the possible fix work (producing output in between to keep the channel open)

            If you want to provide stacktraces, output from the tomcat server and the jnlp client (both with timestamps) along with the output from the jenkins master could also help.

            x29a x29a added a comment - Can you try my possible workaround? Or give estimates of how long the individual steps take (and produce no output)? I think we have enough stacktraces and as long as they do not provide more detailed information, it would be better to collect background information on what led to this error like does it fail always at the same spot in the job does that spot take a long time (and how long is that time) is output generated (and propagated back to the jenkins master) does the possible fix work (producing output in between to keep the channel open) If you want to provide stacktraces, output from the tomcat server and the jnlp client (both with timestamps) along with the output from the jenkins master could also help.
            dekay Doug Konrad added a comment -

            I was seeing this once or twice a day when our slaves were overloaded. Since I fixed the overload problem, I've only seen it once.

            In addition to fixing the overload, on all our slaves, I made the following changes to /etc/ssh/sshd_config:

            ClientAliveCountMax 99
            ClientAliveInterval 60
            

            On half of the slaves, I also set

            TCPKeepAlive no
            

            (It had been 'yes' on all the slaves.)

            The only failure I've seen since these changes has been on a machine with

            TCPKeepAlive yes
            
            dekay Doug Konrad added a comment - I was seeing this once or twice a day when our slaves were overloaded. Since I fixed the overload problem, I've only seen it once. In addition to fixing the overload, on all our slaves, I made the following changes to /etc/ssh/sshd_config: ClientAliveCountMax 99 ClientAliveInterval 60 On half of the slaves, I also set TCPKeepAlive no (It had been 'yes' on all the slaves.) The only failure I've seen since these changes has been on a machine with TCPKeepAlive yes
            yesing Ryang Woo Park added a comment - - edited

            I changed to connect to slave pc via openssh.
            But it's still occurred.

            same environment as above comment

            Jenkins 1.518 on tomcat 7.0.35, in Windows XP, JRE 1.7.0_13
            slave PC : windows xp 32bit.

            FATAL: Unable to delete script file C:\DOCUME~1\dg\LOCALS~1\Temp\hudson8470529757775576764.bat
            hudson.util.IOException2: remote file operation failed: C:\DOCUME~1\dg\LOCALS~1\Temp\hudson8470529757775576764.bat at hudson.remoting.Channel@1293e35:PC_067_LX760
            	at hudson.FilePath.act(FilePath.java:901)
            	at hudson.FilePath.act(FilePath.java:878)
            	at hudson.FilePath.delete(FilePath.java:1263)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
            	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
            	at hudson.model.Build$BuildExecution.build(Build.java:199)
            	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
            	at hudson.model.Run.execute(Run.java:1576)
            	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            	at hudson.model.ResourceController.execute(ResourceController.java:88)
            	at hudson.model.Executor.run(Executor.java:241)
            Caused by: hudson.remoting.ChannelClosedException: channel is already closed
            	at hudson.remoting.Channel.send(Channel.java:494)
            	at hudson.remoting.Request.call(Request.java:129)
            	at hudson.remoting.Channel.call(Channel.java:672)
            	at hudson.FilePath.act(FilePath.java:894)
            	... 13 more
            Caused by: java.io.IOException: Unexpected termination of the channel
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException
            	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
            	at java.io.ObjectInputStream.readObject0(Unknown Source)
            	at java.io.ObjectInputStream.readObject(Unknown Source)
            	at hudson.remoting.Command.readFrom(Command.java:92)
            	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
            hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
            	at hudson.remoting.Request.call(Request.java:174)
            	at hudson.remoting.Channel.call(Channel.java:672)
            	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
            	at sun.proxy.$Proxy70.join(Unknown Source)
            	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915)
            	at hudson.Launcher$ProcStarter.join(Launcher.java:360)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
            	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
            	at hudson.model.Build$BuildExecution.build(Build.java:199)
            	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
            	at hudson.model.Run.execute(Run.java:1576)
            	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            	at hudson.model.ResourceController.execute(ResourceController.java:88)
            	at hudson.model.Executor.run(Executor.java:241)
            Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
            	at hudson.remoting.Request.abort(Request.java:299)
            	at hudson.remoting.Channel.terminate(Channel.java:732)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
            Caused by: java.io.IOException: Unexpected termination of the channel
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
            Caused by: java.io.EOFException
            	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
            	at java.io.ObjectInputStream.readObject0(Unknown Source)
            	at java.io.ObjectInputStream.readObject(Unknown Source)
            	at hudson.remoting.Command.readFrom(Command.java:92)
            	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            
            yesing Ryang Woo Park added a comment - - edited I changed to connect to slave pc via openssh. But it's still occurred. same environment as above comment Jenkins 1.518 on tomcat 7.0.35, in Windows XP, JRE 1.7.0_13 slave PC : windows xp 32bit. FATAL: Unable to delete script file C:\DOCUME~1\dg\LOCALS~1\Temp\hudson8470529757775576764.bat hudson.util.IOException2: remote file operation failed: C:\DOCUME~1\dg\LOCALS~1\Temp\hudson8470529757775576764.bat at hudson.remoting.Channel@1293e35:PC_067_LX760 at hudson.FilePath.act(FilePath.java:901) at hudson.FilePath.act(FilePath.java:878) at hudson.FilePath.delete(FilePath.java:1263) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1576) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:241) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:494) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:672) at hudson.FilePath.act(FilePath.java:894) ... 13 more Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:672) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at sun.proxy.$Proxy70.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1576) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:241) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:732) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
            rb2k Marc Seeger added a comment - - edited

            I also run into this quite frequently with a job that is long-running and sometimes doesn't print anything to stdout for several minutes

            edit: I just noticed that I've already commented on this issue... it's late...

            rb2k Marc Seeger added a comment - - edited I also run into this quite frequently with a job that is long-running and sometimes doesn't print anything to stdout for several minutes edit: I just noticed that I've already commented on this issue... it's late...
            sanga sanga added a comment -

            We just hit this in 1.509.2 i.e. the current jenkins stable release. And it's causing about half our builds to fail at the moment. So our CI system is at the moment pretty much screwed. Raising to critical

            sanga sanga added a comment - We just hit this in 1.509.2 i.e. the current jenkins stable release. And it's causing about half our builds to fail at the moment. So our CI system is at the moment pretty much screwed. Raising to critical
            rb2k Marc Seeger added a comment -

            A small note:
            this happened for us when the master was heavily overloaded (swapping). I reduced the number of executors on the master and just started a slave with more CPU/RAM to take care of the jobs.

            rb2k Marc Seeger added a comment - A small note: this happened for us when the master was heavily overloaded (swapping). I reduced the number of executors on the master and just started a slave with more CPU/RAM to take care of the jobs.
            sanga sanga added a comment - - edited

            I don't see our master swapping. And our master doesn't do much except for serve web pages and farm jobs out to the slaves. We do have many slaves though (on the order of 60 or so I guess).

            sanga sanga added a comment - - edited I don't see our master swapping. And our master doesn't do much except for serve web pages and farm jobs out to the slaves. We do have many slaves though (on the order of 60 or so I guess).
            guyr Guy Rozendorn added a comment - - edited

            This happens across all our slaves, windows, redhat, ubuntu.
            We have a little bit more than 100 jobs, and we keep logs for the last 10 runs of each job.
            Over the last 10 runs, we've seen this 14 times on Windows, 26 times on the other platforms.
            Everything runs over SSH (cygwin on windows) with the default settings:

            • TCPKeepAlive: yes
            • ClientAliveCountMax: 3
            • ClientAliveInterval: 0

            It doesn't look related to the output - this fails randomly in different steps, some with no output for minutes, some with no output for only a few seconds

            guyr Guy Rozendorn added a comment - - edited This happens across all our slaves, windows, redhat, ubuntu. We have a little bit more than 100 jobs, and we keep logs for the last 10 runs of each job. Over the last 10 runs, we've seen this 14 times on Windows, 26 times on the other platforms. Everything runs over SSH (cygwin on windows) with the default settings: TCPKeepAlive: yes ClientAliveCountMax: 3 ClientAliveInterval: 0 It doesn't look related to the output - this fails randomly in different steps, some with no output for minutes, some with no output for only a few seconds
            guyr Guy Rozendorn added a comment -

            I added a print every ten seconds, still happens:

            ( 2013-29-27 17:29:07 running )
            ( 2013-29-27 17:29:17 running )
            ( 2013-29-27 17:29:27 running )
            ( 2013-29-27 17:29:37 running )
            ( 2013-29-27 17:29:47 running )
            ( 2013-29-27 17:29:57 running )
            ( 2013-30-27 17:30:07 running )
            FATAL: Unable to delete script file C:\Users\Administrator\hudson7142602309142296785.py
            hudson.util.IOException2: remote file operation failed: C:\Users\Administrator\hudson7142602309142296785.py at hudson.remoting.Channel@6e47e0a6:host-ci38
            	at hudson.FilePath.act(FilePath.java:901)
            	at hudson.FilePath.act(FilePath.java:878)
            	at hudson.FilePath.delete(FilePath.java:1263)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
            	at hudson.plugins.templateproject.ProxyBuilder.perform(ProxyBuilder.java:87)
            	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
            	at hudson.model.Build$BuildExecution.build(Build.java:199)
            	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
            	at hudson.model.Run.execute(Run.java:1593)
            	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            	at hudson.model.ResourceController.execute(ResourceController.java:88)
            	at hudson.model.Executor.run(Executor.java:242)
            Caused by: hudson.remoting.ChannelClosedException: channel is already closed
            	at hudson.remoting.Channel.send(Channel.java:524)
            	at hudson.remoting.Request.call(Request.java:129)
            	at hudson.remoting.Channel.call(Channel.java:722)
            	at hudson.FilePath.act(FilePath.java:894)
            	... 14 more
            Caused by: hudson.remoting.Channel$OrderlyShutdown
            	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:900)
            	at hudson.remoting.Channel$2.handle(Channel.java:465)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)
            Caused by: Command close created at
            	at hudson.remoting.Command.<init>(Command.java:56)
            	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:894)
            	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:892)
            	at hudson.remoting.Channel.close(Channel.java:975)
            	at hudson.remoting.Channel.close(Channel.java:958)
            	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:899)
            	... 2 more
            FATAL: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
            hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
            	at hudson.remoting.Request.call(Request.java:174)
            	at hudson.remoting.Channel.call(Channel.java:722)
            	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:162)
            	at sun.proxy.$Proxy38.join(Unknown Source)
            	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915)
            	at hudson.Launcher$ProcStarter.join(Launcher.java:360)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
            	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
            	at hudson.plugins.templateproject.ProxyBuilder.perform(ProxyBuilder.java:87)
            	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
            	at hudson.model.Build$BuildExecution.build(Build.java:199)
            	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
            	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
            	at hudson.model.Run.execute(Run.java:1593)
            	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
            	at hudson.model.ResourceController.execute(ResourceController.java:88)
            	at hudson.model.Executor.run(Executor.java:242)
            Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
            	at hudson.remoting.Request.abort(Request.java:299)
            	at hudson.remoting.Channel.terminate(Channel.java:782)
            	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:900)
            	at hudson.remoting.Channel$2.handle(Channel.java:465)
            	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)
            Caused by: hudson.remoting.Channel$OrderlyShutdown
            	... 3 more
            Caused by: Command close created at
            	at hudson.remoting.Command.<init>(Command.java:56)
            	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:894)
            	at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:892)
            	at hudson.remoting.Channel.close(Channel.java:975)
            	at hudson.remoting.Channel.close(Channel.java:958)
            	at hudson.remoting.Channel$CloseCommand.execute(Channel.java:899)
            	... 2 more
            
            guyr Guy Rozendorn added a comment - I added a print every ten seconds, still happens: ( 2013-29-27 17:29:07 running ) ( 2013-29-27 17:29:17 running ) ( 2013-29-27 17:29:27 running ) ( 2013-29-27 17:29:37 running ) ( 2013-29-27 17:29:47 running ) ( 2013-29-27 17:29:57 running ) ( 2013-30-27 17:30:07 running ) FATAL: Unable to delete script file C:\Users\Administrator\hudson7142602309142296785.py hudson.util.IOException2: remote file operation failed: C:\Users\Administrator\hudson7142602309142296785.py at hudson.remoting.Channel@6e47e0a6:host-ci38 at hudson.FilePath.act(FilePath.java:901) at hudson.FilePath.act(FilePath.java:878) at hudson.FilePath.delete(FilePath.java:1263) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.plugins.templateproject.ProxyBuilder.perform(ProxyBuilder.java:87) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1593) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:242) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:524) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:722) at hudson.FilePath.act(FilePath.java:894) ... 14 more Caused by: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Channel$CloseCommand.execute(Channel.java:900) at hudson.remoting.Channel$2.handle(Channel.java:465) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60) Caused by: Command close created at at hudson.remoting.Command.<init>(Command.java:56) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:894) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:892) at hudson.remoting.Channel.close(Channel.java:975) at hudson.remoting.Channel.close(Channel.java:958) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:899) ... 2 more FATAL: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:722) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:162) at sun.proxy.$Proxy38.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.plugins.templateproject.ProxyBuilder.perform(ProxyBuilder.java:87) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1593) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:242) Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:782) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:900) at hudson.remoting.Channel$2.handle(Channel.java:465) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60) Caused by: hudson.remoting.Channel$OrderlyShutdown ... 3 more Caused by: Command close created at at hudson.remoting.Command.<init>(Command.java:56) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:894) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:892) at hudson.remoting.Channel.close(Channel.java:975) at hudson.remoting.Channel.close(Channel.java:958) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:899) ... 2 more
            rb2k Marc Seeger added a comment -

            We started printing "tweet" every minute or so, still happened.
            Since we took down the numbers of executors on the master (aws m1.large) to 2 and upsized the slave, I haven't seen another crash.
            (not that long of a timespan though)

            rb2k Marc Seeger added a comment - We started printing "tweet" every minute or so, still happened. Since we took down the numbers of executors on the master (aws m1.large) to 2 and upsized the slave, I haven't seen another crash. (not that long of a timespan though)
            guyr Guy Rozendorn added a comment - - edited

            Although our master node has 10 executors (left for tied jobs only), this happens when there's a single job running all over the jenkins instance, and its running (and failing) on a different slave.

            guyr Guy Rozendorn added a comment - - edited Although our master node has 10 executors (left for tied jobs only), this happens when there's a single job running all over the jenkins instance, and its running (and failing) on a different slave.
            sanga sanga added a comment - - edited

            So as suggested earlier in this case, we switched from tcp keepalives to ssh keepalives i.e. set on the slaves (in /etc/ssh/sshd_config):

            #to work-around jenkins slave connection dropouts
            ClientAliveCountMax 10
            ClientAliveInterval 60

            and this appears to have fixed the problem for us.

            sanga sanga added a comment - - edited So as suggested earlier in this case, we switched from tcp keepalives to ssh keepalives i.e. set on the slaves (in /etc/ssh/sshd_config): #to work-around jenkins slave connection dropouts ClientAliveCountMax 10 ClientAliveInterval 60 and this appears to have fixed the problem for us.
            guyr Guy Rozendorn added a comment -

            We set the following on all our slaves:

            ClientAliveCountMax 99
            ClientAliveInterval 60
            TCPKeepAlive no
            

            rebooted them all, and still getting this exception

            guyr Guy Rozendorn added a comment - We set the following on all our slaves: ClientAliveCountMax 99 ClientAliveInterval 60 TCPKeepAlive no rebooted them all, and still getting this exception
            forever_xt Zhijun Xu added a comment -

            @Rozendorn, I think you should set TCPKeepAlive to yes, try it

            forever_xt Zhijun Xu added a comment - @Rozendorn, I think you should set TCPKeepAlive to yes, try it
            guyr Guy Rozendorn added a comment -

            forever_xt, TCPKeepAlive yes is the default, which doesn't work either

            guyr Guy Rozendorn added a comment - forever_xt , TCPKeepAlive yes is the default, which doesn't work either
            sanga sanga added a comment -

            As an update to this, we have a bug in a script of ours which redeployed our build slaves. But even after this (also with both TCP and SSH keep alives enabled) we're occasionally seeing this bug. One possible explanation is that it may be related to the load on the Jenkins master. That's something that's been mentioned earlier in this case as a possible cause and something that we noticed too - Updating from 1.489 to 1.509.2 caused significantly increased load on our jenkins master. So we've given the master some more resources and tweaked jvm opts a bit to see if that improves things at all.

            @Zhijun: out of curiosity, how is the load on your jenkins master? Is it at all swapping?

            sanga sanga added a comment - As an update to this, we have a bug in a script of ours which redeployed our build slaves. But even after this (also with both TCP and SSH keep alives enabled) we're occasionally seeing this bug. One possible explanation is that it may be related to the load on the Jenkins master. That's something that's been mentioned earlier in this case as a possible cause and something that we noticed too - Updating from 1.489 to 1.509.2 caused significantly increased load on our jenkins master. So we've given the master some more resources and tweaked jvm opts a bit to see if that improves things at all. @Zhijun: out of curiosity, how is the load on your jenkins master? Is it at all swapping?
            guyr Guy Rozendorn added a comment -

            Looking at our master while Jenkins is alive (no job is running), Jenkin's java process takes 100% of one of the CPUs

            Tasks:  84 total,   1 running,  83 sleeping,   0 stopped,   0 zombie
            %Cpu(s): 51.4 us,  0.0 sy,  0.0 ni, 48.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
            KiB Mem:   8178392 total,  7231500 used,   946892 free,   381796 buffers
            KiB Swap:  8386556 total,    20888 used,  8365668 free,  5538224 cached
            
              PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
            12077 jenkins   20   0 5229m 873m 7428 S  99.7 10.9  11621:35 java
            

            does anyone here know how/has references on how to debug this?

            guyr Guy Rozendorn added a comment - Looking at our master while Jenkins is alive (no job is running), Jenkin's java process takes 100% of one of the CPUs Tasks: 84 total, 1 running, 83 sleeping, 0 stopped, 0 zombie %Cpu(s): 51.4 us, 0.0 sy, 0.0 ni, 48.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 8178392 total, 7231500 used, 946892 free, 381796 buffers KiB Swap: 8386556 total, 20888 used, 8365668 free, 5538224 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12077 jenkins 20 0 5229m 873m 7428 S 99.7 10.9 11621:35 java does anyone here know how/has references on how to debug this?
            guyr Guy Rozendorn added a comment -

            after:

            • applying the ssh settings mentioned above on all our slaves
            • adding more RAM and CPUs to the master
            • spanning our nightly runs over a longer timeframe so we'll run the minimum number of jobs concurrently as we can

            we're not seeing this issue. however, when we start running jobs in parallel (globally in jenkins, not on slaves (each has only 1 executor), we're seeing this issue

            guyr Guy Rozendorn added a comment - after: applying the ssh settings mentioned above on all our slaves adding more RAM and CPUs to the master spanning our nightly runs over a longer timeframe so we'll run the minimum number of jobs concurrently as we can we're not seeing this issue. however, when we start running jobs in parallel (globally in jenkins, not on slaves (each has only 1 executor), we're seeing this issue
            rb2k Marc Seeger added a comment -

            I just witnessed it live on a slave today.
            Some findings:

            1. Once the slave started failing, following (different) jobs failed too. (Tested 3 jobs, all of them failed with the same error)
            2. Just disconnecting and reconnecting the slave made it work again

            rb2k Marc Seeger added a comment - I just witnessed it live on a slave today. Some findings: 1. Once the slave started failing, following (different) jobs failed too. (Tested 3 jobs, all of them failed with the same error) 2. Just disconnecting and reconnecting the slave made it work again
            guyr Guy Rozendorn added a comment -

            We had some issues in our lab, which forced us to re-install all of our slaves (84 and counting).
            We are still experiencing this issue

            It seems that after this happens, the slave remains connected to Jenkins. However, I can't tell what happens if you try to run another job on it, because we revert the slave VM from snapshot after every run (whether it is successful or not)

            guyr Guy Rozendorn added a comment - We had some issues in our lab, which forced us to re-install all of our slaves (84 and counting). We are still experiencing this issue It seems that after this happens, the slave remains connected to Jenkins. However, I can't tell what happens if you try to run another job on it, because we revert the slave VM from snapshot after every run (whether it is successful or not)
            dannystaple Danny Staple added a comment -

            Ok - I've found something on this today. If you have very "chatty" jobs on the slaves which output a lot of console data, try to log/redirect it to a file - they aren't necessarily the root cause, but make it more prone.

            If a job is running, but quiet, you can unplug a slave network cable for a few seconds, put it back in and things will pretty much continue as before. However- a slave running a chatty job will die with an io error almost immediately.

            If you can redirect to file, you may see a big reduction in these.

            dannystaple Danny Staple added a comment - Ok - I've found something on this today. If you have very "chatty" jobs on the slaves which output a lot of console data, try to log/redirect it to a file - they aren't necessarily the root cause, but make it more prone. If a job is running, but quiet, you can unplug a slave network cable for a few seconds, put it back in and things will pretty much continue as before. However- a slave running a chatty job will die with an io error almost immediately. If you can redirect to file, you may see a big reduction in these.
            guyr Guy Rozendorn added a comment -

            After update all our jobs to yield output every 10 seconds this occurs less frequent, but it still happens few times a week.

            guyr Guy Rozendorn added a comment - After update all our jobs to yield output every 10 seconds this occurs less frequent, but it still happens few times a week.
            jglick Jesse Glick added a comment -

            Essentially a duplicate of JENKINS-1948.

            jglick Jesse Glick added a comment - Essentially a duplicate of JENKINS-1948 .

            People

              Unassigned Unassigned
              dumghen Ghenadie Dumitru
              Votes:
              38 Vote for this issue
              Watchers:
              47 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: