-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Powered by SuggestiMate
Below is the stacktrace.
It happened when I ran two jobs on a master. After running a while, both jobs crashed with this exception.
I think this might be caused by a small flip-flop connectivity of the network, but I didn't noticed any disconnection.
Another cause may be the huge load of jenkins:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25942 hudson 15 0 6902m 5.8g 5720 S 0.3 74.3 401:22.30 java
Does the jenkins runs its own garbage collector at some specified time?
We have to restart every few days because it's getting slower and slower until hangs out.
FATAL: Unable to delete script file /tmp/hudson8303731085225956739.sh
hudson.util.IOException2: remote file operation failed: /tmp/hudson8303731085225956739.sh at hudson.remoting.Channel@30e472f4:build@autom-1
at hudson.FilePath.act(FilePath.java:781)
at hudson.FilePath.act(FilePath.java:767)
at hudson.FilePath.delete(FilePath.java:1022)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:695)
at hudson.model.Build$RunnerImpl.build(Build.java:178)
at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:461)
at hudson.model.Run.run(Run.java:1404)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:230)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:499)
at hudson.remoting.Request.call(Request.java:110)
at hudson.remoting.Channel.call(Channel.java:681)
at hudson.FilePath.act(FilePath.java:774)
... 13 more
Caused by: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.Channel$ReaderThread.run(Channel.java:1115)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)
FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.Request.call(Request.java:149)
at hudson.remoting.Channel.call(Channel.java:681)
at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
at $Proxy29.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:859)
at hudson.Launcher$ProcStarter.join(Launcher.java:345)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:695)
at hudson.model.Build$RunnerImpl.build(Build.java:178)
at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:461)
at hudson.model.Run.run(Run.java:1404)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:230)
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.Request.abort(Request.java:273)
at hudson.remoting.Channel.terminate(Channel.java:732)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:1139)
Caused by: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.Channel$ReaderThread.run(Channel.java:1115)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)
- duplicates
-
JENKINS-1948 Intermittent slave disconnections with secondary symptoms
-
- Resolved
-
- is related to
-
JENKINS-5073 hudson.util.IOException2: Failed to join the process - on a Windows slave
-
- Resolved
-
-
JENKINS-19346 100% on one CPU
-
- Resolved
-
[JENKINS-12235] FATAL, Unable to delete script file, IOException2, remote file operation failed, unexpected termination of channel
Hitting this frequently on windows jenkins slaves. Similar call stack attached.
We have started getting this reciently. Difficult to search but I think about version 1.455.
- If there is no obvious fix can the exception be caught so that it does fail an otherwise successful build?
I think this would be an adequate workaround for most people, atm this issue is causing random builds to fail which is a significant annoyance for the developers.
Thanks.
Rich.
I get this problem even so often on Windows machines. I run Jenkins ver. 1.463
See attached stacktrace.txt
For us, the cause of this error was our build slaves (VMs) running out of memory and self-rebooting.
Disconnects with large stack traces still occurring in latest 1.471. So far have encountered this on Windows, but historically we have seen this also on OSX and Linux. It looks slightly different in my latest, I get a socket reset exception, but failure still first hit in deleting the script file after a long build (over 3 hours).
This is intermittent, but if it's a sign of a client problem (out of memory or whatever), a more useful error caught earlier on would be an improvement.
We have also been seeing this problem intermittently. It is not only Windows for us, but our Suse and Red Hat Linux slaves have also been suffering from the same problem.
Can anything be done to prevent this problem from failing a build? The spurious failures are a distraction.
I encountered the problem these days, my job is running on a Linux slave, due to some reason, there have lots of 'wait' steps ,which need lots of time in the job procedure, I think during those wait steps, the master has no communication with the slave, then the slave(ssh server) disconnects the connection to the master(ssh client), then the problem happened.
So I configured the ssh server send messages to ssh client every minutes to ensure the connection quality, the problem is resolved
We have been facing this issue for some time. Yesterday, we upgraded to 1.466 and the issue persists.
Jenkins master is running on Windows and the slaves are mainly Windows but a few are Linux. The issue randomly appears on any slave.
Thanks
Shobha
script file c:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\hudson920936561807305456.bat
hudson.util.IOException2: remote file operation failed: c:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\hudson920936561807305456.bat at hudson.remoting.Channel@5f1603a2: Slave123
at hudson.FilePath.act(FilePath.java:835)
at hudson.FilePath.act(FilePath.java:821)
at hudson.FilePath.delete(FilePath.java:1126)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:710)
at hudson.model.Build$RunnerImpl.build(Build.java:178)
at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:480)
at hudson.model.Run.run(Run.java:1438)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:239)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
We consistently get this error on a Linux build slave consistently on jobs that have little output and take a long time (typically > 2 hours). We are using ssh and I suspect the problem is due to no traffic on the ssh link for this long period. Using Jenkins 1.434.
We are able to work around the problem by adding:
ClientAliveInterval 60
to /etc/ssh/sshd_config on the Jenkins host
Jenkins 1.489 here and it happens too. The master and slave are RHEL5.8.
The task runs for 15 minutes in my case. The output is being spewn and setting neither ClientAliveInterval and ClientAliveCountMax nor TCPKeepAlive helped.
It started to happen after i've joined two "execute shell" steps into one. The sockets, processes numbers are well within limits
The slave appears to exit (crash?). In the slave log there is:
...
Evacuated stdout
Slave successfully connected and online
ERROR: Connection terminated
java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
Probably it's somewhere in the JVM entrails? Now i'll try to play with different JVMs and settings as it's a blocking issue for me.
Code changed in jenkins
User: Nicolas De Loof
Path:
core/src/main/java/hudson/tasks/CommandInterpreter.java
http://jenkins-ci.org/commit/jenkins/8e74242d8b961a78d5d498b55e1f3797f92bb8a1
Log:
JENKINS-12235 root cause is hidden by by script deletion failure
Integrated in jenkins_main_trunk #2141
JENKINS-12235 root cause is hidden by by script deletion failure (Revision 8e74242d8b961a78d5d498b55e1f3797f92bb8a1)
Result = SUCCESS
Nicolas De Loof : 8e74242d8b961a78d5d498b55e1f3797f92bb8a1
Files :
- core/src/main/java/hudson/tasks/CommandInterpreter.java
I get this on Linux -> Linux with Jenkins ver. 1.504
Different data centers though, so probably not the most stable network connection.
FATAL: Unable to delete script file /tmp/hudson9103641402954770242.sh hudson.util.IOException2: remote file operation failed: /tmp/hudson9103641402954770242.sh at hudson.remoting.Channel@3ce3262f:django at hudson.FilePath.act(FilePath.java:861) at hudson.FilePath.act(FilePath.java:838) at hudson.FilePath.delete(FilePath.java:1223) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:814) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:593) at hudson.model.Run.execute(Run.java:1567) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:237) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:494) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:672) at hudson.FilePath.act(FilePath.java:854) ... 13 more Caused by: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Channel$CloseCommand.execute(Channel.java:850) at hudson.remoting.Channel$2.handle(Channel.java:435) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60) Caused by: Command close created at at hudson.remoting.Command.<init>(Command.java:56) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:844) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:842) at hudson.remoting.Channel.close(Channel.java:909) at hudson.remoting.Channel.close(Channel.java:892) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:849) ... 2 more FATAL: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:672) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at $Proxy46.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:814) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:593) at hudson.model.Run.execute(Run.java:1567) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:237) Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:732) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:850) at hudson.remoting.Channel$2.handle(Channel.java:435) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60) Caused by: hudson.remoting.Channel$OrderlyShutdown ... 3 more Caused by: Command close created at at hudson.remoting.Command.<init>(Command.java:56) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:844) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:842) at hudson.remoting.Channel.close(Channel.java:909) at hudson.remoting.Channel.close(Channel.java:892) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:849) ... 2 more
The file is still around:
# ls -lash /tmp/hudson9103641402954770242.sh 4.0K -rw-rw-r-- 1 jenkins jenkins 96 Mar 8 14:50 /tmp/hudson9103641402954770242.sh
We still get this from time to time with debian squeeze master 1.504 jenkins, osx 10.8 client. It would be nice to resolve this, or at least handle this common error, with less stack trace, more human-readable text. Alternately, it would be nice if we could have a more fault-tolerant delete temporary file command, that either retries or schedules a cleanup of the temp file when it can.
Have seen this issue for some time intermittently. It might make the jenkins slave feature unusable for many applications if we can't find a workaround.
Unfortunately, i experience this (very prominent!) problem as well. Here are some more infos and a possible workaround.
Setup:
- Jenkins 1.512 on tomcat 6.0.36, in a VirtualBox Windows 7 Guest, JRE 1.7.0
- Slave connected via JNLP client in a VirtualBox Windows 7 Guest, JRE 1.7.0
The Slave is started via:
java -Xmx512m -jar slave.jar -jnlpUrl http://jenkins/computer/slave/slave-agent.jnlp in order to see the log messages
First off, setting various values to these variables (in catalina on tomcat) did not seem to improve the behaviour:
-Dhudson.remoting.Launcher.pingTimeoutSec
-Dhudson.remoting.Launcher.pingIntervalSec
-Dhudson.slaves.ChannelPinger.pingInterval
I was getting the "channel already closed" exception quite frequently and mostly at the same spot during script execution. The job (between 12h and 16h) on the slave (via windows batch file) generates large amounts of documentation via doxygen and pipes the output into a logfile, so it uses quite some CPU and does not echo progress. Throttling the CPU so that the NIC wont suffer from the overload, did not help the problem though. Also, i performed continuous pings to the slave (from the master and back) and ping requests only seldomly failed (normal network tolerances).
To say this first: allthough jenkins failed with the above mentioned exception, the slave continued to perform its job "in the background", so if the exception came after 1h, i would see the updated documentation after 16h allthough jenkins already declared the job as failed.
For the chronology, these are the log excerpts:
In the live console on the jenkins WebUI i see (THE FIRST LINE IS THE LAST OUTPUT BY MY SCRIPT):
2013-05-03 18:36:50 - Processing: documentationA FATAL: Unable to delete script file c:\temp\hudson3125329676016517230.bat hudson.util.IOException2: remote file operation failed: c:\temp\hudson3125329676016517230.bat at hudson.remoting.Channel@1f12b9f:slave at hudson.FilePath.act(FilePath.java:900) at hudson.FilePath.act(FilePath.java:877) at hudson.FilePath.delete(FilePath.java:1262) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:802) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584) at hudson.model.Run.execute(Run.java:1575) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:237) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:494) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:672) at hudson.FilePath.act(FilePath.java:893) ... 13 more Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:672) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at $Proxy52.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:802) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:584) at hudson.model.Run.execute(Run.java:1575) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:237) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:732) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
In the JNLP clientlog on my slave i got:
Mai 03, 2013 7:16:43 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel channel
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at hudson.remoting.Command.readFrom(Command.java:92)
at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
Mai 03, 2013 7:16:43 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Mai 03, 2013 7:16:56 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins/]
And on the Tomcat server:
May 03, 2013 7:16:42 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel slave
java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at hudson.remoting.Command.readFrom(Command.java:92)
at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
May 03, 2013 7:16:42 PM jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed
WARNING: Channel reader thread: slave for + slave terminated
java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at hudson.remoting.Command.readFrom(Command.java:92)
at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
May 03, 2013 7:16:53 PM hudson.TcpSlaveAgentListener$ConnectionHandler run
INFO: Accepted connection #78 from /10.0.0.2:49300
It seemed, that the channel gets closed, when there is no data going through the connection (hence playing with the ping settings mentioned above). I cant definately say how long it would stay open, but in the case shown above, it was about 45min without output. Therefor i modified the script to call doxygen in a thread and output a "." every 15s. So far, no more closed channels If you cant modify your script to generate continuous output, maybe pipe your command (in the batch file) to some program which outputs the output or triggers a continuous output. Also, i noticed that the dots generated from my script modification are not shown in the WebUI until a newline is sent. Nevertheless, the channel did not get closed.
I hope that this investigation delivers some clues to fix this problem and make distributed working with jenkins more stable!
I got a this problem everyday.
Jenkins 1.518 on tomcat 7.0.35, in Windows XP, JRE 1.7.0_13
slave PC : windows xp 32bit.
FATAL: Unable to delete script file C:\DOCUME~1\dg\LOCALS~1\Temp\hudson229281249267934971.bat hudson.util.IOException2: remote file operation failed: C:\DOCUME~1\dg\LOCALS~1\Temp\hudson229281249267934971.bat at hudson.remoting.Channel@7ccfde:PC_068_LX760 at hudson.FilePath.act(FilePath.java:901) at hudson.FilePath.act(FilePath.java:878) at hudson.FilePath.delete(FilePath.java:1263) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1576) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:241) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:494) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:672) at hudson.FilePath.act(FilePath.java:894) ... 13 more Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) FATAL: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:672) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at sun.proxy.$Proxy72.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1576) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:241) Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:732) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
We are hitting this problem around once a day as well.
Jenkins: Jenkins ver. 1.517 on Windows Server 2008 R2, 8GB RAM, JRE 1.7.0_21
Slave PC: Windows 7, 4GB RAM, JRE 1.7.0_21
What's interesting is that the exception (when it does occur) always seems to happen on the exact same unit test that is executing. That test in particular spawns off a new process and then kills just that process and all child processes. Below is the stack trace:
16:54:43 FATAL: Unable to delete script file C:\Users\****\AppData\Local\Temp\hudson8255606542971992250.ps1 16:54:50 hudson.util.IOException2: remote file operation failed: C:\Users\****\AppData\Local\Temp\hudson8255606542971992250.ps1 at hudson.remoting.Channel@fde4a0:scheduler_tests 16:54:51 at hudson.FilePath.act(FilePath.java:901) 16:54:55 at hudson.FilePath.act(FilePath.java:878) 16:54:55 at hudson.FilePath.delete(FilePath.java:1263) 16:54:55 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) 16:55:19 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) 16:55:19 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) 16:55:19 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) 16:55:19 at hudson.model.Build$BuildExecution.build(Build.java:199) 16:55:19 at hudson.model.Build$BuildExecution.doRun(Build.java:160) 16:55:19 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) 16:55:19 at hudson.model.Run.execute(Run.java:1576) 16:55:19 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) 16:55:19 at hudson.model.ResourceController.execute(ResourceController.java:88) 16:55:19 at hudson.model.Executor.run(Executor.java:241) 16:55:19 Caused by: hudson.remoting.ChannelClosedException: channel is already closed 16:55:19 at hudson.remoting.Channel.send(Channel.java:494) 16:55:19 at hudson.remoting.Request.call(Request.java:129) 16:55:19 at hudson.remoting.Channel.call(Channel.java:672) 16:55:19 at hudson.FilePath.act(FilePath.java:894) 16:55:19 ... 13 more 16:55:19 Caused by: java.net.SocketException: Connection reset 16:55:19 at java.net.SocketInputStream.read(Unknown Source) 16:55:19 at java.net.SocketInputStream.read(Unknown Source) 16:55:19 at java.io.BufferedInputStream.fill(Unknown Source) 16:55:19 at java.io.BufferedInputStream.read(Unknown Source) 16:55:19 at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) 16:55:19 at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) 16:55:19 at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) 16:55:19 at java.io.ObjectInputStream.readObject0(Unknown Source) 16:55:19 at java.io.ObjectInputStream.readObject(Unknown Source) 16:55:19 at hudson.remoting.Command.readFrom(Command.java:92) 16:55:19 at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) 16:55:19 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) 16:55:19 FATAL: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset 16:55:19 hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset 16:55:19 at hudson.remoting.Request.call(Request.java:174) 16:55:19 at hudson.remoting.Channel.call(Channel.java:672) 16:55:19 at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) 16:55:19 at com.sun.proxy.$Proxy41.join(Unknown Source) 16:55:19 at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915) 16:55:19 at hudson.Launcher$ProcStarter.join(Launcher.java:360) 16:55:19 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) 16:55:19 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) 16:55:19 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) 16:55:19 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) 16:55:19 at hudson.model.Build$BuildExecution.build(Build.java:199) 16:55:19 at hudson.model.Build$BuildExecution.doRun(Build.java:160) 16:55:19 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) 16:55:19 at hudson.model.Run.execute(Run.java:1576) 16:55:19 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) 16:55:19 at hudson.model.ResourceController.execute(ResourceController.java:88) 16:55:19 at hudson.model.Executor.run(Executor.java:241) 16:55:19 Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset 16:55:19 at hudson.remoting.Request.abort(Request.java:299) 16:55:19 at hudson.remoting.Channel.terminate(Channel.java:732) 16:55:19 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) 16:55:19 Caused by: java.net.SocketException: Connection reset 16:55:19 at java.net.SocketInputStream.read(Unknown Source) 16:55:19 at java.net.SocketInputStream.read(Unknown Source) 16:55:19 at java.io.BufferedInputStream.fill(Unknown Source) 16:55:19 at java.io.BufferedInputStream.read(Unknown Source) 16:55:19 at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) 16:55:19 at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) 16:55:19 at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) 16:55:19 at java.io.ObjectInputStream.readObject0(Unknown Source) 16:55:19 at java.io.ObjectInputStream.readObject(Unknown Source) 16:55:19 at hudson.remoting.Command.readFrom(Command.java:92) 16:55:19 at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) 16:55:19 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
Can you try my possible workaround? Or give estimates of how long the individual steps take (and produce no output)? I think we have enough stacktraces and as long as they do not provide more detailed information, it would be better to collect background information on what led to this error like
- does it fail always at the same spot in the job
- does that spot take a long time (and how long is that time)
- is output generated (and propagated back to the jenkins master)
- does the possible fix work (producing output in between to keep the channel open)
If you want to provide stacktraces, output from the tomcat server and the jnlp client (both with timestamps) along with the output from the jenkins master could also help.
I was seeing this once or twice a day when our slaves were overloaded. Since I fixed the overload problem, I've only seen it once.
In addition to fixing the overload, on all our slaves, I made the following changes to /etc/ssh/sshd_config:
ClientAliveCountMax 99 ClientAliveInterval 60
On half of the slaves, I also set
TCPKeepAlive no
(It had been 'yes' on all the slaves.)
The only failure I've seen since these changes has been on a machine with
TCPKeepAlive yes
I changed to connect to slave pc via openssh.
But it's still occurred.
same environment as above comment
Jenkins 1.518 on tomcat 7.0.35, in Windows XP, JRE 1.7.0_13
slave PC : windows xp 32bit.
FATAL: Unable to delete script file C:\DOCUME~1\dg\LOCALS~1\Temp\hudson8470529757775576764.bat hudson.util.IOException2: remote file operation failed: C:\DOCUME~1\dg\LOCALS~1\Temp\hudson8470529757775576764.bat at hudson.remoting.Channel@1293e35:PC_067_LX760 at hudson.FilePath.act(FilePath.java:901) at hudson.FilePath.act(FilePath.java:878) at hudson.FilePath.delete(FilePath.java:1263) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1576) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:241) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:494) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:672) at hudson.FilePath.act(FilePath.java:894) ... 13 more Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:672) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at sun.proxy.$Proxy70.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1576) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:241) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:732) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
I also run into this quite frequently with a job that is long-running and sometimes doesn't print anything to stdout for several minutes
edit: I just noticed that I've already commented on this issue... it's late...
We just hit this in 1.509.2 i.e. the current jenkins stable release. And it's causing about half our builds to fail at the moment. So our CI system is at the moment pretty much screwed. Raising to critical
A small note:
this happened for us when the master was heavily overloaded (swapping). I reduced the number of executors on the master and just started a slave with more CPU/RAM to take care of the jobs.
I don't see our master swapping. And our master doesn't do much except for serve web pages and farm jobs out to the slaves. We do have many slaves though (on the order of 60 or so I guess).
This happens across all our slaves, windows, redhat, ubuntu.
We have a little bit more than 100 jobs, and we keep logs for the last 10 runs of each job.
Over the last 10 runs, we've seen this 14 times on Windows, 26 times on the other platforms.
Everything runs over SSH (cygwin on windows) with the default settings:
- TCPKeepAlive: yes
- ClientAliveCountMax: 3
- ClientAliveInterval: 0
It doesn't look related to the output - this fails randomly in different steps, some with no output for minutes, some with no output for only a few seconds
I added a print every ten seconds, still happens:
( 2013-29-27 17:29:07 running ) ( 2013-29-27 17:29:17 running ) ( 2013-29-27 17:29:27 running ) ( 2013-29-27 17:29:37 running ) ( 2013-29-27 17:29:47 running ) ( 2013-29-27 17:29:57 running ) ( 2013-30-27 17:30:07 running ) FATAL: Unable to delete script file C:\Users\Administrator\hudson7142602309142296785.py hudson.util.IOException2: remote file operation failed: C:\Users\Administrator\hudson7142602309142296785.py at hudson.remoting.Channel@6e47e0a6:host-ci38 at hudson.FilePath.act(FilePath.java:901) at hudson.FilePath.act(FilePath.java:878) at hudson.FilePath.delete(FilePath.java:1263) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.plugins.templateproject.ProxyBuilder.perform(ProxyBuilder.java:87) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1593) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:242) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:524) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:722) at hudson.FilePath.act(FilePath.java:894) ... 14 more Caused by: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Channel$CloseCommand.execute(Channel.java:900) at hudson.remoting.Channel$2.handle(Channel.java:465) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60) Caused by: Command close created at at hudson.remoting.Command.<init>(Command.java:56) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:894) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:892) at hudson.remoting.Channel.close(Channel.java:975) at hudson.remoting.Channel.close(Channel.java:958) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:899) ... 2 more FATAL: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:722) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:162) at sun.proxy.$Proxy38.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:915) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.plugins.templateproject.ProxyBuilder.perform(ProxyBuilder.java:87) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1593) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:242) Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:782) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:900) at hudson.remoting.Channel$2.handle(Channel.java:465) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60) Caused by: hudson.remoting.Channel$OrderlyShutdown ... 3 more Caused by: Command close created at at hudson.remoting.Command.<init>(Command.java:56) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:894) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:892) at hudson.remoting.Channel.close(Channel.java:975) at hudson.remoting.Channel.close(Channel.java:958) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:899) ... 2 more
We started printing "tweet" every minute or so, still happened.
Since we took down the numbers of executors on the master (aws m1.large) to 2 and upsized the slave, I haven't seen another crash.
(not that long of a timespan though)
Although our master node has 10 executors (left for tied jobs only), this happens when there's a single job running all over the jenkins instance, and its running (and failing) on a different slave.
So as suggested earlier in this case, we switched from tcp keepalives to ssh keepalives i.e. set on the slaves (in /etc/ssh/sshd_config):
#to work-around jenkins slave connection dropouts
ClientAliveCountMax 10
ClientAliveInterval 60
and this appears to have fixed the problem for us.
We set the following on all our slaves:
ClientAliveCountMax 99 ClientAliveInterval 60 TCPKeepAlive no
rebooted them all, and still getting this exception
forever_xt, TCPKeepAlive yes is the default, which doesn't work either
As an update to this, we have a bug in a script of ours which redeployed our build slaves. But even after this (also with both TCP and SSH keep alives enabled) we're occasionally seeing this bug. One possible explanation is that it may be related to the load on the Jenkins master. That's something that's been mentioned earlier in this case as a possible cause and something that we noticed too - Updating from 1.489 to 1.509.2 caused significantly increased load on our jenkins master. So we've given the master some more resources and tweaked jvm opts a bit to see if that improves things at all.
@Zhijun: out of curiosity, how is the load on your jenkins master? Is it at all swapping?
Looking at our master while Jenkins is alive (no job is running), Jenkin's java process takes 100% of one of the CPUs
Tasks: 84 total, 1 running, 83 sleeping, 0 stopped, 0 zombie %Cpu(s): 51.4 us, 0.0 sy, 0.0 ni, 48.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 8178392 total, 7231500 used, 946892 free, 381796 buffers KiB Swap: 8386556 total, 20888 used, 8365668 free, 5538224 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12077 jenkins 20 0 5229m 873m 7428 S 99.7 10.9 11621:35 java
does anyone here know how/has references on how to debug this?
after:
- applying the ssh settings mentioned above on all our slaves
- adding more RAM and CPUs to the master
- spanning our nightly runs over a longer timeframe so we'll run the minimum number of jobs concurrently as we can
we're not seeing this issue. however, when we start running jobs in parallel (globally in jenkins, not on slaves (each has only 1 executor), we're seeing this issue
I just witnessed it live on a slave today.
Some findings:
1. Once the slave started failing, following (different) jobs failed too. (Tested 3 jobs, all of them failed with the same error)
2. Just disconnecting and reconnecting the slave made it work again
We had some issues in our lab, which forced us to re-install all of our slaves (84 and counting).
We are still experiencing this issue
It seems that after this happens, the slave remains connected to Jenkins. However, I can't tell what happens if you try to run another job on it, because we revert the slave VM from snapshot after every run (whether it is successful or not)
Ok - I've found something on this today. If you have very "chatty" jobs on the slaves which output a lot of console data, try to log/redirect it to a file - they aren't necessarily the root cause, but make it more prone.
If a job is running, but quiet, you can unplug a slave network cable for a few seconds, put it back in and things will pretty much continue as before. However- a slave running a chatty job will die with an io error almost immediately.
If you can redirect to file, you may see a big reduction in these.
After update all our jobs to yield output every 10 seconds this occurs less frequent, but it still happens few times a week.
This issues is bothering, in special, when running on windows slaves connected via JNLP agent.
In windows slaves case, it seems that the jnlp socket connection is quite sensitive to connection even it's not used at 100%.
Maybe the solution for this is to use a ssh server on windows?
Thanks