[JENKINS-3412] For long running jobs (>2 hours) job failing with hudson.util.IOException2: Failed to join the process

Type: Bug
Resolution: Duplicate
Priority: Major
Component/s: core
Labels:
None
Environment:
Platform: PC, OS: Linux

Similar Issues:
Powered by SuggestiMate

Show

We have a sort of special CI environment where after projects build we execute
them remotely and use hudson to monitor their progress. The remote execution of
these programs take a while and at certain points no output is sent back to the
master for long periods of time. During these long intervals where no output is
sent back (just over 2 hours) I am occasionally seeing the job fail with the
following:

FATAL: command execution failed
hudson.util.IOException2: Failed to join the process
at hudson.Proc$RemoteProc.join(Proc.java:269)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:84)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.model.Build$RunnerImpl.build(Build.java:195)
at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
at hudson.model.Run.run(Run.java:895)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:119)
Caused by: java.util.concurrent.ExecutionException:
hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request$1.get(Request.java:188)
at hudson.remoting.Request$1.get(Request.java:157)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
at hudson.Proc$RemoteProc.join(Proc.java:261)
... 9 more
Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request.abort(Request.java:223)
at hudson.remoting.Channel.terminate(Channel.java:528)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:684)
Caused by: java.io.EOFException
at
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:665)
FATAL: Unable to delete script file /tmp/hudson24564.sh
hudson.util.IOException2: remote file operation failed
at hudson.FilePath.act(FilePath.java:544)
at hudson.FilePath.delete(FilePath.java:741)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:94)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.model.Build$RunnerImpl.build(Build.java:195)
at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
at hudson.model.Run.run(Run.java:895)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:119)
Caused by: java.io.IOException: already closed
at hudson.remoting.Channel.send(Channel.java:342)
at hudson.remoting.Request.call(Request.java:104)
at hudson.remoting.Channel.call(Channel.java:481)
at hudson.FilePath.act(FilePath.java:541)
... 10 more
FATAL: already closed
java.io.IOException: already closed
at hudson.remoting.Channel.send(Channel.java:342)
at hudson.remoting.Request.call(Request.java:104)
at hudson.remoting.Channel.call(Channel.java:481)
at hudson.Launcher$RemoteLauncher.kill(Launcher.java:466)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:277)
at hudson.model.Run.run(Run.java:895)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:119)

However, this is not predictable or reproducible which makes me think it
corresponds to an external event such as GC, or even an network or OS event (eg
TCP Error or Socket timeout). Anyway I thought I would put it up here and see if
anyone else is getting this too.

I am using Hudson ver. 1.293, The master and slave are both RHEL 4

An interesting development occurred when I upgraded recently and then set
hudson.util.ProcessTreeKiller.disable=true. The jobs were still failing but the
underlying process was eventually completing its job successfully (copying a
large MySQL DB if you must know). This is the reason I reported this. This hints
at a bug in hudson's remoting code.

--Chad

duplicates

JENKINS-5073 hudson.util.IOException2: Failed to join the process - on a Windows slave

Resolved

Kohsuke Kawaguchi added a comment - 2009-04-04 17:57

If I understand you correctly, Hudson starts a shell (on a slave) and runs your
script, which in turn run ssh and starts a process on yet another machine?

The exception indicates that the link between the master and the slave are
terminated unexpectedly. How does your master and slave talk to each other?

Finally, I didn't follow your reasoning about ProcessTreeKiller and why that
hints a bug in the remoting code.

Kohsuke Kawaguchi added a comment - 2009-04-04 17:57 If I understand you correctly, Hudson starts a shell (on a slave) and runs your script, which in turn run ssh and starts a process on yet another machine? The exception indicates that the link between the master and the slave are terminated unexpectedly. How does your master and slave talk to each other? Finally, I didn't follow your reasoning about ProcessTreeKiller and why that hints a bug in the remoting code.

chad_lyon added a comment - 2009-04-06 07:18

I apologize if I was vague. The job is just a shell execute but it must run on a
particular environment. Thus, it is tied to a slave and that slave is started
via ssh command from master.

The shell script starts by copying tables from a remote data store using mysql
client. One of those tables is very large and takes just over two hours to copy.
While it is copying there is obviously TCP activity between the slave and the
remote data store but the slave doesn't send any logging info back to the master
for the entire two hour+ period. Since upgrading hudson to 1.293 from 1.278. The
connection seems to be getting dropped at some point during this two hour period.

Before I turned off ProcessTreeKiller the underlying mysql transfer was
terminiating with the hudson job. However, now the command started by the
hudson job is completing on the slave but the slave reports failure.

chad_lyon added a comment - 2009-04-06 07:18 I apologize if I was vague. The job is just a shell execute but it must run on a particular environment. Thus, it is tied to a slave and that slave is started via ssh command from master. The shell script starts by copying tables from a remote data store using mysql client. One of those tables is very large and takes just over two hours to copy. While it is copying there is obviously TCP activity between the slave and the remote data store but the slave doesn't send any logging info back to the master for the entire two hour+ period. Since upgrading hudson to 1.293 from 1.278. The connection seems to be getting dropped at some point during this two hour period. Before I turned off ProcessTreeKiller the underlying mysql transfer was terminiating with the hudson job. However, now the command started by the hudson job is completing on the slave but the slave reports failure.

Krystian Nowak added a comment - 2009-05-11 01:51

adding myself as CC

Krystian Nowak added a comment - 2009-05-11 01:51 adding myself as CC

lidiam added a comment - 2009-06-26 17:17

It seems I'm hitting the same problem with just 5 seconds of sleep time (the job
is executing a shell script that in turn calls ant):

check-resources-library:
[echo] Javascript Library 1_2 available = true
[echo] The file is checked at:
/export/home/j2eetest/hudson/workspace/JSF-core/glassfishv3/glassfish/domains/domain1/applications/guessNumber/resources/js/1_2/validator.js
[echo] Image Library 1_2 available = true
[echo] the file is checked at:
/export/home/j2eetest/hudson/workspace/JSF-core/glassfishv3/glassfish/domains/domain1/applications/guessNumber/resources/images/1_2/wave.med.gif
[echo] Sleeping for 5 seconds...
FATAL: command execution failed
hudson.util.IOException2: Failed to join the process
at hudson.Proc$RemoteProc.join(Proc.java:297)
at hudson.Launcher$ProcStarter.join(Launcher.java:274)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:84)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.model.Build$RunnerImpl.build(Build.java:195)
at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
at hudson.model.Run.run(Run.java:928)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:118)
Caused by: java.util.concurrent.ExecutionException:
hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request$1.get(Request.java:188)
at hudson.remoting.Request$1.get(Request.java:157)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
at hudson.Proc$RemoteProc.join(Proc.java:289)
... 10 more
Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request.abort(Request.java:223)
at hudson.remoting.Channel.terminate(Channel.java:558)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:776)
Caused by: java.io.EOFException
at
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:757)
FATAL: Unable to delete script file /tmp/hudson8537360715477296990.sh
hudson.util.IOException2: remote file operation failed
at hudson.FilePath.act(FilePath.java:645)
at hudson.FilePath.act(FilePath.java:633)
at hudson.FilePath.delete(FilePath.java:863)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:94)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.model.Build$RunnerImpl.build(Build.java:195)
at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
at hudson.model.Run.run(Run.java:928)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:118)
Caused by: java.io.IOException: already closed
at hudson.remoting.Channel.send(Channel.java:372)
at hudson.remoting.Request.call(Request.java:104)
at hudson.remoting.Channel.call(Channel.java:511)
at hudson.FilePath.act(FilePath.java:640)
... 11 more
FATAL: already closed
java.io.IOException: already closed
at hudson.remoting.Channel.send(Channel.java:372)
at hudson.remoting.Request.call(Request.java:104)
at hudson.remoting.Channel.call(Channel.java:511)
at hudson.Launcher$RemoteLauncher.kill(Launcher.java:730)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:277)
at hudson.model.Run.run(Run.java:928)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:118)

This job executes fine on solaris but fails on linux RH5.

lidiam added a comment - 2009-06-26 17:17 It seems I'm hitting the same problem with just 5 seconds of sleep time (the job is executing a shell script that in turn calls ant): check-resources-library: [echo] Javascript Library 1_2 available = true [echo] The file is checked at: /export/home/j2eetest/hudson/workspace/JSF-core/glassfishv3/glassfish/domains/domain1/applications/guessNumber/resources/js/1_2/validator.js [echo] Image Library 1_2 available = true [echo] the file is checked at: /export/home/j2eetest/hudson/workspace/JSF-core/glassfishv3/glassfish/domains/domain1/applications/guessNumber/resources/images/1_2/wave.med.gif [echo] Sleeping for 5 seconds... FATAL: command execution failed hudson.util.IOException2: Failed to join the process at hudson.Proc$RemoteProc.join(Proc.java:297) at hudson.Launcher$ProcStarter.join(Launcher.java:274) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:84) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.model.Build$RunnerImpl.build(Build.java:195) at hudson.model.Build$RunnerImpl.doRun(Build.java:151) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272) at hudson.model.Run.run(Run.java:928) at hudson.model.Build.run(Build.java:112) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:118) Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request$1.get(Request.java:188) at hudson.remoting.Request$1.get(Request.java:157) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55) at hudson.Proc$RemoteProc.join(Proc.java:289) ... 10 more Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request.abort(Request.java:223) at hudson.remoting.Channel.terminate(Channel.java:558) at hudson.remoting.Channel$ReaderThread.run(Channel.java:776) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:757) FATAL: Unable to delete script file /tmp/hudson8537360715477296990.sh hudson.util.IOException2: remote file operation failed at hudson.FilePath.act(FilePath.java:645) at hudson.FilePath.act(FilePath.java:633) at hudson.FilePath.delete(FilePath.java:863) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:94) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.model.Build$RunnerImpl.build(Build.java:195) at hudson.model.Build$RunnerImpl.doRun(Build.java:151) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272) at hudson.model.Run.run(Run.java:928) at hudson.model.Build.run(Build.java:112) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:118) Caused by: java.io.IOException: already closed at hudson.remoting.Channel.send(Channel.java:372) at hudson.remoting.Request.call(Request.java:104) at hudson.remoting.Channel.call(Channel.java:511) at hudson.FilePath.act(FilePath.java:640) ... 11 more FATAL: already closed java.io.IOException: already closed at hudson.remoting.Channel.send(Channel.java:372) at hudson.remoting.Request.call(Request.java:104) at hudson.remoting.Channel.call(Channel.java:511) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:730) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:277) at hudson.model.Run.run(Run.java:928) at hudson.model.Build.run(Build.java:112) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:118) This job executes fine on solaris but fails on linux RH5.

lidiam added a comment - 2009-06-26 18:45

adding myself to cc list

lidiam added a comment - 2009-06-26 18:45 adding myself to cc list

jamtur01 added a comment - 2009-07-09 19:41

I am having the same issue on CentOS 5.

....F...............FATAL: rake execution failed
hudson.util.IOException2: Failed to join the process
at hudson.Proc$RemoteProc.join(Proc.java:297)
at hudson.plugins.rake.Rake.perform(Rake.java:101)
at
hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:56)
at hudson.model.Build$RunnerImpl.build(Build.java:195)
at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:271)
at hudson.model.Run.run(Run.java:938)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:118)
Caused by: java.util.concurrent.ExecutionException:
hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request$1.get(Request.java:188)
at hudson.remoting.Request$1.get(Request.java:157)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
at hudson.Proc$RemoteProc.join(Proc.java:289)
... 9 more
Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request.abort(Request.java:223)
at hudson.remoting.Channel.terminate(Channel.java:558)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:776)
Caused by: java.io.EOFException
at
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2570)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:757)
FATAL: already closed
java.io.IOException: already closed
at hudson.remoting.Channel.send(Channel.java:372)
at hudson.remoting.Request.call(Request.java:104)
at hudson.remoting.Channel.call(Channel.java:511)
at hudson.Launcher$RemoteLauncher.kill(Launcher.java:730)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:276)
at hudson.model.Run.run(Run.java:938)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:118)

jamtur01 added a comment - 2009-07-09 19:41 I am having the same issue on CentOS 5. ....F...............FATAL: rake execution failed hudson.util.IOException2: Failed to join the process at hudson.Proc$RemoteProc.join(Proc.java:297) at hudson.plugins.rake.Rake.perform(Rake.java:101) at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:56) at hudson.model.Build$RunnerImpl.build(Build.java:195) at hudson.model.Build$RunnerImpl.doRun(Build.java:151) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:271) at hudson.model.Run.run(Run.java:938) at hudson.model.Build.run(Build.java:112) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:118) Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request$1.get(Request.java:188) at hudson.remoting.Request$1.get(Request.java:157) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55) at hudson.Proc$RemoteProc.join(Proc.java:289) ... 9 more Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request.abort(Request.java:223) at hudson.remoting.Channel.terminate(Channel.java:558) at hudson.remoting.Channel$ReaderThread.run(Channel.java:776) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2570) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368) at hudson.remoting.Channel$ReaderThread.run(Channel.java:757) FATAL: already closed java.io.IOException: already closed at hudson.remoting.Channel.send(Channel.java:372) at hudson.remoting.Request.call(Request.java:104) at hudson.remoting.Channel.call(Channel.java:511) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:730) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:276) at hudson.model.Run.run(Run.java:938) at hudson.model.Build.run(Build.java:112) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:118)

Kohsuke Kawaguchi added a comment - 2009-07-14 17:46

When this happens, the slave log might show some record of why the communication
with the slave JVM failed. Can you please check them?

Kohsuke Kawaguchi added a comment - 2009-07-14 17:46 When this happens, the slave log might show some record of why the communication with the slave JVM failed. Can you please check them?

sits added a comment - 2009-07-15 23:06

I hit this error as well on the slave. The slave log seemed to be basically empty, but the Hudson's main log had this which seemed
to correspond with the slave and time:

16/07/2009 3:32:31 PM hudson.node_monitors.AbstractNodeMonitorDescriptor$Record run
WARNING: Failed to monitor Worker 4 for Free Temp Space
hudson.util.IOException2: remote file operation failed
at hudson.FilePath.act(FilePath.java:548)
at hudson.node_monitors.TemporarySpaceMonitor$1.getFreeSpace(TemporarySpaceMonitor.java:71)
at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:80)
at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:43)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:161)
Caused by: java.io.IOException: Unable to serialize 229391015936
at hudson.remoting.UserRequest.serialize(UserRequest.java:134)
at hudson.remoting.UserRequest.perform(UserRequest.java:100)
at hudson.remoting.UserRequest.perform(UserRequest.java:46)
at hudson.remoting.Request$2.run(Request.java:236)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:54)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.NotSerializableException: hudson.node_monitors.DiskSpaceMonitorDescriptor$DiskSpace
at java.io.ObjectOutputStream.writeObject0(Unknown Source)
at java.io.ObjectOutputStream.writeObject(Unknown Source)
at hudson.remoting.UserRequest._serialize(UserRequest.java:123)
at hudson.remoting.UserRequest.serialize(UserRequest.java:132)
... 10 more

This stacktrace is reported here:

https://hudson.dev.java.net/issues/show_bug.cgi?id=3381, which has been fixed already, however in 1.296, where-as we are running
1.295, so we are updating now. Hopefully that will fix this issue reported here.

sits added a comment - 2009-07-15 23:06 I hit this error as well on the slave. The slave log seemed to be basically empty, but the Hudson's main log had this which seemed to correspond with the slave and time: 16/07/2009 3:32:31 PM hudson.node_monitors.AbstractNodeMonitorDescriptor$Record run WARNING: Failed to monitor Worker 4 for Free Temp Space hudson.util.IOException2: remote file operation failed at hudson.FilePath.act(FilePath.java:548) at hudson.node_monitors.TemporarySpaceMonitor$1.getFreeSpace(TemporarySpaceMonitor.java:71) at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:80) at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:43) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:161) Caused by: java.io.IOException: Unable to serialize 229391015936 at hudson.remoting.UserRequest.serialize(UserRequest.java:134) at hudson.remoting.UserRequest.perform(UserRequest.java:100) at hudson.remoting.UserRequest.perform(UserRequest.java:46) at hudson.remoting.Request$2.run(Request.java:236) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1$1.run(Engine.java:54) at java.lang.Thread.run(Unknown Source) Caused by: java.io.NotSerializableException: hudson.node_monitors.DiskSpaceMonitorDescriptor$DiskSpace at java.io.ObjectOutputStream.writeObject0(Unknown Source) at java.io.ObjectOutputStream.writeObject(Unknown Source) at hudson.remoting.UserRequest._serialize(UserRequest.java:123) at hudson.remoting.UserRequest.serialize(UserRequest.java:132) ... 10 more This stacktrace is reported here: https://hudson.dev.java.net/issues/show_bug.cgi?id=3381 , which has been fixed already, however in 1.296, where-as we are running 1.295, so we are updating now. Hopefully that will fix this issue reported here.

jmboulos added a comment - 2009-09-28 05:34

I am on Hudson 1.319 and am seeing a similar problem. This is not only for long
jobs anymore...this happens after 6 minutes for me. I am running Hudson on a
Fedora Core 6 Linux box, but am doing the builds on a Red Hat Enterprise Linux
server 5.1 slave. It happens intermittently without pattern. I leave it
running all weekend doing a build every 2 hours. During the weekend of about 30

40 builds, it fails 1 time with the following while in the middle of
compilation (then works fine on the next run):

FATAL: command execution failed
hudson.util.IOException2: Failed to join the process
at hudson.Proc$RemoteProc.join(Proc.java:297)
at hudson.Launcher$ProcStarter.join(Launcher.java:275)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:83)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:471)
at hudson.model.Build$RunnerImpl.build(Build.java:157)
at hudson.model.Build$RunnerImpl.doRun(Build.java:113)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:345)
at hudson.model.Run.run(Run.java:1090)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:122)
Caused by: java.util.concurrent.ExecutionException:
hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request$1.get(Request.java:188)
at hudson.remoting.Request$1.get(Request.java:157)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
at hudson.Proc$RemoteProc.join(Proc.java:289)
... 12 more
Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request.abort(Request.java:223)
at hudson.remoting.Channel.terminate(Channel.java:561)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:819)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:800)
FATAL: Unable to delete script file /tmp/hudson5532835365757807889.sh
hudson.util.IOException2: remote file operation failed
at hudson.FilePath.act(FilePath.java:672)
at hudson.FilePath.act(FilePath.java:660)
at hudson.FilePath.delete(FilePath.java:904)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:93)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:471)
at hudson.model.Build$RunnerImpl.build(Build.java:157)
at hudson.model.Build$RunnerImpl.doRun(Build.java:113)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:345)
at hudson.model.Run.run(Run.java:1090)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:122)
Caused by: java.io.IOException: already closed
at hudson.remoting.Channel.send(Channel.java:375)
at hudson.remoting.Request.call(Request.java:104)
at hudson.remoting.Channel.call(Channel.java:514)
at hudson.FilePath.act(FilePath.java:667)
... 13 more
FATAL: already closed
java.io.IOException: already closed
at hudson.remoting.Channel.send(Channel.java:375)
at hudson.remoting.Request.call(Request.java:104)
at hudson.remoting.Channel.call(Channel.java:514)
at hudson.Launcher$RemoteLauncher.kill(Launcher.java:732)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:350)
at hudson.model.Run.run(Run.java:1090)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:122)

jmboulos added a comment - 2009-09-28 05:34 I am on Hudson 1.319 and am seeing a similar problem. This is not only for long jobs anymore...this happens after 6 minutes for me. I am running Hudson on a Fedora Core 6 Linux box, but am doing the builds on a Red Hat Enterprise Linux server 5.1 slave. It happens intermittently without pattern. I leave it running all weekend doing a build every 2 hours. During the weekend of about 30 40 builds, it fails 1 time with the following while in the middle of compilation (then works fine on the next run): FATAL: command execution failed hudson.util.IOException2: Failed to join the process at hudson.Proc$RemoteProc.join(Proc.java:297) at hudson.Launcher$ProcStarter.join(Launcher.java:275) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:83) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:471) at hudson.model.Build$RunnerImpl.build(Build.java:157) at hudson.model.Build$RunnerImpl.doRun(Build.java:113) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:345) at hudson.model.Run.run(Run.java:1090) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:122) Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request$1.get(Request.java:188) at hudson.remoting.Request$1.get(Request.java:157) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55) at hudson.Proc$RemoteProc.join(Proc.java:289) ... 12 more Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request.abort(Request.java:223) at hudson.remoting.Channel.terminate(Channel.java:561) at hudson.remoting.Channel$ReaderThread.run(Channel.java:819) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Channel$ReaderThread.run(Channel.java:800) FATAL: Unable to delete script file /tmp/hudson5532835365757807889.sh hudson.util.IOException2: remote file operation failed at hudson.FilePath.act(FilePath.java:672) at hudson.FilePath.act(FilePath.java:660) at hudson.FilePath.delete(FilePath.java:904) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:93) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:471) at hudson.model.Build$RunnerImpl.build(Build.java:157) at hudson.model.Build$RunnerImpl.doRun(Build.java:113) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:345) at hudson.model.Run.run(Run.java:1090) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:122) Caused by: java.io.IOException: already closed at hudson.remoting.Channel.send(Channel.java:375) at hudson.remoting.Request.call(Request.java:104) at hudson.remoting.Channel.call(Channel.java:514) at hudson.FilePath.act(FilePath.java:667) ... 13 more FATAL: already closed java.io.IOException: already closed at hudson.remoting.Channel.send(Channel.java:375) at hudson.remoting.Request.call(Request.java:104) at hudson.remoting.Channel.call(Channel.java:514) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:732) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:350) at hudson.model.Run.run(Run.java:1090) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:122)

Kirill Evstigneev added a comment - 2009-10-28 06:40

The same effect with Hudson 1.330 on Windows XP, Linux.

Kirill Evstigneev added a comment - 2009-10-28 06:40 The same effect with Hudson 1.330 on Windows XP, Linux.

crbeng added a comment - 2009-10-28 18:07

Been having this problem ever since upgrading from 1.320 to 1.327. See:

http://www.nabble.com/Failed-to-join-process-v1.327-1.328-to25866005.html

Now using hudson 1.329. Master is on fedora core 5. Slaves are on all kinds of
platforms: rhel, windows, hpux, macosx, solaris etc etc.

crbeng added a comment - 2009-10-28 18:07 Been having this problem ever since upgrading from 1.320 to 1.327. See: http://www.nabble.com/Failed-to-join-process-v1.327-1.328-to25866005.html Now using hudson 1.329. Master is on fedora core 5. Slaves are on all kinds of platforms: rhel, windows, hpux, macosx, solaris etc etc.

crbeng added a comment - 2009-10-28 18:21

See https://hudson.dev.java.net/issues/show_bug.cgi?id=4656

crbeng added a comment - 2009-10-28 18:21 See https://hudson.dev.java.net/issues/show_bug.cgi?id=4656

tiainpa added a comment - 2010-01-04 01:35

Using Hudson 1.339 on Windows XP with 5 Windows XP slaves, I'm seeing this error happening constantly with longer test runs.

I'm pretty sure this happens because nothing is coming to the output for a long time from our test system, therefore I would suggest if it is possible to add the possibility to adjust the timeout time manually either to project or node configuration in Hudson?

If someone knows if this can be done with Java command line options I would appreciate that too.

tiainpa added a comment - 2010-01-04 01:35 Using Hudson 1.339 on Windows XP with 5 Windows XP slaves, I'm seeing this error happening constantly with longer test runs. I'm pretty sure this happens because nothing is coming to the output for a long time from our test system, therefore I would suggest if it is possible to add the possibility to adjust the timeout time manually either to project or node configuration in Hudson? If someone knows if this can be done with Java command line options I would appreciate that too.

tiainpa added a comment - 2010-01-20 06:16

I got some new information when I ran one of the slaves in headless mode instead of Java Web Start, and just before this issue was reported in the console output, I saw the following in the command prompt window of the slave:

20.1.2010 14:14:44 hudson.remoting.Engine$2 onDead
INFO: Ping failed. Terminating the socket.
20.1.2010 14:14:44 hudson.remoting.Channel$ReaderThread run
SEVERE: I/O error in channel channel
java.net.SocketException: socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:852)
20.1.2010 14:14:44 hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated

So I think the new ping mechanism of Hudson thinks that the connection is broken and that produces the exception. I've been pinging the machines all day long with Windows XP's ping utility and sometimes, rarely I see that ping times out when I have the timeout value set at 1 second. I wonder if there is a way to manually adjust the timeout value in Hudson?

I don't know the exact timeout value which should work in my environment but it seems that it should be at least bigger 1 second.

tiainpa added a comment - 2010-01-20 06:16 I got some new information when I ran one of the slaves in headless mode instead of Java Web Start, and just before this issue was reported in the console output, I saw the following in the command prompt window of the slave: 20.1.2010 14:14:44 hudson.remoting.Engine$2 onDead INFO: Ping failed. Terminating the socket. 20.1.2010 14:14:44 hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel channel java.net.SocketException: socket closed at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Channel$ReaderThread.run(Channel.java:852) 20.1.2010 14:14:44 hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated So I think the new ping mechanism of Hudson thinks that the connection is broken and that produces the exception. I've been pinging the machines all day long with Windows XP's ping utility and sometimes, rarely I see that ping times out when I have the timeout value set at 1 second. I wonder if there is a way to manually adjust the timeout value in Hudson? I don't know the exact timeout value which should work in my environment but it seems that it should be at least bigger 1 second.

Felix Drueke added a comment - 2010-03-10 02:34

I'm getting the very same error (as described in the initial issue-description) with Hudson
1.345 occasionally (maybe once every 40 builds).
Our server is running on Solaris 9 and I just saw the error happening on a slave running Solaris 10.

Felix Drueke added a comment - 2010-03-10 02:34 I'm getting the very same error (as described in the initial issue-description) with Hudson 1.345 occasionally (maybe once every 40 builds). Our server is running on Solaris 9 and I just saw the error happening on a slave running Solaris 10.

jbauernberger added a comment - 2010-03-11 06:46

Hi,

we are getting the same result (with long running jobs).
We have a mixture of Linux RHEE installations (running version 4 and 5).
hudson version is latest

jbauernberger added a comment - 2010-03-11 06:46 Hi, we are getting the same result (with long running jobs). We have a mixture of Linux RHEE installations (running version 4 and 5). hudson version is latest

mdonohue added a comment - 2010-03-12 13:00

For Hudson, 'latest' changes every week. Could you specify the version number explicitly?

mdonohue added a comment - 2010-03-12 13:00 For Hudson, 'latest' changes every week. Could you specify the version number explicitly?

Hans-Juergen Hafner added a comment - 2010-04-20 03:25

Hi,

we still have this problem, and not only for long running jobs.
Currently we are using Hudson 1.355 running on RHEE.

Hans-Juergen Hafner added a comment - 2010-04-20 03:25 Hi, we still have this problem, and not only for long running jobs. Currently we are using Hudson 1.355 running on RHEE.

njancesk added a comment - 2010-04-22 08:28

I have this same issue running Hudson ver. 1.355 on slave on Solaris 10 Sparc machine using Java 1.5.0 with jobs that take 4+ hours.

I don't have this issue with similiar jobs on Solaris 10 x86, but the job finishes before 4 hours.

njancesk added a comment - 2010-04-22 08:28 I have this same issue running Hudson ver. 1.355 on slave on Solaris 10 Sparc machine using Java 1.5.0 with jobs that take 4+ hours. I don't have this issue with similiar jobs on Solaris 10 x86, but the job finishes before 4 hours.

Jim McCaskey added a comment - 2010-06-17 14:05

FWIW: This seems to be happing with a Windows 2003 Slave as well usining Hudson 1.362. I have seen it before, this is just the first time I tried to track down a solution. Here is what the error looks like on this version of Hudson.

FATAL: command execution failed
hudson.util.IOException2: Failed to join the process
at hudson.Proc$RemoteProc.join(Proc.java:312)
at hudson.Launcher$ProcStarter.join(Launcher.java:280)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:83)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601)
at hudson.model.Build$RunnerImpl.build(Build.java:174)
at hudson.model.Build$RunnerImpl.doRun(Build.java:138)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416)
at hudson.model.Run.run(Run.java:1253)
at hudson.matrix.MatrixRun.run(MatrixRun.java:130)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:124)
Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.Request$1.get(Request.java:218)
at hudson.remoting.Request$1.get(Request.java:172)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
at hudson.Proc$RemoteProc.join(Proc.java:304)
... 12 more
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.Request.abort(Request.java:257)
at hudson.remoting.Channel.terminate(Channel.java:602)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:893)
Caused by: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.Channel$ReaderThread.run(Channel.java:875)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:869)
FATAL: Unable to delete script file C:\DOCUME~1\conman\LOCALS~1\Temp\hudson7729064622458259363.bat
hudson.util.IOException2: remote file operation failed: C:\DOCUME~1\conman\LOCALS~1\Temp\hudson7729064622458259363.bat at hudson.remoting.Channel@1a8aa2c:cmhslave02-win32
at hudson.FilePath.act(FilePath.java:749)
at hudson.FilePath.act(FilePath.java:735)
at hudson.FilePath.delete(FilePath.java:990)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:93)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601)
at hudson.model.Build$RunnerImpl.build(Build.java:174)
at hudson.model.Build$RunnerImpl.doRun(Build.java:138)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416)
at hudson.model.Run.run(Run.java:1253)
at hudson.matrix.MatrixRun.run(MatrixRun.java:130)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:124)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:412)
at hudson.remoting.Request.call(Request.java:105)
at hudson.remoting.Channel.call(Channel.java:555)
at hudson.FilePath.act(FilePath.java:742)
... 13 more
FATAL: channel is already closed
hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:412)
at hudson.remoting.Request.call(Request.java:105)
at hudson.remoting.Channel.call(Channel.java:555)
at hudson.Launcher$RemoteLauncher.kill(Launcher.java:744)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:421)
at hudson.model.Run.run(Run.java:1253)
at hudson.matrix.MatrixRun.run(MatrixRun.java:130)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:124)

Jim McCaskey added a comment - 2010-06-17 14:05 FWIW: This seems to be happing with a Windows 2003 Slave as well usining Hudson 1.362. I have seen it before, this is just the first time I tried to track down a solution. Here is what the error looks like on this version of Hudson. FATAL: command execution failed hudson.util.IOException2: Failed to join the process at hudson.Proc$RemoteProc.join(Proc.java:312) at hudson.Launcher$ProcStarter.join(Launcher.java:280) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:83) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.build(Build.java:174) at hudson.model.Build$RunnerImpl.doRun(Build.java:138) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416) at hudson.model.Run.run(Run.java:1253) at hudson.matrix.MatrixRun.run(MatrixRun.java:130) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:124) Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request$1.get(Request.java:218) at hudson.remoting.Request$1.get(Request.java:172) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55) at hudson.Proc$RemoteProc.join(Proc.java:304) ... 12 more Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:257) at hudson.remoting.Channel.terminate(Channel.java:602) at hudson.remoting.Channel$ReaderThread.run(Channel.java:893) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:875) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:869) FATAL: Unable to delete script file C:\DOCUME~1\conman\LOCALS~1\Temp\hudson7729064622458259363.bat hudson.util.IOException2: remote file operation failed: C:\DOCUME~1\conman\LOCALS~1\Temp\hudson7729064622458259363.bat at hudson.remoting.Channel@1a8aa2c:cmhslave02-win32 at hudson.FilePath.act(FilePath.java:749) at hudson.FilePath.act(FilePath.java:735) at hudson.FilePath.delete(FilePath.java:990) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:93) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.build(Build.java:174) at hudson.model.Build$RunnerImpl.doRun(Build.java:138) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416) at hudson.model.Run.run(Run.java:1253) at hudson.matrix.MatrixRun.run(MatrixRun.java:130) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:124) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:412) at hudson.remoting.Request.call(Request.java:105) at hudson.remoting.Channel.call(Channel.java:555) at hudson.FilePath.act(FilePath.java:742) ... 13 more FATAL: channel is already closed hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:412) at hudson.remoting.Request.call(Request.java:105) at hudson.remoting.Channel.call(Channel.java:555) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:744) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:421) at hudson.model.Run.run(Run.java:1253) at hudson.matrix.MatrixRun.run(MatrixRun.java:130) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:124)

Shrinkhla21 added a comment - 2010-07-07 01:51

The best solution being that there is no ping requests made till the time the slave is running a build.
Or the least requirement would be to somehow increase the timeout on the ping event.

Shrinkhla21 added a comment - 2010-07-07 01:51 The best solution being that there is no ping requests made till the time the slave is running a build. Or the least requirement would be to somehow increase the timeout on the ping event.

Tzuchien added a comment - 2010-07-27 03:03

I am also experiencing exactly the same problem (callstack). Hudson 1.363 on RedHat/Tomcat. Windows XP slaves. I have a matrix job, and each configuration job takes about 1.5 hours. Not all jobs fail, in my last build, 1 out of 25 configuration failed because of this problem.

Tzuchien added a comment - 2010-07-27 03:03 I am also experiencing exactly the same problem (callstack). Hudson 1.363 on RedHat/Tomcat. Windows XP slaves. I have a matrix job, and each configuration job takes about 1.5 hours. Not all jobs fail, in my last build, 1 out of 25 configuration failed because of this problem.

SCM/JIRA link daemon added a comment - 2010-08-02 13:13

Code changed in hudson
User: : kohsuke
Path:
trunk/hudson/main/remoting/src/main/java/hudson/remoting/Channel.java
trunk/hudson/main/remoting/src/main/java/hudson/remoting/ChannelClosedException.java
http://jenkins-ci.org/commit/33537
Log:
[JENKINS-5073 JENKINS-3412] improved the error diagnostics on ChannelClosedException by having it report who/how the connection was closed.

SCM/JIRA link daemon added a comment - 2010-08-02 13:13 Code changed in hudson User: : kohsuke Path: trunk/hudson/main/remoting/src/main/java/hudson/remoting/Channel.java trunk/hudson/main/remoting/src/main/java/hudson/remoting/ChannelClosedException.java http://jenkins-ci.org/commit/33537 Log: [JENKINS-5073 JENKINS-3412] improved the error diagnostics on ChannelClosedException by having it report who/how the connection was closed.

dogfood added a comment - 2010-08-02 16:11

Integrated in hudson_main_trunk #156
[JENKINS-5073 JENKINS-3412] improved the error diagnostics on ChannelClosedException by having it report who/how the connection was closed.

kohsuke :
Files :

/trunk/hudson/main/remoting/src/main/java/hudson/remoting/ChannelClosedException.java
/trunk/hudson/main/remoting/src/main/java/hudson/remoting/Channel.java

dogfood added a comment - 2010-08-02 16:11 Integrated in hudson_main_trunk #156 [JENKINS-5073 JENKINS-3412] improved the error diagnostics on ChannelClosedException by having it report who/how the connection was closed. kohsuke : Files : /trunk/hudson/main/remoting/src/main/java/hudson/remoting/ChannelClosedException.java /trunk/hudson/main/remoting/src/main/java/hudson/remoting/Channel.java

kalpanab added a comment - 2010-08-04 16:59

I did integrate the above fix in our Hudson 1.362 version but I am not seeing the root cause of why Hudson slave connection reset.

kalpanab added a comment - 2010-08-04 16:59 I did integrate the above fix in our Hudson 1.362 version but I am not seeing the root cause of why Hudson slave connection reset.

Kohsuke Kawaguchi added a comment - 2011-02-04 17:00

Can you please report the stack trace?

Kohsuke Kawaguchi added a comment - 2011-02-04 17:00 Can you please report the stack trace?

Kohsuke Kawaguchi added a comment - 2011-02-27 02:07

I'm marking this as a duplicate of ~~JENKINS-5073~~.

Both issues are caused by a lost master/slave communication channel. When it happens while your build is waiting for a forked process to complete, you see this error in the build console.

Kohsuke Kawaguchi added a comment - 2011-02-27 02:07 I'm marking this as a duplicate of JENKINS-5073 . Both issues are caused by a lost master/slave communication channel. When it happens while your build is waiting for a forked process to complete, you see this error in the build console.

Kohsuke Kawaguchi added a comment - 2011-02-27 02:07

Marking as a duplicate.

Kohsuke Kawaguchi added a comment - 2011-02-27 02:07 Marking as a duplicate.

Assignee:: Unassigned

Reporter:: chad_lyon

Votes:: 24 Vote for this issue

Watchers:: 27 Start watching this issue

Created:: 2009-04-02 13:05

Updated:: 2011-02-27 02:07

Resolved:: 2011-02-27 02:07

Jenkins

Details

Description

Attachments

Issue Links

Activity

Collapse comment: Kohsuke Kawaguchi added a comment - 2009-04-04 17:57

Expand comment: Kohsuke Kawaguchi added a comment - 2009-04-04 17:57

Collapse comment: chad_lyon added a comment - 2009-04-06 07:18

Expand comment: chad_lyon added a comment - 2009-04-06 07:18

Collapse comment: Krystian Nowak added a comment - 2009-05-11 01:51

Expand comment: Krystian Nowak added a comment - 2009-05-11 01:51

Collapse comment: lidiam added a comment - 2009-06-26 17:17

Expand comment: lidiam added a comment - 2009-06-26 17:17

Collapse comment: lidiam added a comment - 2009-06-26 18:45

Expand comment: lidiam added a comment - 2009-06-26 18:45

Collapse comment: jamtur01 added a comment - 2009-07-09 19:41

Expand comment: jamtur01 added a comment - 2009-07-09 19:41

Collapse comment: Kohsuke Kawaguchi added a comment - 2009-07-14 17:46

Expand comment: Kohsuke Kawaguchi added a comment - 2009-07-14 17:46

Collapse comment: sits added a comment - 2009-07-15 23:06

Expand comment: sits added a comment - 2009-07-15 23:06

Collapse comment: jmboulos added a comment - 2009-09-28 05:34

Expand comment: jmboulos added a comment - 2009-09-28 05:34

Collapse comment: Kirill Evstigneev added a comment - 2009-10-28 06:40

Expand comment: Kirill Evstigneev added a comment - 2009-10-28 06:40

Collapse comment: crbeng added a comment - 2009-10-28 18:07

Expand comment: crbeng added a comment - 2009-10-28 18:07

Collapse comment: crbeng added a comment - 2009-10-28 18:21

Expand comment: crbeng added a comment - 2009-10-28 18:21

Collapse comment: tiainpa added a comment - 2010-01-04 01:35

Expand comment: tiainpa added a comment - 2010-01-04 01:35

Collapse comment: tiainpa added a comment - 2010-01-20 06:16

Expand comment: tiainpa added a comment - 2010-01-20 06:16

Collapse comment: Felix Drueke added a comment - 2010-03-10 02:34

Expand comment: Felix Drueke added a comment - 2010-03-10 02:34

Collapse comment: jbauernberger added a comment - 2010-03-11 06:46

Expand comment: jbauernberger added a comment - 2010-03-11 06:46

Collapse comment: mdonohue added a comment - 2010-03-12 13:00

Expand comment: mdonohue added a comment - 2010-03-12 13:00

Collapse comment: Hans-Juergen Hafner added a comment - 2010-04-20 03:25

Expand comment: Hans-Juergen Hafner added a comment - 2010-04-20 03:25

Collapse comment: njancesk added a comment - 2010-04-22 08:28

Expand comment: njancesk added a comment - 2010-04-22 08:28

Collapse comment: Jim McCaskey added a comment - 2010-06-17 14:05

Expand comment: Jim McCaskey added a comment - 2010-06-17 14:05

Collapse comment: Shrinkhla21 added a comment - 2010-07-07 01:51

Expand comment: Shrinkhla21 added a comment - 2010-07-07 01:51

Collapse comment: Tzuchien added a comment - 2010-07-27 03:03

Expand comment: Tzuchien added a comment - 2010-07-27 03:03

Collapse comment: SCM/JIRA link daemon added a comment - 2010-08-02 13:13

Expand comment: SCM/JIRA link daemon added a comment - 2010-08-02 13:13

Collapse comment: dogfood added a comment - 2010-08-02 16:11

Expand comment: dogfood added a comment - 2010-08-02 16:11

Collapse comment: kalpanab added a comment - 2010-08-04 16:59

Expand comment: kalpanab added a comment - 2010-08-04 16:59

Collapse comment: Kohsuke Kawaguchi added a comment - 2011-02-04 17:00

Expand comment: Kohsuke Kawaguchi added a comment - 2011-02-04 17:00

Collapse comment: Kohsuke Kawaguchi added a comment - 2011-02-27 02:07

Expand comment: Kohsuke Kawaguchi added a comment - 2011-02-27 02:07

Collapse comment: Kohsuke Kawaguchi added a comment - 2011-02-27 02:07

Expand comment: Kohsuke Kawaguchi added a comment - 2011-02-27 02:07

People

Dates