-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Platform: PC, OS: Linux
We have a sort of special CI environment where after projects build we execute
them remotely and use hudson to monitor their progress. The remote execution of
these programs take a while and at certain points no output is sent back to the
master for long periods of time. During these long intervals where no output is
sent back (just over 2 hours) I am occasionally seeing the job fail with the
following:
FATAL: command execution failed
hudson.util.IOException2: Failed to join the process
at hudson.Proc$RemoteProc.join(Proc.java:269)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:84)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.model.Build$RunnerImpl.build(Build.java:195)
at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
at hudson.model.Run.run(Run.java:895)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:119)
Caused by: java.util.concurrent.ExecutionException:
hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request$1.get(Request.java:188)
at hudson.remoting.Request$1.get(Request.java:157)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
at hudson.Proc$RemoteProc.join(Proc.java:261)
... 9 more
Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
at hudson.remoting.Request.abort(Request.java:223)
at hudson.remoting.Channel.terminate(Channel.java:528)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:684)
Caused by: java.io.EOFException
at
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:665)
FATAL: Unable to delete script file /tmp/hudson24564.sh
hudson.util.IOException2: remote file operation failed
at hudson.FilePath.act(FilePath.java:544)
at hudson.FilePath.delete(FilePath.java:741)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:94)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
at hudson.model.Build$RunnerImpl.build(Build.java:195)
at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
at hudson.model.Run.run(Run.java:895)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:119)
Caused by: java.io.IOException: already closed
at hudson.remoting.Channel.send(Channel.java:342)
at hudson.remoting.Request.call(Request.java:104)
at hudson.remoting.Channel.call(Channel.java:481)
at hudson.FilePath.act(FilePath.java:541)
... 10 more
FATAL: already closed
java.io.IOException: already closed
at hudson.remoting.Channel.send(Channel.java:342)
at hudson.remoting.Request.call(Request.java:104)
at hudson.remoting.Channel.call(Channel.java:481)
at hudson.Launcher$RemoteLauncher.kill(Launcher.java:466)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:277)
at hudson.model.Run.run(Run.java:895)
at hudson.model.Build.run(Build.java:112)
at hudson.model.ResourceController.execute(ResourceController.java:93)
at hudson.model.Executor.run(Executor.java:119)
However, this is not predictable or reproducible which makes me think it
corresponds to an external event such as GC, or even an network or OS event (eg
TCP Error or Socket timeout). Anyway I thought I would put it up here and see if
anyone else is getting this too.
I am using Hudson ver. 1.293, The master and slave are both RHEL 4
An interesting development occurred when I upgraded recently and then set
hudson.util.ProcessTreeKiller.disable=true. The jobs were still failing but the
underlying process was eventually completing its job successfully (copying a
large MySQL DB if you must know). This is the reason I reported this. This hints
at a bug in hudson's remoting code.
--Chad
- duplicates
-
JENKINS-5073 hudson.util.IOException2: Failed to join the process - on a Windows slave
- Resolved