Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-3412

For long running jobs (>2 hours) job failing with hudson.util.IOException2: Failed to join the process

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • core
    • None
    • Platform: PC, OS: Linux

      We have a sort of special CI environment where after projects build we execute
      them remotely and use hudson to monitor their progress. The remote execution of
      these programs take a while and at certain points no output is sent back to the
      master for long periods of time. During these long intervals where no output is
      sent back (just over 2 hours) I am occasionally seeing the job fail with the
      following:

      FATAL: command execution failed
      hudson.util.IOException2: Failed to join the process
      at hudson.Proc$RemoteProc.join(Proc.java:269)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:84)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      at hudson.model.Build$RunnerImpl.build(Build.java:195)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
      at hudson.model.Run.run(Run.java:895)
      at hudson.model.Build.run(Build.java:112)
      at hudson.model.ResourceController.execute(ResourceController.java:93)
      at hudson.model.Executor.run(Executor.java:119)
      Caused by: java.util.concurrent.ExecutionException:
      hudson.remoting.RequestAbortedException: java.io.EOFException
      at hudson.remoting.Request$1.get(Request.java:188)
      at hudson.remoting.Request$1.get(Request.java:157)
      at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
      at hudson.Proc$RemoteProc.join(Proc.java:261)
      ... 9 more
      Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
      at hudson.remoting.Request.abort(Request.java:223)
      at hudson.remoting.Channel.terminate(Channel.java:528)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:684)
      Caused by: java.io.EOFException
      at
      java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:665)
      FATAL: Unable to delete script file /tmp/hudson24564.sh
      hudson.util.IOException2: remote file operation failed
      at hudson.FilePath.act(FilePath.java:544)
      at hudson.FilePath.delete(FilePath.java:741)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:94)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      at hudson.model.Build$RunnerImpl.build(Build.java:195)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
      at hudson.model.Run.run(Run.java:895)
      at hudson.model.Build.run(Build.java:112)
      at hudson.model.ResourceController.execute(ResourceController.java:93)
      at hudson.model.Executor.run(Executor.java:119)
      Caused by: java.io.IOException: already closed
      at hudson.remoting.Channel.send(Channel.java:342)
      at hudson.remoting.Request.call(Request.java:104)
      at hudson.remoting.Channel.call(Channel.java:481)
      at hudson.FilePath.act(FilePath.java:541)
      ... 10 more
      FATAL: already closed
      java.io.IOException: already closed
      at hudson.remoting.Channel.send(Channel.java:342)
      at hudson.remoting.Request.call(Request.java:104)
      at hudson.remoting.Channel.call(Channel.java:481)
      at hudson.Launcher$RemoteLauncher.kill(Launcher.java:466)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:277)
      at hudson.model.Run.run(Run.java:895)
      at hudson.model.Build.run(Build.java:112)
      at hudson.model.ResourceController.execute(ResourceController.java:93)
      at hudson.model.Executor.run(Executor.java:119)

      However, this is not predictable or reproducible which makes me think it
      corresponds to an external event such as GC, or even an network or OS event (eg
      TCP Error or Socket timeout). Anyway I thought I would put it up here and see if
      anyone else is getting this too.

      I am using Hudson ver. 1.293, The master and slave are both RHEL 4

      An interesting development occurred when I upgraded recently and then set
      hudson.util.ProcessTreeKiller.disable=true. The jobs were still failing but the
      underlying process was eventually completing its job successfully (copying a
      large MySQL DB if you must know). This is the reason I reported this. This hints
      at a bug in hudson's remoting code.

      --Chad

          [JENKINS-3412] For long running jobs (>2 hours) job failing with hudson.util.IOException2: Failed to join the process

          If I understand you correctly, Hudson starts a shell (on a slave) and runs your
          script, which in turn run ssh and starts a process on yet another machine?

          The exception indicates that the link between the master and the slave are
          terminated unexpectedly. How does your master and slave talk to each other?

          Finally, I didn't follow your reasoning about ProcessTreeKiller and why that
          hints a bug in the remoting code.

          Kohsuke Kawaguchi added a comment - If I understand you correctly, Hudson starts a shell (on a slave) and runs your script, which in turn run ssh and starts a process on yet another machine? The exception indicates that the link between the master and the slave are terminated unexpectedly. How does your master and slave talk to each other? Finally, I didn't follow your reasoning about ProcessTreeKiller and why that hints a bug in the remoting code.

          chad_lyon added a comment -

          I apologize if I was vague. The job is just a shell execute but it must run on a
          particular environment. Thus, it is tied to a slave and that slave is started
          via ssh command from master.

          The shell script starts by copying tables from a remote data store using mysql
          client. One of those tables is very large and takes just over two hours to copy.
          While it is copying there is obviously TCP activity between the slave and the
          remote data store but the slave doesn't send any logging info back to the master
          for the entire two hour+ period. Since upgrading hudson to 1.293 from 1.278. The
          connection seems to be getting dropped at some point during this two hour period.

          Before I turned off ProcessTreeKiller the underlying mysql transfer was
          terminiating with the hudson job. However, now the command started by the
          hudson job is completing on the slave but the slave reports failure.

          chad_lyon added a comment - I apologize if I was vague. The job is just a shell execute but it must run on a particular environment. Thus, it is tied to a slave and that slave is started via ssh command from master. The shell script starts by copying tables from a remote data store using mysql client. One of those tables is very large and takes just over two hours to copy. While it is copying there is obviously TCP activity between the slave and the remote data store but the slave doesn't send any logging info back to the master for the entire two hour+ period. Since upgrading hudson to 1.293 from 1.278. The connection seems to be getting dropped at some point during this two hour period. Before I turned off ProcessTreeKiller the underlying mysql transfer was terminiating with the hudson job. However, now the command started by the hudson job is completing on the slave but the slave reports failure.

          adding myself as CC

          Krystian Nowak added a comment - adding myself as CC

          lidiam added a comment -

          It seems I'm hitting the same problem with just 5 seconds of sleep time (the job
          is executing a shell script that in turn calls ant):

          check-resources-library:
          [echo] Javascript Library 1_2 available = true
          [echo] The file is checked at:
          /export/home/j2eetest/hudson/workspace/JSF-core/glassfishv3/glassfish/domains/domain1/applications/guessNumber/resources/js/1_2/validator.js
          [echo] Image Library 1_2 available = true
          [echo] the file is checked at:
          /export/home/j2eetest/hudson/workspace/JSF-core/glassfishv3/glassfish/domains/domain1/applications/guessNumber/resources/images/1_2/wave.med.gif
          [echo] Sleeping for 5 seconds...
          FATAL: command execution failed
          hudson.util.IOException2: Failed to join the process
          at hudson.Proc$RemoteProc.join(Proc.java:297)
          at hudson.Launcher$ProcStarter.join(Launcher.java:274)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:84)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
          at hudson.model.Build$RunnerImpl.build(Build.java:195)
          at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
          at hudson.model.Run.run(Run.java:928)
          at hudson.model.Build.run(Build.java:112)
          at hudson.model.ResourceController.execute(ResourceController.java:93)
          at hudson.model.Executor.run(Executor.java:118)
          Caused by: java.util.concurrent.ExecutionException:
          hudson.remoting.RequestAbortedException: java.io.EOFException
          at hudson.remoting.Request$1.get(Request.java:188)
          at hudson.remoting.Request$1.get(Request.java:157)
          at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
          at hudson.Proc$RemoteProc.join(Proc.java:289)
          ... 10 more
          Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
          at hudson.remoting.Request.abort(Request.java:223)
          at hudson.remoting.Channel.terminate(Channel.java:558)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:776)
          Caused by: java.io.EOFException
          at
          java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
          at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
          at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:757)
          FATAL: Unable to delete script file /tmp/hudson8537360715477296990.sh
          hudson.util.IOException2: remote file operation failed
          at hudson.FilePath.act(FilePath.java:645)
          at hudson.FilePath.act(FilePath.java:633)
          at hudson.FilePath.delete(FilePath.java:863)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:94)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
          at hudson.model.Build$RunnerImpl.build(Build.java:195)
          at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
          at hudson.model.Run.run(Run.java:928)
          at hudson.model.Build.run(Build.java:112)
          at hudson.model.ResourceController.execute(ResourceController.java:93)
          at hudson.model.Executor.run(Executor.java:118)
          Caused by: java.io.IOException: already closed
          at hudson.remoting.Channel.send(Channel.java:372)
          at hudson.remoting.Request.call(Request.java:104)
          at hudson.remoting.Channel.call(Channel.java:511)
          at hudson.FilePath.act(FilePath.java:640)
          ... 11 more
          FATAL: already closed
          java.io.IOException: already closed
          at hudson.remoting.Channel.send(Channel.java:372)
          at hudson.remoting.Request.call(Request.java:104)
          at hudson.remoting.Channel.call(Channel.java:511)
          at hudson.Launcher$RemoteLauncher.kill(Launcher.java:730)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:277)
          at hudson.model.Run.run(Run.java:928)
          at hudson.model.Build.run(Build.java:112)
          at hudson.model.ResourceController.execute(ResourceController.java:93)
          at hudson.model.Executor.run(Executor.java:118)

          This job executes fine on solaris but fails on linux RH5.

          lidiam added a comment - It seems I'm hitting the same problem with just 5 seconds of sleep time (the job is executing a shell script that in turn calls ant): check-resources-library: [echo] Javascript Library 1_2 available = true [echo] The file is checked at: /export/home/j2eetest/hudson/workspace/JSF-core/glassfishv3/glassfish/domains/domain1/applications/guessNumber/resources/js/1_2/validator.js [echo] Image Library 1_2 available = true [echo] the file is checked at: /export/home/j2eetest/hudson/workspace/JSF-core/glassfishv3/glassfish/domains/domain1/applications/guessNumber/resources/images/1_2/wave.med.gif [echo] Sleeping for 5 seconds... FATAL: command execution failed hudson.util.IOException2: Failed to join the process at hudson.Proc$RemoteProc.join(Proc.java:297) at hudson.Launcher$ProcStarter.join(Launcher.java:274) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:84) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.model.Build$RunnerImpl.build(Build.java:195) at hudson.model.Build$RunnerImpl.doRun(Build.java:151) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272) at hudson.model.Run.run(Run.java:928) at hudson.model.Build.run(Build.java:112) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:118) Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request$1.get(Request.java:188) at hudson.remoting.Request$1.get(Request.java:157) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55) at hudson.Proc$RemoteProc.join(Proc.java:289) ... 10 more Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request.abort(Request.java:223) at hudson.remoting.Channel.terminate(Channel.java:558) at hudson.remoting.Channel$ReaderThread.run(Channel.java:776) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:757) FATAL: Unable to delete script file /tmp/hudson8537360715477296990.sh hudson.util.IOException2: remote file operation failed at hudson.FilePath.act(FilePath.java:645) at hudson.FilePath.act(FilePath.java:633) at hudson.FilePath.delete(FilePath.java:863) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:94) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.model.Build$RunnerImpl.build(Build.java:195) at hudson.model.Build$RunnerImpl.doRun(Build.java:151) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272) at hudson.model.Run.run(Run.java:928) at hudson.model.Build.run(Build.java:112) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:118) Caused by: java.io.IOException: already closed at hudson.remoting.Channel.send(Channel.java:372) at hudson.remoting.Request.call(Request.java:104) at hudson.remoting.Channel.call(Channel.java:511) at hudson.FilePath.act(FilePath.java:640) ... 11 more FATAL: already closed java.io.IOException: already closed at hudson.remoting.Channel.send(Channel.java:372) at hudson.remoting.Request.call(Request.java:104) at hudson.remoting.Channel.call(Channel.java:511) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:730) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:277) at hudson.model.Run.run(Run.java:928) at hudson.model.Build.run(Build.java:112) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:118) This job executes fine on solaris but fails on linux RH5.

          lidiam added a comment -

          adding myself to cc list

          lidiam added a comment - adding myself to cc list

          jamtur01 added a comment -

          I am having the same issue on CentOS 5.

          ....F...............FATAL: rake execution failed
          hudson.util.IOException2: Failed to join the process
          at hudson.Proc$RemoteProc.join(Proc.java:297)
          at hudson.plugins.rake.Rake.perform(Rake.java:101)
          at
          hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:56)
          at hudson.model.Build$RunnerImpl.build(Build.java:195)
          at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:271)
          at hudson.model.Run.run(Run.java:938)
          at hudson.model.Build.run(Build.java:112)
          at hudson.model.ResourceController.execute(ResourceController.java:93)
          at hudson.model.Executor.run(Executor.java:118)
          Caused by: java.util.concurrent.ExecutionException:
          hudson.remoting.RequestAbortedException: java.io.EOFException
          at hudson.remoting.Request$1.get(Request.java:188)
          at hudson.remoting.Request$1.get(Request.java:157)
          at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
          at hudson.Proc$RemoteProc.join(Proc.java:289)
          ... 9 more
          Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
          at hudson.remoting.Request.abort(Request.java:223)
          at hudson.remoting.Channel.terminate(Channel.java:558)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:776)
          Caused by: java.io.EOFException
          at
          java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2570)
          at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314)
          at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:757)
          FATAL: already closed
          java.io.IOException: already closed
          at hudson.remoting.Channel.send(Channel.java:372)
          at hudson.remoting.Request.call(Request.java:104)
          at hudson.remoting.Channel.call(Channel.java:511)
          at hudson.Launcher$RemoteLauncher.kill(Launcher.java:730)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:276)
          at hudson.model.Run.run(Run.java:938)
          at hudson.model.Build.run(Build.java:112)
          at hudson.model.ResourceController.execute(ResourceController.java:93)
          at hudson.model.Executor.run(Executor.java:118)

          jamtur01 added a comment - I am having the same issue on CentOS 5. ....F...............FATAL: rake execution failed hudson.util.IOException2: Failed to join the process at hudson.Proc$RemoteProc.join(Proc.java:297) at hudson.plugins.rake.Rake.perform(Rake.java:101) at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:56) at hudson.model.Build$RunnerImpl.build(Build.java:195) at hudson.model.Build$RunnerImpl.doRun(Build.java:151) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:271) at hudson.model.Run.run(Run.java:938) at hudson.model.Build.run(Build.java:112) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:118) Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request$1.get(Request.java:188) at hudson.remoting.Request$1.get(Request.java:157) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55) at hudson.Proc$RemoteProc.join(Proc.java:289) ... 9 more Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request.abort(Request.java:223) at hudson.remoting.Channel.terminate(Channel.java:558) at hudson.remoting.Channel$ReaderThread.run(Channel.java:776) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2570) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1314) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:368) at hudson.remoting.Channel$ReaderThread.run(Channel.java:757) FATAL: already closed java.io.IOException: already closed at hudson.remoting.Channel.send(Channel.java:372) at hudson.remoting.Request.call(Request.java:104) at hudson.remoting.Channel.call(Channel.java:511) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:730) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:276) at hudson.model.Run.run(Run.java:938) at hudson.model.Build.run(Build.java:112) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:118)

          When this happens, the slave log might show some record of why the communication
          with the slave JVM failed. Can you please check them?

          Kohsuke Kawaguchi added a comment - When this happens, the slave log might show some record of why the communication with the slave JVM failed. Can you please check them?

          sits added a comment -

          I hit this error as well on the slave. The slave log seemed to be basically empty, but the Hudson's main log had this which seemed
          to correspond with the slave and time:

          16/07/2009 3:32:31 PM hudson.node_monitors.AbstractNodeMonitorDescriptor$Record run
          WARNING: Failed to monitor Worker 4 for Free Temp Space
          hudson.util.IOException2: remote file operation failed
          at hudson.FilePath.act(FilePath.java:548)
          at hudson.node_monitors.TemporarySpaceMonitor$1.getFreeSpace(TemporarySpaceMonitor.java:71)
          at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:80)
          at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:43)
          at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:161)
          Caused by: java.io.IOException: Unable to serialize 229391015936
          at hudson.remoting.UserRequest.serialize(UserRequest.java:134)
          at hudson.remoting.UserRequest.perform(UserRequest.java:100)
          at hudson.remoting.UserRequest.perform(UserRequest.java:46)
          at hudson.remoting.Request$2.run(Request.java:236)
          at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
          at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
          at java.util.concurrent.FutureTask.run(Unknown Source)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
          at hudson.remoting.Engine$1$1.run(Engine.java:54)
          at java.lang.Thread.run(Unknown Source)
          Caused by: java.io.NotSerializableException: hudson.node_monitors.DiskSpaceMonitorDescriptor$DiskSpace
          at java.io.ObjectOutputStream.writeObject0(Unknown Source)
          at java.io.ObjectOutputStream.writeObject(Unknown Source)
          at hudson.remoting.UserRequest._serialize(UserRequest.java:123)
          at hudson.remoting.UserRequest.serialize(UserRequest.java:132)
          ... 10 more

          This stacktrace is reported here:

          https://hudson.dev.java.net/issues/show_bug.cgi?id=3381, which has been fixed already, however in 1.296, where-as we are running
          1.295, so we are updating now. Hopefully that will fix this issue reported here.

          sits added a comment - I hit this error as well on the slave. The slave log seemed to be basically empty, but the Hudson's main log had this which seemed to correspond with the slave and time: 16/07/2009 3:32:31 PM hudson.node_monitors.AbstractNodeMonitorDescriptor$Record run WARNING: Failed to monitor Worker 4 for Free Temp Space hudson.util.IOException2: remote file operation failed at hudson.FilePath.act(FilePath.java:548) at hudson.node_monitors.TemporarySpaceMonitor$1.getFreeSpace(TemporarySpaceMonitor.java:71) at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:80) at hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:43) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:161) Caused by: java.io.IOException: Unable to serialize 229391015936 at hudson.remoting.UserRequest.serialize(UserRequest.java:134) at hudson.remoting.UserRequest.perform(UserRequest.java:100) at hudson.remoting.UserRequest.perform(UserRequest.java:46) at hudson.remoting.Request$2.run(Request.java:236) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1$1.run(Engine.java:54) at java.lang.Thread.run(Unknown Source) Caused by: java.io.NotSerializableException: hudson.node_monitors.DiskSpaceMonitorDescriptor$DiskSpace at java.io.ObjectOutputStream.writeObject0(Unknown Source) at java.io.ObjectOutputStream.writeObject(Unknown Source) at hudson.remoting.UserRequest._serialize(UserRequest.java:123) at hudson.remoting.UserRequest.serialize(UserRequest.java:132) ... 10 more This stacktrace is reported here: https://hudson.dev.java.net/issues/show_bug.cgi?id=3381 , which has been fixed already, however in 1.296, where-as we are running 1.295, so we are updating now. Hopefully that will fix this issue reported here.

          jmboulos added a comment -

          I am on Hudson 1.319 and am seeing a similar problem. This is not only for long
          jobs anymore...this happens after 6 minutes for me. I am running Hudson on a
          Fedora Core 6 Linux box, but am doing the builds on a Red Hat Enterprise Linux
          server 5.1 slave. It happens intermittently without pattern. I leave it
          running all weekend doing a build every 2 hours. During the weekend of about 30

          • 40 builds, it fails 1 time with the following while in the middle of
            compilation (then works fine on the next run):

          FATAL: command execution failed
          hudson.util.IOException2: Failed to join the process
          at hudson.Proc$RemoteProc.join(Proc.java:297)
          at hudson.Launcher$ProcStarter.join(Launcher.java:275)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:83)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
          at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:471)
          at hudson.model.Build$RunnerImpl.build(Build.java:157)
          at hudson.model.Build$RunnerImpl.doRun(Build.java:113)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:345)
          at hudson.model.Run.run(Run.java:1090)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          at hudson.model.ResourceController.execute(ResourceController.java:93)
          at hudson.model.Executor.run(Executor.java:122)
          Caused by: java.util.concurrent.ExecutionException:
          hudson.remoting.RequestAbortedException: java.io.EOFException
          at hudson.remoting.Request$1.get(Request.java:188)
          at hudson.remoting.Request$1.get(Request.java:157)
          at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
          at hudson.Proc$RemoteProc.join(Proc.java:289)
          ... 12 more
          Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
          at hudson.remoting.Request.abort(Request.java:223)
          at hudson.remoting.Channel.terminate(Channel.java:561)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:819)
          Caused by: java.io.EOFException
          at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
          at java.io.ObjectInputStream.readObject0(Unknown Source)
          at java.io.ObjectInputStream.readObject(Unknown Source)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:800)
          FATAL: Unable to delete script file /tmp/hudson5532835365757807889.sh
          hudson.util.IOException2: remote file operation failed
          at hudson.FilePath.act(FilePath.java:672)
          at hudson.FilePath.act(FilePath.java:660)
          at hudson.FilePath.delete(FilePath.java:904)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:93)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
          at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:471)
          at hudson.model.Build$RunnerImpl.build(Build.java:157)
          at hudson.model.Build$RunnerImpl.doRun(Build.java:113)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:345)
          at hudson.model.Run.run(Run.java:1090)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          at hudson.model.ResourceController.execute(ResourceController.java:93)
          at hudson.model.Executor.run(Executor.java:122)
          Caused by: java.io.IOException: already closed
          at hudson.remoting.Channel.send(Channel.java:375)
          at hudson.remoting.Request.call(Request.java:104)
          at hudson.remoting.Channel.call(Channel.java:514)
          at hudson.FilePath.act(FilePath.java:667)
          ... 13 more
          FATAL: already closed
          java.io.IOException: already closed
          at hudson.remoting.Channel.send(Channel.java:375)
          at hudson.remoting.Request.call(Request.java:104)
          at hudson.remoting.Channel.call(Channel.java:514)
          at hudson.Launcher$RemoteLauncher.kill(Launcher.java:732)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:350)
          at hudson.model.Run.run(Run.java:1090)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          at hudson.model.ResourceController.execute(ResourceController.java:93)
          at hudson.model.Executor.run(Executor.java:122)

          jmboulos added a comment - I am on Hudson 1.319 and am seeing a similar problem. This is not only for long jobs anymore...this happens after 6 minutes for me. I am running Hudson on a Fedora Core 6 Linux box, but am doing the builds on a Red Hat Enterprise Linux server 5.1 slave. It happens intermittently without pattern. I leave it running all weekend doing a build every 2 hours. During the weekend of about 30 40 builds, it fails 1 time with the following while in the middle of compilation (then works fine on the next run): FATAL: command execution failed hudson.util.IOException2: Failed to join the process at hudson.Proc$RemoteProc.join(Proc.java:297) at hudson.Launcher$ProcStarter.join(Launcher.java:275) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:83) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:471) at hudson.model.Build$RunnerImpl.build(Build.java:157) at hudson.model.Build$RunnerImpl.doRun(Build.java:113) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:345) at hudson.model.Run.run(Run.java:1090) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:122) Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request$1.get(Request.java:188) at hudson.remoting.Request$1.get(Request.java:157) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55) at hudson.Proc$RemoteProc.join(Proc.java:289) ... 12 more Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException at hudson.remoting.Request.abort(Request.java:223) at hudson.remoting.Channel.terminate(Channel.java:561) at hudson.remoting.Channel$ReaderThread.run(Channel.java:819) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Channel$ReaderThread.run(Channel.java:800) FATAL: Unable to delete script file /tmp/hudson5532835365757807889.sh hudson.util.IOException2: remote file operation failed at hudson.FilePath.act(FilePath.java:672) at hudson.FilePath.act(FilePath.java:660) at hudson.FilePath.delete(FilePath.java:904) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:93) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:471) at hudson.model.Build$RunnerImpl.build(Build.java:157) at hudson.model.Build$RunnerImpl.doRun(Build.java:113) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:345) at hudson.model.Run.run(Run.java:1090) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:122) Caused by: java.io.IOException: already closed at hudson.remoting.Channel.send(Channel.java:375) at hudson.remoting.Request.call(Request.java:104) at hudson.remoting.Channel.call(Channel.java:514) at hudson.FilePath.act(FilePath.java:667) ... 13 more FATAL: already closed java.io.IOException: already closed at hudson.remoting.Channel.send(Channel.java:375) at hudson.remoting.Request.call(Request.java:104) at hudson.remoting.Channel.call(Channel.java:514) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:732) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:350) at hudson.model.Run.run(Run.java:1090) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:93) at hudson.model.Executor.run(Executor.java:122)

          The same effect with Hudson 1.330 on Windows XP, Linux.

          Kirill Evstigneev added a comment - The same effect with Hudson 1.330 on Windows XP, Linux.

          crbeng added a comment -

          Been having this problem ever since upgrading from 1.320 to 1.327. See:

          http://www.nabble.com/Failed-to-join-process-v1.327-1.328-to25866005.html

          Now using hudson 1.329. Master is on fedora core 5. Slaves are on all kinds of
          platforms: rhel, windows, hpux, macosx, solaris etc etc.

          crbeng added a comment - Been having this problem ever since upgrading from 1.320 to 1.327. See: http://www.nabble.com/Failed-to-join-process-v1.327-1.328-to25866005.html Now using hudson 1.329. Master is on fedora core 5. Slaves are on all kinds of platforms: rhel, windows, hpux, macosx, solaris etc etc.

          crbeng added a comment -

          crbeng added a comment - See https://hudson.dev.java.net/issues/show_bug.cgi?id=4656

          tiainpa added a comment -

          Using Hudson 1.339 on Windows XP with 5 Windows XP slaves, I'm seeing this error happening constantly with longer test runs.

          I'm pretty sure this happens because nothing is coming to the output for a long time from our test system, therefore I would suggest if it is possible to add the possibility to adjust the timeout time manually either to project or node configuration in Hudson?

          If someone knows if this can be done with Java command line options I would appreciate that too.

          tiainpa added a comment - Using Hudson 1.339 on Windows XP with 5 Windows XP slaves, I'm seeing this error happening constantly with longer test runs. I'm pretty sure this happens because nothing is coming to the output for a long time from our test system, therefore I would suggest if it is possible to add the possibility to adjust the timeout time manually either to project or node configuration in Hudson? If someone knows if this can be done with Java command line options I would appreciate that too.

          tiainpa added a comment -

          I got some new information when I ran one of the slaves in headless mode instead of Java Web Start, and just before this issue was reported in the console output, I saw the following in the command prompt window of the slave:

          20.1.2010 14:14:44 hudson.remoting.Engine$2 onDead
          INFO: Ping failed. Terminating the socket.
          20.1.2010 14:14:44 hudson.remoting.Channel$ReaderThread run
          SEVERE: I/O error in channel channel
          java.net.SocketException: socket closed
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.io.BufferedInputStream.fill(Unknown Source)
          at java.io.BufferedInputStream.read(Unknown Source)
          at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)
          at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source)
          at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
          at java.io.ObjectInputStream.readObject0(Unknown Source)
          at java.io.ObjectInputStream.readObject(Unknown Source)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:852)
          20.1.2010 14:14:44 hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated

          So I think the new ping mechanism of Hudson thinks that the connection is broken and that produces the exception. I've been pinging the machines all day long with Windows XP's ping utility and sometimes, rarely I see that ping times out when I have the timeout value set at 1 second. I wonder if there is a way to manually adjust the timeout value in Hudson?

          I don't know the exact timeout value which should work in my environment but it seems that it should be at least bigger 1 second.

          tiainpa added a comment - I got some new information when I ran one of the slaves in headless mode instead of Java Web Start, and just before this issue was reported in the console output, I saw the following in the command prompt window of the slave: 20.1.2010 14:14:44 hudson.remoting.Engine$2 onDead INFO: Ping failed. Terminating the socket. 20.1.2010 14:14:44 hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel channel java.net.SocketException: socket closed at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peek(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source) at java.io.ObjectInputStream.readObject0(Unknown Source) at java.io.ObjectInputStream.readObject(Unknown Source) at hudson.remoting.Channel$ReaderThread.run(Channel.java:852) 20.1.2010 14:14:44 hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated So I think the new ping mechanism of Hudson thinks that the connection is broken and that produces the exception. I've been pinging the machines all day long with Windows XP's ping utility and sometimes, rarely I see that ping times out when I have the timeout value set at 1 second. I wonder if there is a way to manually adjust the timeout value in Hudson? I don't know the exact timeout value which should work in my environment but it seems that it should be at least bigger 1 second.

          Felix Drueke added a comment -

          I'm getting the very same error (as described in the initial issue-description) with Hudson
          1.345 occasionally (maybe once every 40 builds).
          Our server is running on Solaris 9 and I just saw the error happening on a slave running Solaris 10.

          Felix Drueke added a comment - I'm getting the very same error (as described in the initial issue-description) with Hudson 1.345 occasionally (maybe once every 40 builds). Our server is running on Solaris 9 and I just saw the error happening on a slave running Solaris 10.

          jbauernberger added a comment -

          Hi,

          we are getting the same result (with long running jobs).
          We have a mixture of Linux RHEE installations (running version 4 and 5).
          hudson version is latest

          jbauernberger added a comment - Hi, we are getting the same result (with long running jobs). We have a mixture of Linux RHEE installations (running version 4 and 5). hudson version is latest

          mdonohue added a comment -

          For Hudson, 'latest' changes every week. Could you specify the version number explicitly?

          mdonohue added a comment - For Hudson, 'latest' changes every week. Could you specify the version number explicitly?

          Hi,

          we still have this problem, and not only for long running jobs.
          Currently we are using Hudson 1.355 running on RHEE.

          Hans-Juergen Hafner added a comment - Hi, we still have this problem, and not only for long running jobs. Currently we are using Hudson 1.355 running on RHEE.

          njancesk added a comment -

          I have this same issue running Hudson ver. 1.355 on slave on Solaris 10 Sparc machine using Java 1.5.0 with jobs that take 4+ hours.

          I don't have this issue with similiar jobs on Solaris 10 x86, but the job finishes before 4 hours.

          njancesk added a comment - I have this same issue running Hudson ver. 1.355 on slave on Solaris 10 Sparc machine using Java 1.5.0 with jobs that take 4+ hours. I don't have this issue with similiar jobs on Solaris 10 x86, but the job finishes before 4 hours.

          Jim McCaskey added a comment -

          FWIW: This seems to be happing with a Windows 2003 Slave as well usining Hudson 1.362. I have seen it before, this is just the first time I tried to track down a solution. Here is what the error looks like on this version of Hudson.

          FATAL: command execution failed
          hudson.util.IOException2: Failed to join the process
          at hudson.Proc$RemoteProc.join(Proc.java:312)
          at hudson.Launcher$ProcStarter.join(Launcher.java:280)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:83)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
          at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601)
          at hudson.model.Build$RunnerImpl.build(Build.java:174)
          at hudson.model.Build$RunnerImpl.doRun(Build.java:138)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416)
          at hudson.model.Run.run(Run.java:1253)
          at hudson.matrix.MatrixRun.run(MatrixRun.java:130)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:124)
          Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
          at hudson.remoting.Request$1.get(Request.java:218)
          at hudson.remoting.Request$1.get(Request.java:172)
          at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
          at hudson.Proc$RemoteProc.join(Proc.java:304)
          ... 12 more
          Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
          at hudson.remoting.Request.abort(Request.java:257)
          at hudson.remoting.Channel.terminate(Channel.java:602)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:893)
          Caused by: java.io.IOException: Unexpected termination of the channel
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:875)
          Caused by: java.io.EOFException
          at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
          at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
          at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:869)
          FATAL: Unable to delete script file C:\DOCUME~1\conman\LOCALS~1\Temp\hudson7729064622458259363.bat
          hudson.util.IOException2: remote file operation failed: C:\DOCUME~1\conman\LOCALS~1\Temp\hudson7729064622458259363.bat at hudson.remoting.Channel@1a8aa2c:cmhslave02-win32
          at hudson.FilePath.act(FilePath.java:749)
          at hudson.FilePath.act(FilePath.java:735)
          at hudson.FilePath.delete(FilePath.java:990)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:93)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
          at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601)
          at hudson.model.Build$RunnerImpl.build(Build.java:174)
          at hudson.model.Build$RunnerImpl.doRun(Build.java:138)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416)
          at hudson.model.Run.run(Run.java:1253)
          at hudson.matrix.MatrixRun.run(MatrixRun.java:130)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:124)
          Caused by: hudson.remoting.ChannelClosedException: channel is already closed
          at hudson.remoting.Channel.send(Channel.java:412)
          at hudson.remoting.Request.call(Request.java:105)
          at hudson.remoting.Channel.call(Channel.java:555)
          at hudson.FilePath.act(FilePath.java:742)
          ... 13 more
          FATAL: channel is already closed
          hudson.remoting.ChannelClosedException: channel is already closed
          at hudson.remoting.Channel.send(Channel.java:412)
          at hudson.remoting.Request.call(Request.java:105)
          at hudson.remoting.Channel.call(Channel.java:555)
          at hudson.Launcher$RemoteLauncher.kill(Launcher.java:744)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:421)
          at hudson.model.Run.run(Run.java:1253)
          at hudson.matrix.MatrixRun.run(MatrixRun.java:130)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:124)

          Jim McCaskey added a comment - FWIW: This seems to be happing with a Windows 2003 Slave as well usining Hudson 1.362. I have seen it before, this is just the first time I tried to track down a solution. Here is what the error looks like on this version of Hudson. FATAL: command execution failed hudson.util.IOException2: Failed to join the process at hudson.Proc$RemoteProc.join(Proc.java:312) at hudson.Launcher$ProcStarter.join(Launcher.java:280) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:83) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.build(Build.java:174) at hudson.model.Build$RunnerImpl.doRun(Build.java:138) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416) at hudson.model.Run.run(Run.java:1253) at hudson.matrix.MatrixRun.run(MatrixRun.java:130) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:124) Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request$1.get(Request.java:218) at hudson.remoting.Request$1.get(Request.java:172) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55) at hudson.Proc$RemoteProc.join(Proc.java:304) ... 12 more Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:257) at hudson.remoting.Channel.terminate(Channel.java:602) at hudson.remoting.Channel$ReaderThread.run(Channel.java:893) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:875) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:869) FATAL: Unable to delete script file C:\DOCUME~1\conman\LOCALS~1\Temp\hudson7729064622458259363.bat hudson.util.IOException2: remote file operation failed: C:\DOCUME~1\conman\LOCALS~1\Temp\hudson7729064622458259363.bat at hudson.remoting.Channel@1a8aa2c:cmhslave02-win32 at hudson.FilePath.act(FilePath.java:749) at hudson.FilePath.act(FilePath.java:735) at hudson.FilePath.delete(FilePath.java:990) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:93) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:601) at hudson.model.Build$RunnerImpl.build(Build.java:174) at hudson.model.Build$RunnerImpl.doRun(Build.java:138) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:416) at hudson.model.Run.run(Run.java:1253) at hudson.matrix.MatrixRun.run(MatrixRun.java:130) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:124) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:412) at hudson.remoting.Request.call(Request.java:105) at hudson.remoting.Channel.call(Channel.java:555) at hudson.FilePath.act(FilePath.java:742) ... 13 more FATAL: channel is already closed hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:412) at hudson.remoting.Request.call(Request.java:105) at hudson.remoting.Channel.call(Channel.java:555) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:744) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:421) at hudson.model.Run.run(Run.java:1253) at hudson.matrix.MatrixRun.run(MatrixRun.java:130) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:124)

          Shrinkhla21 added a comment -

          The best solution being that there is no ping requests made till the time the slave is running a build.
          Or the least requirement would be to somehow increase the timeout on the ping event.

          Shrinkhla21 added a comment - The best solution being that there is no ping requests made till the time the slave is running a build. Or the least requirement would be to somehow increase the timeout on the ping event.

          Tzuchien added a comment -

          I am also experiencing exactly the same problem (callstack). Hudson 1.363 on RedHat/Tomcat. Windows XP slaves. I have a matrix job, and each configuration job takes about 1.5 hours. Not all jobs fail, in my last build, 1 out of 25 configuration failed because of this problem.

          Tzuchien added a comment - I am also experiencing exactly the same problem (callstack). Hudson 1.363 on RedHat/Tomcat. Windows XP slaves. I have a matrix job, and each configuration job takes about 1.5 hours. Not all jobs fail, in my last build, 1 out of 25 configuration failed because of this problem.

          Code changed in hudson
          User: : kohsuke
          Path:
          trunk/hudson/main/remoting/src/main/java/hudson/remoting/Channel.java
          trunk/hudson/main/remoting/src/main/java/hudson/remoting/ChannelClosedException.java
          http://jenkins-ci.org/commit/33537
          Log:
          [JENKINS-5073 JENKINS-3412] improved the error diagnostics on ChannelClosedException by having it report who/how the connection was closed.

          SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: trunk/hudson/main/remoting/src/main/java/hudson/remoting/Channel.java trunk/hudson/main/remoting/src/main/java/hudson/remoting/ChannelClosedException.java http://jenkins-ci.org/commit/33537 Log: [JENKINS-5073 JENKINS-3412] improved the error diagnostics on ChannelClosedException by having it report who/how the connection was closed.

          dogfood added a comment -

          Integrated in hudson_main_trunk #156
          [JENKINS-5073 JENKINS-3412] improved the error diagnostics on ChannelClosedException by having it report who/how the connection was closed.

          kohsuke :
          Files :

          • /trunk/hudson/main/remoting/src/main/java/hudson/remoting/ChannelClosedException.java
          • /trunk/hudson/main/remoting/src/main/java/hudson/remoting/Channel.java

          dogfood added a comment - Integrated in hudson_main_trunk #156 [JENKINS-5073 JENKINS-3412] improved the error diagnostics on ChannelClosedException by having it report who/how the connection was closed. kohsuke : Files : /trunk/hudson/main/remoting/src/main/java/hudson/remoting/ChannelClosedException.java /trunk/hudson/main/remoting/src/main/java/hudson/remoting/Channel.java

          kalpanab added a comment -

          I did integrate the above fix in our Hudson 1.362 version but I am not seeing the root cause of why Hudson slave connection reset.

          kalpanab added a comment - I did integrate the above fix in our Hudson 1.362 version but I am not seeing the root cause of why Hudson slave connection reset.

          Can you please report the stack trace?

          Kohsuke Kawaguchi added a comment - Can you please report the stack trace?

          I'm marking this as a duplicate of JENKINS-5073.

          Both issues are caused by a lost master/slave communication channel. When it happens while your build is waiting for a forked process to complete, you see this error in the build console.

          Kohsuke Kawaguchi added a comment - I'm marking this as a duplicate of JENKINS-5073 . Both issues are caused by a lost master/slave communication channel. When it happens while your build is waiting for a forked process to complete, you see this error in the build console.

          Marking as a duplicate.

          Kohsuke Kawaguchi added a comment - Marking as a duplicate.

            Unassigned Unassigned
            chad_lyon chad_lyon
            Votes:
            24 Vote for this issue
            Watchers:
            27 Start watching this issue

              Created:
              Updated:
              Resolved: