Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-3412

For long running jobs (>2 hours) job failing with hudson.util.IOException2: Failed to join the process

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • core
    • None
    • Platform: PC, OS: Linux

      We have a sort of special CI environment where after projects build we execute
      them remotely and use hudson to monitor their progress. The remote execution of
      these programs take a while and at certain points no output is sent back to the
      master for long periods of time. During these long intervals where no output is
      sent back (just over 2 hours) I am occasionally seeing the job fail with the
      following:

      FATAL: command execution failed
      hudson.util.IOException2: Failed to join the process
      at hudson.Proc$RemoteProc.join(Proc.java:269)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:84)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      at hudson.model.Build$RunnerImpl.build(Build.java:195)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
      at hudson.model.Run.run(Run.java:895)
      at hudson.model.Build.run(Build.java:112)
      at hudson.model.ResourceController.execute(ResourceController.java:93)
      at hudson.model.Executor.run(Executor.java:119)
      Caused by: java.util.concurrent.ExecutionException:
      hudson.remoting.RequestAbortedException: java.io.EOFException
      at hudson.remoting.Request$1.get(Request.java:188)
      at hudson.remoting.Request$1.get(Request.java:157)
      at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
      at hudson.Proc$RemoteProc.join(Proc.java:261)
      ... 9 more
      Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
      at hudson.remoting.Request.abort(Request.java:223)
      at hudson.remoting.Channel.terminate(Channel.java:528)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:684)
      Caused by: java.io.EOFException
      at
      java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:665)
      FATAL: Unable to delete script file /tmp/hudson24564.sh
      hudson.util.IOException2: remote file operation failed
      at hudson.FilePath.act(FilePath.java:544)
      at hudson.FilePath.delete(FilePath.java:741)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:94)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      at hudson.model.Build$RunnerImpl.build(Build.java:195)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
      at hudson.model.Run.run(Run.java:895)
      at hudson.model.Build.run(Build.java:112)
      at hudson.model.ResourceController.execute(ResourceController.java:93)
      at hudson.model.Executor.run(Executor.java:119)
      Caused by: java.io.IOException: already closed
      at hudson.remoting.Channel.send(Channel.java:342)
      at hudson.remoting.Request.call(Request.java:104)
      at hudson.remoting.Channel.call(Channel.java:481)
      at hudson.FilePath.act(FilePath.java:541)
      ... 10 more
      FATAL: already closed
      java.io.IOException: already closed
      at hudson.remoting.Channel.send(Channel.java:342)
      at hudson.remoting.Request.call(Request.java:104)
      at hudson.remoting.Channel.call(Channel.java:481)
      at hudson.Launcher$RemoteLauncher.kill(Launcher.java:466)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:277)
      at hudson.model.Run.run(Run.java:895)
      at hudson.model.Build.run(Build.java:112)
      at hudson.model.ResourceController.execute(ResourceController.java:93)
      at hudson.model.Executor.run(Executor.java:119)

      However, this is not predictable or reproducible which makes me think it
      corresponds to an external event such as GC, or even an network or OS event (eg
      TCP Error or Socket timeout). Anyway I thought I would put it up here and see if
      anyone else is getting this too.

      I am using Hudson ver. 1.293, The master and slave are both RHEL 4

      An interesting development occurred when I upgraded recently and then set
      hudson.util.ProcessTreeKiller.disable=true. The jobs were still failing but the
      underlying process was eventually completing its job successfully (copying a
      large MySQL DB if you must know). This is the reason I reported this. This hints
      at a bug in hudson's remoting code.

      --Chad

            Unassigned Unassigned
            chad_lyon chad_lyon
            Votes:
            24 Vote for this issue
            Watchers:
            27 Start watching this issue

              Created:
              Updated:
              Resolved: