Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-12235

FATAL, Unable to delete script file, IOException2, remote file operation failed, unexpected termination of channel

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Critical
    • Resolution: Duplicate
    • core, remoting
    • None

    Description

      Below is the stacktrace.

      It happened when I ran two jobs on a master. After running a while, both jobs crashed with this exception.
      I think this might be caused by a small flip-flop connectivity of the network, but I didn't noticed any disconnection.
      Another cause may be the huge load of jenkins:

      PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
      25942 hudson 15 0 6902m 5.8g 5720 S 0.3 74.3 401:22.30 java

      Does the jenkins runs its own garbage collector at some specified time?
      We have to restart every few days because it's getting slower and slower until hangs out.

      FATAL: Unable to delete script file /tmp/hudson8303731085225956739.sh
      hudson.util.IOException2: remote file operation failed: /tmp/hudson8303731085225956739.sh at hudson.remoting.Channel@30e472f4:build@autom-1
      at hudson.FilePath.act(FilePath.java:781)
      at hudson.FilePath.act(FilePath.java:767)
      at hudson.FilePath.delete(FilePath.java:1022)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:695)
      at hudson.model.Build$RunnerImpl.build(Build.java:178)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:461)
      at hudson.model.Run.run(Run.java:1404)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:230)
      Caused by: hudson.remoting.ChannelClosedException: channel is already closed
      at hudson.remoting.Channel.send(Channel.java:499)
      at hudson.remoting.Request.call(Request.java:110)
      at hudson.remoting.Channel.call(Channel.java:681)
      at hudson.FilePath.act(FilePath.java:774)
      ... 13 more
      Caused by: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1115)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)
      FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Request.call(Request.java:149)
      at hudson.remoting.Channel.call(Channel.java:681)
      at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
      at $Proxy29.join(Unknown Source)
      at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:859)
      at hudson.Launcher$ProcStarter.join(Launcher.java:345)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
      at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:695)
      at hudson.model.Build$RunnerImpl.build(Build.java:178)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:461)
      at hudson.model.Run.run(Run.java:1404)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:230)
      Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Request.abort(Request.java:273)
      at hudson.remoting.Channel.terminate(Channel.java:732)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1139)
      Caused by: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1115)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)

      Attachments

        Issue Links

          Activity

            rb2k Marc Seeger added a comment -

            I just witnessed it live on a slave today.
            Some findings:

            1. Once the slave started failing, following (different) jobs failed too. (Tested 3 jobs, all of them failed with the same error)
            2. Just disconnecting and reconnecting the slave made it work again

            rb2k Marc Seeger added a comment - I just witnessed it live on a slave today. Some findings: 1. Once the slave started failing, following (different) jobs failed too. (Tested 3 jobs, all of them failed with the same error) 2. Just disconnecting and reconnecting the slave made it work again
            guyr Guy Rozendorn added a comment -

            We had some issues in our lab, which forced us to re-install all of our slaves (84 and counting).
            We are still experiencing this issue

            It seems that after this happens, the slave remains connected to Jenkins. However, I can't tell what happens if you try to run another job on it, because we revert the slave VM from snapshot after every run (whether it is successful or not)

            guyr Guy Rozendorn added a comment - We had some issues in our lab, which forced us to re-install all of our slaves (84 and counting). We are still experiencing this issue It seems that after this happens, the slave remains connected to Jenkins. However, I can't tell what happens if you try to run another job on it, because we revert the slave VM from snapshot after every run (whether it is successful or not)
            dannystaple Danny Staple added a comment -

            Ok - I've found something on this today. If you have very "chatty" jobs on the slaves which output a lot of console data, try to log/redirect it to a file - they aren't necessarily the root cause, but make it more prone.

            If a job is running, but quiet, you can unplug a slave network cable for a few seconds, put it back in and things will pretty much continue as before. However- a slave running a chatty job will die with an io error almost immediately.

            If you can redirect to file, you may see a big reduction in these.

            dannystaple Danny Staple added a comment - Ok - I've found something on this today. If you have very "chatty" jobs on the slaves which output a lot of console data, try to log/redirect it to a file - they aren't necessarily the root cause, but make it more prone. If a job is running, but quiet, you can unplug a slave network cable for a few seconds, put it back in and things will pretty much continue as before. However- a slave running a chatty job will die with an io error almost immediately. If you can redirect to file, you may see a big reduction in these.
            guyr Guy Rozendorn added a comment -

            After update all our jobs to yield output every 10 seconds this occurs less frequent, but it still happens few times a week.

            guyr Guy Rozendorn added a comment - After update all our jobs to yield output every 10 seconds this occurs less frequent, but it still happens few times a week.
            jglick Jesse Glick added a comment -

            Essentially a duplicate of JENKINS-1948.

            jglick Jesse Glick added a comment - Essentially a duplicate of JENKINS-1948 .

            People

              Unassigned Unassigned
              dumghen Ghenadie Dumitru
              Votes:
              38 Vote for this issue
              Watchers:
              47 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: