Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-8883

Build fails because of slave error

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Major
    • Resolution: Duplicate
    • ssh-slaves-plugin
    • None
    • Jenkins ver. 1.398

      Slave is a RedHat 5.2
      Slave workdir is /tmp/...

      ssh-slave 0.14

    Description

      Some builds randomly fail with this message:

      FATAL: L'exécution de la commande a échoué.
      hudson.util.IOException2: Failed to join the process
      at hudson.Proc$RemoteProc.join(Proc.java:359)
      at hudson.Launcher$ProcStarter.join(Launcher.java:280)
      at hudson.tasks.Ant.perform(Ant.java:216)
      at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
      at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:624)
      at hudson.model.Build$RunnerImpl.build(Build.java:176)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:138)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:420)
      at hudson.model.Run.run(Run.java:1362)
      at hudson.matrix.MatrixRun.run(MatrixRun.java:137)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:145)
      Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Request$1.get(Request.java:218)
      at hudson.remoting.Request$1.get(Request.java:172)
      at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
      at hudson.Proc$RemoteProc.join(Proc.java:351)
      ... 11 more
      Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Request.abort(Request.java:257)
      at hudson.remoting.Channel.terminate(Channel.java:680)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:971)
      Caused by: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:953)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:947)

      Here is the slave log:

      Slave successfully connected and online
      ERROR: Connection terminated
      java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:953)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:947)
      ERROR: [02/25/11 09:59:47] [SSH] Error deleting file.
      java.io.IOException: Sorry, this connection is closed.
      at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:637)
      at com.trilead.ssh2.channel.ChannelManager.openSessionChannel(ChannelManager.java:582)
      at com.trilead.ssh2.Session.<init>(Session.java:40)
      at com.trilead.ssh2.Connection.openSession(Connection.java:1047)
      at com.trilead.ssh2.Connection.exec(Connection.java:1434)
      at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:597)
      at hudson.slaves.SlaveComputer$2.onClosed(SlaveComputer.java:320)
      at hudson.remoting.Channel.terminate(Channel.java:695)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:971)
      Caused by: java.net.SocketException: Connection reset
      at java.net.SocketInputStream.read(SocketInputStream.java:168)
      at com.trilead.ssh2.crypto.cipher.CipherInputStream.fill_buffer(CipherInputStream.java:41)
      at com.trilead.ssh2.crypto.cipher.CipherInputStream.internal_read(CipherInputStream.java:52)
      at com.trilead.ssh2.crypto.cipher.CipherInputStream.getBlock(CipherInputStream.java:79)
      at com.trilead.ssh2.crypto.cipher.CipherInputStream.read(CipherInputStream.java:108)
      at com.trilead.ssh2.transport.TransportConnection.receiveMessage(TransportConnection.java:232)
      at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:672)
      at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:470)
      at java.lang.Thread.run(Thread.java:662)
      [02/25/11 09:59:47] [SSH] Connection closed.
      ERROR: [02/25/11 09:59:47] lagent esclave a été terminé
      java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:953)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:947)
      FATAL: channel is already closed
      hudson.remoting.ChannelClosedException: channel is already closed
      at hudson.remoting.Channel.send(Channel.java:466)
      at hudson.remoting.Request.call(Request.java:105)
      at hudson.remoting.Channel.call(Channel.java:629)
      at hudson.Launcher$RemoteLauncher.kill(Launcher.java:744)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:443)
      at hudson.model.Run.run(Run.java:1362)
      at hudson.matrix.MatrixRun.run(MatrixRun.java:137)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:145)
      Caused by: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:953)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:947)

      Attachments

        Issue Links

          Activity

            ebann ebann created issue -

            "Connection reset" indicates that the underlying TCP/IP connection of SSH was abruptly terminated from the slave side – for example, killing sshd would cause this. This can also happen if the router in the middle decides to terminate idle connections. Does one of those ring a bell?

            Beyond that it's hard to diagnose this problem. Maybe we can let you increase the ping frequency if we suspect that the problem is caused by an intermediate router shutting down the connection?

            kohsuke Kohsuke Kawaguchi added a comment - "Connection reset" indicates that the underlying TCP/IP connection of SSH was abruptly terminated from the slave side – for example, killing sshd would cause this. This can also happen if the router in the middle decides to terminate idle connections. Does one of those ring a bell? Beyond that it's hard to diagnose this problem. Maybe we can let you increase the ping frequency if we suspect that the problem is caused by an intermediate router shutting down the connection?
            ebann ebann added a comment -

            No neither of these ring a bell.

            If it was an "auto-disconnect" after idling, it would happen more often I think. Or only on very long jobs.
            But this happens randomly on any job, any node, anytime
            Further more this sometime happens on nodes freshly started too.

            I put an "disconnect node after idle > X" in the Jenkins configuration some weeks ago.
            But it did not changed anything.

            I noticed "ERROR: [02/25/11 09:59:47] [SSH] Error deleting file." in the slave log.
            Is it normal ?

            ebann ebann added a comment - No neither of these ring a bell. If it was an "auto-disconnect" after idling, it would happen more often I think. Or only on very long jobs. But this happens randomly on any job, any node, anytime Further more this sometime happens on nodes freshly started too. I put an "disconnect node after idle > X" in the Jenkins configuration some weeks ago. But it did not changed anything. I noticed "ERROR: [02/25/11 09:59:47] [SSH] Error deleting file." in the slave log. Is it normal ?
            fgu fgu added a comment - - edited

            We think that we have the same trouble with a Mac slave.

            We have avoided the problem doing verbose build, but tasks that have no output have the same problem and once tasks are finished slave become idle again. As happens to ebann, sometimes connection is abruptly terminated just opened.

            We have seen java process ending in slave meanwhile ssh connection keep alive for a while, until ssh ends later.

            fgu fgu added a comment - - edited We think that we have the same trouble with a Mac slave. We have avoided the problem doing verbose build, but tasks that have no output have the same problem and once tasks are finished slave become idle again. As happens to ebann, sometimes connection is abruptly terminated just opened. We have seen java process ending in slave meanwhile ssh connection keep alive for a while, until ssh ends later.
            brianharris brianharris added a comment -

            Seems this is same as JENKINS-6817, please close as duplicate.

            brianharris brianharris added a comment - Seems this is same as JENKINS-6817 , please close as duplicate.
            ebann ebann added a comment -

            Yes it looks like the same issue.

            Anyway I'm not having this problem anymore.
            (But I have changed a LOT of things in my Hudson configuration/slaves since that, so it might still exists)

            ebann ebann added a comment - Yes it looks like the same issue. Anyway I'm not having this problem anymore. (But I have changed a LOT of things in my Hudson configuration/slaves since that, so it might still exists)
            oleg_nenashev Oleg Nenashev added a comment -

            Marked issue as a duplicate according to comments

            oleg_nenashev Oleg Nenashev added a comment - Marked issue as a duplicate according to comments
            oleg_nenashev Oleg Nenashev made changes -
            Field Original Value New Value
            Link This issue duplicates JENKINS-6817 [ JENKINS-6817 ]
            oleg_nenashev Oleg Nenashev made changes -
            Resolution Duplicate [ 3 ]
            Status Open [ 1 ] Resolved [ 5 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 139027 ] JNJira + In-Review [ 188297 ]

            People

              kohsuke Kohsuke Kawaguchi
              ebann ebann
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: