Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-1948

Intermittent slave disconnections with secondary symptoms

      Another intermittent problem.
      Master is linux, target is freebsd 4.9.

      FATAL: Unable to delete script file /var/tmp/hudson60616.sh
      hudson.util.IOException2: remote file operation failed
      at hudson.FilePath.act(FilePath.java:313)
      at hudson.FilePath.delete(FilePath.java:510)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:70)
      at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:34)
      at hudson.model.Build$RunnerImpl.build(Build.java:130)
      at hudson.model.Build$RunnerImpl.doRun(Build.java:105)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:231)
      at hudson.model.Run.run(Run.java:756)
      at hudson.model.Build.run(Build.java:85)
      at hudson.model.ResourceController.execute(ResourceController.java:70)
      at hudson.model.Executor.run(Executor.java:82)
      Caused by: java.io.IOException: already closed
      at hudson.remoting.Channel.send(Channel.java:316)
      at hudson.remoting.Request.call(Request.java:81)
      at hudson.remoting.Channel.call(Channel.java:390)
      at hudson.FilePath.act(FilePath.java:310)
      ... 10 more
      Build was aborted
      FATAL: null
      java.lang.NullPointerException
      at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:78)
      at
      hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:309)
      at
      hudson.model.AbstractBuild$AbstractRunner.performAllBuildStep(AbstractBuild.java:297)
      at hudson.model.Build$RunnerImpl.post2(Build.java:118)
      at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:282)
      at hudson.model.Run.run(Run.java:774)
      at hudson.model.Build.run(Build.java:85)
      at hudson.model.ResourceController.execute(ResourceController.java:70)
      at hudson.model.Executor.run(Executor.java:82)

          [JENKINS-1948] Intermittent slave disconnections with secondary symptoms

          Still an issue in jenkins 1.467

          FATAL: Unable to delete script file /var/folders/8j/q0m30zk95rv6_58lnchztz680000gn/T/hudson6575302717651863102.sh
          hudson.util.IOException2: remote file operation failed: /var/folders/8j/q0m30zk95rv6_58lnchztz680000gn/T/hudson6575302717651863102.sh at hudson.remoting.Channel@38526a51:ISTFrameworks-MacMini
          at hudson.FilePath.act(FilePath.java:838)
          at hudson.FilePath.act(FilePath.java:824)
          at hudson.FilePath.delete(FilePath.java:1129)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
          at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:711)
          at hudson.model.Build$RunnerImpl.build(Build.java:178)
          at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:481)
          at hudson.model.Run.run(Run.java:1438)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:239)
          Caused by: hudson.remoting.ChannelClosedException: channel is already closed
          at hudson.remoting.Channel.send(Channel.java:475)
          at hudson.remoting.Request.call(Request.java:110)
          at hudson.remoting.Channel.call(Channel.java:646)
          at hudson.FilePath.act(FilePath.java:831)
          ... 13 more
          Caused by: java.net.SocketException: Connection reset
          at java.net.SocketInputStream.read(SocketInputStream.java:189)
          at java.net.SocketInputStream.read(SocketInputStream.java:121)
          at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
          at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2266)
          at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2559)
          at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2569)
          at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1315)
          at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
          at hudson.remoting.Command.readFrom(Command.java:90)
          at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
          FATAL: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
          hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
          at hudson.remoting.Request.call(Request.java:149)
          at hudson.remoting.Channel.call(Channel.java:646)
          at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
          at $Proxy141.join(Unknown Source)
          at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:861)
          at hudson.Launcher$ProcStarter.join(Launcher.java:345)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
          at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
          at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:711)
          at hudson.model.Build$RunnerImpl.build(Build.java:178)
          at hudson.model.Build$RunnerImpl.doRun(Build.java:139)
          at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:481)
          at hudson.model.Run.run(Run.java:1438)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          at hudson.model.ResourceController.execute(ResourceController.java:88)
          at hudson.model.Executor.run(Executor.java:239)
          Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset
          at hudson.remoting.Request.abort(Request.java:273)
          at hudson.remoting.Channel.terminate(Channel.java:702)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
          Caused by: java.net.SocketException: Connection reset
          at java.net.SocketInputStream.read(SocketInputStream.java:189)
          at java.net.SocketInputStream.read(SocketInputStream.java:121)
          at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
          at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2266)
          at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2559)
          at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2569)
          at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1315)
          at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
          at hudson.remoting.Command.readFrom(Command.java:90)
          at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
          at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          Balaji Krishnaswamy added a comment - Still an issue in jenkins 1.467 FATAL: Unable to delete script file /var/folders/8j/q0m30zk95rv6_58lnchztz680000gn/T/hudson6575302717651863102.sh hudson.util.IOException2: remote file operation failed: /var/folders/8j/q0m30zk95rv6_58lnchztz680000gn/T/hudson6575302717651863102.sh at hudson.remoting.Channel@38526a51:ISTFrameworks-MacMini at hudson.FilePath.act(FilePath.java:838) at hudson.FilePath.act(FilePath.java:824) at hudson.FilePath.delete(FilePath.java:1129) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:711) at hudson.model.Build$RunnerImpl.build(Build.java:178) at hudson.model.Build$RunnerImpl.doRun(Build.java:139) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:481) at hudson.model.Run.run(Run.java:1438) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:239) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:475) at hudson.remoting.Request.call(Request.java:110) at hudson.remoting.Channel.call(Channel.java:646) at hudson.FilePath.act(FilePath.java:831) ... 13 more Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:189) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2266) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2559) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2569) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1315) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369) at hudson.remoting.Command.readFrom(Command.java:90) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) FATAL: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset at hudson.remoting.Request.call(Request.java:149) at hudson.remoting.Channel.call(Channel.java:646) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at $Proxy141.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:861) at hudson.Launcher$ProcStarter.join(Launcher.java:345) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:82) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:711) at hudson.model.Build$RunnerImpl.build(Build.java:178) at hudson.model.Build$RunnerImpl.doRun(Build.java:139) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:481) at hudson.model.Run.run(Run.java:1438) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:239) Caused by: hudson.remoting.RequestAbortedException: java.net.SocketException: Connection reset at hudson.remoting.Request.abort(Request.java:273) at hudson.remoting.Channel.terminate(Channel.java:702) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:189) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2266) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2559) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2569) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1315) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369) at hudson.remoting.Command.readFrom(Command.java:90) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

          Brian Harris added a comment -

          For us, the cause of this error was our build slaves (VMs) running out of memory and self-rebooting.

          Brian Harris added a comment - For us, the cause of this error was our build slaves (VMs) running out of memory and self-rebooting.

          Erik Purins added a comment -

          Your VM reboots the operating system? We are not doing that. If the build slave ran out of memory, maybe it is doing something in a buffer it doesn't need to (say, keeping long lines of output it could flush to disk) or has a leak. Since something similar has been happening to us across operating systems, I can help trouble shoot and fix. It's intermittent for us, but it makes our entire build system untrustworthy to our engineers, and they've started ignoring build e-mails.

          Erik Purins added a comment - Your VM reboots the operating system? We are not doing that. If the build slave ran out of memory, maybe it is doing something in a buffer it doesn't need to (say, keeping long lines of output it could flush to disk) or has a leak. Since something similar has been happening to us across operating systems, I can help trouble shoot and fix. It's intermittent for us, but it makes our entire build system untrustworthy to our engineers, and they've started ignoring build e-mails.

          I can confirm same issue in AIX and HPUX having failures intermittently. Any pointers for capturing debug information or workaround will be appreciated.

          Soumen Banerjee added a comment - I can confirm same issue in AIX and HPUX having failures intermittently. Any pointers for capturing debug information or workaround will be appreciated.

          Erik Purins added a comment -

          This is still occurring 1.504 debian squeeze master, osx 10.8 slave. If it's related to an unreliable connection and is this frequent, maybe a better error message (instead of the entire remote action callstack) or more fault-tolerent remote action retry system could be implemented?

          Erik Purins added a comment - This is still occurring 1.504 debian squeeze master, osx 10.8 slave. If it's related to an unreliable connection and is this frequent, maybe a better error message (instead of the entire remote action callstack) or more fault-tolerent remote action retry system could be implemented?

          Jesse Glick added a comment -

          JENKINS-5073 massaged the reporting a bit; should no longer give misleading error about deleting script file (which is merely an aftereffect of the slave connectivity issue).

          Jesse Glick added a comment - JENKINS-5073 massaged the reporting a bit; should no longer give misleading error about deleting script file (which is merely an aftereffect of the slave connectivity issue).

          jminne added a comment -

          I'm also seeing this only on an OSX slave. Windows and linux slaves are fine.

          jminne added a comment - I'm also seeing this only on an OSX slave. Windows and linux slaves are fine.

          Fatih Degirmenci added a comment - - edited

          There are different ways of fixing this issue and in our case, rebooting slaves help. But the other people solved the issue by changing SSH settings on slaves, changing java version used for connecting slaves, etc. (as explained in JENKINS-12235 and other tickets on this issue.)

          So it is crucial for everyone that this issue is solved.

          Of course there are some things which Jenkins can not solve such as issues with the slaves themselves, etc. If this is the case, it would be good if Jenkins tells us a bit more. The error message printed to console could tell us what the "real" problem is, rather than just saying unable to delete the script file. This misleads us and we start troubleshooting unrelated things rather than just checking the slave and rebooting it if necessary. So, having an indication regarding what's the problem behind this preventing Jenkins from deleting the file could improve things. (is it connectivity issue, etc and so on.)

          Fatih Degirmenci added a comment - - edited There are different ways of fixing this issue and in our case, rebooting slaves help. But the other people solved the issue by changing SSH settings on slaves, changing java version used for connecting slaves, etc. (as explained in JENKINS-12235 and other tickets on this issue.) So it is crucial for everyone that this issue is solved. Of course there are some things which Jenkins can not solve such as issues with the slaves themselves, etc. If this is the case, it would be good if Jenkins tells us a bit more. The error message printed to console could tell us what the "real" problem is, rather than just saying unable to delete the script file. This misleads us and we start troubleshooting unrelated things rather than just checking the slave and rebooting it if necessary. So, having an indication regarding what's the problem behind this preventing Jenkins from deleting the file could improve things. (is it connectivity issue, etc and so on.)

          Guy Rozendorn added a comment - - edited

          This week we changed all our 80± slaves from using the SSHLauncher to use the CommandLauncher, which launches strace -t -s 4096 ssh ..., which the following lines in .ssh/config:

          TCPKeepAlive yes
          ServerAliveInterval 10
          ServerAliveCountMax 10
          LogLevel DEBUG
          

          The reason using strace is to get a clue if the connection is dropped first, or the master decides it is dead.

          In one of the job executions (which started at 00:00:56 we get this in the log:

          Started by timer
          Building remotely on host-ci66 in workspace /root/jenkins/workspace/mainline-bdist-develop
          
          Deleting project workspace... Checkout:mainline-bdist-develop / /root/jenkins/workspace/mainline-bdist-develop - hudson.remoting.Channel@27ecbe67:host-ci66
          Using strategy: Default
          Last Built Revision: Revision ff29df8b003dde47573ddfb0b463351baee6dea3 (origin/develop)
          FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
          hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
          	at hudson.remoting.Request.call(Request.java:174)
          	at hudson.remoting.Channel.call(Channel.java:722)
          	at hudson.FilePath.act(FilePath.java:894)
          	at hudson.FilePath.act(FilePath.java:878)
          	at hudson.plugins.git.GitSCM.determineRevisionToBuild(GitSCM.java:942)
          	at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1108)
          	at hudson.model.AbstractProject.checkout(AbstractProject.java:1369)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:676)
          	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
          	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:581)
          	at hudson.model.Run.execute(Run.java:1593)
          	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          	at hudson.model.ResourceController.execute(ResourceController.java:88)
          	at hudson.model.Executor.run(Executor.java:242)
          Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
          	at hudson.remoting.Request.abort(Request.java:299)
          	at hudson.remoting.Channel.terminate(Channel.java:782)
          	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
          Caused by: java.io.IOException: Unexpected termination of the channel
          	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
          Caused by: java.io.EOFException
          	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2595)
          	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1315)
          	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
          	at hudson.remoting.Command.readFrom(Command.java:92)
          	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
          	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
          

          and the slave.log ends with this:

          00:00:56 select(7, [3 4], [3 5], NULL, {10, 0}) = 4 (in [3 4], out [3 5], left {9, 999997})
          00:00:56 read(4, "\10\0\245\231\255B,\304\304\7\23\3\0\0000\7\0\0d\0\0\0hudson/plugins/git/browser/FisheyeGitRepositoryBrowser$FisheyeGitRepositoryBrowserDescriptor$1.class\265U{O\23A\20\377m)\\\251\247\205\"\240@\364\264UK\21\17|\240\370\226\332\"Z5\341\225h\214\361\350-\355\312q[\357\256\5\376\364c\370-0\0214\232\370\1\374P\306\331k!&\0264&m\2567\2733\263\363\370\315\314\336\217\237_\277\3\230\304\223n\34\303\250ze\343H`L\303\3058\242\30\217\23\347R\f\246\242\23qR\274\34\303\25E\257j\270\246\230S\32\256k\270\301p\254n9\351U\341Z\316\262\345\3248C\262\370\326\252[\246c\271es!\360\204[\276\305\320\25T\204\237\236\320@\353s\5\341W\370\26\237\25\301<\257J_\4\322\333\232\361\344\206\317\275\207\334/y\242J\34\6}\316u\271\227s,\337\347>\303\353b\245f\373\3225\253N\255,\\\337,\213\300\\i\0343\0171\231\376'w*\306\333\302\25\301]\6?\323^W\177\0024\272\314\20\315I\233\340K\24\205\313\237\325\326W\270\267h\2558!\240\262\244\320\365\204\3327\231Q\5(\3\30\336\2645\330\364$!\323Y\252\360\322\32\303\251\314\350\236\263Z \34\263 \275u\252\272\260\255@H\227\24\31E\326\337HOHs\356y~\263\304\253MY|\177\343k\270M\373\5Y\363J\274 T:\306!\301\\R\366\10\230\274[rH\342\226\237\362\240\"m\35wpWG\22\3:z\320\253c\20\367\250iB\347.\17\314\245\371\242\342\335\327\361\0003\f\232\362\220\337\342:rx\250!\257\243\0\203\341\352\"\341h\320c\31u\225\211A\307\214\225Z`\210\300\260%\367\335\v\201\341H\271f8b\215\33\3736fA}\322\327\"Q\35\2170G\315\332\336\2320\244\16\256C\232R\3105\n\26\333[jx\314\360\252\235AQw(86M\342\326\35\302\177\241A\367\221\241\212\3332\ff\311s\30&2-\306\340\360\356\352\310\2509\351ouP\315\217\254rr\222i\212\233-@6\367\2524S[]\345\36\267\347\271es5\3601\272\266\354E\276\0310\214g\16Rk\341\354%\303\320\301\2012D$A\337\311=O\241\322[\261\\\333\341\277\265\10C\276E\n-\247\346/\200L\375_Aa\320\5\236\240\373?\322\323\243F\7 Js\2448\364\37\304\t\272VN\322j\232\366\35D\23\331\261\35\260\354\305\35D\262\237\321\3611T\34\242w\27)\2\353\30\246\267\36\256\23\30\301)\242Q\234&7\221\320\314{t\206\206\307\263\331o\210\276\310~B\3443:w\321\225\324v\21\373\0\355\v\272\267\223\361/8\262]\314*\351\330.\216n\207G\206q\226\f\217\340LH\33N\7\310\34\360\216\242\364H\22\220\244\216\0246\302 \6H'\205x\250\257\302\31o\206\223\n\203\215\220\2154et.L\340<.\204\201\366!\203~Z\r\221\244\0177q\34\352{\330\370%\350\2237\35\355\376\5PK\3\4\n\0\0\10\10\0\245\231\255B\342o\20\6\230\5\0\0\314\f\0\0b\0\0\0hudson/plugins/git/browser/FisheyeGitRepositoryBrowser$FisheyeGitRepositoryBrowserDescriptor.class\265W{S\23W\24\377\255\0016\204E\20\37(\365\21-\266!\321\254\257\252m(-\362\20h\2H\20|\353&\271$\0276\273q\37H\332\332\367\373\365\1\374\24jg\"\326\231N\377\353L?T\247\347nB\0101\240\343\324\314\354}\236\347\357\236s\356\315?\377\376\361'\200\223\370-\200\3\30\363c\\\306\204\37\37\265!\216D\0\223\230\362c:\200K\230\221\221\f\240\25c\242\231\25\315e\321\314\265b?\346\3\270\202\253\1\\\303u?n\10\256\233\1\334\302\355Vj\356\210\221&#\25\300\36dD\303\2\304\262\340GVp\345dp\31\213\22\332.\317\304oO\17\316\316\216\314LJ\330\27_\324\2265\325u\270\256Z,\313V\324i\315q\230e\304$\264\364s\203;\3\22|\241\2769\tMCf\206I\350\210s\203M\272\371\24\263f\265\224N+]q3\255\351s\232\305\305\274\262\330\344\344\270-\341\350(\267s\254\310.rg\206\25L\233;\246U\274`\231\367lf\r3;m\361\2\255HP\306\r\203YC\272f\333\214\330n\305sn\3066\r\265\240\273Yn\330j\226;j\252\314\246n!\262\367\245\324\221o\333\263\314\31\346vA\327\212\223Z\236\354\335\25\352+C\241kFVM:\0267\262D\327f\260{\343\206\355hF\232\210\226Cq\323\312\252Kf\316v\227\230J\313\5\235\fJ\226\373\31v\327e\266\23\213\33\314Q\355\5uQ80\221\234\232\234J-\262\264\23\353{E\247\310\f\237\305\356J\350}\31\355\22\2BqY\247\204\275\233YCt#+iVp\270i\3302\226h\2361\207r,\275t\331\322%\234\10=\17G\325\1/\\FM+?\247\351<\243\t\21$\257yY\323]/\36\32\0\31X\340\206\10\22\242\220\221\227aH82\343\32\16\317\2639ns\212\231i\315\242\223\240\330\0334\f\323\361\204\332\233\371|\311eV\261\312@\342\375\v|e$_p\212>J2I\302\225\377\347\244\362\24\362\272Z\216\234\224\10lR\265m\345\204hN\222\322\376\264^I\221\326$\317\32\232\343Z\344\177\177#n\21w\325\r;\235W\237;\346\376\360@l@ \2254]+\3ERROR: Connection terminated
          ESC[8mha:AAAAWB+LCAAAAAAAAABb85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=ESC[0mjava.io.IOException: Unexpected termination of the channel
                  at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
          Caused by: java.io.EOFException
                  at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2595)
                  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1315)
                  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
                  at hudson.remoting.Command.readFrom(Command.java:92)
                  at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59)
                  at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
          

          From what I understand, when the java process on the master node decides that the connection is terminated, the SSH connection was still up and running and with data flowing in both directions.

          Would it help if we dump the stdout and stdin streams into files and attach them to this ticket?

          Guy Rozendorn added a comment - - edited This week we changed all our 80± slaves from using the SSHLauncher to use the CommandLauncher, which launches strace -t -s 4096 ssh ... , which the following lines in .ssh/config : TCPKeepAlive yes ServerAliveInterval 10 ServerAliveCountMax 10 LogLevel DEBUG The reason using strace is to get a clue if the connection is dropped first, or the master decides it is dead. In one of the job executions (which started at 00:00:56 we get this in the log: Started by timer Building remotely on host-ci66 in workspace /root/jenkins/workspace/mainline-bdist-develop Deleting project workspace... Checkout:mainline-bdist-develop / /root/jenkins/workspace/mainline-bdist-develop - hudson.remoting.Channel@27ecbe67:host-ci66 Using strategy: Default Last Built Revision: Revision ff29df8b003dde47573ddfb0b463351baee6dea3 (origin/develop) FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:722) at hudson.FilePath.act(FilePath.java:894) at hudson.FilePath.act(FilePath.java:878) at hudson.plugins.git.GitSCM.determineRevisionToBuild(GitSCM.java:942) at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1108) at hudson.model.AbstractProject.checkout(AbstractProject.java:1369) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:676) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:581) at hudson.model.Run.execute(Run.java:1593) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:242) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:782) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2595) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1315) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) and the slave.log ends with this: 00:00:56 select(7, [3 4], [3 5], NULL, {10, 0}) = 4 (in [3 4], out [3 5], left {9, 999997}) 00:00:56 read(4, "\10\0\245\231\255B,\304\304\7\23\3\0\0000\7\0\0d\0\0\0hudson/plugins/git/browser/FisheyeGitRepositoryBrowser$FisheyeGitRepositoryBrowserDescriptor$1.class\265U{O\23A\20\377m)\\\251\247\205\"\240@\364\264UK\21\17|\240\370\226\332\"Z5\341\225h\214\361\350-\355\312q[\357\256\5\376\364c\370-0\0214\232\370\1\374P\306\331k!&\0264&m\2567\2733\263\363\370\315\314\336\217\237_\277\3\230\304\223n\34\303\250ze\343H`L\303\3058\242\30\217\23\347R\f\246\242\23qR\274\34\303\25E\257j\270\246\230S\32\256k\270\301p\254n9\351U\341Z\316\262\345\3248C\262\370\326\252[\246c\271es!\360\204[\276\305\320\25T\204\237\236\320@\353s\5\341W\370\26\237\25\301<\257J_\4\322\333\232\361\344\206\317\275\207\334/y\242J\34\6}\316u\271\227s,\337\347>\303\353b\245f\373\3225\253N\255,\\\337,\213\300\\i\0343\0171\231\376'w*\306\333\302\25\301]\6?\323^W\177\0024\272\314\20\315I\233\340K\24\205\313\237\325\326W\270\267h\2558!\240\262\244\320\365\204\3327\231Q\5(\3\30\336\2645\330\364$!\323Y\252\360\322\32\303\251\314\350\236\263Z \34\263 \275u\252\272\260\255@H\227\24\31E\326\337HOHs\356y~\263\304\253MY|\177\343k\270M\373\5Y\363J\274 T:\306!\301\\R\366\10\230\274[rH\342\226\237\362\240\"m\35wpWG\22\3:z\320\253c\20\367\250iB\347.\17\314\245\371\242\342\335\327\361\0003\f\232\362\220\337\342:rx\250!\257\243\0\203\341\352\"\341h\320c\31u\225\211A\307\214\225Z`\210\300\260%\367\335\v\201\341H\271f8b\215\33\3736fA}\322\327\"Q\35\2170G\315\332\336\2320\244\16\256C\232R\3105\n\26\333[jx\314\360\252\235AQw(86M\342\326\35\302\177\241A\367\221\241\212\3332\ff\311s\30&2-\306\340\360\356\352\310\2509\351ouP\315\217\254rr\222i\212\233-@6\367\2524S[]\345\36\267\347\271es5\3601\272\266\354E\276\0310\214g\16Rk\341\354%\303\320\301\2012D$A\337\311=O\241\322[\261\\\333\341\277\265\10C\276E\n-\247\346/\200L\375_Aa\320\5\236\240\373?\322\323\243F\7 Js\2448\364\37\304\t\272VN\322j\232\366\35D\23\331\261\35\260\354\305\35D\262\237\321\3611T\34\242w\27)\2\353\30\246\267\36\256\23\30\301)\242Q\234&7\221\320\314{t\206\206\307\263\331o\210\276\310~B\3443:w\321\225\324v\21\373\0\355\v\272\267\223\361/8\262]\314*\351\330.\216n\207G\206q\226\f\217\340LH\33N\7\310\34\360\216\242\364H\22\220\244\216\0246\302 \6H'\205x\250\257\302\31o\206\223\n\203\215\220\2154et.L\340<.\204\201\366!\203~Z\r\221\244\0177q\34\352{\330\370%\350\2237\35\355\376\5PK\3\4\n\0\0\10\10\0\245\231\255B\342o\20\6\230\5\0\0\314\f\0\0b\0\0\0hudson/plugins/git/browser/FisheyeGitRepositoryBrowser$FisheyeGitRepositoryBrowserDescriptor.class\265W{S\23W\24\377\255\0016\204E\20\37(\365\21-\266!\321\254\257\252m(-\362\20h\2H\20|\353&\271$\0276\273q\37H\332\332\367\373\365\1\374\24jg\"\326\231N\377\353L?T\247\347nB\0101\240\343\324\314\354}\236\347\357\236s\356\315?\377\376\361'\200\223\370-\200\3\30\363c\\\306\204\37\37\265!\216D\0\223\230\362c:\200K\230\221\221\f\240\25c\242\231\25\315e\321\314\265b?\346\3\270\202\253\1\\\303u?n\10\256\233\1\334\302\355Vj\356\210\221&#\25\300\36dD\303\2\304\262\340GVp\345dp\31\213\22\332.\317\304oO\17\316\316\216\314LJ\330\27_\324\2265\325u\270\256Z,\313V\324i\315q\230e\304$\264\364s\203;\3\22|\241\2769\tMCf\206I\350\210s\203M\272\371\24\263f\265\224N+]q3\255\351s\232\305\305\274\262\330\344\344\270-\341\350(\267s\254\310.rg\206\25L\233;\246U\274`\231\367lf\r3;m\361\2\255HP\306\r\203YC\272f\333\214\330n\305sn\3066\r\265\240\273Yn\330j\226;j\252\314\246n!\262\367\245\324\221o\333\263\314\31\346vA\327\212\223Z\236\354\335\25\352+C\241kFVM:\0267\262D\327f\260{\343\206\355hF\232\210\226Cq\323\312\252Kf\316v\227\230J\313\5\235\fJ\226\373\31v\327e\266\23\213\33\314Q\355\5uQ80\221\234\232\234J-\262\264\23\353{E\247\310\f\237\305\356J\350}\31\355\22\2BqY\247\204\275\233YCt#+iVp\270i\3302\226h\2361\207r,\275t\331\322%\234\10=\17G\325\1/\\FM+?\247\351<\243\t\21$\257yY\323]/\36\32\0\31X\340\206\10\22\242\220\221\227aH82\343\32\16\317\2639ns\212\231i\315\242\223\240\330\0334\f\323\361\204\332\233\371|\311eV\261\312@\342\375\v|e$_p\212>J2I\302\225\377\347\244\362\24\362\272Z\216\234\224\10lR\265m\345\204hN\222\322\376\264^I\221\326$\317\32\232\343Z\344\177\177#n\21w\325\r;\235W\237;\346\376\360@l@ \2254]+\3ERROR: Connection terminated ESC[8mha:AAAAWB+LCAAAAAAAAABb85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0ldFVf2c+b/lb5MDAwVRQxSaBqcITRIIQMEMIIUFgAAckCEiWAAAAA=ESC[0mjava.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2595) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1315) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:59) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) From what I understand, when the java process on the master node decides that the connection is terminated, the SSH connection was still up and running and with data flowing in both directions. Would it help if we dump the stdout and stdin streams into files and attach them to this ticket?

          Oleg Nenashev added a comment -

          The last report in this thread was in 2014. Since that Jenkins core and Remoting got lots of stability and diagnosability improvements. If somebody still sees the issue reported in the ticket, please reopen with new logs and other diagnostics information

          Oleg Nenashev added a comment - The last report in this thread was in 2014. Since that Jenkins core and Remoting got lots of stability and diagnosability improvements. If somebody still sees the issue reported in the ticket, please reopen with new logs and other diagnostics information

            oleg_nenashev Oleg Nenashev
            bll6969 bll
            Votes:
            36 Vote for this issue
            Watchers:
            33 Start watching this issue

              Created:
              Updated:
              Resolved: