-
Bug
-
Resolution: Not A Defect
-
Major
-
Jenkins 2.89.4 on Windows 7. ssh slaves, 14.04 LTS and 16.04 LTS.
This is a bit of an epic, sorry.
For a year or so we have had a system configured that after build triggers an ssh slave (14.04 LTS) in a test chamber to run hours of tests, an hour after a build, 12 hours overnight and 60 hours over the weekend. All stable and happy.
For the last couple of months we have been recreating the setup for a new product, pretty much a copy/paste of the existing system except this is on 16.04 LTS.
Here is the twist, adding the new system seems to have broken the old.
I have about logs of about 600 good jobs up to 16th Feb, after that about 2 in 50 have worked.
I cannot even work out which end is failing.
Over the last weekend I disconnected Jenkins from the slaves (I changed the addresses not just offlined). On the jenkins server I ran 2 command windows, in the windows I ran ssh connections to the slaves and executed the commands manually to run 60 hour tests on both.
Both stayed up for 60 hours and ran to conclusion.
Tried automated again this morning, broken in minutes.
I spotted this morning a new version of the ssh plugin, so I upgraded from 1.25.1 to 1.26, same results.
Any ideas...
The jenkins log looks like this
... **TEST PASSED** No failures were logged FATAL: command execution failed java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.readShort(Unknown Source) at java.io.ObjectInputStream.readStreamHeader(Unknown Source) at java.io.ObjectInputStream.<init>(Unknown Source) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63) Caused: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77) Caused: java.io.IOException: Backing channel 'elysvrmelchett' is disconnected. at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:212) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:281) at com.sun.proxy.$Proxy92.isAlive(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1138) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1130) at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744) at hudson.model.Build$BuildExecution.build(Build.java:206) at hudson.model.Build$BuildExecution.doRun(Build.java:163) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504) at hudson.model.Run.execute(Run.java:1724) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:429) Build step 'Execute shell' marked build as failure Set build name. Workspace is not accessible ERROR: Step ‘Archive the artifacts’ failed: no workspace for FullTest_3400 #1598 ...
The slave log looks like this
[02/26/18 09:38:19] [SSH] Opening SSH connection to elysvrmelchett:22. [02/26/18 09:38:19] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection. [02/26/18 09:38:19] [SSH] Authentication successful. [02/26/18 09:38:20] [SSH] The remote user's environment is: BASH=/bin/bash BASHOPTS=cmdhist:complete_fullquote:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath BASH_ALIASES=() BASH_ARGC=() BASH_ARGV=() BASH_CMDS=() BASH_EXECUTION_STRING=set BASH_LINENO=() BASH_SOURCE=() BASH_VERSINFO=([0]="4" [1]="3" [2]="11" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu") BASH_VERSION='4.3.11(1)-release' DIRSTACK=() EUID=1000 GROUPS=() HOME=/home/wifi HOSTNAME=melchett HOSTTYPE=x86_64 IFS=$' \t\n' LANG=en_GB.UTF-8 LANGUAGE=en_GB:en LOGNAME=wifi MACHTYPE=x86_64-pc-linux-gnu MAIL=/var/mail/wifi OPTERR=1 OPTIND=1 OSTYPE=linux-gnu PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games PIPESTATUS=([0]="0") PPID=3953 PS4='+ ' PWD=/home/wifi SHELL=/bin/bash SHELLOPTS=braceexpand:hashall:interactive-comments SHLVL=1 SSH_CLIENT='10.146.37.211 60750 22' SSH_CONNECTION='10.146.37.211 60750 10.146.37.236 22' TERM=dumb UID=1000 USER=wifi XDG_RUNTIME_DIR=/run/user/1000 XDG_SESSION_ID=97 _=']' [02/26/18 09:38:20] [SSH] Checking java version of java [02/26/18 09:38:20] [SSH] java -version returned 9.0.4. [02/26/18 09:38:20] [SSH] Starting sftp client. [02/26/18 09:38:20] [SSH] Copying latest slave.jar... [02/26/18 09:38:20] [SSH] Copied 745,674 bytes. Expanded the channel window size to 4MB [02/26/18 09:38:20] [SSH] Starting slave process: cd "/home/wifi/jenkins" && java -jar slave.jar <===[JENKINS REMOTING CAPACITY]===> channel started WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by hudson.remoting.RemoteClassLoader (file:/home/wifi/jenkins/slave.jar) to method java.lang.ClassLoader.getClassLoadingLock(java.lang.String) WARNING: Please consider reporting this to the maintainers of hudson.remoting.RemoteClassLoader WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release Remoting version: 3.14 This is a Unix agent Evacuated stdout Agent successfully connected and online Feb 26, 2018 9:39:21 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 153.4±140.3/sec; rate(5min) = 192.1±76.3/sec; rate(15min) = 197.8±45.4/sec; rate(total) = 157.9±181.7/sec; N = 11 Feb 26, 2018 9:39:21 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats WARNING: The 15 minute average rate is 197.8±45.4/sec. At the 95% confidence level this is above 100.0/sec. If this message is repeated often in the logs then very seriously consider setting system property 'hudson.remoting.RemoteInvocationHandler.Unexporter.retainOrigin' to 'false' to trade debug diagnostics for reduced memory pressure. Feb 26, 2018 9:40:21 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 56.4±112.7/sec; rate(5min) = 157.3±101.2/sec; rate(15min) = 185.0±65.5/sec; rate(total) = 75.5±148.4/sec; N = 23 Feb 26, 2018 9:41:21 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 21.7±73.4/sec; rate(5min) = 129.1±109.5/sec; rate(15min) = 173.2±77.8/sec; rate(total) = 50.1±125.3/sec; N = 35 Feb 26, 2018 9:42:22 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 78.4±150.9/sec; rate(5min) = 130.4±125.8/sec; rate(15min) = 171.1±88.8/sec; rate(total) = 73.8±148.0/sec; N = 47 Feb 26, 2018 9:43:22 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 32.1±102.7/sec; rate(5min) = 108.8±124.5/sec; rate(15min) = 161.0±95.0/sec; rate(total) = 60.0±136.2/sec; N = 58 Feb 26, 2018 9:44:22 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 77.6±152.9/sec; rate(5min) = 113.3±136.9/sec; rate(15min) = 159.6±103.1/sec; rate(total) = 73.9±148.6/sec; N = 70 Feb 26, 2018 9:45:22 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 29.3±99.8/sec; rate(5min) = 92.9±131.2/sec; rate(15min) = 149.4±107.0/sec; rate(total) = 63.2±139.7/sec; N = 82 Feb 26, 2018 9:46:22 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 10.8±62.2/sec; rate(5min) = 76.1±124.0/sec; rate(15min) = 139.7±109.8/sec; rate(total) = 55.2±132.2/sec; N = 94 Feb 26, 2018 9:47:22 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 4.3±39.7/sec; rate(5min) = 63.3±116.7/sec; rate(15min) = 131.4±111.5/sec; rate(total) = 49.4±126.2/sec; N = 105 Feb 26, 2018 9:48:22 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 1.6±24.1/sec; rate(5min) = 51.9±108.3/sec; rate(15min) = 123.0±112.6/sec; rate(total) = 44.3±120.5/sec; N = 117 Feb 26, 2018 9:49:22 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 0.8±14.7/sec; rate(5min) = 42.5±100.0/sec; rate(15min) = 115.1±113.0/sec; rate(total) = 40.3±115.4/sec; N = 129 Feb 26, 2018 9:50:22 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 0.3±8.9/sec; rate(5min) = 34.8±92.0/sec; rate(15min) = 107.7±112.9/sec; rate(total) = 36.8±111.0/sec; N = 141 Feb 26, 2018 9:51:22 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 0.1±5.4/sec; rate(5min) = 28.5±84.3/sec; rate(15min) = 100.7±112.3/sec; rate(total) = 33.9±107.0/sec; N = 153 Feb 26, 2018 9:52:22 AM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 68.6±154.0/sec; rate(5min) = 48.3±120.0/sec; rate(15min) = 103.8±119.9/sec; rate(total) = 42.1±119.5/sec; N = 164 ERROR: Connection terminated [8mha:////4EbjKsAjs0CIpq1Oi8fZUAaTul1pdwwddC4QELZISFRGAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==[0mjava.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.readShort(Unknown Source) at java.io.ObjectInputStream.readStreamHeader(Unknown Source) at java.io.ObjectInputStream.<init>(Unknown Source) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63) Caused: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77) ERROR: Socket connection to SSH server was lost [8mha:////4EbjKsAjs0CIpq1Oi8fZUAaTul1pdwwddC4QELZISFRGAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==[0mjava.net.SocketException: Connection reset by peer: socket write error at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at com.trilead.ssh2.crypto.cipher.CipherOutputStream.flush(CipherOutputStream.java:75) at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:193) at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:107) at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:690) at com.trilead.ssh2.channel.ChannelManager.sendData(ChannelManager.java:429) at com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:63) at hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:90) at hudson.remoting.ChunkedOutputStream.sendBreak(ChunkedOutputStream.java:62) at hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:46) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:45) at hudson.remoting.Channel.send(Channel.java:675) at hudson.remoting.RemoteInvocationHandler$PhantomReferenceImpl.cleanup(RemoteInvocationHandler.java:391) at hudson.remoting.RemoteInvocationHandler$PhantomReferenceImpl.access$900(RemoteInvocationHandler.java:350) at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:608) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110) at java.lang.Thread.run(Unknown Source) Slave JVM has not reported exit code before the socket was lost [02/26/18 09:54:25] [SSH] Connection closed.
I updated the problem box to the latest LTS release, same as the working box.
Much improved, but still fails 50% of the time.
I suspect one of the corporate forced updates that happened on the day it went wrong is the culprit.
http://support.microsoft.com/?kbid=4056894
http://support.microsoft.com/?kbid=4056568
http://support.microsoft.com/?kbid=4054176
http://support.microsoft.com/?kbid=4054998
Somehow something has made the jenkins ssh comms more susceptible to something.