[JENKINS-48850] Random java.io.IOException: Unexpected termination of the channel

Type: Bug
Resolution: Cannot Reproduce
Priority: Minor
Component/s: remoting, ssh-slaves-plugin
Labels:
None
Environment:
Jenkins server/slave OS: Ubuntu 14.04.5 LTS
Jenkins server/slave openJDK: 8u141-b15-3~14.04
Jenkins: 2.89.2
SSH-slave-plugin: 1.23

Similar Issues:
Powered by SuggestiMate

Show

Related to: ~~JENKINS-25858~~ and ~~JENKINS-48810~~

Per suggestion from oleg_nenashev,
I'm openning a separate bug ticket for further investigation.

Jenkins Server log:

Dec 21, 2017 12:17:09 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel jenkins-smoke-slave03(192.168.100.94)
java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused by: java.io.EOFException
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)

Jenkins Slave log:

Dec 21, 2017 12:15:09 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
INFO: rate(1min) = 381.9±905.3/sec; rate(5min) = 363.6±923.4/sec; rate(15min) = 335.3±927.4/sec; rate(total) = 100.3±521.0/sec; N = 35,086
Dec 21, 2017 12:16:09 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
INFO: rate(1min) = 272.0±705.3/sec; rate(5min) = 324.8±863.5/sec; rate(15min) = 322.8±905.9/sec; rate(total) = 100.3±521.0/sec; N = 35,098
Dec 21, 2017 12:17:09 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
INFO: rate(1min) = 321.9±768.9/sec; rate(5min) = 333.2±865.8/sec; rate(15min) = 326.3±905.0/sec; rate(total) = 100.4±521.2/sec; N = 35,110
ERROR: Connection terminated
ESC[8mha:////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==ESC[0mjava.io.EOFException
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
ERROR: Socket connection to SSH server was lost
ESC[8mha:////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==ESC[0mjava.io.IOException: Peer sent DISCONNECT message (reason code 2): Packet corrupt
        at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:779)
        at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502)
        at java.lang.Thread.run(Thread.java:748)
Slave JVM has not reported exit code before the socket was lost
[12/21/17 12:17:09] [SSH] Connection closed.

This "Unexpected termination of the channel" has happened everyday (3 days in a roll) to any of slaves randomly since I updated the Jenkins core and all the plugins to the latest on Dec 19. 2017.

The previous Jenkins core and plugin were updated back on April 2017:

Jenkins Core: 2.46.2
SSH-slave puglin: 1.16

Due to the more than usual of the random "Unexpected termination of the channel",
on "Dec 22. 2017" I downgraded Jenkins Core and SSH-slave plugin to:

Jenkins Core: 2.60.3 (which remoting should be the same as 2.46.2 based on changelog)
SSH-slave puglin: 1.16

The issue has been eased since the downgrade,
but the random "Unexpected termination of the channel" still happened a couple time so far.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

2018-0104-jenkins.log.tgz
872 kB
2018-01-08 22:04
2018-0108-jenkins.log.tgz
181 kB
2018-01-09 00:04

relates to

JENKINS-48810 How to disable hudson.remoting.RemoteInvocationHandler$Unexporter reportStats?

Resolved

Rick Liu added a comment - 2018-01-08 22:01 - edited

After the downgrade,

Jenkins Core: 2.60.3
SSH-slave puglin: 1.16

one randome SSH disconnection again on "Jan 04, 2018 @ 12:39:45 PM":

Server Log:

Jan 04, 2018 12:37:16 PM hudson.model.Run execute
INFO: FortiOS_Automated_Test_6.0/FOS_Release/PLATFORM=FGT_81E,TYPE=cli-syntax-collection #49 main build action completed: SUCCESS
Jan 04, 2018 12:38:14 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: TinderBox/FortiOS/Build_Steps/trunk_Chroot_Build #76057 completed: SUCCESS
Jan 04, 2018 12:38:14 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: TinderBox/FortiOS/Build_Steps/trunk_Chroot_Build #76046 completed: SUCCESS
Jan 04, 2018 12:38:21 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: TinderBox/FortiOS/Build_Steps/trunk_Chroot_Build #76036 completed: SUCCESS
Jan 04, 2018 12:38:31 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: TinderBox/FortiOS/Build_Steps/trunk_Chroot_Build #76056 completed: SUCCESS
Jan 04, 2018 12:38:39 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: TinderBox/FortiOS/Build_Steps/trunk_Chroot_Build #76059 completed: SUCCESS
Jan 04, 2018 12:39:03 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: FortiRobot/pipeline_FortiRobot_Test #14366 completed: SUCCESS
Jan 04, 2018 12:39:05 PM hudson.model.Run execute
INFO: FortiOS_Automated_Test_6.0/FOS_Release/PLATFORM=FGT_61E,TYPE=cli-syntax-collection #49 main build action completed: SUCCESS
Jan 04, 2018 12:39:11 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: FGT_VM_QA/low_level_job #2631 completed: SUCCESS
Jan 04, 2018 12:39:45 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel jenkins-smoke-slave01(172.16.182.121)
java.net.SocketException: Broken pipe (Write failed)
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
        at com.trilead.ssh2.crypto.cipher.CipherOutputStream.flush(CipherOutputStream.java:75)
        at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:193)
        at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:107)
        at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:690)
        at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:407)
        at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:347)
        at com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:943)
        at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58)
        at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79)
        at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86)
        at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
        at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
        at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)

Jan 04, 2018 12:39:50 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: FortiRobot/pipeline_FortiRobot_Test #14359 completed: SUCCESS
Jan 04, 2018 12:39:51 PM hudson.model.Run execute
INFO: FortiOS_Automated_Test_6.0/FOS_Release/PLATFORM=FGT_60E,TYPE=cli-syntax-collection #49 main build action completed: SUCCESS
Jan 04, 2018 12:40:29 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: TinderBox/FortiOS/Build_Steps/trunk_Chroot_Build #76035 completed: SUCCESS
Jan 04, 2018 12:40:35 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load
WARNING: took 375ms to load/not load groovy.util.BUILD_URL from classLoader hudson.PluginManager$UberClassLoader
Jan 04, 2018 12:40:35 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load
WARNING: took 323ms to load/not load MINOR_VERSION from classLoader hudson.PluginManager$UberClassLoader
Jan 04, 2018 12:40:36 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load
WARNING: took 294ms to load/not load java.util.WorkflowScript$EXECUTOR_NUMBER from classLoader hudson.PluginManager$UberClassLoader
Jan 04, 2018 12:40:36 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load
WARNING: took 317ms to load/not load groovy.lang.GroovyObject$groovy$lang$CHROOT_NAME from classLoader hudson.PluginManager$UberClassLoader
Jan 04, 2018 12:40:37 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load
WARNING: took 331ms to load/not load java.io.Serializable$java$lang$PATCH_VERSION from classLoader hudson.PluginManager$UberClassLoader
Jan 04, 2018 12:40:45 PM hudson.slaves.SlaveComputer tryReconnect
INFO: Attempting to reconnect jenkins-smoke-slave01(172.16.182.121)
[01/04/18 12:40:51] SSH Launch of jenkins-smoke-slave01(172.16.182.121) on 172.16.182.121 completed in 6,006 ms
Jan 04, 2018 12:41:03 PM hudson.model.Run execute
INFO: FortiRobot/Performance_Tests/Jenkins_Performance_Staging #1946 main build action completed: SUCCESS

Slave Log:

Jan 04, 2018 12:34:45 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
INFO: rate(1min) = 257.3±1023.6/sec; rate(5min) = 346.4±1058.6/sec; rate(15min) = 364.6±1013.0/sec; rate(total) = 117.6±534.6/sec; N = 16,077
Jan 04, 2018 12:35:45 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
INFO: rate(1min) = 318.8±816.5/sec; rate(5min) = 352.9±1001.2/sec; rate(15min) = 365.9±995.2/sec; rate(total) = 117.8±534.8/sec; N = 16,089
Jan 04, 2018 12:36:45 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
INFO: rate(1min) = 415.6±921.6/sec; rate(5min) = 362.4±987.4/sec; rate(15min) = 367.7±990.0/sec; rate(total) = 118.0±535.2/sec; N = 16,101
Jan 04, 2018 12:37:45 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
INFO: rate(1min) = 491.1±1299.9/sec; rate(5min) = 371.4±1054.2/sec; rate(15min) = 369.3±1011.6/sec; rate(total) = 118.2±536.2/sec; N = 16,113
Jan 04, 2018 12:38:45 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
INFO: rate(1min) = 545.2±1291.1/sec; rate(5min) = 380.2±1066.8/sec; rate(15min) = 371.0±1017.1/sec; rate(total) = 118.4±536.8/sec; N = 16,125
Jan 04, 2018 12:39:45 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
INFO: rate(1min) = 458.2±1158.7/sec; rate(5min) = 364.5±1043.4/sec; rate(15min) = 365.0±1010.3/sec; rate(total) = 118.5±537.2/sec; N = 16,137
ERROR: Connection terminated
ha:////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==java.net.SocketException: Broken pipe (Write failed)
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at com.trilead.ssh2.crypto.cipher.CipherOutputStream.flush(CipherOutputStream.java:75)
	at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:193)
	at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:107)
	at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:690)
	at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:407)
	at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:347)
	at com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:943)
	at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58)
	at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79)
	at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86)
	at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
	at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
	at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
ERROR: Socket connection to SSH server was lost
ha:////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==java.net.SocketException: Broken pipe (Write failed)
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at com.trilead.ssh2.crypto.cipher.CipherOutputStream.flush(CipherOutputStream.java:75)
	at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:193)
	at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:107)
	at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:690)
	at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:407)
	at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:347)
	at com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:943)
	at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58)
	at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79)
	at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86)
	at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
	at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
	at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
Slave JVM has not reported exit code before the socket was lost
[01/04/18 12:39:48] [SSH] Connection closed.

Rick Liu added a comment - 2018-01-08 22:01 - edited After the downgrade, Jenkins Core: 2.60.3 SSH-slave puglin: 1.16 one randome SSH disconnection again on " Jan 04, 2018 @ 12:39:45 PM ": Server Log: Jan 04, 2018 12:37:16 PM hudson.model.Run execute INFO: FortiOS_Automated_Test_6.0/FOS_Release/PLATFORM=FGT_81E,TYPE=cli-syntax-collection #49 main build action completed: SUCCESS Jan 04, 2018 12:38:14 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish INFO: TinderBox/FortiOS/Build_Steps/trunk_Chroot_Build #76057 completed: SUCCESS Jan 04, 2018 12:38:14 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish INFO: TinderBox/FortiOS/Build_Steps/trunk_Chroot_Build #76046 completed: SUCCESS Jan 04, 2018 12:38:21 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish INFO: TinderBox/FortiOS/Build_Steps/trunk_Chroot_Build #76036 completed: SUCCESS Jan 04, 2018 12:38:31 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish INFO: TinderBox/FortiOS/Build_Steps/trunk_Chroot_Build #76056 completed: SUCCESS Jan 04, 2018 12:38:39 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish INFO: TinderBox/FortiOS/Build_Steps/trunk_Chroot_Build #76059 completed: SUCCESS Jan 04, 2018 12:39:03 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish INFO: FortiRobot/pipeline_FortiRobot_Test #14366 completed: SUCCESS Jan 04, 2018 12:39:05 PM hudson.model.Run execute INFO: FortiOS_Automated_Test_6.0/FOS_Release/PLATFORM=FGT_61E,TYPE=cli-syntax-collection #49 main build action completed: SUCCESS Jan 04, 2018 12:39:11 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish INFO: FGT_VM_QA/low_level_job #2631 completed: SUCCESS Jan 04, 2018 12:39:45 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run SEVERE: I/O error in channel jenkins-smoke-slave01(172.16.182.121) java.net.SocketException: Broken pipe (Write failed) at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) at java.net.SocketOutputStream.write(SocketOutputStream.java:155) at com.trilead.ssh2.crypto.cipher.CipherOutputStream.flush(CipherOutputStream.java:75) at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:193) at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:107) at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:690) at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:407) at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:347) at com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:943) at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58) at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79) at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86) at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72) at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59) Jan 04, 2018 12:39:50 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish INFO: FortiRobot/pipeline_FortiRobot_Test #14359 completed: SUCCESS Jan 04, 2018 12:39:51 PM hudson.model.Run execute INFO: FortiOS_Automated_Test_6.0/FOS_Release/PLATFORM=FGT_60E,TYPE=cli-syntax-collection #49 main build action completed: SUCCESS Jan 04, 2018 12:40:29 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish INFO: TinderBox/FortiOS/Build_Steps/trunk_Chroot_Build #76035 completed: SUCCESS Jan 04, 2018 12:40:35 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load WARNING: took 375ms to load/not load groovy.util.BUILD_URL from classLoader hudson.PluginManager$UberClassLoader Jan 04, 2018 12:40:35 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load WARNING: took 323ms to load/not load MINOR_VERSION from classLoader hudson.PluginManager$UberClassLoader Jan 04, 2018 12:40:36 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load WARNING: took 294ms to load/not load java.util.WorkflowScript$EXECUTOR_NUMBER from classLoader hudson.PluginManager$UberClassLoader Jan 04, 2018 12:40:36 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load WARNING: took 317ms to load/not load groovy.lang.GroovyObject$groovy$lang$CHROOT_NAME from classLoader hudson.PluginManager$UberClassLoader Jan 04, 2018 12:40:37 PM org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxResolvingClassLoader$4$1 load WARNING: took 331ms to load/not load java.io.Serializable$java$lang$PATCH_VERSION from classLoader hudson.PluginManager$UberClassLoader Jan 04, 2018 12:40:45 PM hudson.slaves.SlaveComputer tryReconnect INFO: Attempting to reconnect jenkins-smoke-slave01(172.16.182.121) [01/04/18 12:40:51] SSH Launch of jenkins-smoke-slave01(172.16.182.121) on 172.16.182.121 completed in 6,006 ms Jan 04, 2018 12:41:03 PM hudson.model.Run execute INFO: FortiRobot/Performance_Tests/Jenkins_Performance_Staging #1946 main build action completed: SUCCESS Slave Log: Jan 04, 2018 12:34:45 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 257.3±1023.6/sec; rate(5min) = 346.4±1058.6/sec; rate(15min) = 364.6±1013.0/sec; rate(total) = 117.6±534.6/sec; N = 16,077 Jan 04, 2018 12:35:45 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 318.8±816.5/sec; rate(5min) = 352.9±1001.2/sec; rate(15min) = 365.9±995.2/sec; rate(total) = 117.8±534.8/sec; N = 16,089 Jan 04, 2018 12:36:45 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 415.6±921.6/sec; rate(5min) = 362.4±987.4/sec; rate(15min) = 367.7±990.0/sec; rate(total) = 118.0±535.2/sec; N = 16,101 Jan 04, 2018 12:37:45 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 491.1±1299.9/sec; rate(5min) = 371.4±1054.2/sec; rate(15min) = 369.3±1011.6/sec; rate(total) = 118.2±536.2/sec; N = 16,113 Jan 04, 2018 12:38:45 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 545.2±1291.1/sec; rate(5min) = 380.2±1066.8/sec; rate(15min) = 371.0±1017.1/sec; rate(total) = 118.4±536.8/sec; N = 16,125 Jan 04, 2018 12:39:45 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 458.2±1158.7/sec; rate(5min) = 364.5±1043.4/sec; rate(15min) = 365.0±1010.3/sec; rate(total) = 118.5±537.2/sec; N = 16,137 ERROR: Connection terminated ha: ////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==java.net.SocketException: Broken pipe (Write failed) at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) at java.net.SocketOutputStream.write(SocketOutputStream.java:155) at com.trilead.ssh2.crypto.cipher.CipherOutputStream.flush(CipherOutputStream.java:75) at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:193) at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:107) at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:690) at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:407) at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:347) at com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:943) at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58) at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79) at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86) at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72) at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59) ERROR: Socket connection to SSH server was lost ha: ////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==java.net.SocketException: Broken pipe (Write failed) at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) at java.net.SocketOutputStream.write(SocketOutputStream.java:155) at com.trilead.ssh2.crypto.cipher.CipherOutputStream.flush(CipherOutputStream.java:75) at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:193) at com.trilead.ssh2.transport.TransportConnection.sendMessage(TransportConnection.java:107) at com.trilead.ssh2.transport.TransportManager.sendMessage(TransportManager.java:690) at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:407) at com.trilead.ssh2.channel.Channel.freeupWindow(Channel.java:347) at com.trilead.ssh2.channel.ChannelManager.getChannelData(ChannelManager.java:943) at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:58) at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79) at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:86) at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72) at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59) Slave JVM has not reported exit code before the socket was lost [01/04/18 12:39:48] [SSH] Connection closed.

Rick Liu added a comment - 2018-01-09 00:02

Just happened again 30 mins ago~

Jenkins Core: 2.60.3
SSH-slave puglin: 1.16

On "01/08/18 15:07:02" and "01/08/18 15:39:29":

Server.log:

Jan 08, 2018 3:07:02 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel jenkins-smoke-slave03(192.168.100.94)
java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73)
Caused by: java.io.EOFException
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)

Jan 08, 2018 3:07:45 PM hudson.slaves.SlaveComputer tryReconnect
INFO: Attempting to reconnect jenkins-smoke-slave03(192.168.100.94)
[01/08/18 15:07:45] SSH Launch of jenkins-smoke-slave03(192.168.100.94) on 192.168.100.94 failed in 170 ms
Jan 08, 2018 3:09:07 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish
INFO: TinderBox/FortiOS/Build_Steps/5.6_Chroot_Build #64101 completed: SUCCESS
Jan 08, 2018 3:09:45 PM hudson.slaves.SlaveComputer tryReconnect
INFO: Attempting to reconnect jenkins-smoke-slave03(192.168.100.94)
[01/08/18 15:10:11] SSH Launch of jenkins-smoke-slave03(192.168.100.94) on 192.168.100.94 completed in 25,748 ms

Jan 08, 2018 3:39:09 PM hudson.scm.SubversionSCM buildEnvironment
WARNING: no revision found corresponding to $svnurl; known: [https://scm-yvr.fortinet.com/svn/svnfos/FortiOS/trunk]
Jan 08, 2018 3:39:09 PM hudson.scm.SubversionSCM buildEnvironment
WARNING: no revision found corresponding to $svnurl; known: [https://scm-yvr.fortinet.com/svn/svnfos/FortiOS/trunk]
Jan 08, 2018 3:39:29 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel jenkins-smoke-slave03(192.168.100.94)
java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73)
Caused by: java.io.EOFException
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)

Jan 08, 2018 3:40:45 PM hudson.slaves.SlaveComputer tryReconnect
INFO: Attempting to reconnect jenkins-smoke-slave03(192.168.100.94)
Jan 08, 2018 3:41:02 PM hudson.model.Run execute
INFO: FortiRobot/Performance_Tests/Jenkins_Performance_Staging #1979 main build action completed: SUCCESS
[01/08/18 15:41:07] SSH Launch of jenkins-smoke-slave03(192.168.100.94) on 192.168.100.94 completed in 22,161 ms

Slave.log:

Jan 08, 2018 3:04:43 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
INFO: rate(1min) = 33.9±241.7/sec; rate(5min) = 102.7±461.3/sec; rate(15min) = 174.3±644.9/sec; rate(total) = 50.4±359.9/sec; N = 85,304
Jan 08, 2018 3:05:43 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
INFO: rate(1min) = 288.4±817.5/sec; rate(5min) = 141.6±560.3/sec; rate(15min) = 182.4±659.5/sec; rate(total) = 50.4±360.0/sec; N = 85,316
Jan 08, 2018 3:06:43 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats
INFO: rate(1min) = 141.5±543.7/sec; rate(5min) = 131.2±522.5/sec; rate(15min) = 176.4±642.8/sec; rate(total) = 50.4±360.0/sec; N = 85,328
ERROR: Connection terminated
^[[8mha:////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==^[[0mjava.io.EOFException
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
Caused: java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73)
ERROR: Socket connection to SSH server was lost
^[[8mha:////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==^[[0mjava.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:210)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at com.trilead.ssh2.crypto.cipher.CipherInputStream.fill_buffer(CipherInputStream.java:41)
        at com.trilead.ssh2.crypto.cipher.CipherInputStream.internal_read(CipherInputStream.java:52)
        at com.trilead.ssh2.crypto.cipher.CipherInputStream.getBlock(CipherInputStream.java:79)
        at com.trilead.ssh2.crypto.cipher.CipherInputStream.read(CipherInputStream.java:108)
        at com.trilead.ssh2.transport.TransportConnection.receiveMessage(TransportConnection.java:232)
        at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:706)
        at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502)
        at java.lang.Thread.run(Thread.java:748)
Slave JVM has not reported exit code before the socket was lost
[01/08/18 15:07:02] [SSH] Connection closed.

[01/08/18 15:10:07] [SSH] Checking java version of java
[01/08/18 15:10:07] [SSH] java -version returned 1.8.0_141.
[01/08/18 15:10:07] [SSH] Starting sftp client.
[01/08/18 15:10:07] [SSH] Copying latest slave.jar...
[01/08/18 15:10:08] [SSH] Copied 719,269 bytes.
Expanded the channel window size to 4MB
[01/08/18 15:10:08] [SSH] Starting slave process: cd "/home/devops/jenkins_slave_robot" && java  -jar slave.jar
<===[JENKINS REMOTING CAPACITY]===>^@^@^@channel started
Slave.jar version: 3.7
This is a Unix agent
Evacuated stdout
Agent successfully connected and online
ERROR: Connection terminated
^[[8mha:////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==^[[0mjava.io.EOFException
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349)
        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
Caused: java.io.IOException: Unexpected termination of the channel
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73)
ERROR: Socket connection to SSH server was lost
^[[8mha:////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==^[[0mjava.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:210)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at com.trilead.ssh2.crypto.cipher.CipherInputStream.fill_buffer(CipherInputStream.java:41)
        at com.trilead.ssh2.crypto.cipher.CipherInputStream.internal_read(CipherInputStream.java:52)
        at com.trilead.ssh2.crypto.cipher.CipherInputStream.getBlock(CipherInputStream.java:79)
        at com.trilead.ssh2.crypto.cipher.CipherInputStream.read(CipherInputStream.java:108)
        at com.trilead.ssh2.transport.TransportConnection.receiveMessage(TransportConnection.java:232)
        at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:706)
        at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502)
        at java.lang.Thread.run(Thread.java:748)
Slave JVM has not reported exit code before the socket was lost
[01/08/18 15:39:29] [SSH] Connection closed.

Rick Liu added a comment - 2018-01-09 00:02 Just happened again 30 mins ago~ Jenkins Core: 2.60.3 SSH-slave puglin: 1.16 On " 01/08/18 15:07:02 " and " 01/08/18 15:39:29 ": Server.log: Jan 08, 2018 3:07:02 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run SEVERE: I/O error in channel jenkins-smoke-slave03(192.168.100.94) java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73) Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59) Jan 08, 2018 3:07:45 PM hudson.slaves.SlaveComputer tryReconnect INFO: Attempting to reconnect jenkins-smoke-slave03(192.168.100.94) [01/08/18 15:07:45] SSH Launch of jenkins-smoke-slave03(192.168.100.94) on 192.168.100.94 failed in 170 ms Jan 08, 2018 3:09:07 PM org.jenkinsci.plugins.workflow.job.WorkflowRun finish INFO: TinderBox/FortiOS/Build_Steps/5.6_Chroot_Build #64101 completed: SUCCESS Jan 08, 2018 3:09:45 PM hudson.slaves.SlaveComputer tryReconnect INFO: Attempting to reconnect jenkins-smoke-slave03(192.168.100.94) [01/08/18 15:10:11] SSH Launch of jenkins-smoke-slave03(192.168.100.94) on 192.168.100.94 completed in 25,748 ms Jan 08, 2018 3:39:09 PM hudson.scm.SubversionSCM buildEnvironment WARNING: no revision found corresponding to $svnurl; known: [https: //scm-yvr.fortinet.com/svn/svnfos/FortiOS/trunk] Jan 08, 2018 3:39:09 PM hudson.scm.SubversionSCM buildEnvironment WARNING: no revision found corresponding to $svnurl; known: [https: //scm-yvr.fortinet.com/svn/svnfos/FortiOS/trunk] Jan 08, 2018 3:39:29 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run SEVERE: I/O error in channel jenkins-smoke-slave03(192.168.100.94) java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73) Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59) Jan 08, 2018 3:40:45 PM hudson.slaves.SlaveComputer tryReconnect INFO: Attempting to reconnect jenkins-smoke-slave03(192.168.100.94) Jan 08, 2018 3:41:02 PM hudson.model.Run execute INFO: FortiRobot/Performance_Tests/Jenkins_Performance_Staging #1979 main build action completed: SUCCESS [01/08/18 15:41:07] SSH Launch of jenkins-smoke-slave03(192.168.100.94) on 192.168.100.94 completed in 22,161 ms Slave.log: Jan 08, 2018 3:04:43 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 33.9±241.7/sec; rate(5min) = 102.7±461.3/sec; rate(15min) = 174.3±644.9/sec; rate(total) = 50.4±359.9/sec; N = 85,304 Jan 08, 2018 3:05:43 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 288.4±817.5/sec; rate(5min) = 141.6±560.3/sec; rate(15min) = 182.4±659.5/sec; rate(total) = 50.4±360.0/sec; N = 85,316 Jan 08, 2018 3:06:43 PM hudson.remoting.RemoteInvocationHandler$Unexporter reportStats INFO: rate(1min) = 141.5±543.7/sec; rate(5min) = 131.2±522.5/sec; rate(15min) = 176.4±642.8/sec; rate(total) = 50.4±360.0/sec; N = 85,328 ERROR: Connection terminated ^[[8mha: ////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==^[[0mjava.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59) Caused: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73) ERROR: Socket connection to SSH server was lost ^[[8mha: ////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==^[[0mjava.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.net.SocketInputStream.read(SocketInputStream.java:141) at com.trilead.ssh2.crypto.cipher.CipherInputStream.fill_buffer(CipherInputStream.java:41) at com.trilead.ssh2.crypto.cipher.CipherInputStream.internal_read(CipherInputStream.java:52) at com.trilead.ssh2.crypto.cipher.CipherInputStream.getBlock(CipherInputStream.java:79) at com.trilead.ssh2.crypto.cipher.CipherInputStream.read(CipherInputStream.java:108) at com.trilead.ssh2.transport.TransportConnection.receiveMessage(TransportConnection.java:232) at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:706) at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502) at java.lang. Thread .run( Thread .java:748) Slave JVM has not reported exit code before the socket was lost [01/08/18 15:07:02] [SSH] Connection closed. [01/08/18 15:10:07] [SSH] Checking java version of java [01/08/18 15:10:07] [SSH] java -version returned 1.8.0_141. [01/08/18 15:10:07] [SSH] Starting sftp client. [01/08/18 15:10:07] [SSH] Copying latest slave.jar... [01/08/18 15:10:08] [SSH] Copied 719,269 bytes. Expanded the channel window size to 4MB [01/08/18 15:10:08] [SSH] Starting slave process: cd "/home/devops/jenkins_slave_robot" && java -jar slave.jar <===[JENKINS REMOTING CAPACITY]===>^@^@^@channel started Slave.jar version: 3.7 This is a Unix agent Evacuated stdout Agent successfully connected and online ERROR: Connection terminated ^[[8mha: ////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==^[[0mjava.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:349) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59) Caused: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73) ERROR: Socket connection to SSH server was lost ^[[8mha: ////4Cm+u8BY/EgsbhzNlnUfOXWprV5tRETZDv4u6647BaROAAAAVx+LCAAAAAAAAP9b85aBtbiIQSmjNKU4P08vOT+vOD8nVc8DzHWtSE4tKMnMz/PLL0mV3NWzufebKBsTA0NFEYMUmgZnCA1SyAABjCCFBQC2xNaiYAAAAA==^[[0mjava.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.net.SocketInputStream.read(SocketInputStream.java:141) at com.trilead.ssh2.crypto.cipher.CipherInputStream.fill_buffer(CipherInputStream.java:41) at com.trilead.ssh2.crypto.cipher.CipherInputStream.internal_read(CipherInputStream.java:52) at com.trilead.ssh2.crypto.cipher.CipherInputStream.getBlock(CipherInputStream.java:79) at com.trilead.ssh2.crypto.cipher.CipherInputStream.read(CipherInputStream.java:108) at com.trilead.ssh2.transport.TransportConnection.receiveMessage(TransportConnection.java:232) at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:706) at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502) at java.lang. Thread .run( Thread .java:748) Slave JVM has not reported exit code before the socket was lost [01/08/18 15:39:29] [SSH] Connection closed.

Francois Aichelbaum added a comment - 2018-02-06 09:19

as I experienced the same thing, I tested a lot of different things (kernel, libs, jdk version, ..., rolled back to previous version) with no luck

Then, I realized that my issues started with the various VMware patches for Spectre/Meltdown and that most of the case here for this kind of issues did too.

Maybe worth digging that way.

Cheers,

Francois Aichelbaum added a comment - 2018-02-06 09:19 Hi as I experienced the same thing, I tested a lot of different things (kernel, libs, jdk version, ..., rolled back to previous version) with no luck Then, I realized that my issues started with the various VMware patches for Spectre/Meltdown and that most of the case here for this kind of issues did too. Maybe worth digging that way. Cheers,

Rick Liu added a comment - 2018-02-06 18:51

faichelbaum
Are you using VMware on Jenkins server side or Jenkins slave side?
My Jenkins slaves are physical baremetal Dell machines,
and only the Jenkins master is a VMware VM.

Do you have VMware's patch number you applied?

Rick Liu added a comment - 2018-02-06 18:51 faichelbaum Are you using VMware on Jenkins server side or Jenkins slave side? My Jenkins slaves are physical baremetal Dell machines, and only the Jenkins master is a VMware VM. Do you have VMware's patch number you applied?

Francois Aichelbaum added a comment - 2018-02-06 19:40

Same setup: server on a VMWare VM and Slaves on bare metal (Docker on bare metal to be exact).

I need to double check for the path number as I'm not the one who applied it but I know for sure they were related to Spectre and Meltdown. As far as I know, the guy deployed all the available patches for those at that time.

Francois Aichelbaum added a comment - 2018-02-06 19:40 Same setup: server on a VMWare VM and Slaves on bare metal (Docker on bare metal to be exact). I need to double check for the path number as I'm not the one who applied it but I know for sure they were related to Spectre and Meltdown. As far as I know, the guy deployed all the available patches for those at that time.

Rick Liu added a comment - 2018-02-07 18:14

I have another question.

If the interruption is initiated from the server side (VM),
and I have multiple slaves connected at the same time,
why only single slave would get disconnected instead of all slaves got disconnected?

This is why I can't be 100% sure the interrption is coming from server side or slave side.

Rick Liu added a comment - 2018-02-07 18:14 I have another question. If the interruption is initiated from the server side (VM), and I have multiple slaves connected at the same time, why only single slave would get disconnected instead of all slaves got disconnected? This is why I can't be 100% sure the interrption is coming from server side or slave side.

Francois Aichelbaum added a comment - 2018-02-07 18:59

I had the same question.

I started a slave via a Jenkins job. Connected towards it in SSH in parallel of the job being run, from the Linux hosting Jenkins:

Jenkins lost the connection
SSH remained up with no latency or packet lost

Same thing with hping3 for instance.

My guess is that the issue is random and target the sockets, not the global service.

Francois Aichelbaum added a comment - 2018-02-07 18:59 I had the same question. I started a slave via a Jenkins job. Connected towards it in SSH in parallel of the job being run, from the Linux hosting Jenkins: Jenkins lost the connection SSH remained up with no latency or packet lost Same thing with hping3 for instance. My guess is that the issue is random and target the sockets, not the global service.

Owen Mehegan added a comment - 2018-05-16 07:56

faichelbaum totoroliu were either of you able to resolve this issue? I have been seeing it myself and wondering if there is any fix.

Also faichelbaum I notice that you assigned this issue to yourself, was that intentional?

Owen Mehegan added a comment - 2018-05-16 07:56 faichelbaum totoroliu were either of you able to resolve this issue? I have been seeing it myself and wondering if there is any fix. Also faichelbaum I notice that you assigned this issue to yourself, was that intentional?

Francois Aichelbaum added a comment - 2018-05-16 08:06

No it's not intentional.

Though, I partly solved my case: I still have issues with "docker in docker" type of slaves when pulling from any docker registry due with a fixed 3 seconds delay between last even in the Console Output and the IOException event.

For the rest, I found a corner case in Jenkins when running it in a VMware environment where somehow, two elements needs to be set a specific way which were not needed before January:

(this one, I can explain thanks to Jenkins behaviour), you need to increase the rx ring buffer value : seems the default (which is the lowest value on most distro) is not enough and raising it helped ; i.e. ethtool -G eth0 rx 4096
(this one, I don't understand the relation), we need to enable back various checksum that we disable in our fine tuning for all our systems : ethtool -K eth0 tso on gso on rx on tx on

Both were setup as off/low value long before January so it does not really make sense and I consider it as a workaround and not a fix (especially as the first one is tweaking the default system value). So far, it solved all my IOException issues (besides the docker in docker one)

Francois Aichelbaum added a comment - 2018-05-16 08:06 No it's not intentional. Though, I partly solved my case: I still have issues with "docker in docker" type of slaves when pulling from any docker registry due with a fixed 3 seconds delay between last even in the Console Output and the IOException event. For the rest, I found a corner case in Jenkins when running it in a VMware environment where somehow, two elements needs to be set a specific way which were not needed before January: (this one, I can explain thanks to Jenkins behaviour), you need to increase the rx ring buffer value : seems the default (which is the lowest value on most distro) is not enough and raising it helped ; i.e. ethtool -G eth0 rx 4096 (this one, I don't understand the relation), we need to enable back various checksum that we disable in our fine tuning for all our systems : ethtool -K eth0 tso on gso on rx on tx on Both were setup as off/low value long before January so it does not really make sense and I consider it as a workaround and not a fix (especially as the first one is tweaking the default system value). So far, it solved all my IOException issues (besides the docker in docker one)

Owen Mehegan added a comment - 2018-05-16 11:09

faichelbaum thanks for your reply. I take it you made those changes on your master? In the case where I saw this the agents were VMware but the master was not.

Owen Mehegan added a comment - 2018-05-16 11:09 faichelbaum thanks for your reply. I take it you made those changes on your master? In the case where I saw this the agents were VMware but the master was not.

Francois Aichelbaum added a comment - 2018-05-16 11:11

I had to do this at both ends. Doing it just at one end only reduced the number of IOException events but not totally.

Francois Aichelbaum added a comment - 2018-05-16 11:11 I had to do this at both ends. Doing it just at one end only reduced the number of IOException events but not totally.

Oleg Nenashev added a comment - 2018-05-16 11:34

owenmehegan why am I assigned to it?

Oleg Nenashev added a comment - 2018-05-16 11:34 owenmehegan why am I assigned to it?

Owen Mehegan added a comment - 2018-05-16 11:44

oleg_nenashev sorry, I thought you were the default assignee for remoting. I will unassign.

Owen Mehegan added a comment - 2018-05-16 11:44 oleg_nenashev sorry, I thought you were the default assignee for remoting. I will unassign.

Rick Liu added a comment - 2018-05-16 17:02

owenmehegan
No, I don't have any solution or workaround yet.
Right now, it still happens once a while.

We improveed our pipeline script to self-retry when we enecounter the interruption.

Rick Liu added a comment - 2018-05-16 17:02 owenmehegan No, I don't have any solution or workaround yet. Right now, it still happens once a while. We improveed our pipeline script to self-retry when we enecounter the interruption.

Federico Naum added a comment - 2018-07-02 02:10

Hi totoroliu can you tell me how do you detect this type of interruption from inside the pipeline? can you catch this exception?

Federico Naum added a comment - 2018-07-02 02:10 Hi totoroliu can you tell me how do you detect this type of interruption from inside the pipeline? can you catch this exception?

Rick Liu added a comment - 2018-07-03 20:46

try {
#do your stuff here
}
catch (Exception err) {

if (err.toString().contains('channel is already closed'))

{ #retry your stuff here }

}

Rick Liu added a comment - 2018-07-03 20:46 try { #do your stuff here } catch (Exception err) { if (err.toString().contains('channel is already closed')) { #retry your stuff here } }

Federico Naum added a comment - 2018-07-03 23:46

Hi totoroliu,

I was trying to find a more elegant solution for our declarative pipeline. But I guess I will give this a try.

Thank you for your help.

Fede

Federico Naum added a comment - 2018-07-03 23:46 Hi totoroliu , I was trying to find a more elegant solution for our declarative pipeline. But I guess I will give this a try. Thank you for your help. Fede

Federico Naum added a comment - 2018-07-05 08:24

Looks like I can not retry a whole declarative pipeline

if I do

try {

    runMyPipeline config

}catch (Exception err) {

    if (err.toString().contains('channel is already closed')){

       runMyPipeline config

    }

}

where (a trimmed version of ) runMyPipelin.groovy is

def call(Map config) {

    // Default for environment variables
    // work out some config

    pipeline {
        agent {
            label node_label
        }
        parameters {
            string(name: 'LIBRARY_BRANCH_NAME',
                   description: 'For testing AL_jenkins_pipeline_library',
                   defaultValue: params.LIBRARY_BRANCH_NAME ? params.LIBRARY_BRANCH_NAME : 'master')
        }
        environment {
            // In the environment we workout and set some values that are going to be used in all pipeline steps
            CREATE_BUILD_ARTIFACTS = "${create_build_artifacts}"     //Note this becomes a string with true or false.
       }
        options{
            timeout(time: global_timeout, unit: 'MINUTES')
            timestamps()
            ansiColor('xterm')
        }

        stages {
            stage('Setup') {
                steps {
                    checkout scm
                    // Workout out some global values that can only be discovered when we have acquired a node and the source code.
                    // print some info about the configuration.
                    printParams params
                    printConfig config
                    log text: "Running on: " + env.NODE_NAME

                    addShortText text: env.NODE_NAME, background: 'yellow', border: 0
                    createSummary icon: "info.gif",  text: "Run on agent: " + env.NODE_NAME
                }
            }
            stage('Get Upstream Artifacts') {
                steps {
                        copyUpstreamArtifacts upstreamArtifactInfo: upstream_artifact_dependency_info
                    }
                }
            }
            stage('Build') {
                steps {
                    rezMultiBuild buildConfig: build_config,
                                  disableUpstreamArtifacts: disable_upstream_artifacts,
                                  rezForceLocal: rez_force_local,
                                  shouldFailFast: should_fail_fast
                }
                post {
                    success {
                        log level: "debug", text: "Mark this build as OK to reuse artifacts. But not to skip tests"
                    }
                }
            }
            stage('Test') {
                steps {
                    rezMultiTest buildConfig: build_config,
                                 rezForceLocal: rez_force_local,
                                 shouldFailFast: should_fail_fast
                }
                post {
                    success {
                        log level: "debug", text: "Mark this as OK to skip test when rebuilding the same build with the same sha"
                    }
                    always {
                        junit testDataPublishers: [[$class: 'ClaimTestDataPublisher']],
                              allowEmptyResults: true, testResults: '**/*_nosetests.xml'
                    }
                }

            }
            stage('Sonar Analysis') {
                steps {
                    sonarStage prepareSonarReport: prepare_sonar_reports_folder,
                               packagesToBuild: packages_to_build, sonarVerbose: sonar_verbose
                }
            }
            stage('Build Downstream Jobs') {
                steps {
                    buildDownstreamJobs downstreamJobConfig: downstream_job_config,
                                        downstreamDependencies: downstream_dependencies,
                                        shouldFailFast: should_fail_fast, fallbackBranch: fallback_branch
                }
            }
            stage('Results') {
                steps {
                    reportStatusToGitHub status: 'SUCCESS'
                    junit testDataPublishers: [[$class: 'ClaimTestDataPublisher']], allowEmptyResults: true, testResults: '**/*_nosetests.xml'
                }
            }
        }
        post {
            failure{
                notifyResult resultState: 'FAILURE',  hipChatCredentialId: hipchat_credential_id,
                             hipChatRoom: hipchat_room, sendEmail: send_email

                reportStatusToGitHub status: 'FAILURE'

            }
            always {
                 cleanWs notFailBuild: true
                }
            }
         }
    }

Then I get

java.lang.IllegalStateException: Only one pipeline { ... }  block can be executed in a single run.

I'm not sure how to handle this. There is nothing elegant that comes to my mind as the disconnection can occur at any stage of the pipeline, even if I have the error thrown available in the post phase, I would have to retry all the steps in the body and the post phases, which I guess it won't be a valid pipeline.

Does someone have any idea how to handle this?

I guess the workaround is not good for our case. I will keep poking on the remoting/ssh plugin to see if I can find the cause of the channel closing down

Federico Naum added a comment - 2018-07-05 08:24 Looks like I can not retry a whole declarative pipeline if I do try { runMyPipeline config } catch (Exception err) { if (err.toString().contains( 'channel is already closed' )){ runMyPipeline config } } where (a trimmed version of ) runMyPipelin.groovy is def call(Map config) { // Default for environment variables // work out some config pipeline { agent { label node_label } parameters { string(name: 'LIBRARY_BRANCH_NAME' , description: 'For testing AL_jenkins_pipeline_library' , defaultValue: params.LIBRARY_BRANCH_NAME ? params.LIBRARY_BRANCH_NAME : 'master' ) } environment { // In the environment we workout and set some values that are going to be used in all pipeline steps CREATE_BUILD_ARTIFACTS = "${create_build_artifacts}" //Note this becomes a string with true or false . } options{ timeout(time: global_timeout, unit: 'MINUTES' ) timestamps() ansiColor( 'xterm' ) } stages { stage( 'Setup' ) { steps { checkout scm // Workout out some global values that can only be discovered when we have acquired a node and the source code. // print some info about the configuration. printParams params printConfig config log text: "Running on: " + env.NODE_NAME addShortText text: env.NODE_NAME, background: 'yellow' , border: 0 createSummary icon: "info.gif" , text: "Run on agent: " + env.NODE_NAME } } stage( 'Get Upstream Artifacts' ) { steps { copyUpstreamArtifacts upstreamArtifactInfo: upstream_artifact_dependency_info } } } stage( 'Build' ) { steps { rezMultiBuild buildConfig: build_config, disableUpstreamArtifacts: disable_upstream_artifacts, rezForceLocal: rez_force_local, shouldFailFast: should_fail_fast } post { success { log level: "debug" , text: "Mark this build as OK to reuse artifacts. But not to skip tests" } } } stage( 'Test' ) { steps { rezMultiTest buildConfig: build_config, rezForceLocal: rez_force_local, shouldFailFast: should_fail_fast } post { success { log level: "debug" , text: "Mark this as OK to skip test when rebuilding the same build with the same sha" } always { junit testDataPublishers: [[$class: 'ClaimTestDataPublisher' ]], allowEmptyResults: true , testResults: '**/*_nosetests.xml' } } } stage( 'Sonar Analysis' ) { steps { sonarStage prepareSonarReport: prepare_sonar_reports_folder, packagesToBuild: packages_to_build, sonarVerbose: sonar_verbose } } stage( 'Build Downstream Jobs' ) { steps { buildDownstreamJobs downstreamJobConfig: downstream_job_config, downstreamDependencies: downstream_dependencies, shouldFailFast: should_fail_fast, fallbackBranch: fallback_branch } } stage( 'Results' ) { steps { reportStatusToGitHub status: 'SUCCESS' junit testDataPublishers: [[$class: 'ClaimTestDataPublisher' ]], allowEmptyResults: true , testResults: '**/*_nosetests.xml' } } } post { failure{ notifyResult resultState: 'FAILURE' , hipChatCredentialId: hipchat_credential_id, hipChatRoom: hipchat_room, sendEmail: send_email reportStatusToGitHub status: 'FAILURE' } always { cleanWs notFailBuild: true } } } } Then I get java.lang.IllegalStateException: Only one pipeline { ... } block can be executed in a single run. I'm not sure how to handle this. There is nothing elegant that comes to my mind as the disconnection can occur at any stage of the pipeline, even if I have the error thrown available in the post phase, I would have to retry all the steps in the body and the post phases, which I guess it won't be a valid pipeline. Does someone have any idea how to handle this? I guess the workaround is not good for our case. I will keep poking on the remoting/ssh plugin to see if I can find the cause of the channel closing down

Jeff Thompson added a comment - 2018-11-01 21:05

Unfortunately, I don't have any ideas or suggestions on this one. The logs don't provide any helpful information. It looks like the agent ones pretty much lack anything useful.

Commonly Remoting issues involve something in the networking or system environment terminating the connection from outside the process. The trick can be to determine what is doing that. People report a number of different causes, or at least changes that seem to make the problem go away. It can involve issues from the system, network, environment, or hosting / containerization system. I've heard that VM systems are sometimes particularly susceptible.

On the other hand, many people are very successful with long-running jobs and connections remain very stable.

Jeff Thompson added a comment - 2018-11-01 21:05 Unfortunately, I don't have any ideas or suggestions on this one. The logs don't provide any helpful information. It looks like the agent ones pretty much lack anything useful. Commonly Remoting issues involve something in the networking or system environment terminating the connection from outside the process. The trick can be to determine what is doing that. People report a number of different causes, or at least changes that seem to make the problem go away. It can involve issues from the system, network, environment, or hosting / containerization system. I've heard that VM systems are sometimes particularly susceptible. On the other hand, many people are very successful with long-running jobs and connections remain very stable.

Federico Naum added a comment - 2018-11-01 23:49

Hi Jeff,

I agree with you that the logs are not helpful at all, I tried following the remoting debug strategies to find what was happening but I could not find the root cause.

Also, I put a layer on top and started monitoring the machines (network/CPU/RAM/ I/O ) to see If I could correlate that to this Unexpected termination, but again I could not anything.

In my experience, the only thing that alleviated this situation is to run things serially, when my multi-build stage run in parallel is when I started to see this issue more frequently.

I'm giving the Kafka-agent plugin a go, and that seems to improve. (my understanding is that the agent plugin is a rewritten version of remoting original engine)

Federico Naum added a comment - 2018-11-01 23:49 Hi Jeff, I agree with you that the logs are not helpful at all, I tried following the remoting debug strategies to find what was happening but I could not find the root cause. Also, I put a layer on top and started monitoring the machines (network/CPU/RAM/ I/O ) to see If I could correlate that to this Unexpected termination, but again I could not anything. In my experience, the only thing that alleviated this situation is to run things serially, when my multi-build stage run in parallel is when I started to see this issue more frequently. I'm giving the Kafka-agent plugin a go, and that seems to improve. (my understanding is that the agent plugin is a rewritten version of remoting original engine)

Andrew Marlow added a comment - 2018-11-02 07:43

Hello everyone,

I want to add my $0.02 to say that this really is a problem. I have seen issues like these transient errors closed as not reproducible. Please don't do that with this one. I am working in an environment where the network is quite unreliable and I see these errors almost every day. I provide some (redacted) stack trace below for an instance where the connection was lost between the master and slave during a completely innocuous part of the build. I conclude that this must be down to the unreliability of the network.

I think that jenkins has to cope with this directly, hopefully by doing some sort of retry. The build scripts I work on already have to cope with this and do so by simple retries, .e.g. retry up to three times with a delay of 1, 3, 5 seconds between retries. Looking at the stack trace it seems to me that this won't be an easy fix.

FATAL: command execution failed
java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(Unknown Source)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(Unknown Source)
at java.io.ObjectInputStream.readStreamHeader(Unknown Source)
at java.io.ObjectInputStream.<init>(Unknown Source)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: java.io.IOException: Backing channel 'my slave name' is disconnected.
at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:214)
at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:283)
at com.sun.proxy.$Proxy94.isAlive(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1143)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1135)
at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
at hudson.model.Build$BuildExecution.build(Build.java:206)
at hudson.model.Build$BuildExecution.doRun(Build.java:163)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
at hudson.model.Run.execute(Run.java:1815)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
FATAL: Unable to delete script file /tmp/jenkins9156834294959325199.sh

Andrew Marlow added a comment - 2018-11-02 07:43 Hello everyone, I want to add my $0.02 to say that this really is a problem. I have seen issues like these transient errors closed as not reproducible. Please don't do that with this one. I am working in an environment where the network is quite unreliable and I see these errors almost every day. I provide some (redacted) stack trace below for an instance where the connection was lost between the master and slave during a completely innocuous part of the build. I conclude that this must be down to the unreliability of the network. I think that jenkins has to cope with this directly, hopefully by doing some sort of retry. The build scripts I work on already have to cope with this and do so by simple retries, .e.g. retry up to three times with a delay of 1, 3, 5 seconds between retries. Looking at the stack trace it seems to me that this won't be an easy fix. FATAL: command execution failed java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(Unknown Source) at java.io.ObjectInputStream$BlockDataInputStream.readShort(Unknown Source) at java.io.ObjectInputStream.readStreamHeader(Unknown Source) at java.io.ObjectInputStream.<init>(Unknown Source) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49) at hudson.remoting.Command.readFrom(Command.java:140) at hudson.remoting.Command.readFrom(Command.java:126) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63) Caused: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77) Caused: java.io.IOException: Backing channel 'my slave name' is disconnected. at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:214) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:283) at com.sun.proxy.$Proxy94.isAlive(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1143) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1135) at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744) at hudson.model.Build$BuildExecution.build(Build.java:206) at hudson.model.Build$BuildExecution.doRun(Build.java:163) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504) at hudson.model.Run.execute(Run.java:1815) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:429) FATAL: Unable to delete script file /tmp/jenkins9156834294959325199.sh

Jeff Thompson added a comment - 2018-11-02 19:14

fnaum, I've noticed a number of different comments or cases that suggest there might be more connection problems with more parallel builds. It would be nice to be able to pin that down to something but so far I've seen nothing indicating the source.

I'm really interested in hearing your results with the remoting-kafka-plugin. It reuses many of the capabilities and concepts of the primary Remoting module but routes the communication over Kafka channels instead of over Jenkins Remoting specific ones. It's clear there are some reliability issues with the standard channels. The logic around disconnects, reconnects, and retries is not as strong as it needs to be. I doubt it's worth putting a lot of energy into strengthening the existing channel implementations, unless we can isolate specific situations. The remoting-kafka-plugin or something else similar seems a lot more promising for these reliability efforts.

It would be great to more results from the remoting-kafka-plugin and see how it holds up in these production environments. If we could see it makes noticeable improvements in a number of these cases that would give confidence for shifting efforts that way.

Jeff Thompson added a comment - 2018-11-02 19:14 fnaum , I've noticed a number of different comments or cases that suggest there might be more connection problems with more parallel builds. It would be nice to be able to pin that down to something but so far I've seen nothing indicating the source. I'm really interested in hearing your results with the remoting-kafka-plugin. It reuses many of the capabilities and concepts of the primary Remoting module but routes the communication over Kafka channels instead of over Jenkins Remoting specific ones. It's clear there are some reliability issues with the standard channels. The logic around disconnects, reconnects, and retries is not as strong as it needs to be. I doubt it's worth putting a lot of energy into strengthening the existing channel implementations, unless we can isolate specific situations. The remoting-kafka-plugin or something else similar seems a lot more promising for these reliability efforts. It would be great to more results from the remoting-kafka-plugin and see how it holds up in these production environments. If we could see it makes noticeable improvements in a number of these cases that would give confidence for shifting efforts that way.

Jeff Thompson added a comment - 2018-11-02 19:29

marlowa, a lot of people have good success keeping channels alive over many builds or long builds. Certainly there are also a number of cases where people have reliability problems for a wide variety of reasons. Sometimes they're able to stabilize or strengthen their environment and these problems disappear. Most of the times they don't provide enough information for anyone who isn't local to diagnose anything. There isn't much reason to keep multiple, duplicate tickets open that all lack information or continued response.

With your acknowledged network unreliability, you may also want to give the remoting-kafka-plugin a try. I'd suggest running a test build environment or trying it with a few jobs. One of the reasons for creating the new plugin was a hope for improved reliability. Your unreliable network may be a good test case for that.

My network is pretty reliable so I can't reproduce any of these reports or give a good workout to the remoting-kafka-plugin.

Jeff Thompson added a comment - 2018-11-02 19:29 marlowa , a lot of people have good success keeping channels alive over many builds or long builds. Certainly there are also a number of cases where people have reliability problems for a wide variety of reasons. Sometimes they're able to stabilize or strengthen their environment and these problems disappear. Most of the times they don't provide enough information for anyone who isn't local to diagnose anything. There isn't much reason to keep multiple, duplicate tickets open that all lack information or continued response. With your acknowledged network unreliability, you may also want to give the remoting-kafka-plugin a try. I'd suggest running a test build environment or trying it with a few jobs. One of the reasons for creating the new plugin was a hope for improved reliability. Your unreliable network may be a good test case for that. My network is pretty reliable so I can't reproduce any of these reports or give a good workout to the remoting-kafka-plugin.

Jeff Thompson added a comment - 2018-12-10 23:53

There hasn't been updates to this for a while and there is insufficient information and diagnostics to make any progress on this report. If anyone tries out the remoting-kafka-plugin to see if it provides improvements that would be good information. Otherwise, we may just have to consider this as due to unreliable networks and close it as Cannot Reproduce.

Jeff Thompson added a comment - 2018-12-10 23:53 There hasn't been updates to this for a while and there is insufficient information and diagnostics to make any progress on this report. If anyone tries out the remoting-kafka-plugin to see if it provides improvements that would be good information. Otherwise, we may just have to consider this as due to unreliable networks and close it as Cannot Reproduce.

Jeff Thompson added a comment - 2019-01-02 19:21

It's been two months since any information was provided that might give hints on the cause or reproduction and we're no closer to having any verification that this is actually a code rather than environment issue. Nor whether the remoting-kafka-plugin helps. I'm going to close it out as Cannot Reproduce. If someone is able to provide additional information, please do and we can re-open it.

Jeff Thompson added a comment - 2019-01-02 19:21 It's been two months since any information was provided that might give hints on the cause or reproduction and we're no closer to having any verification that this is actually a code rather than environment issue. Nor whether the remoting-kafka-plugin helps. I'm going to close it out as Cannot Reproduce. If someone is able to provide additional information, please do and we can re-open it.

Denis Shvedchenko added a comment - 2019-01-24 09:31 - edited

jthompson

Hello, we also caught this issue. and I think I can bring some explanation ( not fully ) , this time it is related to Slave startup routines

(related issue ~~JENKINS-38487~~ )

Agent went offline with after several seconds :

XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1  -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=512m -Dgroovy.use.classvalue=true -jar remoting.jar -workDir /data/pentaho/jenkins/test-head01
Jan 24, 2019 3:16:49 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /data/xxxxxxxxxxxxxxxx/remoting as a remoting work directory
Both error and output logs will be printed to /data/xxxxxxxxxxxxxxx/remoting
<===[JENKINS REMOTING CAPACITY]===>channel started
Remoting version: 3.27
This is a Unix agent
Evacuated stdout
just before slave ph-slave-01 gets online ...
executing prepare script ...
setting up slave ph-slave-01 ...
slave setup done.
Jan 24, 2019 3:16:51 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.envinject.EnvInjectComputerListener$2; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
ERROR: null
java.util.concurrent.CancellationException
	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:902)
	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[01/24/19 03:17:08] Launch failed - cleaning up connection
[01/24/19 03:17:08] [SSH] Connection closed.
ERROR: Connection terminated
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:140)
	at hudson.remoting.Command.readFrom(Command.java:126)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)

I've found out that some jobs even managed to start, but then abruptly were terminated.

I found in master log some possible explanation:

mail watcher could not send email about slave started, some mailer configuration was changed recently, ( and after that change slave could not survive start).

Jan 24, 2019 3:16:45 AM org.jenkinsci.plugins.mailwatcher.MailWatcherNotification log
INFO: mail-watcher-plugin: unable to notify
javax.mail.MessagingException: Could not connect to SMTP host: xxxxxxxxx, port: 465;
  nested exception is:
        java.net.SocketTimeoutException: connect timed out
        at com.sun.mail.smtp.SMTPTransport.openServer(SMTPTransport.java:1934)
        at com.sun.mail.smtp.SMTPTransport.protocolConnect(SMTPTransport.java:638)
        at javax.mail.Service.connect(Service.java:295)
        at javax.mail.Service.connect(Service.java:176)
        at javax.mail.Service.connect(Service.java:125)
        at javax.mail.Transport.send0(Transport.java:194)
        at javax.mail.Transport.send(Transport.java:124)
        at org.jenkinsci.plugins.mailwatcher.MailWatcherMailer.send(MailWatcherMailer.java:135)
        at org.jenkinsci.plugins.mailwatcher.MailWatcherMailer.send(MailWatcherMailer.java:128)
        at org.jenkinsci.plugins.mailwatcher.MailWatcherNotification.send(MailWatcherNotification.java:156)
        at org.jenkinsci.plugins.mailwatcher.WatcherComputerListener$Notification$Builder.send(WatcherComputerListener.java:181)
        at org.jenkinsci.plugins.mailwatcher.WatcherComputerListener.onOnline(WatcherComputerListener.java:101)
        at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:693)
        at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:432)
        at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:1034)
        at hudson.plugins.sshslaves.SSHLauncher.access$500(SSHLauncher.java:128)
        at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:868)
        at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:833)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: connect timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method)

so Jenkins could sent notification mail.

Mailer listened to both : 25 and 465, after changes only to 25/tcp port.

When I fix that configuration on master for mailer, all errors went off, and slave can start.

<===[JENKINS REMOTING CAPACITY]===>channel started
Remoting version: 3.27
This is a Unix agent
Evacuated stdout
just before slave ph-slave-01 gets online ...
executing prepare script ...
setting up slave ph-slave-01 ...
slave setup done.
Jan 24, 2019 3:19:44 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.envinject.EnvInjectComputerListener$2; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
[StartupTrigger] - Scanning jobs for node ph-slave-01
Agent successfully connected and online

you can note : Agent successfully connected and online now

Hope it can bring you some info about how to reproduce some cases

I hope it means that until master could not notify/just go clean with startup procedure for Slave start event, it breaks connection with it.

Denis Shvedchenko added a comment - 2019-01-24 09:31 - edited jthompson Hello, we also caught this issue. and I think I can bring some explanation ( not fully ) , this time it is related to Slave startup routines (related issue JENKINS-38487 ) Agent went offline with after several seconds : XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=512m -Dgroovy.use.classvalue= true -jar remoting.jar -workDir /data/pentaho/jenkins/test-head01 Jan 24, 2019 3:16:49 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /data/xxxxxxxxxxxxxxxx/remoting as a remoting work directory Both error and output logs will be printed to /data/xxxxxxxxxxxxxxx/remoting <===[JENKINS REMOTING CAPACITY]===>channel started Remoting version: 3.27 This is a Unix agent Evacuated stdout just before slave ph-slave-01 gets online ... executing prepare script ... setting up slave ph-slave-01 ... slave setup done. Jan 24, 2019 3:16:51 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.envinject.EnvInjectComputerListener$2; see: https: //jenkins.io/redirect/serialization-of-anonymous-classes/ ERROR: null java.util.concurrent.CancellationException at java.util.concurrent.FutureTask.report(FutureTask.java:121) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:902) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) [01/24/19 03:17:08] Launch failed - cleaning up connection [01/24/19 03:17:08] [SSH] Connection closed. ERROR: Connection terminated java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358) at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49) at hudson.remoting.Command.readFrom(Command.java:140) at hudson.remoting.Command.readFrom(Command.java:126) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63) Caused: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77) I've found out that some jobs even managed to start, but then abruptly were terminated. I found in master log some possible explanation: mail watcher could not send email about slave started, some mailer configuration was changed recently, ( and after that change slave could not survive start). Jan 24, 2019 3:16:45 AM org.jenkinsci.plugins.mailwatcher.MailWatcherNotification log INFO: mail-watcher-plugin: unable to notify javax.mail.MessagingException: Could not connect to SMTP host: xxxxxxxxx, port: 465; nested exception is: java.net.SocketTimeoutException: connect timed out at com.sun.mail.smtp.SMTPTransport.openServer(SMTPTransport.java:1934) at com.sun.mail.smtp.SMTPTransport.protocolConnect(SMTPTransport.java:638) at javax.mail.Service.connect(Service.java:295) at javax.mail.Service.connect(Service.java:176) at javax.mail.Service.connect(Service.java:125) at javax.mail.Transport.send0(Transport.java:194) at javax.mail.Transport.send(Transport.java:124) at org.jenkinsci.plugins.mailwatcher.MailWatcherMailer.send(MailWatcherMailer.java:135) at org.jenkinsci.plugins.mailwatcher.MailWatcherMailer.send(MailWatcherMailer.java:128) at org.jenkinsci.plugins.mailwatcher.MailWatcherNotification.send(MailWatcherNotification.java:156) at org.jenkinsci.plugins.mailwatcher.WatcherComputerListener$Notification$Builder.send(WatcherComputerListener.java:181) at org.jenkinsci.plugins.mailwatcher.WatcherComputerListener.onOnline(WatcherComputerListener.java:101) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:693) at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:432) at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:1034) at hudson.plugins.sshslaves.SSHLauncher.access$500(SSHLauncher.java:128) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:868) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:833) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) so Jenkins could sent notification mail. Mailer listened to both : 25 and 465, after changes only to 25/tcp port. When I fix that configuration on master for mailer, all errors went off, and slave can start. <===[JENKINS REMOTING CAPACITY]===>channel started Remoting version: 3.27 This is a Unix agent Evacuated stdout just before slave ph-slave-01 gets online ... executing prepare script ... setting up slave ph-slave-01 ... slave setup done. Jan 24, 2019 3:19:44 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.envinject.EnvInjectComputerListener$2; see: https: //jenkins.io/redirect/serialization-of-anonymous-classes/ [StartupTrigger] - Scanning jobs for node ph-slave-01 Agent successfully connected and online you can note : Agent successfully connected and online now Hope it can bring you some info about how to reproduce some cases I hope it means that until master could not notify/just go clean with startup procedure for Slave start event, it breaks connection with it.

Jeff Thompson added a comment - 2019-01-24 18:15

Thanks for providing that information, dshvedchenko. Hopefully it will be useful for others who encounter connection failures. I'm glad you were able to track it down and resolve it.

Jeff Thompson added a comment - 2019-01-24 18:15 Thanks for providing that information, dshvedchenko . Hopefully it will be useful for others who encounter connection failures. I'm glad you were able to track it down and resolve it.

Assignee:: Jeff Thompson

Reporter:: Rick Liu

Votes:: 5 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2018-01-08 21:18

Updated:: 2019-01-24 18:15

Resolved:: 2019-01-02 19:21

Jenkins

Details

Description

Jenkins Server log:

Jenkins Slave log:

Attachments

Attachments

Issue Links

Activity

Collapse comment: Rick Liu added a comment - 2018-01-08 22:01, Edited by Rick Liu - 2018-01-08 22:03

Expand comment: Rick Liu added a comment - 2018-01-08 22:01, Edited by Rick Liu - 2018-01-08 22:03

Collapse comment: Rick Liu added a comment - 2018-01-09 00:02

Expand comment: Rick Liu added a comment - 2018-01-09 00:02

Collapse comment: Francois Aichelbaum added a comment - 2018-02-06 09:19

Expand comment: Francois Aichelbaum added a comment - 2018-02-06 09:19

Collapse comment: Rick Liu added a comment - 2018-02-06 18:51

Expand comment: Rick Liu added a comment - 2018-02-06 18:51

Collapse comment: Francois Aichelbaum added a comment - 2018-02-06 19:40

Expand comment: Francois Aichelbaum added a comment - 2018-02-06 19:40

Collapse comment: Rick Liu added a comment - 2018-02-07 18:14

Expand comment: Rick Liu added a comment - 2018-02-07 18:14

Collapse comment: Francois Aichelbaum added a comment - 2018-02-07 18:59

Expand comment: Francois Aichelbaum added a comment - 2018-02-07 18:59

Collapse comment: Owen Mehegan added a comment - 2018-05-16 07:56

Expand comment: Owen Mehegan added a comment - 2018-05-16 07:56

Collapse comment: Francois Aichelbaum added a comment - 2018-05-16 08:06

Expand comment: Francois Aichelbaum added a comment - 2018-05-16 08:06

Collapse comment: Owen Mehegan added a comment - 2018-05-16 11:09

Expand comment: Owen Mehegan added a comment - 2018-05-16 11:09

Collapse comment: Francois Aichelbaum added a comment - 2018-05-16 11:11

Expand comment: Francois Aichelbaum added a comment - 2018-05-16 11:11

Collapse comment: Oleg Nenashev added a comment - 2018-05-16 11:34

Expand comment: Oleg Nenashev added a comment - 2018-05-16 11:34

Collapse comment: Owen Mehegan added a comment - 2018-05-16 11:44

Expand comment: Owen Mehegan added a comment - 2018-05-16 11:44

Collapse comment: Rick Liu added a comment - 2018-05-16 17:02

Expand comment: Rick Liu added a comment - 2018-05-16 17:02

Collapse comment: Federico Naum added a comment - 2018-07-02 02:10

Expand comment: Federico Naum added a comment - 2018-07-02 02:10

Collapse comment: Rick Liu added a comment - 2018-07-03 20:46

Expand comment: Rick Liu added a comment - 2018-07-03 20:46

Collapse comment: Federico Naum added a comment - 2018-07-03 23:46

Expand comment: Federico Naum added a comment - 2018-07-03 23:46

Collapse comment: Federico Naum added a comment - 2018-07-05 08:24

Expand comment: Federico Naum added a comment - 2018-07-05 08:24

Collapse comment: Jeff Thompson added a comment - 2018-11-01 21:05

Expand comment: Jeff Thompson added a comment - 2018-11-01 21:05

Collapse comment: Federico Naum added a comment - 2018-11-01 23:49

Expand comment: Federico Naum added a comment - 2018-11-01 23:49

Collapse comment: Andrew Marlow added a comment - 2018-11-02 07:43

Expand comment: Andrew Marlow added a comment - 2018-11-02 07:43

Collapse comment: Jeff Thompson added a comment - 2018-11-02 19:14

Expand comment: Jeff Thompson added a comment - 2018-11-02 19:14

Collapse comment: Jeff Thompson added a comment - 2018-11-02 19:29

Expand comment: Jeff Thompson added a comment - 2018-11-02 19:29

Collapse comment: Jeff Thompson added a comment - 2018-12-10 23:53

Expand comment: Jeff Thompson added a comment - 2018-12-10 23:53

Collapse comment: Jeff Thompson added a comment - 2019-01-02 19:21

Expand comment: Jeff Thompson added a comment - 2019-01-02 19:21

Collapse comment: Denis Shvedchenko added a comment - 2019-01-24 09:31, Edited by Denis Shvedchenko - 2019-01-24 09:50

Expand comment: Denis Shvedchenko added a comment - 2019-01-24 09:31, Edited by Denis Shvedchenko - 2019-01-24 09:50

Collapse comment: Jeff Thompson added a comment - 2019-01-24 18:15

Expand comment: Jeff Thompson added a comment - 2019-01-24 18:15

People

Dates