-
Bug
-
Resolution: Fixed
-
Critical
-
Master: debian wheezy (jenkins installed by apt)
Slave: Windows 7 (x64)
-
Powered by SuggestiMate
When updating slave.jar on slave with version shipped with Jenkins 1.560, slave show a message "connected" but master does not show the slave online.
Reverting to the slave.jar in 1.533 or 1.559 make it work as usual.
Tried with direct connection or via Apache reverse proxy.
- is related to
-
JENKINS-18781 Configurable channel timeout for slaves
-
- Open
-
-
JENKINS-34808 Allow user to adjust socket timeout
-
- Resolved
-
[JENKINS-22722] Master doesn't show connected slave
We have the same issue with every Windows versions (XP, Vista, Win7, Win8, Win8.1) from local virtual machine slaves and even from Azure.
For the first time the slaves can connect, but after they disconnect, they can't connect back again (this bug happens).
Only restarting the jenkins server helps.
Reverting to 1.559 works for me too.
I only tried direct connection with JNLP agents.
I am having the same issue with Ubuntu Master and Jenkins 1.618 and MacOS an Ubuntu Slave...
Jul 05, 2015 10:23:22 PM hudson.remoting.jnlp.Main createEngine
INFORMATION: Setting up slave: MaCe-Desktop2
Jul 05, 2015 10:23:22 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFORMATION: Jenkins agent is running in headless mode.
Jul 05, 2015 10:23:22 PM hudson.remoting.jnlp.Main$CuiListener status
INFORMATION: Locating server among [http://..../jenkins/]
Jul 05, 2015 10:23:22 PM hudson.remoting.jnlp.Main$CuiListener status
INFORMATION: Handshaking
Jul 05, 2015 10:23:22 PM hudson.remoting.jnlp.Main$CuiListener status
INFORMATION: Connecting to ciaa1.uns.edu.ar:32768
Jul 05, 2015 10:23:22 PM hudson.remoting.jnlp.Main$CuiListener status
INFORMATION: Trying protocol: JNLP2-connect
Jul 05, 2015 10:23:23 PM hudson.remoting.jnlp.Main$CuiListener status
INFORMATION: Connected
but the master does not shows the conection.
If the master is restartet the salve detects it indicating SCHWERWIEGEND: I/O error in channel channel. So the conection was estabilsched.
After the restart the master detects the slave without problem.
Same here. Master is running on some SoC Arm with Debian Wheezy and slave is on win 7 started as java headless from commandline.
Jenkins version is 2.0 rc-1.
I think this bug is quite serious and prevents me to actualy use many plugins since I need to stay on 1.559. or to look for another CI software.
Here is slave's log from cmd:
C:\jenkins2_slave>java -jar "C:\jenkins2_slave\slave.jar" -jnlpUrl http://local.pc:8082/computer/windows_slave/slave-agent.jnlp -secret ef9469 b438646d065d61d86577dc3ebfd7b0bc6ee2f8dc563661a8d209062002 Dub 28, 2016 1:35:31 ODP. hudson.remoting.jnlp.Main createEngine INFO: Setting up slave: windows_slave Dub 28, 2016 1:35:31 ODP. hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Dub 28, 2016 1:35:31 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://local.pc:8082/] Dub 28, 2016 1:35:31 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Dub 28, 2016 1:35:31 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to local.pc:39916 Dub 28, 2016 1:35:31 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP3-connect Dub 28, 2016 1:35:31 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Server didn't accept the handshake: Unknown protocol:Protocol:JNLP3-connec t Dub 28, 2016 1:35:31 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to local.pc:39916 Dub 28, 2016 1:35:31 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP2-connect Dub 28, 2016 1:35:31 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Dub 28, 2016 2:20:19 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Dub 28, 2016 2:20:29 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://local.pc:8082/] Dub 28, 2016 2:20:29 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Dub 28, 2016 2:20:29 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to local.pc:39916 Dub 28, 2016 2:20:29 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP3-connect Dub 28, 2016 2:20:30 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Server didn't accept the handshake: Unknown protocol:Protocol:JNLP3-connec t Dub 28, 2016 2:20:30 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to local.pc:39916 Dub 28, 2016 2:20:30 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP2-connect Dub 28, 2016 2:20:30 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Dub 28, 2016 2:20:55 ODP. hudson.util.ProcessTree get WARNING: Failed to load winp. Reverting to the default java.lang.UnsatisfiedLinkError: Native Library C:\Users\pc\.jenkins\cache\jars\4 A\winp.x64.22D9AB310A3FA2D96B6E03A836A47724.dll already loaded in another classl oader at java.lang.ClassLoader.loadLibrary0(Unknown Source) at java.lang.ClassLoader.loadLibrary(Unknown Source) at java.lang.Runtime.load0(Unknown Source) at java.lang.System.load(Unknown Source) at org.jvnet.winp.Native.loadDll(Native.java:190) at org.jvnet.winp.Native.load(Native.java:122) at org.jvnet.winp.Native.<clinit>(Native.java:56) at org.jvnet.winp.WinProcess.enableDebugPrivilege(WinProcess.java:212) at hudson.util.ProcessTree$Windows.<clinit>(ProcessTree.java:494) at hudson.util.ProcessTree.get(ProcessTree.java:345) at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:965) at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:956) at hudson.remoting.UserRequest.perform(UserRequest.java:120) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:332) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecut orService.java:68) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1$1.run(Engine.java:85) at java.lang.Thread.run(Unknown Source) Dub 28, 2016 2:44:28 ODP. hudson.remoting.SynchronousCommandTransport$ReaderThre ad run SEVERE: I/O error in channel channel java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStr eam.java:82) at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java :72) at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream. java:103) at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTrans port.java:39) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(Abs tractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(Synchron ousCommandTransport.java:48) Dub 28, 2016 2:44:28 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Dub 28, 2016 2:44:38 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://local.pc:8082/] Dub 28, 2016 2:44:38 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Dub 28, 2016 2:44:38 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to local.pc:39916 Dub 28, 2016 2:44:38 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP3-connect Dub 28, 2016 2:44:39 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Server didn't accept the handshake: Unknown protocol:Protocol:JNLP3-connec t Dub 28, 2016 2:44:39 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to local.pc:39916 Dub 28, 2016 2:44:39 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP2-connect Dub 28, 2016 2:44:39 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Dub 28, 2016 3:14:39 ODP. hudson.remoting.SynchronousCommandTransport$ReaderThre ad run SEVERE: I/O error in channel channel java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStr eam.java:82) at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java :72) at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream. java:103) at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTrans port.java:39) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(Abs tractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(Synchron ousCommandTransport.java:48) Dub 28, 2016 3:14:39 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Dub 28, 2016 3:14:49 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://local.pc:8082/] Dub 28, 2016 3:14:49 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Dub 28, 2016 3:14:49 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to local.pc:39916 Dub 28, 2016 3:14:49 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP3-connect Dub 28, 2016 3:14:49 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Server didn't accept the handshake: Unknown protocol:Protocol:JNLP3-connec t Dub 28, 2016 3:14:49 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to local.pc:39916 Dub 28, 2016 3:14:49 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP2-connect Dub 28, 2016 3:14:49 ODP. hudson.remoting.jnlp.Main$CuiListener status INFO: Connected
And here are some of the last relevant lines from master's log:
Accepted connection #9 from /xxx.xxx.xxx.xxx:6338 Apr 28, 2016 2:44:02 PM WARNING hudson.TcpSlaveAgentListener$ConnectionHandler error Connection #9 is aborted: Unknown protocol:Protocol:JNLP3-connect Apr 28, 2016 2:44:02 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run Accepted connection #10 from /xxx.xxx.xxx.xxx:6339 Apr 28, 2016 2:44:02 PM INFO jenkins.slaves.DefaultJnlpSlaveReceiver handle Disconnecting windows_slave as we are reconnected from the current peer Apr 28, 2016 2:45:35 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run .... Accepted connection #15 from /xxx.xxx.xxx.xxx:6351 Apr 28, 2016 3:14:13 PM WARNING hudson.TcpSlaveAgentListener$ConnectionHandler error Connection #15 is aborted: Unknown protocol:Protocol:JNLP3-connect Apr 28, 2016 3:14:13 PM INFO hudson.TcpSlaveAgentListener$ConnectionHandler run Accepted connection #16 from /xxx.xxx.xxx.xxx:6352 Feed
So there at least one scenario, which leads to the issue.
1) On the slave side we have just a SocketRead timeout, which commonly means that it didn't receive any command from master. Commonly Ping threads on master and monitoring jobs should reset this timeout.
2) When such issue happens, it terminates the SynchronousCommandTransport$ReaderThread on the slave side. When it dies, the slave stops answering to any command coming from the Jenkins master. And it causes the connection timeout on the master side.
3) Slave is being reconnected successfully
4) But then the Connection timeout happens on the master side.
We recently had a conversation with teilo about this issue. The decision is that SocketTimeoutException should not lead to the full termination of the receiver thread on the slave side. So the SynchronousCommandTransport$ReaderThread should be patched.
SEVERE h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel channeljava.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82) at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72) at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
Code changed in jenkins
User: Oleg Nenashev
Path:
src/main/java/hudson/remoting/SynchronousCommandTransport.java
http://jenkins-ci.org/commit/remoting/25373d73e7369e9e8e21e2308f7051d5d1e9a16c
Log:
JENKINS-22722 - Make AsynchronousCommandTransport tolerant against Socket timeouts (#86)
JENKINS-22722- Make AsynchronousCommandTransport tolerant against Socket timeouts
JENKINS-22722- Fix the formatting
Code changed in jenkins
User: Oleg Nenashev
Path:
pom.xml
http://jenkins-ci.org/commit/jenkins/d9f12b0e614d9598221c571001aa43c018b21e25
Log:
Update remoting to 2.60
Changes summary:
Fixed issues:
JENKINS-22722(https://issues.jenkins-ci.org/browse/JENKINS-22722) -
Make the channel reader tolerant against Socket timeouts.
(https://github.com/jenkinsci/remoting/pull/80)JENKINS-32326(https://issues.jenkins-ci.org/browse/JENKINS-32326) -
Support no_proxy environment variable.
(https://github.com/jenkinsci/remoting/pull/84)JENKINS-35190(https://issues.jenkins-ci.org/browse/JENKINS-35190) -
Do not invoke PingFailureAnalyzer for agent=>master ping failures.
(https://github.com/jenkinsci/remoting/pull/85)JENKINS-31256(https://issues.jenkins-ci.org/browse/JENKINS-31256) -
<code>hudson.Remoting.Engine#waitForServerToBack</code> now uses credentials for connection.
(https://github.com/jenkinsci/remoting/pull/87)JENKINS-35494(https://issues.jenkins-ci.org/browse/JENKINS-35494) -
Fix issues in file management in <code>hudson.remoting.Launcher</code> (main executable class).
(https://github.com/jenkinsci/remoting/pull/88)
Enhancements:
- Ensure a message is logged if remoting fails to override the default <code>ClassFilter</code>.
(https://github.com/jenkinsci/remoting/pull/80)
Code changed in jenkins
User: Oleg Nenashev
Path:
pom.xml
http://jenkins-ci.org/commit/jenkins/c718516adfddeb10cbf616ce37c619cc6bbafd53
Log:
Update remoting to 2.60 (#2403)
Changes summary:
Fixed issues:
JENKINS-22722(https://issues.jenkins-ci.org/browse/JENKINS-22722) -
Make the channel reader tolerant against Socket timeouts.
(https://github.com/jenkinsci/remoting/pull/80)JENKINS-32326(https://issues.jenkins-ci.org/browse/JENKINS-32326) -
Support no_proxy environment variable.
(https://github.com/jenkinsci/remoting/pull/84)JENKINS-35190(https://issues.jenkins-ci.org/browse/JENKINS-35190) -
Do not invoke PingFailureAnalyzer for agent=>master ping failures.
(https://github.com/jenkinsci/remoting/pull/85)JENKINS-31256(https://issues.jenkins-ci.org/browse/JENKINS-31256) -
<code>hudson.Remoting.Engine#waitForServerToBack</code> now uses credentials for connection.
(https://github.com/jenkinsci/remoting/pull/87)JENKINS-35494(https://issues.jenkins-ci.org/browse/JENKINS-35494) -
Fix issues in file management in <code>hudson.remoting.Launcher</code> (main executable class).
(https://github.com/jenkinsci/remoting/pull/88)
Enhancements:
- Ensure a message is logged if remoting fails to override the default <code>ClassFilter</code>.
(https://github.com/jenkinsci/remoting/pull/80)
Code changed in jenkins
User: Oleg Nenashev
Path:
CHANGELOG.md
http://jenkins-ci.org/commit/remoting/3df4ce626eca74bd45009d0033839d2c02db7722
Log:
Changelog: Fix the link to the JENKINS-22722 fix PR in the changelog
We experienced the exact same issue with the same log information that Michaël commented with. A master restart would bring things back to normal until we restarted the slave and then subsequent connections would fail until the master was restarted again.
Reverting to 1.559 fixed the issue. Our Jenkins machine is running on ArchLinux.