-
Bug
-
Resolution: Fixed
-
Critical
-
jenkins 1.559 and above
I have noticed that whenever I restart my Jenkins master my jnlp slaves are not reconnecting and require a manual slave restart to bring them back online.
I've traced this back to the changes to fix JENKINS-19055. Specifically those changes cause the slave JVM to be restarted when the master disconnects. Prior to this change the remoting engine would wait for the server to restart before attempting to reconnect to the master. With the change it immediately tries to connect to the master and get a connection error because the master is restarting. This causes the slave to immediately terminate.
Jenkins 1.575 gives the following slave log output when restarting the master
Aug 12, 2014 3:55:15 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Aug 12, 2014 3:55:15 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect INFO: Restarting slave via jenkins.slaves.restarter.UnixSlaveRestarter@32a9f661 Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up slave: bishop Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://jenkins.example/] Aug 12, 2014 3:55:18 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable java.lang.Exception: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable at hudson.remoting.Engine.run(Engine.java:213)
Notice the "jenkins.slaves.restarter.JnlpSlaveRestarterInstaller" onDisconnect log message that performs a slave restart.
Prior to JENKINS-19055 being integrated the slave called waitForServerToBack() repeatedly until the master came back online. For example
25-Mar-2014 10:52:16 hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated 25-Mar-2014 10:52:26 hudson.remoting.Engine waitForServerToBack INFO: Failed to connect to the master. Will retry again java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385) at java.net.Socket.connect(Socket.java:546) at sun.net.NetworkClient.doConnect(NetworkClient.java:173) at sun.net.www.http.HttpClient.openServer(HttpClient.java:409) at sun.net.www.http.HttpClient.openServer(HttpClient.java:530) at sun.net.www.http.HttpClient.<init>(HttpClient.java:240) at sun.net.www.http.HttpClient.New(HttpClient.java:321) at sun.net.www.http.HttpClient.New(HttpClient.java:338) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:935) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:801) at hudson.remoting.Engine.waitForServerToBack(Engine.java:371) at hudson.remoting.Engine.run(Engine.java:278) ... 25-Mar-2014 10:54:11 hudson.remoting.Engine waitForServerToBack INFO: Master isn't ready to talk to us. Will retry again: response code=503 25-Mar-2014 10:54:21 hudson.remoting.Engine waitForServerToBack INFO: Master isn't ready to talk to us. Will retry again: response code=503 25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://jenkins.example/] 25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins.example:42715 25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking 25-Mar-2014 10:54:32 hudson.remoting.jnlp.Main$CuiListener status INFO: Connected
The connection/retry logic is contained in remoting Engine.java
https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java
When connecting to the slave an error causes the connection to terminate (around line 232)
if(firstError!=null) { events.error(firstError); return; }
prior to JENKINS-19055 hooking into onDisconnect() a re-connection would not be attempted until waitForServerToBack() had ensured that the master had recovered.
events.onDisconnect();
// try to connect back to the server every 10 secs.
waitForServerToBack();
A quick and dirty fix would likely be to swap the onDisconnect and waitForServerToBack calls around.
- is blocking
-
JENKINS-19055 In case of connection loss, slave JVM should restart itself if it can
- Resolved
- is related to
-
JENKINS-25490 Slave agent run with javaws breaks with "Unable to launch the application" when master is stopped
- Resolved
-
JENKINS-25895 GUI-mode JNLP slave does not reconnect after Jenkins restart (UnixSlaveRestarter broken)
- Resolved