Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24272

jnlp slaves fail to reconnect when master is restarted

      I have noticed that whenever I restart my Jenkins master my jnlp slaves are not reconnecting and require a manual slave restart to bring them back online.

      I've traced this back to the changes to fix JENKINS-19055. Specifically those changes cause the slave JVM to be restarted when the master disconnects. Prior to this change the remoting engine would wait for the server to restart before attempting to reconnect to the master. With the change it immediately tries to connect to the master and get a connection error because the master is restarting. This causes the slave to immediately terminate.

      Jenkins 1.575 gives the following slave log output when restarting the master

      Aug 12, 2014 3:55:15 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Aug 12, 2014 3:55:15 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect
      INFO: Restarting slave via jenkins.slaves.restarter.UnixSlaveRestarter@32a9f661
      Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: bishop
      Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among [http://jenkins.example/]
      Aug 12, 2014 3:55:18 PM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable
      java.lang.Exception: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable
              at hudson.remoting.Engine.run(Engine.java:213)
      

      Notice the "jenkins.slaves.restarter.JnlpSlaveRestarterInstaller" onDisconnect log message that performs a slave restart.

      Prior to JENKINS-19055 being integrated the slave called waitForServerToBack() repeatedly until the master came back online. For example

      25-Mar-2014 10:52:16 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      25-Mar-2014 10:52:26 hudson.remoting.Engine waitForServerToBack
      INFO: Failed to connect to the master. Will retry again
      java.net.ConnectException: Connection refused
              at java.net.PlainSocketImpl.socketConnect(Native Method)
              at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
              at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
              at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
              at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
              at java.net.Socket.connect(Socket.java:546)
              at sun.net.NetworkClient.doConnect(NetworkClient.java:173)
              at sun.net.www.http.HttpClient.openServer(HttpClient.java:409)
              at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
              at sun.net.www.http.HttpClient.<init>(HttpClient.java:240)
              at sun.net.www.http.HttpClient.New(HttpClient.java:321)
              at sun.net.www.http.HttpClient.New(HttpClient.java:338)
              at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:935)
              at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876)
              at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:801)
              at hudson.remoting.Engine.waitForServerToBack(Engine.java:371)
              at hudson.remoting.Engine.run(Engine.java:278)
      ...
      25-Mar-2014 10:54:11 hudson.remoting.Engine waitForServerToBack
      INFO: Master isn't ready to talk to us. Will retry again: response code=503
      25-Mar-2014 10:54:21 hudson.remoting.Engine waitForServerToBack
      INFO: Master isn't ready to talk to us. Will retry again: response code=503
      25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among [http://jenkins.example/]
      25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to jenkins.example:42715
      25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      25-Mar-2014 10:54:32 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      

      The connection/retry logic is contained in remoting Engine.java
      https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java

      When connecting to the slave an error causes the connection to terminate (around line 232)

      if(firstError!=null) {
        events.error(firstError);
        return;
      }
      

      prior to JENKINS-19055 hooking into onDisconnect() a re-connection would not be attempted until waitForServerToBack() had ensured that the master had recovered.

      events.onDisconnect();
      // try to connect back to the server every 10 secs.
      waitForServerToBack();
      

      A quick and dirty fix would likely be to swap the onDisconnect and waitForServerToBack calls around.

          [JENKINS-24272] jnlp slaves fail to reconnect when master is restarted

          Henrik Skupin added a comment -

          What is left to do here? As long as this fix is not in a release or even the current 1.580.x LTS, is there a way to workaround it? We have around 70 slaves and having to re-connect them all manually is terrible. Thanks.

          Henrik Skupin added a comment - What is left to do here? As long as this fix is not in a release or even the current 1.580.x LTS, is there a way to workaround it? We have around 70 slaves and having to re-connect them all manually is terrible. Thanks.

          Looks like my pull request https://github.com/jenkinsci/jenkins/pull/1393 is still waiting to be merged. I'll give it a prod.

          Richard Mortimer added a comment - Looks like my pull request https://github.com/jenkinsci/jenkins/pull/1393 is still waiting to be merged. I'll give it a prod.

          Christian Bremer added a comment - 100$ is up for grabs for this issue on freedomsponsors.org https://freedomsponsors.org/issue/592/jnlp-slaves-fail-to-reconnect-when-master-is-restarted

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3852
          [FIXES JENKINS-24272] jnlp slaves fail to reconnect when master is restarted (Revision 48e19c58f9e2caa998d0942417d58679f5ce47f0)

          Result = SUCCESS
          oldelvet : 48e19c58f9e2caa998d0942417d58679f5ce47f0
          Files :

          • core/src/main/java/jenkins/slaves/restarter/JnlpSlaveRestarterInstaller.java

          dogfood added a comment - Integrated in jenkins_main_trunk #3852 [FIXES JENKINS-24272] jnlp slaves fail to reconnect when master is restarted (Revision 48e19c58f9e2caa998d0942417d58679f5ce47f0) Result = SUCCESS oldelvet : 48e19c58f9e2caa998d0942417d58679f5ce47f0 Files : core/src/main/java/jenkins/slaves/restarter/JnlpSlaveRestarterInstaller.java

          Jesse Glick added a comment -

          Use FIXED rather than FIXES to get the JIRA link daemon to auto-close issues.

          Jesse Glick added a comment - Use FIXED rather than FIXES to get the JIRA link daemon to auto-close issues.

          Code changed in jenkins
          User: Richard Mortimer
          Path:
          core/src/main/java/jenkins/slaves/restarter/JnlpSlaveRestarterInstaller.java
          http://jenkins-ci.org/commit/jenkins/ea6baf9a103b841ec99dd1c2a9aac85fe6d8d29f
          Log:
          [FIXES JENKINS-24272] jnlp slaves fail to reconnect when master is restarted

          During master restart only attempt to reconnect the slave after the master has
          finished restarting.

          (cherry picked from commit 48e19c58f9e2caa998d0942417d58679f5ce47f0)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Richard Mortimer Path: core/src/main/java/jenkins/slaves/restarter/JnlpSlaveRestarterInstaller.java http://jenkins-ci.org/commit/jenkins/ea6baf9a103b841ec99dd1c2a9aac85fe6d8d29f Log: [FIXES JENKINS-24272] jnlp slaves fail to reconnect when master is restarted During master restart only attempt to reconnect the slave after the master has finished restarting. (cherry picked from commit 48e19c58f9e2caa998d0942417d58679f5ce47f0)

          dogfood added a comment -

          Integrated in jenkins_main_trunk #4292
          [FIXES JENKINS-24272] jnlp slaves fail to reconnect when master is restarted (Revision ea6baf9a103b841ec99dd1c2a9aac85fe6d8d29f)

          Result = UNSTABLE
          ogondza : ea6baf9a103b841ec99dd1c2a9aac85fe6d8d29f
          Files :

          • core/src/main/java/jenkins/slaves/restarter/JnlpSlaveRestarterInstaller.java

          dogfood added a comment - Integrated in jenkins_main_trunk #4292 [FIXES JENKINS-24272] jnlp slaves fail to reconnect when master is restarted (Revision ea6baf9a103b841ec99dd1c2a9aac85fe6d8d29f) Result = UNSTABLE ogondza : ea6baf9a103b841ec99dd1c2a9aac85fe6d8d29f Files : core/src/main/java/jenkins/slaves/restarter/JnlpSlaveRestarterInstaller.java

          Roman Pickl added a comment - - edited

          We still see this on Jenkins ver. 2.19.4 on a mac slave with java version "1.8.0_92"
          Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
          Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)

          reconnect when calling the jar via the cli seems to work though.

          Roman Pickl added a comment - - edited We still see this on Jenkins ver. 2.19.4 on a mac slave with java version "1.8.0_92" Java(TM) SE Runtime Environment (build 1.8.0_92-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode) reconnect when calling the jar via the cli seems to work though.

          Daniel Beck added a comment -

          Please file a new issue and provide more information about your environment. Two years later, it's likely to be an unrelated issue.

          Daniel Beck added a comment - Please file a new issue and provide more information about your environment. Two years later, it's likely to be an unrelated issue.

          li mengmeng added a comment -

          My Jenkins version is 2.164.3, jdk12. This problem also occurs. I reload JDK to 8 and the problem is solved

          li mengmeng added a comment - My Jenkins version is 2.164.3, jdk12. This problem also occurs. I reload JDK to 8 and the problem is solved

            oldelvet Richard Mortimer
            oldelvet Richard Mortimer
            Votes:
            4 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: