Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24272

jnlp slaves fail to reconnect when master is restarted

      I have noticed that whenever I restart my Jenkins master my jnlp slaves are not reconnecting and require a manual slave restart to bring them back online.

      I've traced this back to the changes to fix JENKINS-19055. Specifically those changes cause the slave JVM to be restarted when the master disconnects. Prior to this change the remoting engine would wait for the server to restart before attempting to reconnect to the master. With the change it immediately tries to connect to the master and get a connection error because the master is restarting. This causes the slave to immediately terminate.

      Jenkins 1.575 gives the following slave log output when restarting the master

      Aug 12, 2014 3:55:15 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Aug 12, 2014 3:55:15 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect
      INFO: Restarting slave via jenkins.slaves.restarter.UnixSlaveRestarter@32a9f661
      Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main createEngine
      INFO: Setting up slave: bishop
      Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among [http://jenkins.example/]
      Aug 12, 2014 3:55:18 PM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable
      java.lang.Exception: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable
              at hudson.remoting.Engine.run(Engine.java:213)
      

      Notice the "jenkins.slaves.restarter.JnlpSlaveRestarterInstaller" onDisconnect log message that performs a slave restart.

      Prior to JENKINS-19055 being integrated the slave called waitForServerToBack() repeatedly until the master came back online. For example

      25-Mar-2014 10:52:16 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      25-Mar-2014 10:52:26 hudson.remoting.Engine waitForServerToBack
      INFO: Failed to connect to the master. Will retry again
      java.net.ConnectException: Connection refused
              at java.net.PlainSocketImpl.socketConnect(Native Method)
              at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
              at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
              at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
              at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
              at java.net.Socket.connect(Socket.java:546)
              at sun.net.NetworkClient.doConnect(NetworkClient.java:173)
              at sun.net.www.http.HttpClient.openServer(HttpClient.java:409)
              at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
              at sun.net.www.http.HttpClient.<init>(HttpClient.java:240)
              at sun.net.www.http.HttpClient.New(HttpClient.java:321)
              at sun.net.www.http.HttpClient.New(HttpClient.java:338)
              at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:935)
              at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876)
              at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:801)
              at hudson.remoting.Engine.waitForServerToBack(Engine.java:371)
              at hudson.remoting.Engine.run(Engine.java:278)
      ...
      25-Mar-2014 10:54:11 hudson.remoting.Engine waitForServerToBack
      INFO: Master isn't ready to talk to us. Will retry again: response code=503
      25-Mar-2014 10:54:21 hudson.remoting.Engine waitForServerToBack
      INFO: Master isn't ready to talk to us. Will retry again: response code=503
      25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server among [http://jenkins.example/]
      25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to jenkins.example:42715
      25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      25-Mar-2014 10:54:32 hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      

      The connection/retry logic is contained in remoting Engine.java
      https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java

      When connecting to the slave an error causes the connection to terminate (around line 232)

      if(firstError!=null) {
        events.error(firstError);
        return;
      }
      

      prior to JENKINS-19055 hooking into onDisconnect() a re-connection would not be attempted until waitForServerToBack() had ensured that the master had recovered.

      events.onDisconnect();
      // try to connect back to the server every 10 secs.
      waitForServerToBack();
      

      A quick and dirty fix would likely be to swap the onDisconnect and waitForServerToBack calls around.

          [JENKINS-24272] jnlp slaves fail to reconnect when master is restarted

          Richard Mortimer created issue -
          Daniel Beck made changes -
          Labels New: remoting
          Daniel Beck made changes -
          Description Original: I have noticed that whenever I restart my Jenkins master my jnlp slaves are not reconnecting and require a manual slave restart to bring them back online.

          I've traced this back to the changes to fix JENKINS-19055. Specifically those changes cause the slave JVM to be restarted when the master disconnects. Prior to this change the remoting engine would wait for the server to restart before attempting to reconnect to the master. With the change it immediately tries to connect to the master and get a connection error because the master is restarting. This causes the slave to immediately terminate.

          Jenkins 1.575 gives the following slave log output when restarting the master

          {noformat}
          Aug 12, 2014 3:55:15 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Aug 12, 2014 3:55:15 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect
          INFO: Restarting slave via jenkins.slaves.restarter.UnixSlaveRestarter@32a9f661
          Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: bishop
          Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://jenkins.example/]
          Aug 12, 2014 3:55:18 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable
          java.lang.Exception: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable
                  at hudson.remoting.Engine.run(Engine.java:213)
          {noformat}

          Notice the "jenkins.slaves.restarter.JnlpSlaveRestarterInstaller" onDisconnect log message that performs a slave restart.

          Prior to JENKINS-19055 being integrated the slave called waitForServerToBack() repeatedly until the master came back online. For example

          {noformat}
          25-Mar-2014 10:52:16 hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          25-Mar-2014 10:52:26 hudson.remoting.Engine waitForServerToBack
          INFO: Failed to connect to the master. Will retry again
          java.net.ConnectException: Connection refused
                  at java.net.PlainSocketImpl.socketConnect(Native Method)
                  at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
                  at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
                  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
                  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
                  at java.net.Socket.connect(Socket.java:546)
                  at sun.net.NetworkClient.doConnect(NetworkClient.java:173)
                  at sun.net.www.http.HttpClient.openServer(HttpClient.java:409)
                  at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
                  at sun.net.www.http.HttpClient.&lt;init&gt;(HttpClient.java:240)
                  at sun.net.www.http.HttpClient.New(HttpClient.java:321)
                  at sun.net.www.http.HttpClient.New(HttpClient.java:338)
                  at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:935)
                  at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876)
                  at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:801)
                  at hudson.remoting.Engine.waitForServerToBack(Engine.java:371)
                  at hudson.remoting.Engine.run(Engine.java:278)
          ...
          25-Mar-2014 10:54:11 hudson.remoting.Engine waitForServerToBack
          INFO: Master isn't ready to talk to us. Will retry again: response code=503
          25-Mar-2014 10:54:21 hudson.remoting.Engine waitForServerToBack
          INFO: Master isn't ready to talk to us. Will retry again: response code=503
          25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://jenkins.example/]
          25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to jenkins.example:42715
          25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          25-Mar-2014 10:54:32 hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected

          The connection/retry logic is contained in remoting Engine.java
          https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java

          When connecting to the slave an error causes the connection to terminate (around line 232)

          {code}
          if(firstError!=null) {
            events.error(firstError);
            return;
          }
          {code}

          prior to JENKINS-19055 hooking into onDisconnect() a re-connection would not be attempted until waitForServerToBack() had ensured that the master had recovered.

          {code}
          events.onDisconnect();
          // try to connect back to the server every 10 secs.
          waitForServerToBack();
          {code}

          A quick and dirty fix would likely be to swap the onDisconnect and waitForServerToBack calls around.
          New: I have noticed that whenever I restart my Jenkins master my jnlp slaves are not reconnecting and require a manual slave restart to bring them back online.

          I've traced this back to the changes to fix JENKINS-19055. Specifically those changes cause the slave JVM to be restarted when the master disconnects. Prior to this change the remoting engine would wait for the server to restart before attempting to reconnect to the master. With the change it immediately tries to connect to the master and get a connection error because the master is restarting. This causes the slave to immediately terminate.

          Jenkins 1.575 gives the following slave log output when restarting the master

          {noformat}
          Aug 12, 2014 3:55:15 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Aug 12, 2014 3:55:15 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onDisconnect
          INFO: Restarting slave via jenkins.slaves.restarter.UnixSlaveRestarter@32a9f661
          Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up slave: bishop
          Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Aug 12, 2014 3:55:17 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://jenkins.example/]
          Aug 12, 2014 3:55:18 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable
          java.lang.Exception: http://jenkins.example/tcpSlaveAgentListener/ is invalid: 503 Service Temporarily Unavailable
                  at hudson.remoting.Engine.run(Engine.java:213)
          {noformat}

          Notice the "jenkins.slaves.restarter.JnlpSlaveRestarterInstaller" onDisconnect log message that performs a slave restart.

          Prior to JENKINS-19055 being integrated the slave called waitForServerToBack() repeatedly until the master came back online. For example

          {noformat}
          25-Mar-2014 10:52:16 hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          25-Mar-2014 10:52:26 hudson.remoting.Engine waitForServerToBack
          INFO: Failed to connect to the master. Will retry again
          java.net.ConnectException: Connection refused
                  at java.net.PlainSocketImpl.socketConnect(Native Method)
                  at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
                  at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
                  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
                  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
                  at java.net.Socket.connect(Socket.java:546)
                  at sun.net.NetworkClient.doConnect(NetworkClient.java:173)
                  at sun.net.www.http.HttpClient.openServer(HttpClient.java:409)
                  at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
                  at sun.net.www.http.HttpClient.&lt;init&gt;(HttpClient.java:240)
                  at sun.net.www.http.HttpClient.New(HttpClient.java:321)
                  at sun.net.www.http.HttpClient.New(HttpClient.java:338)
                  at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:935)
                  at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876)
                  at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:801)
                  at hudson.remoting.Engine.waitForServerToBack(Engine.java:371)
                  at hudson.remoting.Engine.run(Engine.java:278)
          ...
          25-Mar-2014 10:54:11 hudson.remoting.Engine waitForServerToBack
          INFO: Master isn't ready to talk to us. Will retry again: response code=503
          25-Mar-2014 10:54:21 hudson.remoting.Engine waitForServerToBack
          INFO: Master isn't ready to talk to us. Will retry again: response code=503
          25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
          INFO: Locating server among [http://jenkins.example/]
          25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connecting to jenkins.example:42715
          25-Mar-2014 10:54:31 hudson.remoting.jnlp.Main$CuiListener status
          INFO: Handshaking
          25-Mar-2014 10:54:32 hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          {noformat}

          The connection/retry logic is contained in remoting Engine.java
          https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java

          When connecting to the slave an error causes the connection to terminate (around line 232)

          {code}
          if(firstError!=null) {
            events.error(firstError);
            return;
          }
          {code}

          prior to JENKINS-19055 hooking into onDisconnect() a re-connection would not be attempted until waitForServerToBack() had ensured that the master had recovered.

          {code}
          events.onDisconnect();
          // try to connect back to the server every 10 secs.
          waitForServerToBack();
          {code}

          A quick and dirty fix would likely be to swap the onDisconnect and waitForServerToBack calls around.
          Richard Mortimer made changes -
          Assignee Original: Kohsuke Kawaguchi [ kohsuke ] New: Richard Mortimer [ oldelvet ]
          Richard Mortimer made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]
          Daniel Beck made changes -
          Link New: This issue is related to JENKINS-25490 [ JENKINS-25490 ]
          Jesse Glick made changes -
          Link New: This issue is related to JENKINS-25895 [ JENKINS-25895 ]
          Jesse Glick made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: In Progress [ 3 ] New: Resolved [ 5 ]
          Jesse Glick made changes -
          Link New: This issue is blocking JENKINS-19055 [ JENKINS-19055 ]
          Jesse Glick made changes -
          Labels Original: remoting New: lts-candidate regression remoting
          Jesse Glick made changes -
          Labels Original: lts-candidate regression remoting New: jnlp lts-candidate regression remoting slave

            oldelvet Richard Mortimer
            oldelvet Richard Mortimer
            Votes:
            4 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: