Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-31514

Jenkins Swarm slave goes remains offline after master restarts

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Minor Minor
    • swarm-plugin
    • None
    • swarm-plugin 2.0
      jenkins 1.627

      I'm currently using the swarm plugin to connect all my slaves to the master. However, whenever the Jenkins service on the master gets restarted, the Jenkins slave will remain offline. It will only come back online when I restart the jenkins swarm plugin process.

      Nov 11, 2015 10:06:32 AM org.apache.commons.httpclient.HttpMethodBase getResponseBody
      WARNING: Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
      Attempting to connect to https://suct2v420.it.mgt:8443/ 98ecac62-d76a-4734-9f9f-9350ee5b4e7d with ID c3c7e53b
      Could not obtain CSRF crumb. Response code: 404
      javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No name matching suct2v420.it.mgt found
      at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
      at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1904)
      at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:279)
      at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:273)
      at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1446)
      at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:209)
      at sun.security.ssl.Handshaker.processLoop(Handshaker.java:901)
      at sun.security.ssl.Handshaker.process_record(Handshaker.java:837)
      at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1023)
      at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1332)
      at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1359)
      at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1343)
      at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:563)
      at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
      at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:153)
      at hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:269)
      at hudson.plugins.swarm.SwarmClient.connect(SwarmClient.java:229)
      at hudson.plugins.swarm.Client.run(Client.java:106)
      at hudson.plugins.swarm.Client.main(Client.java:69)
      Caused by: java.security.cert.CertificateException: No name matching suct2v420.it.mgt found
      at sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:208)
      at sun.security.util.HostnameChecker.match(HostnameChecker.java:93)
      at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:347)
      at sun.security.ssl.AbstractTrustManagerWrapper.checkAdditionalTrust(SSLContextImpl.java:919)
      at sun.security.ssl.AbstractTrustManagerWrapper.checkServerTrusted(SSLContextImpl.java:886)
      at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1428)
      ... 14 more
      Failed to establish JNLP connection to https://suct2v420.it.mgt:8443/
      Retrying in 10 seconds

          [JENKINS-31514] Jenkins Swarm slave goes remains offline after master restarts

          Alex Gray added a comment -

          You can use a service like "supervisor" that will automatically start the swarm jar on the slave if it ever goes down. That is what we use. We have the process retry X times with a sleep of Y seconds in between each attempt. That way, we can restart our master for maintenance, and when it is back online the agents will magically re-connect.

          Alex Gray added a comment - You can use a service like "supervisor" that will automatically start the swarm jar on the slave if it ever goes down. That is what we use. We have the process retry X times with a sleep of Y seconds in between each attempt. That way, we can restart our master for maintenance, and when it is back online the agents will magically re-connect.

          Oleg Nenashev added a comment -

          KK does not maintain this plugin anymore. Moving to unassigned to set the expectation

          Oleg Nenashev added a comment - KK does not maintain this plugin anymore. Moving to unassigned to set the expectation

          Basil Crow added a comment -

          This should be working nowadays. You need to use the -deleteExistingClients so that the Swarm Client can connect after the restart. See PipelineJobTest#buildShellScriptAfterRestart for a working example from a unit test.

          Basil Crow added a comment - This should be working nowadays. You need to use the -deleteExistingClients so that the Swarm Client can connect after the restart. See PipelineJobTest#buildShellScriptAfterRestart for a working example from a unit test.

            Unassigned Unassigned
            choonming Choon Ming Goh
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: