Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52283

Jenkins Slaves Not Communicated w/ Master After restart

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Running Jenkins 2.122 on Kubernetes Cluster with Helm Chart 0.9.0

      Kubernets plugin is version is 1.9.2

      When the Jenkins master restarts and the jobs that were in the middle resume, they are timing out trying to connect to slave to master

      ```
      Resuming build at Fri Jun 29 16:23:11 UTC 2018 after Jenkins restart
      Waiting to resume part of ...
      ```

      When I look at the logs for the slaves, I see the following error.

      ```
      Jun 29, 2018 4:23:21 PM hudson.remoting.jnlp.Main$CuiListener errorJun 29, 2018 4:23:21 PM hudson.remoting.jnlp.Main$CuiListener errorSEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstallerjava.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97) at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49) at hudson.remoting.Engine.innerRun(Engine.java:662) at hudson.remoting.Engine.run(Engine.java:469)Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 4 more
      ```

        Attachments

          Issue Links

            Activity

            Hide
            asafpelegcodes Asaf Peleg added a comment -

            We were originally seeing this a lot when we had memory issues with our cluster and the node that the jenkins master was running on kept getting restarted.

            We mitigated the master restarting by increasing its memory via the helm chart and this has seemed to help with restarts.  We also changed the strategy for PERFORMANCE OPTIMIZED which has helped as well.

            Show
            asafpelegcodes Asaf Peleg added a comment - We were originally seeing this a lot when we had memory issues with our cluster and the node that the jenkins master was running on kept getting restarted. We mitigated the master restarting by increasing its memory via the helm chart and this has seemed to help with restarts.  We also changed the strategy for PERFORMANCE OPTIMIZED which has helped as well.
            Hide
            felipecassiors Felipe Santos added a comment -

            I am facing the exact same issue with the same stack trace. I have the suspicious that the Jenkins Agent is giving up to connect to master because the restart on master takes too long, but I didn't find a way to configure it either.

            Show
            felipecassiors Felipe Santos added a comment - I am facing the exact same issue with the same stack trace. I have the suspicious that the Jenkins Agent is giving up to connect to master because the restart on master takes too long, but I didn't find a way to configure it either.
            Hide
            felipecassiors Felipe Santos added a comment -

            I'm reopening as I have an environment to reproduce. I believe I would only need to increase the timeout of the Jenkins JNLP, but I can't find a way to do it.

            Show
            felipecassiors Felipe Santos added a comment - I'm reopening as I have an environment to reproduce. I believe I would only need to increase the timeout of the Jenkins JNLP, but I can't find a way to do it.
            Hide
            felipecassiors Felipe Santos added a comment -

            This seems to be pretty much the same issue, and despite the linked one was closed as resolved, in fact it was not (just the logs were improved).

            Show
            felipecassiors Felipe Santos added a comment - This seems to be pretty much the same issue, and despite the linked one was closed as resolved, in fact it was not (just the logs were improved).
            Hide
            felipecassiors Felipe Santos added a comment -

            A deeper investigation was made by https://github.com/falldamagestudio/UE-Jenkins-Images/issues/5, and he points out that the agent is failing during the reconnect process and not due to timing out, which I believe to make sense.

            Any help here would be very appreciated. I don't know how to fix this by myself but I'm looking.

            Show
            felipecassiors Felipe Santos added a comment - A deeper investigation was made by https://github.com/falldamagestudio/UE-Jenkins-Images/issues/5 , and he points out that the agent is failing during the reconnect process and not due to timing out, which I believe to make sense. Any help here would be very appreciated. I don't know how to fix this by myself but I'm looking.
            Hide
            felipecassiors Felipe Santos added a comment -

            I will create a follow-up issue for this, as I have found how to easily reproduce.

            Show
            felipecassiors Felipe Santos added a comment - I will create a follow-up issue for this, as I have found how to easily reproduce.

              People

              Assignee:
              csanchez Carlos Sanchez
              Reporter:
              asafpelegcodes Asaf Peleg
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: