Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52283

Jenkins Slaves Not Communicated w/ Master After restart

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • kubernetes-plugin
    • None
    • 2.338

      Running Jenkins 2.122 on Kubernetes Cluster with Helm Chart 0.9.0

      Kubernets plugin is version is 1.9.2

      When the Jenkins master restarts and the jobs that were in the middle resume, they are timing out trying to connect to slave to master

      ```
      Resuming build at Fri Jun 29 16:23:11 UTC 2018 after Jenkins restart
      Waiting to resume part of ...
      ```

      When I look at the logs for the slaves, I see the following error.

      ```
      Jun 29, 2018 4:23:21 PM hudson.remoting.jnlp.Main$CuiListener errorJun 29, 2018 4:23:21 PM hudson.remoting.jnlp.Main$CuiListener errorSEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstallerjava.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97) at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49) at hudson.remoting.Engine.innerRun(Engine.java:662) at hudson.remoting.Engine.run(Engine.java:469)Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 4 more
      ```

          [JENKINS-52283] Jenkins Slaves Not Communicated w/ Master After restart

          Asaf Peleg added a comment -

          We were originally seeing this a lot when we had memory issues with our cluster and the node that the jenkins master was running on kept getting restarted.

          We mitigated the master restarting by increasing its memory via the helm chart and this has seemed to help with restarts.  We also changed the strategy for PERFORMANCE OPTIMIZED which has helped as well.

          Asaf Peleg added a comment - We were originally seeing this a lot when we had memory issues with our cluster and the node that the jenkins master was running on kept getting restarted. We mitigated the master restarting by increasing its memory via the helm chart and this has seemed to help with restarts.  We also changed the strategy for PERFORMANCE OPTIMIZED which has helped as well.

          Felipe Santos added a comment -

          I am facing the exact same issue with the same stack trace. I have the suspicious that the Jenkins Agent is giving up to connect to master because the restart on master takes too long, but I didn't find a way to configure it either.

          Felipe Santos added a comment - I am facing the exact same issue with the same stack trace. I have the suspicious that the Jenkins Agent is giving up to connect to master because the restart on master takes too long, but I didn't find a way to configure it either.

          Felipe Santos added a comment -

          I'm reopening as I have an environment to reproduce. I believe I would only need to increase the timeout of the Jenkins JNLP, but I can't find a way to do it.

          Felipe Santos added a comment - I'm reopening as I have an environment to reproduce. I believe I would only need to increase the timeout of the Jenkins JNLP, but I can't find a way to do it.

          Felipe Santos added a comment -

          This seems to be pretty much the same issue, and despite the linked one was closed as resolved, in fact it was not (just the logs were improved).

          Felipe Santos added a comment - This seems to be pretty much the same issue, and despite the linked one was closed as resolved, in fact it was not (just the logs were improved).

          Felipe Santos added a comment -

          A deeper investigation was made by https://github.com/falldamagestudio/UE-Jenkins-Images/issues/5, and he points out that the agent is failing during the reconnect process and not due to timing out, which I believe to make sense.

          Any help here would be very appreciated. I don't know how to fix this by myself but I'm looking.

          Felipe Santos added a comment - A deeper investigation was made by https://github.com/falldamagestudio/UE-Jenkins-Images/issues/5 , and he points out that the agent is failing during the reconnect process and not due to timing out, which I believe to make sense. Any help here would be very appreciated. I don't know how to fix this by myself but I'm looking.

          Felipe Santos added a comment -

          I will create a follow-up issue for this, as I have found how to easily reproduce.

          Felipe Santos added a comment - I will create a follow-up issue for this, as I have found how to easily reproduce.

          Basil Crow added a comment -

          Duplicates JENKINS-66446, which was fixed in jenkinsci/jenkins#6315 and jenkinsci/jenkins#6329 toward 2.338.

          Basil Crow added a comment - Duplicates JENKINS-66446 , which was fixed in jenkinsci/jenkins#6315 and jenkinsci/jenkins#6329 toward 2.338.

            Unassigned Unassigned
            asafpelegcodes Asaf Peleg
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: