Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52283

Jenkins Slaves Not Communicated w/ Master After restart

    XMLWordPrintable

Details

    • 2.338

    Description

      Running Jenkins 2.122 on Kubernetes Cluster with Helm Chart 0.9.0

      Kubernets plugin is version is 1.9.2

      When the Jenkins master restarts and the jobs that were in the middle resume, they are timing out trying to connect to slave to master

      ```
      Resuming build at Fri Jun 29 16:23:11 UTC 2018 after Jenkins restart
      Waiting to resume part of ...
      ```

      When I look at the logs for the slaves, I see the following error.

      ```
      Jun 29, 2018 4:23:21 PM hudson.remoting.jnlp.Main$CuiListener errorJun 29, 2018 4:23:21 PM hudson.remoting.jnlp.Main$CuiListener errorSEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstallerjava.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97) at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49) at hudson.remoting.Engine.innerRun(Engine.java:662) at hudson.remoting.Engine.run(Engine.java:469)Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 4 more
      ```

      Attachments

        Issue Links

          Activity

            asafpelegcodes Asaf Peleg added a comment -

            We were originally seeing this a lot when we had memory issues with our cluster and the node that the jenkins master was running on kept getting restarted.

            We mitigated the master restarting by increasing its memory via the helm chart and this has seemed to help with restarts.  We also changed the strategy for PERFORMANCE OPTIMIZED which has helped as well.

            asafpelegcodes Asaf Peleg added a comment - We were originally seeing this a lot when we had memory issues with our cluster and the node that the jenkins master was running on kept getting restarted. We mitigated the master restarting by increasing its memory via the helm chart and this has seemed to help with restarts.  We also changed the strategy for PERFORMANCE OPTIMIZED which has helped as well.
            felipecassiors Felipe Santos added a comment -

            I am facing the exact same issue with the same stack trace. I have the suspicious that the Jenkins Agent is giving up to connect to master because the restart on master takes too long, but I didn't find a way to configure it either.

            felipecassiors Felipe Santos added a comment - I am facing the exact same issue with the same stack trace. I have the suspicious that the Jenkins Agent is giving up to connect to master because the restart on master takes too long, but I didn't find a way to configure it either.
            felipecassiors Felipe Santos added a comment -

            I'm reopening as I have an environment to reproduce. I believe I would only need to increase the timeout of the Jenkins JNLP, but I can't find a way to do it.

            felipecassiors Felipe Santos added a comment - I'm reopening as I have an environment to reproduce. I believe I would only need to increase the timeout of the Jenkins JNLP, but I can't find a way to do it.
            felipecassiors Felipe Santos added a comment -

            This seems to be pretty much the same issue, and despite the linked one was closed as resolved, in fact it was not (just the logs were improved).

            felipecassiors Felipe Santos added a comment - This seems to be pretty much the same issue, and despite the linked one was closed as resolved, in fact it was not (just the logs were improved).
            felipecassiors Felipe Santos added a comment -

            A deeper investigation was made by https://github.com/falldamagestudio/UE-Jenkins-Images/issues/5, and he points out that the agent is failing during the reconnect process and not due to timing out, which I believe to make sense.

            Any help here would be very appreciated. I don't know how to fix this by myself but I'm looking.

            felipecassiors Felipe Santos added a comment - A deeper investigation was made by https://github.com/falldamagestudio/UE-Jenkins-Images/issues/5 , and he points out that the agent is failing during the reconnect process and not due to timing out, which I believe to make sense. Any help here would be very appreciated. I don't know how to fix this by myself but I'm looking.
            felipecassiors Felipe Santos added a comment -

            I will create a follow-up issue for this, as I have found how to easily reproduce.

            felipecassiors Felipe Santos added a comment - I will create a follow-up issue for this, as I have found how to easily reproduce.
            basil Basil Crow added a comment -

            Duplicates JENKINS-66446, which was fixed in jenkinsci/jenkins#6315 and jenkinsci/jenkins#6329 toward 2.338.

            basil Basil Crow added a comment - Duplicates JENKINS-66446 , which was fixed in jenkinsci/jenkins#6315 and jenkinsci/jenkins#6329 toward 2.338.

            People

              Unassigned Unassigned
              asafpelegcodes Asaf Peleg
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: