• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • kubernetes-plugin, remoting
    • None
    • Jenkins v2.89.2
      Kubernetes Plugin v1.3.3
    • Remoting 3.28

      While provisioning slaves from a private Kubernetes instance, we've found that a lot of slaves terminate with the following (or similar) stack trace on the slave's side:

       

      INFO: Setting up slave: kube1-medium-r9zf4
      Apr 10, 2018 11:02:05 AM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Apr 10, 2018 11:02:05 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /home/<user>/workDir/remoting as a remoting work directory
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server ...
      Apr 10, 2018 11:02:06 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
      INFO: Remoting server accepts the following protocols: [JNLP4-connect, CLI2-connect, JNLP-connect, Ping, CLI-connect, JNLP2-connect]
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Agent discovery successful <...>
      pr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to <Jenkins Master>
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Trying protocol: JNLP4-connect
      Apr 10, 2018 11:02:07 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Remote identity confirmed: <...>
      Apr 10, 2018 11:02:07 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      Apr 10, 2018 11:02:14 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Apr 10, 2018 11:02:14 AM hudson.remoting.UserRequest perform
      WARNING: LinkageError while performing UserRequest:jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2@3e708317
      java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller$2$1
              at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call(JnlpSlaveRestarterInstaller.java:71)
              at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call(JnlpSlaveRestarterInstaller.java:53)
              at hudson.remoting.UserRequest.perform(UserRequest.java:207)
              at hudson.remoting.UserRequest.perform(UserRequest.java:53)
              at hudson.remoting.Request$2.run(Request.java:358)
              at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
              at java.util.concurrent.FutureTask.run(Unknown Source)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
              at hudson.remoting.Engine$1$1.run(Engine.java:98)
              at java.lang.Thread.run(Unknown Source)
      Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1
              at java.net.URLClassLoader.findClass(Unknown Source)
              at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:159)
              at java.lang.ClassLoader.loadClass(Unknown Source)
              at java.lang.ClassLoader.loadClass(Unknown Source)
              ... 11 more
      
      

      The class that appears to not have been found isn't consistently the same. I've seen `FilePathFilter`, `LaunchConfiguration`, `StringBuilderWriter`, and some others being reported as well. Sometimes, there's also exceptions related to `JarCacheSupport` not being able to resolve jars (I don't have the exact stacktrace at hand - will post it if I find it again).

      On the master's side, these exceptions generally manifest as `ChannelClosedException`s, or weird Exception-less failures in pipeline branches.

      ERROR: Issue with creating launcher for agent kube1-medium-r9zf4. The agent has not been fully initialized yet
      
      ERROR: Issue with creating launcher for agent kube1-medium-r9zf4. The agent has not been fully initialized yet
      
      remote file operation failed: /home/<user>/workspace/<job_name> at hudson.remoting.Channel@6639429c:JNLP4-connect connection from <some-host-name>/<ip-address>:60326: hudson.remoting.ChannelClosedException: Remote call on JNLP4-connect connection from <some-host-name>/<ip-address>:60326 failed. The channel is closing down or has closed down
      

      I haven't been able to consistently reproduce the error, but it does manifest enough to be causing major pain to users (especially since we extensively use pipelines with a large number of parallel nodes, and a failure in any one of the nodes causes the entire pipeline to fail).

          [JENKINS-50730] NoClassDefFound errors in Cloud Slaves

          Karthik Duddu created issue -
          Karthik Duddu made changes -
          Assignee Original: Carlos Sanchez [ csanchez ] New: Oleg Nenashev [ oleg_nenashev ]
          Oleg Nenashev made changes -
          Assignee Original: Oleg Nenashev [ oleg_nenashev ] New: Jeff Thompson [ jthompson ]

          Oleg Nenashev added a comment -

          karthikduddu Sorry, I am not going to review it soon. jthompson is now responsible for triaging Remoting-related issues

          Oleg Nenashev added a comment - karthikduddu Sorry, I am not going to review it soon. jthompson is now responsible for triaging Remoting-related issues

          Emmanuel Costa added a comment - - edited

          We are observing the same bug on our production system,

          2018-04-16 16:16:34.028:INFO:osjs.Server:main: Started @1199ms
          
          16:16:34.028 INFO - Selenium Server is up and running
          
          Apr 16, 2018 4:16:34 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
          
          INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
          
          Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Agent discovery successful
          
            Agent address: jenkins.XXXX.XXX
          
            Agent port:    8888
          
            Identity:      62:69:42:d7:6c:31:25:9d:ab:7e:97:4f:de:36:2a:b1
          
          Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Handshaking
          
          Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Connecting to jenkins.XXX.XX:8888
          
          Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Trying protocol: JNLP4-connect
          
          Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Remote identity confirmed: 62:69:42:d7:6c:31:25:9d:ab:7e:97:4f:de:36:2a:b1
          
          Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Connected
          
          
          
          Apr 16, 2018 4:29:51 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Terminated
          
          Apr 16, 2018 4:30:01 PM hudson.remoting.jnlp.Main$CuiListener error
          
          SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
          
          java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
          
          at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
          
          at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)
          
          at hudson.remoting.Engine.innerRun(Engine.java:643)
          
          at hudson.remoting.Engine.run(Engine.java:451)
          
          Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
          
          at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
          
          at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:157)
          
          at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
          
          at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
          
          ... 4 more

          Emmanuel Costa added a comment - - edited We are observing the same bug on our production system, 2018-04-16 16:16:34.028:INFO:osjs.Server:main: Started @1199ms 16:16:34.028 INFO - Selenium Server is up and running Apr 16, 2018 4:16:34 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful   Agent address: jenkins.XXXX.XXX   Agent port:    8888   Identity:      62:69:42:d7:6c:31:25:9d:ab:7e:97:4f:de:36:2a:b1 Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins.XXX.XX:8888 Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: 62:69:42:d7:6c:31:25:9d:ab:7e:97:4f:de:36:2a:b1 Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Apr 16, 2018 4:29:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Apr 16, 2018 4:30:01 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97) at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49) at hudson.remoting.Engine.innerRun(Engine.java:643) at hudson.remoting.Engine.run(Engine.java:451) Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:157) at java.lang. ClassLoader .loadClass( ClassLoader .java:424) at java.lang. ClassLoader .loadClass( ClassLoader .java:357) ... 4 more

          Luke Hopkins added a comment -

          We are having similar issues.  On slave we see

          INFO: Connected
          Apr 27, 2018 3:58:55 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Apr 27, 2018 3:59:05 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
          java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
              at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
              at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)
              at hudson.remoting.Engine.innerRun(Engine.java:662)
              at hudson.remoting.Engine.run(Engine.java:469)
          Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
              at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
              at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
          

          We also see in the logs of the job these errors.

          java.io.IOException: remote file operation failed
          
          Cannot contact minion-srzt-svc-template-master-qgsj1-kj8cr: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException

          Luke Hopkins added a comment - We are having similar issues.  On slave we see INFO: Connected Apr 27, 2018 3:58:55 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Apr 27, 2018 3:59:05 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller     at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)     at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)     at hudson.remoting.Engine.innerRun(Engine.java:662)     at hudson.remoting.Engine.run(Engine.java:469) Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)     at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)     at java.lang. ClassLoader .loadClass( ClassLoader .java:424)     at java.lang. ClassLoader .loadClass( ClassLoader .java:357) We also see in the logs of the job these errors. java.io.IOException: remote file operation failed Cannot contact minion-srzt-svc-template-master-qgsj1-kj8cr: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException

          Karthik Duddu added a comment -

          oleg_nenashev jthompson I'd be happy to help in resolving the issue, but I'm not very familiar with the remoting codebase, and I'm not too sure what I'm supposed to be looking out for - I've rebuilt Jenkins/remoting with extremely verbose logging for `JarLoaderImpl` and Checksum classes, but that hasn't really helped as I got inundated with information. Also, one of the biggest problems with this issue is that it isn't consistently reproducible (at least in our case), which makes it difficult to test out too many theories.

          Do you guys have any initial ideas or starting points that we can work off of?

          Karthik Duddu added a comment - oleg_nenashev jthompson I'd be happy to help in resolving the issue, but I'm not very familiar with the remoting codebase, and I'm not too sure what I'm supposed to be looking out for - I've rebuilt Jenkins/remoting with extremely verbose logging for `JarLoaderImpl` and Checksum classes, but that hasn't really helped as I got inundated with information. Also, one of the biggest problems with this issue is that it isn't consistently reproducible (at least in our case), which makes it difficult to test out too many theories. Do you guys have any initial ideas or starting points that we can work off of?
          CloudBees Inc. made changes -
          Remote Link New: This issue links to "CloudBees Internal FNDN-235 (Web Link)" [ 20775 ]

          Jeff Thompson added a comment -

          karthikduddu, manu86, luke_hopkins: Do you continue to see this issue?

          From the provided information, I don't have enough to figure out what is going on. Particularly without any steps to reproduce and with the reported variability.

          I see a couple of other similar reports JENKINS-50458 and JENKINS-52283 but certainly no indication that it is a widespread problem. There might be some similarities with Cloud or particularly Kubernetes environments.--

          In some cases the causes appear to be environment or version related. Getting the correct Remoting, Jenkins, or Java versions seems to have resolved it in some cases. In one case it appears to have been due to memory issues.

          Jeff Thompson added a comment - karthikduddu , manu86 , luke_hopkins : Do you continue to see this issue? From the provided information, I don't have enough to figure out what is going on. Particularly without any steps to reproduce and with the reported variability. I see a couple of other similar reports JENKINS-50458 and JENKINS-52283  but certainly no indication that it is a widespread problem. There might be some similarities with Cloud or particularly Kubernetes environments.-- In some cases the causes appear to be environment or version related. Getting the correct Remoting, Jenkins, or Java versions seems to have resolved it in some cases. In one case it appears to have been due to memory issues.

          jthompson - I'm replying on behalf of Karthik as he is no longer working here. We've stopped seeing this issue, but haven't made any changes to Jenkins on our side. Java version = 8, Jenkins core = v2.89.2, agent version = 3.14 - we haven't upgraded any of these since April. It was also not a memory issue on the master, as there was plenty of free memory there. My suspicion is that it's related to something else in our environment (Kubernetes, networking, etc).

          We haven't tried agent v3.19 as reported in one of the linked issues - maybe that will help. But then again, it only seems to have helped in one case, so I'm not sure. If this issue comes up again I'll report back here, but for the moment it is gone.

          Vicky Chijwani added a comment - jthompson - I'm replying on behalf of Karthik as he is no longer working here. We've stopped seeing this issue, but haven't made any changes to Jenkins on our side. Java version = 8, Jenkins core = v2.89.2, agent version = 3.14 - we haven't upgraded any of these since April. It was also not a memory issue on the master, as there was plenty of free memory there. My suspicion is that it's related to something else in our environment (Kubernetes, networking, etc). We haven't tried agent v3.19 as reported in one of the linked issues - maybe that will help. But then again, it only seems to have helped in one case, so I'm not sure. If this issue comes up again I'll report back here, but for the moment it is gone.

            jthompson Jeff Thompson
            karthikduddu Karthik Duddu
            Votes:
            3 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: