• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • kubernetes-plugin, remoting
    • None
    • Jenkins v2.89.2
      Kubernetes Plugin v1.3.3
    • Remoting 3.28

      While provisioning slaves from a private Kubernetes instance, we've found that a lot of slaves terminate with the following (or similar) stack trace on the slave's side:

       

      INFO: Setting up slave: kube1-medium-r9zf4
      Apr 10, 2018 11:02:05 AM hudson.remoting.jnlp.Main$CuiListener <init>
      INFO: Jenkins agent is running in headless mode.
      Apr 10, 2018 11:02:05 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /home/<user>/workDir/remoting as a remoting work directory
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Locating server ...
      Apr 10, 2018 11:02:06 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
      INFO: Remoting server accepts the following protocols: [JNLP4-connect, CLI2-connect, JNLP-connect, Ping, CLI-connect, JNLP2-connect]
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Agent discovery successful <...>
      pr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Handshaking
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connecting to <Jenkins Master>
      Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Trying protocol: JNLP4-connect
      Apr 10, 2018 11:02:07 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Remote identity confirmed: <...>
      Apr 10, 2018 11:02:07 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Connected
      Apr 10, 2018 11:02:14 AM hudson.remoting.jnlp.Main$CuiListener status
      INFO: Terminated
      Apr 10, 2018 11:02:14 AM hudson.remoting.UserRequest perform
      WARNING: LinkageError while performing UserRequest:jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2@3e708317
      java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller$2$1
              at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call(JnlpSlaveRestarterInstaller.java:71)
              at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call(JnlpSlaveRestarterInstaller.java:53)
              at hudson.remoting.UserRequest.perform(UserRequest.java:207)
              at hudson.remoting.UserRequest.perform(UserRequest.java:53)
              at hudson.remoting.Request$2.run(Request.java:358)
              at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
              at java.util.concurrent.FutureTask.run(Unknown Source)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
              at hudson.remoting.Engine$1$1.run(Engine.java:98)
              at java.lang.Thread.run(Unknown Source)
      Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1
              at java.net.URLClassLoader.findClass(Unknown Source)
              at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:159)
              at java.lang.ClassLoader.loadClass(Unknown Source)
              at java.lang.ClassLoader.loadClass(Unknown Source)
              ... 11 more
      
      

      The class that appears to not have been found isn't consistently the same. I've seen `FilePathFilter`, `LaunchConfiguration`, `StringBuilderWriter`, and some others being reported as well. Sometimes, there's also exceptions related to `JarCacheSupport` not being able to resolve jars (I don't have the exact stacktrace at hand - will post it if I find it again).

      On the master's side, these exceptions generally manifest as `ChannelClosedException`s, or weird Exception-less failures in pipeline branches.

      ERROR: Issue with creating launcher for agent kube1-medium-r9zf4. The agent has not been fully initialized yet
      
      ERROR: Issue with creating launcher for agent kube1-medium-r9zf4. The agent has not been fully initialized yet
      
      remote file operation failed: /home/<user>/workspace/<job_name> at hudson.remoting.Channel@6639429c:JNLP4-connect connection from <some-host-name>/<ip-address>:60326: hudson.remoting.ChannelClosedException: Remote call on JNLP4-connect connection from <some-host-name>/<ip-address>:60326 failed. The channel is closing down or has closed down
      

      I haven't been able to consistently reproduce the error, but it does manifest enough to be causing major pain to users (especially since we extensively use pipelines with a large number of parallel nodes, and a failure in any one of the nodes causes the entire pipeline to fail).

          [JENKINS-50730] NoClassDefFound errors in Cloud Slaves

          Oleg Nenashev added a comment -

          karthikduddu Sorry, I am not going to review it soon. jthompson is now responsible for triaging Remoting-related issues

          Oleg Nenashev added a comment - karthikduddu Sorry, I am not going to review it soon. jthompson is now responsible for triaging Remoting-related issues

          Emmanuel Costa added a comment - - edited

          We are observing the same bug on our production system,

          2018-04-16 16:16:34.028:INFO:osjs.Server:main: Started @1199ms
          
          16:16:34.028 INFO - Selenium Server is up and running
          
          Apr 16, 2018 4:16:34 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
          
          INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
          
          Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Agent discovery successful
          
            Agent address: jenkins.XXXX.XXX
          
            Agent port:    8888
          
            Identity:      62:69:42:d7:6c:31:25:9d:ab:7e:97:4f:de:36:2a:b1
          
          Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Handshaking
          
          Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Connecting to jenkins.XXX.XX:8888
          
          Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Trying protocol: JNLP4-connect
          
          Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Remote identity confirmed: 62:69:42:d7:6c:31:25:9d:ab:7e:97:4f:de:36:2a:b1
          
          Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Connected
          
          
          
          Apr 16, 2018 4:29:51 PM hudson.remoting.jnlp.Main$CuiListener status
          
          INFO: Terminated
          
          Apr 16, 2018 4:30:01 PM hudson.remoting.jnlp.Main$CuiListener error
          
          SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
          
          java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
          
          at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
          
          at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)
          
          at hudson.remoting.Engine.innerRun(Engine.java:643)
          
          at hudson.remoting.Engine.run(Engine.java:451)
          
          Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
          
          at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
          
          at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:157)
          
          at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
          
          at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
          
          ... 4 more

          Emmanuel Costa added a comment - - edited We are observing the same bug on our production system, 2018-04-16 16:16:34.028:INFO:osjs.Server:main: Started @1199ms 16:16:34.028 INFO - Selenium Server is up and running Apr 16, 2018 4:16:34 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful   Agent address: jenkins.XXXX.XXX   Agent port:    8888   Identity:      62:69:42:d7:6c:31:25:9d:ab:7e:97:4f:de:36:2a:b1 Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to jenkins.XXX.XX:8888 Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: 62:69:42:d7:6c:31:25:9d:ab:7e:97:4f:de:36:2a:b1 Apr 16, 2018 4:16:34 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Apr 16, 2018 4:29:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Apr 16, 2018 4:30:01 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97) at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49) at hudson.remoting.Engine.innerRun(Engine.java:643) at hudson.remoting.Engine.run(Engine.java:451) Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:157) at java.lang. ClassLoader .loadClass( ClassLoader .java:424) at java.lang. ClassLoader .loadClass( ClassLoader .java:357) ... 4 more

          Luke Hopkins added a comment -

          We are having similar issues.  On slave we see

          INFO: Connected
          Apr 27, 2018 3:58:55 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Apr 27, 2018 3:59:05 PM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
          java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
              at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
              at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)
              at hudson.remoting.Engine.innerRun(Engine.java:662)
              at hudson.remoting.Engine.run(Engine.java:469)
          Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
              at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
              at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
          

          We also see in the logs of the job these errors.

          java.io.IOException: remote file operation failed
          
          Cannot contact minion-srzt-svc-template-master-qgsj1-kj8cr: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException

          Luke Hopkins added a comment - We are having similar issues.  On slave we see INFO: Connected Apr 27, 2018 3:58:55 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Apr 27, 2018 3:59:05 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller     at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)     at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)     at hudson.remoting.Engine.innerRun(Engine.java:662)     at hudson.remoting.Engine.run(Engine.java:469) Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)     at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)     at java.lang. ClassLoader .loadClass( ClassLoader .java:424)     at java.lang. ClassLoader .loadClass( ClassLoader .java:357) We also see in the logs of the job these errors. java.io.IOException: remote file operation failed Cannot contact minion-srzt-svc-template-master-qgsj1-kj8cr: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException

          Karthik Duddu added a comment -

          oleg_nenashev jthompson I'd be happy to help in resolving the issue, but I'm not very familiar with the remoting codebase, and I'm not too sure what I'm supposed to be looking out for - I've rebuilt Jenkins/remoting with extremely verbose logging for `JarLoaderImpl` and Checksum classes, but that hasn't really helped as I got inundated with information. Also, one of the biggest problems with this issue is that it isn't consistently reproducible (at least in our case), which makes it difficult to test out too many theories.

          Do you guys have any initial ideas or starting points that we can work off of?

          Karthik Duddu added a comment - oleg_nenashev jthompson I'd be happy to help in resolving the issue, but I'm not very familiar with the remoting codebase, and I'm not too sure what I'm supposed to be looking out for - I've rebuilt Jenkins/remoting with extremely verbose logging for `JarLoaderImpl` and Checksum classes, but that hasn't really helped as I got inundated with information. Also, one of the biggest problems with this issue is that it isn't consistently reproducible (at least in our case), which makes it difficult to test out too many theories. Do you guys have any initial ideas or starting points that we can work off of?

          Jeff Thompson added a comment -

          karthikduddu, manu86, luke_hopkins: Do you continue to see this issue?

          From the provided information, I don't have enough to figure out what is going on. Particularly without any steps to reproduce and with the reported variability.

          I see a couple of other similar reports JENKINS-50458 and JENKINS-52283 but certainly no indication that it is a widespread problem. There might be some similarities with Cloud or particularly Kubernetes environments.--

          In some cases the causes appear to be environment or version related. Getting the correct Remoting, Jenkins, or Java versions seems to have resolved it in some cases. In one case it appears to have been due to memory issues.

          Jeff Thompson added a comment - karthikduddu , manu86 , luke_hopkins : Do you continue to see this issue? From the provided information, I don't have enough to figure out what is going on. Particularly without any steps to reproduce and with the reported variability. I see a couple of other similar reports JENKINS-50458 and JENKINS-52283  but certainly no indication that it is a widespread problem. There might be some similarities with Cloud or particularly Kubernetes environments.-- In some cases the causes appear to be environment or version related. Getting the correct Remoting, Jenkins, or Java versions seems to have resolved it in some cases. In one case it appears to have been due to memory issues.

          jthompson - I'm replying on behalf of Karthik as he is no longer working here. We've stopped seeing this issue, but haven't made any changes to Jenkins on our side. Java version = 8, Jenkins core = v2.89.2, agent version = 3.14 - we haven't upgraded any of these since April. It was also not a memory issue on the master, as there was plenty of free memory there. My suspicion is that it's related to something else in our environment (Kubernetes, networking, etc).

          We haven't tried agent v3.19 as reported in one of the linked issues - maybe that will help. But then again, it only seems to have helped in one case, so I'm not sure. If this issue comes up again I'll report back here, but for the moment it is gone.

          Vicky Chijwani added a comment - jthompson - I'm replying on behalf of Karthik as he is no longer working here. We've stopped seeing this issue, but haven't made any changes to Jenkins on our side. Java version = 8, Jenkins core = v2.89.2, agent version = 3.14 - we haven't upgraded any of these since April. It was also not a memory issue on the master, as there was plenty of free memory there. My suspicion is that it's related to something else in our environment (Kubernetes, networking, etc). We haven't tried agent v3.19 as reported in one of the linked issues - maybe that will help. But then again, it only seems to have helped in one case, so I'm not sure. If this issue comes up again I'll report back here, but for the moment it is gone.

          Jeff Thompson added a comment -

          vchijwani, thanks for the reply. I'm going to leave this open for a couple more days and see if anyone can provide further details, otherwise I'll mark it as Cannot Reproduce and close it.

          Jeff Thompson added a comment - vchijwani , thanks for the reply. I'm going to leave this open for a couple more days and see if anyone can provide further details, otherwise I'll mark it as Cannot Reproduce and close it.

          Alon Lavi added a comment -

          jthompson, please don't close this issue. We're having the same problem. I still didn't figure out a way to reproduce, but it's really bothering.

          Alon Lavi added a comment - jthompson , please don't close this issue. We're having the same problem. I still didn't figure out a way to reproduce, but it's really bothering.

          Jeff Thompson added a comment -

          alonlavi, there are different opinions on how issue reports like this should be handled. I tend to follow the approach that if it cannot be sufficiently described and reproduced then it should be closed as cannot reproduce and then re-opened if someone obtains better information. Particularly when a significant number of the reporters have seen the problem go away from environmental or version changes.

          But, let's keep this one open for a while longer and see if we get any better information.

          Jeff Thompson added a comment - alonlavi , there are different opinions on how issue reports like this should be handled. I tend to follow the approach that if it cannot be sufficiently described and reproduced then it should be closed as cannot reproduce and then re-opened if someone obtains better information. Particularly when a significant number of the reporters have seen the problem go away from environmental or version changes. But, let's keep this one open for a while longer and see if we get any better information.

          Jeff Thompson added a comment -

          I saw this ClassNotFoundException yesterday. It was one of the last things in a string of exceptions and messages in my agent logs on my Windows 10 system when it went to sleep. I couldn't see that it caused any failures or problems. Things started up without any problem when the system woke up.

          This might not have any relation to other instances when people are seeing this exception. However given other reports, this might not be causing any problems but may be following on from other problems. Perhaps the real problem is elsewhere (environmental or configuration as noted by many, including my experience) and this particular message is a distraction.

          Jeff Thompson added a comment - I saw this ClassNotFoundException yesterday. It was one of the last things in a string of exceptions and messages in my agent logs on my Windows 10 system when it went to sleep. I couldn't see that it caused any failures or problems. Things started up without any problem when the system woke up. This might not have any relation to other instances when people are seeing this exception. However given other reports, this might not be causing any problems but may be following on from other problems. Perhaps the real problem is elsewhere (environmental or configuration as noted by many, including my experience) and this particular message is a distraction.

          Jeff Thompson added a comment -

          Since the original reporter no longer observes this issue and no one has provided additional information to describe, diagnose, or reproduce this issue in quite some time I am going to close this report down as Cannot Reproduce. If anyone can provide further reports and information that could move this forward we can re-open it at that time.

          Jeff Thompson added a comment - Since the original reporter no longer observes this issue and no one has provided additional information to describe, diagnose, or reproduce this issue in quite some time I am going to close this report down as Cannot Reproduce. If anyone can provide further reports and information that could move this forward we can re-open it at that time.

          Thomas COAN added a comment - - edited

          I face same problem for a couple of days now. 

          The slave agent is launched with following command 

          java -jar agent.jar -jnlpUrl https://MY_MASTER_HOST:MY_MASTER_PORT/computer/slave-number1/slave-agent.jnlp -secret ***** -workDir "/var/lib/jenkins"

          It was working fine for monthes, but i don't know what has changed (I don't remember any updates or modification on master or slave)

          Slave works during several minutes/hours and dies again with the following error :

          INFO: Connecting to MY_MASTER_HOST:MY_MASTER_PORT
          Oct 24, 2018 10:36:35 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Trying protocol: JNLP4-connect
          Oct 24, 2018 10:36:35 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Remote identity confirmed: 47:2d:42:ae:61:c6:79:b1:69:79:27:d7:26:b8:15:ec
          Oct 24, 2018 10:36:36 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          Oct 24, 2018 10:51:36 AM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Oct 24, 2018 10:51:46 AM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
          java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
          at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97)
          at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49)
          at hudson.remoting.Engine.innerRun(Engine.java:662)
          at hudson.remoting.Engine.run(Engine.java:469)
          Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
          at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
          at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
          ... 4 more

          Config : 

          • java : 1.8.0_181
          • os : ubuntu 16.04.4 LTS (Xenial Xerus)
          • jenkins master version : 2.138.2

           

          Thomas COAN added a comment - - edited I face same problem for a couple of days now.  The slave agent is launched with following command  java -jar agent.jar -jnlpUrl https://MY_MASTER_HOST:MY_MASTER_PORT/computer/slave-number1/slave-agent.jnlp -secret ***** -workDir "/var/lib/jenkins" It was working fine for monthes, but i don't know what has changed (I don't remember any updates or modification on master or slave) Slave works during several minutes/hours and dies again with the following error : INFO: Connecting to MY_MASTER_HOST:MY_MASTER_PORT Oct 24, 2018 10:36:35 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Oct 24, 2018 10:36:35 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: 47:2d:42:ae:61:c6:79:b1:69:79:27:d7:26:b8:15:ec Oct 24, 2018 10:36:36 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Oct 24, 2018 10:51:36 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Oct 24, 2018 10:51:46 AM hudson.remoting.jnlp.Main$CuiListener error SEVERE: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:97) at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:49) at hudson.remoting.Engine.innerRun(Engine.java:662) at hudson.remoting.Engine.run(Engine.java:469) Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 4 more Config :   java : 1.8.0_181 os : ubuntu 16.04.4 LTS (Xenial Xerus) jenkins master version : 2.138.2  

          Jeff Thompson added a comment -

          As far as I've been able to tell so, this is not a cause of failures or even a direct symptom. It results after an earlier connection failure and just indicates that a reconnection attempt has failed.

          The stack traces all begin with a failure in onReconnect(). This indicates the connection has terminated and Remoting is attempting to reconnect. Unfortunately, the reconnect attempt fails and this message results.

          Connection failures and disconnects can happen for many reasons, commonly associated with system, network, or environment issues. Particularly if nothing has changed and this suddenly starts occurring it is likely to be one of these external causes.

          It might be possible to improve the retry logic but without further information on what is occurring it is difficult to know what changes might improve it.

          One thing I could easily do is to change the log messaging and downgrade the severity of this particular message.

          Jeff Thompson added a comment - As far as I've been able to tell so, this is not a cause of failures or even a direct symptom. It results after an earlier connection failure and just indicates that a reconnection attempt has failed. The stack traces all begin with a failure in onReconnect(). This indicates the connection has terminated and Remoting is attempting to reconnect. Unfortunately, the reconnect attempt fails and this message results. Connection failures and disconnects can happen for many reasons, commonly associated with system, network, or environment issues. Particularly if nothing has changed and this suddenly starts occurring it is likely to be one of these external causes. It might be possible to improve the retry logic but without further information on what is occurring it is difficult to know what changes might improve it. One thing I could easily do is to change the log messaging and downgrade the severity of this particular message.

          Thomas COAN added a comment -

          Thanks Jeff,

          Yes changing the log message would be great. 

           

          => As a workaround, I have changes the way of connecting the slave using now the ssh agent method instead of java web start.

          There are no connection failure from slave to master anymore for a couple of days

          Thomas 

          Thomas COAN added a comment - Thanks Jeff, Yes changing the log message would be great.    => As a workaround, I have changes the way of connecting the slave using now the ssh agent  method instead of java web start. There are no connection failure from slave to master anymore for a couple of days Thomas 

          Jeff Thompson added a comment -

          tcoan, I've got a PR in review to tweak the messaging. https://github.com/jenkinsci/remoting/pull/295   If you have any comments on that proposal, please share.

          That's great news on the improved reliability using ssh agent. I wish I knew why that was the case.

          Jeff Thompson added a comment - tcoan , I've got a PR in review to tweak the messaging. https://github.com/jenkinsci/remoting/pull/295    If you have any comments on that proposal, please share. That's great news on the improved reliability using ssh agent. I wish I knew why that was the case.

          Thomas COAN added a comment -

          jthompson, I have reviewed the PR, it seems good to me. Thanks Jeff.

          As a remark I have read in the PR comments that the case should be rare, but for some weeks it was happening 5-10 times per day on my side. A research in the web seems to demonstrate that there are some people facing the same issue. 

          With the new message, we will be able to investigate outside jenkins. Previous message seemed to indicate it was an internal error.

           

          Thomas COAN added a comment - jthompson , I have reviewed the PR, it seems good to me. Thanks Jeff. As a remark I have read in the PR comments that the case should be rare, but for some weeks it was happening 5-10 times per day on my side. A research in the web seems to demonstrate that there are some people facing the same issue.  With the new message, we will be able to investigate outside jenkins. Previous message seemed to indicate it was an internal error.  

          Jeff Thompson added a comment -

          tcoan, if you have a GitHub account, you would be welcome to add your comment directly to the PR. At this point, though, I hope to merge that today so no need to bother. 

          I think the question of rarity was on the order of minutes or less. If the occurrence was that frequent these log messages could still be annoying. At 5-10 per day the disconnects are very annoying but the log messages may be helpful.

          Jeff Thompson added a comment - tcoan , if you have a GitHub account, you would be welcome to add your comment directly to the PR. At this point, though, I hope to merge that today so no need to bother.  I think the question of rarity was on the order of minutes or less. If the occurrence was that frequent these log messages could still be annoying. At 5-10 per day the disconnects are very annoying but the log messages may be helpful.

          Jeff Thompson added a comment -

          As discussed in the comments, we decided to improve the log messages on reconnect to reduce the log level and improve the clarity. This won't do anything to help reduce the original problem but we've received insufficient information to reproduce those situations.

          This will be picked up by a Jenkins weekly build soon.

          Jeff Thompson added a comment - As discussed in the comments, we decided to improve the log messages on reconnect to reduce the log level and improve the clarity. This won't do anything to help reduce the original problem but we've received insufficient information to reproduce those situations. This will be picked up by a Jenkins weekly build soon.

            jthompson Jeff Thompson
            karthikduddu Karthik Duddu
            Votes:
            3 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: