• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • core, ssh-slaves-plugin
    • None
    • Jenkins 2.89.3
      Mesos-Plugin 0.15.0
      SSH Slaves 1.25
    • ssh-slaves-1.30.0

      After upgrading to SSH Slaves plugin 1.25 from 1.23 we started seeing these errors in the logs. Slaves started to report clock differences of 10 seconds and ping times upwards of 17 seconds.

      We spin up our slaves using the mesos-plugin and mesosphere's docker-in-docker jenkins containers: https://github.com/mesosphere/dcos-jenkins-dind-agent

      The master would then get pegged @ 100% CPU usage and all builds would start to fail

      ===================== Errors

      WARNING: Failed to monitor mesos-jenkins-ad54e4db86ab493cbfcbf40fc586031d-mesos for Free Temp Space
       java.util.concurrent.ExecutionException: java.lang.Error: Failed to deserialize the Callable object.
       at hudson.remoting.Channel$2.adapt(Channel.java:943)
       at hudson.remoting.Channel$2.adapt(Channel.java:938)
       at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
       at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
       at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
       Caused by: java.lang.Error: Failed to deserialize the Callable object.
       at hudson.remoting.UserRequest.perform(UserRequest.java:192)
       at hudson.remoting.UserRequest.perform(UserRequest.java:54)
       at hudson.remoting.Request$2.run(Request.java:360)
       at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at hudson.remoting.Engine$1$1.run(Engine.java:98)
       at java.lang.Thread.run(Thread.java:748)
       at ......remote call to JNLP4-connect connection from ip-10-89-142-141.us-west-2.compute.internal/10.89.142.141:48042(Native Method)
       at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1654)
       at hudson.remoting.UserResponse.retrieve(UserRequest.java:311)
       at hudson.remoting.Channel$2.adapt(Channel.java:941)
       ... 4 more
       Caused by: hudson.remoting.RemotingSystemException: java.lang.InterruptedException
       at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:273)
       at com.sun.proxy.$Proxy6.fetch(Unknown Source)
       at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:301)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
       at java.lang.Class.getDeclaredMethods0(Native Method)
       at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
       at java.lang.Class.getDeclaredMethod(Class.java:2128)
       at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
       at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
       at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
       at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
       at java.security.AccessController.doPrivileged(Native Method)
       at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:472)
       at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
       at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:598)
       at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1843)
       at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
       at hudson.remoting.UserRequest.deserialize(UserRequest.java:275)
       at hudson.remoting.UserRequest.perform(UserRequest.java:186)
       at hudson.remoting.UserRequest.perform(UserRequest.java:54)
       at hudson.remoting.Request$2.run(Request.java:360)
       at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at hudson.remoting.Engine$1$1.run(Engine.java:98)
       at java.lang.Thread.run(Thread.java:748)
       Caused by: java.lang.InterruptedException
       at java.lang.Object.wait(Native Method)
       at hudson.remoting.Request.call(Request.java:169)
       at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:260)
       ... 38 more
      
      Jan 22, 2018 7:41:37 PM hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
       WARNING: Failed to monitor mesos-jenkins-9fe9204c80af4f5c98e67cf012e24a39-mesos for Free Disk Space
       java.util.concurrent.ExecutionException: java.io.InvalidClassException: hudson.FilePath; local class incompatible: stream classdesc serialVersionUID = 1, local class serialVersionUID = -7135276226716035594
       at hudson.remoting.Channel$2.adapt(Channel.java:943)
       at hudson.remoting.Channel$2.adapt(Channel.java:938)
       at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
       at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
       at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
       Caused by: java.io.InvalidClassException: hudson.FilePath; local class incompatible: stream classdesc serialVersionUID = 1, local class serialVersionUID = -7135276226716035594
       at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
       at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1843)
       at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
       at hudson.remoting.UserRequest.deserialize(UserRequest.java:275)
       at hudson.remoting.UserRequest.perform(UserRequest.java:186)
       at hudson.remoting.UserRequest.perform(UserRequest.java:54)
       at hudson.remoting.Request$2.run(Request.java:360)
       at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at hudson.remoting.Engine$1$1.run(Engine.java:98)
       at java.lang.Thread.run(Thread.java:748)
       at ......remote call to JNLP4-connect connection from ip-10-89-142-71.us-west-2.compute.internal/10.89.142.71:35556(Native Method)
       at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1654)
       at hudson.remoting.UserResponse.retrieve(UserRequest.java:311)
       at hudson.remoting.Channel$2.adapt(Channel.java:941)
       ... 4 more
      
      Jan 22, 2018 7:41:37 PM hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
       WARNING: Failed to monitor mesos-jenkins-9fe9204c80af4f5c98e67cf012e24a39-mesos for Free Temp Space
       java.util.concurrent.ExecutionException: java.io.InvalidClassException: hudson.FilePath; local class incompatible: stream classdesc serialVersionUID = 1, local class serialVersionUID = -7135276226716035594
       at hudson.remoting.Channel$2.adapt(Channel.java:943)
       at hudson.remoting.Channel$2.adapt(Channel.java:938)
       at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
       at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:96)
       at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)
       Caused by: java.io.InvalidClassException: hudson.FilePath; local class incompatible: stream classdesc serialVersionUID = 1, local class serialVersionUID = -7135276226716035594
       at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
       at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1843)
       at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
       at hudson.remoting.UserRequest.deserialize(UserRequest.java:275)
       at hudson.remoting.UserRequest.perform(UserRequest.java:186)
       at hudson.remoting.UserRequest.perform(UserRequest.java:54)
       at hudson.remoting.Request$2.run(Request.java:360)
       at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at hudson.remoting.Engine$1$1.run(Engine.java:98)
       at java.lang.Thread.run(Thread.java:748)
       at ......remote call to JNLP4-connect connection from ip-10-89-142-71.us-west-2.compute.internal/10.89.142.71:35556(Native Method)
       at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1654)
       at hudson.remoting.UserResponse.retrieve(UserRequest.java:311)
       at hudson.remoting.Channel$2.adapt(Channel.java:941)
       ... 4 more
      
      Jan 22, 2018 7:41:38 PM org.jenkinsci.plugins.mesos.MesosSlave getRootPath
       WARNING: IO exception while absolutizing slave root path: java.io.IOException: remote file operation failed: jenkins at hudson.remoting.Channel@1e42b1f:JNLP4-connect connection from ip-xxxxxx.us-west-2.compute.internal/xxxxxx:48042: java.io.IOException: Remote call on JNLP4-connect connection from ip-xxxxxx.us-west-2.compute.internal/1xxxx:48042 failed
      

          [JENKINS-49118] SSH Slaves 1.25 Breaks

          Oleg Nenashev added a comment -

          Please provide your agent configuration. Do you ise custom remoting version on the remote side?

          Oleg Nenashev added a comment - Please provide your agent configuration. Do you ise custom remoting version on the remote side?

          Kevin R. added a comment -

          oleg_nenashev I don't think so. It's remoting version 3.14 and I believe that the container is downloading the slave.jar directly from the master

          Kevin R. added a comment - oleg_nenashev I don't think so. It's remoting version 3.14 and I believe that the container is downloading the slave.jar directly from the master

          Oleg Nenashev added a comment -

          There are similar reports like https://groups.google.com/forum/#!topic/jenkinsci-users/JJ4GlS0UCQY in old Jenkins versions.
          Not sure why it happens in this case though.

          Have you updated anything excepting SSH Slaves?

          Oleg Nenashev added a comment - There are similar reports like https://groups.google.com/forum/#!topic/jenkinsci-users/JJ4GlS0UCQY in old Jenkins versions. Not sure why it happens in this case though. Have you updated anything excepting SSH Slaves?

          Kevin R. added a comment - - edited

          The slaves themselves have not changed at all. Do you think that it could be something with the network that's messing things up? Any tips on where I can start or how I can debug that? From the UI perspective, out of maybe 100 builds, the first 50 run normally after a reboot of the master but then quickly degrade and the mentioned symptoms start to ramp up so I wasn't really sure where to start. The infrastructure that all of this was running on has not changed in the past 2 months (no updates at all, only Jenkins was changed afaik).

          Basically, on our instance with Jenkins 2.89.3, I upgraded all our plugins to latest. I've just attached a list of our plugins (left SSH slaves @ 1.23 - problem started @ 1.25).

          Kevin R. added a comment - - edited The slaves themselves have not changed at all. Do you think that it could be something with the network that's messing things up? Any tips on where I can start or how I can debug that? From the UI perspective, out of maybe 100 builds, the first 50 run normally after a reboot of the master but then quickly degrade and the mentioned symptoms start to ramp up so I wasn't really sure where to start. The infrastructure that all of this was running on has not changed in the past 2 months (no updates at all, only Jenkins was changed afaik). Basically, on our instance with Jenkins 2.89.3, I upgraded all our plugins to latest. I've just attached a list of our plugins (left SSH slaves @ 1.23 - problem started @ 1.25).

          Oleg Nenashev added a comment -

          Bulk issue update: The plugin connectivity is still unstable from what I see in this and other reports. Probably the recent patches in 1.24-1.25 caused some extra instability by getting rid of interlocks between agent connection and termination logic. Apparently it impacts some reconnection scenarios due to the race conditions.

          Unfortunately I do not have capacity to work on the plugin in medium-term. So for now I am unassigning issues from myself. ifernandezcalvo was very kind to take ownership of the plugin and to handle some workload in it. Probably he will have some capacity to review the backlog I was unable to triage.

          Oleg Nenashev added a comment - Bulk issue update: The plugin connectivity is still unstable from what I see in this and other reports. Probably the recent patches in 1.24-1.25 caused some extra instability by getting rid of interlocks between agent connection and termination logic. Apparently it impacts some reconnection scenarios due to the race conditions. Unfortunately I do not have capacity to work on the plugin in medium-term. So for now I am unassigning issues from myself. ifernandezcalvo was very kind to take ownership of the plugin and to handle some workload in it. Probably he will have some capacity to review the backlog I was unable to triage.

          What JDK version you use on the Jenkins instance and on the Agents? It is recommended to use nearest, and in the same major version

          Ivan Fernandez Calvo added a comment - What JDK version you use on the Jenkins instance and on the Agents? It is recommended to use nearest, and in the same major version

          Ivan Fernandez Calvo added a comment - These changes could be related https://github.com/jenkinsci/ssh-slaves-plugin/commit/618b3b366753dd2c607f4e34ec1d62e3c9996580 https://github.com/jenkinsci/ssh-slaves-plugin/commit/5eea3a0f5a79cb1e1c155e04d1f54cd2252b5e38 Also this issue https://issues.jenkins-ci.org/browse/JENKINS-52739 , Did you see an abnormal number of threads on the Jenkins instance?

          Ivan Fernandez Calvo added a comment - Did you upgrade to 1.28.1 and disable credential tracking? see https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#threads-stuck-at-credentialsprovidertrackall

          recently we detected disconnections that are related to https://wiki.jenkins.io/display/JENKINS/Slave+To+Master+Access+Control setting here we do not have the agent logs but in case that they show a serialization warning you should try to disable this feature and report a bug on the plugin that contains the class that fails to serialize. see https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#selenium-grid-agents-failed-to-connect

          The warning would be something like this one with a different class

          Apr 03, 2019 9:46:01 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn WARNING: Attempt to (de-)serialize anonymous class hudson.plugins.selenium.configuration.DirectJsonInputConfiguration$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/ 
          
          

          Ivan Fernandez Calvo added a comment - recently we detected disconnections that are related to  https://wiki.jenkins.io/display/JENKINS/Slave+To+Master+Access+Control  setting here we do not have the agent logs but in case that they show a serialization warning you should try to disable this feature and report a bug on the plugin that contains the class that fails to serialize. see  https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#selenium-grid-agents-failed-to-connect The warning would be something like this one with a different class Apr 03, 2019 9:46:01 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn WARNING: Attempt to (de-)serialize anonymous class hudson.plugins.selenium.configuration.DirectJsonInputConfiguration$1; see: https: //jenkins.io/redirect/serialization-of-anonymous-classes/

          It is probably related to the way the timeout was managed, this would change in the last snapshot version, also I've found JENKINS-59764 on docker-plugin that could be also related

          Ivan Fernandez Calvo added a comment - It is probably related to the way the timeout was managed, this would change in the last snapshot version, also I've found JENKINS-59764 on docker-plugin that could be also related

            ifernandezcalvo Ivan Fernandez Calvo
            chr0n1x Kevin R.
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: