While provisioning slaves from a private Kubernetes instance, we've found that a lot of slaves terminate with the following (or similar) stack trace on the slave's side:
INFO: Setting up slave: kube1-medium-r9zf4
Apr 10, 2018 11:02:05 AM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Apr 10, 2018 11:02:05 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/<user>/workDir/remoting as a remoting work directory
Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server ...
Apr 10, 2018 11:02:06 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, CLI2-connect, JNLP-connect, Ping, CLI-connect, JNLP2-connect]
Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful <...>
pr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to <Jenkins Master>
Apr 10, 2018 11:02:06 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Trying protocol: JNLP4-connect
Apr 10, 2018 11:02:07 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Remote identity confirmed: <...>
Apr 10, 2018 11:02:07 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Apr 10, 2018 11:02:14 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Apr 10, 2018 11:02:14 AM hudson.remoting.UserRequest perform
WARNING: LinkageError while performing UserRequest:jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2@3e708317
java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller$2$1
at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call(JnlpSlaveRestarterInstaller.java:71)
at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call(JnlpSlaveRestarterInstaller.java:53)
at hudson.remoting.UserRequest.perform(UserRequest.java:207)
at hudson.remoting.UserRequest.perform(UserRequest.java:53)
at hudson.remoting.Request$2.run(Request.java:358)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at hudson.remoting.Engine$1$1.run(Engine.java:98)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1
at java.net.URLClassLoader.findClass(Unknown Source)
at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:159)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 11 more
The class that appears to not have been found isn't consistently the same. I've seen `FilePathFilter`, `LaunchConfiguration`, `StringBuilderWriter`, and some others being reported as well. Sometimes, there's also exceptions related to `JarCacheSupport` not being able to resolve jars (I don't have the exact stacktrace at hand - will post it if I find it again).
On the master's side, these exceptions generally manifest as `ChannelClosedException`s, or weird Exception-less failures in pipeline branches.
ERROR: Issue with creating launcher for agent kube1-medium-r9zf4. The agent has not been fully initialized yet
ERROR: Issue with creating launcher for agent kube1-medium-r9zf4. The agent has not been fully initialized yet
remote file operation failed: /home/<user>/workspace/<job_name> at hudson.remoting.Channel@6639429c:JNLP4-connect connection from <some-host-name>/<ip-address>:60326: hudson.remoting.ChannelClosedException: Remote call on JNLP4-connect connection from <some-host-name>/<ip-address>:60326 failed. The channel is closing down or has closed down
I haven't been able to consistently reproduce the error, but it does manifest enough to be causing major pain to users (especially since we extensively use pipelines with a large number of parallel nodes, and a failure in any one of the nodes causes the entire pipeline to fail).