-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Jenkins 2.332.3, OpenJDK 11.0.15, running on Ubuntu 20.04
SSH Slaves Plugin 1.814.vc82988f54b_10 (tested with 1.33.0 as well)
Anka Build Plugin 2.7.0
-
-
1.821.vd834f8a_c390e
The error observed is agents simply hanging while starting. This happens about 5% of the VMs started in this manner.
Anka Build plugin is used and the VM which is spun by it is 100% functional.
Investigating the tread dump shows a deadlock between launch and
teardownConncetion methods in SSHLauncher.
I have attached stack trace of both threads as files.
The launch method seems to be hanging while executing this:
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(java.base@11.0.15/Native Method)
- waiting on <no object reference available>
at hudson.remoting.Request.call(Request.java:177) - waiting to re-lock in wait() <0x00000005f9721350> (a hudson.remoting.UserRequest)
at hudson.remoting.Channel.call(Channel.java:999)
at hudson.FilePath.act(FilePath.java:1194)
at hudson.FilePath.act(FilePath.java:1183)
at hudson.FilePath.exists(FilePath.java:1748)
at jenkins.branch.WorkspaceLocatorImpl.load(WorkspaceLocatorImpl.java:254)
at jenkins.branch.WorkspaceLocatorImpl.access$500(WorkspaceLocatorImpl.java:86)
at jenkins.branch.WorkspaceLocatorImpl$Collector.onOnline(WorkspaceLocatorImpl.java:601) - locked <0x00000005f97214e0> (a java.lang.String)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:727)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:437)
at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:645)
at hudson.plugins.sshslaves.SSHLauncher.lambda$launch$0(SSHLauncher.java:458)
at hudson.plugins.sshslaves.SSHLauncher$$Lambda$393/0x0000000840c2c040.call(Unknown Source)
at java.util.concurrent.FutureTask.run(java.base@11.0.15/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.15/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.15/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.15/Thread.java:829)
[JENKINS-68656] SSH Slaves Plugin Deadlock while spinning up a new agent
Component/s | New: anka-build-plugin [ 23042 ] |
Environment |
Original:
Jenkins 2.332.3
SSH Slaves Plugin 1.814.vc82988f54b_10 (tested with 1.33.0 as well) Anka Build Plugin 2.7.0 |
New:
Jenkins 2.332.3, OpenJDK 11.0.15, running on Ubuntu 20.04
SSH Slaves Plugin 1.814.vc82988f54b_10 (tested with 1.33.0 as well) Anka Build Plugin 2.7.0 |
Does it happen with SSH Agents not launched with the Anka plugin?
Do you have the logs of one of those agents to see at which stage of the connection is falling?