-
Bug
-
Resolution: Fixed
-
Major
-
None
Fix of JENKINS-19465 seems to be incomplete in some cases (e.g. when there is a lock conflict with Trilead SSH). We need a better fix, which would prevent it at all.
Proposals:
- tearDown hooks are being offloaded to a separate executor pool with merging of similar requests
- Ideal: All agent listeners are offloaded to a separate hook. Likely it cannot work in such way due to the listener implementations
Lock example I see:
"SSHLauncher.launch for 'myagent' node [#1]" #2565 prio=5 os_prio=0 tid=0x00007f080c1b1000 nid=0x35c runnable [0x00007f07b2c5c000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.net.SocketInputStream.read(SocketInputStream.java:224) at com.trilead.ssh2.transport.ClientServerHello.readLineRN(ClientServerHello.java:31) at com.trilead.ssh2.transport.ClientServerHello.<init>(ClientServerHello.java:68) at com.trilead.ssh2.transport.TransportManager.initialize(TransportManager.java:487) at com.trilead.ssh2.Connection.connect(Connection.java:774) - locked <0x0000000594003de0> (a com.trilead.ssh2.Connection) at com.trilead.ssh2.Connection.connect(Connection.java:703) - locked <0x0000000594003de0> (a com.trilead.ssh2.Connection) at com.trilead.ssh2.Connection.connect(Connection.java:617) - locked <0x0000000594003de0> (a com.trilead.ssh2.Connection) at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1302) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:814) at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:803) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) ... Hundreds of threads: "Computer.threadPoolForRemoting [#104]" #1768 daemon prio=5 os_prio=0 tid=0x00007f07e02db800 nid=0x7d46 waiting for monitor entry [0x00007f07c24f5000] java.lang.Thread.State: BLOCKED (on object monitor) at com.trilead.ssh2.Connection.close(Connection.java:573) - waiting to lock <0x0000000594003de0> (a com.trilead.ssh2.Connection) at hudson.plugins.sshslaves.SSHLauncher.cleanupConnection(SSHLauncher.java:897) at hudson.plugins.sshslaves.SSHLauncher.tearDownConnection(SSHLauncher.java:1445) - locked <0x0000000608aa1468> (a hudson.plugins.sshslaves.SSHLauncher) at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1371) at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:633) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - <0x000000058bc977c8> (a java.util.concurrent.ThreadPoolExecutor$Worker) ..... "Computer.threadPoolForRemoting [#98]" #1714 daemon prio=5 os_prio=0 tid=0x00007f08002df800 nid=0x7ce4 waiting for monitor entry [0x00007f07c0546000] java.lang.Thread.State: BLOCKED (on object monitor) at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:799) - waiting to lock <0x0000000608aa1468> (a hudson.plugins.sshslaves.SSHLauncher) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:262) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
- is related to
-
JENKINS-19465 Slave hangs while being launched
- Resolved
-
JENKINS-48611 SSH threads become blocked when trying to close the connection
- Resolved