SSH Slaves 1.23 can create lots of threads waiting for SSHLauncher lock in tearDownConnection

This issue is archived. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      Fix of JENKINS-19465 seems to be incomplete in some cases (e.g. when there is a lock conflict with Trilead SSH). We need a better fix, which would prevent it at all.

      Proposals:

      • tearDown hooks are being offloaded to a separate executor pool with merging of similar requests
      • Ideal: All agent listeners are offloaded to a separate hook. Likely it cannot work in such way due to the listener implementations

      Lock example I see:

      "SSHLauncher.launch for 'myagent' node [#1]" #2565 prio=5 os_prio=0 tid=0x00007f080c1b1000 nid=0x35c runnable [0x00007f07b2c5c000]
         java.lang.Thread.State: RUNNABLE
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
          at java.net.SocketInputStream.read(SocketInputStream.java:171)
          at java.net.SocketInputStream.read(SocketInputStream.java:141)
          at java.net.SocketInputStream.read(SocketInputStream.java:224)
          at com.trilead.ssh2.transport.ClientServerHello.readLineRN(ClientServerHello.java:31)
          at com.trilead.ssh2.transport.ClientServerHello.<init>(ClientServerHello.java:68)
          at com.trilead.ssh2.transport.TransportManager.initialize(TransportManager.java:487)
          at com.trilead.ssh2.Connection.connect(Connection.java:774)
          - locked <0x0000000594003de0> (a com.trilead.ssh2.Connection)
          at com.trilead.ssh2.Connection.connect(Connection.java:703)
          - locked <0x0000000594003de0> (a com.trilead.ssh2.Connection)
          at com.trilead.ssh2.Connection.connect(Connection.java:617)
          - locked <0x0000000594003de0> (a com.trilead.ssh2.Connection)
          at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1302)
          at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:814)
          at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:803)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:748)
      
      ...
      
      Hundreds of threads:
      
      "Computer.threadPoolForRemoting [#104]" #1768 daemon prio=5 os_prio=0 tid=0x00007f07e02db800 nid=0x7d46 waiting for monitor entry [0x00007f07c24f5000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at com.trilead.ssh2.Connection.close(Connection.java:573)
      - waiting to lock <0x0000000594003de0> (a com.trilead.ssh2.Connection)
      at hudson.plugins.sshslaves.SSHLauncher.cleanupConnection(SSHLauncher.java:897)
      at hudson.plugins.sshslaves.SSHLauncher.tearDownConnection(SSHLauncher.java:1445)
      - locked <0x0000000608aa1468> (a hudson.plugins.sshslaves.SSHLauncher)
      at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1371)
      at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:633)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:748)
      
      Locked ownable synchronizers:
      - <0x000000058bc977c8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      
      .....
      
      "Computer.threadPoolForRemoting [#98]" #1714 daemon prio=5 os_prio=0 tid=0x00007f08002df800 nid=0x7ce4 waiting for monitor entry [0x00007f07c0546000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:799)
      - waiting to lock <0x0000000608aa1468> (a hudson.plugins.sshslaves.SSHLauncher)
      at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:262)
      at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:748)
      
      

            Assignee:
            Oleg Nenashev
            Reporter:
            Oleg Nenashev
            Archiver:
            Jenkins Service Account

              Created:
              Updated:
              Resolved:
              Archived: