Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48613

SSH Slaves 1.23 can create lots of threads waiting for SSHLauncher lock in tearDownConnection

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • ssh-slaves-plugin
    • None

      Fix of JENKINS-19465 seems to be incomplete in some cases (e.g. when there is a lock conflict with Trilead SSH). We need a better fix, which would prevent it at all.

      Proposals:

      • tearDown hooks are being offloaded to a separate executor pool with merging of similar requests
      • Ideal: All agent listeners are offloaded to a separate hook. Likely it cannot work in such way due to the listener implementations

      Lock example I see:

      "SSHLauncher.launch for 'myagent' node [#1]" #2565 prio=5 os_prio=0 tid=0x00007f080c1b1000 nid=0x35c runnable [0x00007f07b2c5c000]
         java.lang.Thread.State: RUNNABLE
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
          at java.net.SocketInputStream.read(SocketInputStream.java:171)
          at java.net.SocketInputStream.read(SocketInputStream.java:141)
          at java.net.SocketInputStream.read(SocketInputStream.java:224)
          at com.trilead.ssh2.transport.ClientServerHello.readLineRN(ClientServerHello.java:31)
          at com.trilead.ssh2.transport.ClientServerHello.<init>(ClientServerHello.java:68)
          at com.trilead.ssh2.transport.TransportManager.initialize(TransportManager.java:487)
          at com.trilead.ssh2.Connection.connect(Connection.java:774)
          - locked <0x0000000594003de0> (a com.trilead.ssh2.Connection)
          at com.trilead.ssh2.Connection.connect(Connection.java:703)
          - locked <0x0000000594003de0> (a com.trilead.ssh2.Connection)
          at com.trilead.ssh2.Connection.connect(Connection.java:617)
          - locked <0x0000000594003de0> (a com.trilead.ssh2.Connection)
          at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1302)
          at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:814)
          at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:803)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:748)
      
      ...
      
      Hundreds of threads:
      
      "Computer.threadPoolForRemoting [#104]" #1768 daemon prio=5 os_prio=0 tid=0x00007f07e02db800 nid=0x7d46 waiting for monitor entry [0x00007f07c24f5000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at com.trilead.ssh2.Connection.close(Connection.java:573)
      - waiting to lock <0x0000000594003de0> (a com.trilead.ssh2.Connection)
      at hudson.plugins.sshslaves.SSHLauncher.cleanupConnection(SSHLauncher.java:897)
      at hudson.plugins.sshslaves.SSHLauncher.tearDownConnection(SSHLauncher.java:1445)
      - locked <0x0000000608aa1468> (a hudson.plugins.sshslaves.SSHLauncher)
      at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1371)
      at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:633)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:748)
      
      Locked ownable synchronizers:
      - <0x000000058bc977c8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      
      .....
      
      "Computer.threadPoolForRemoting [#98]" #1714 daemon prio=5 os_prio=0 tid=0x00007f08002df800 nid=0x7ce4 waiting for monitor entry [0x00007f07c0546000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:799)
      - waiting to lock <0x0000000608aa1468> (a hudson.plugins.sshslaves.SSHLauncher)
      at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:262)
      at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:748)
      
      

          [JENKINS-48613] SSH Slaves 1.23 can create lots of threads waiting for SSHLauncher lock in tearDownConnection

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          src/main/java/hudson/plugins/sshslaves/SSHLauncher.java
          http://jenkins-ci.org/commit/ssh-slaves-plugin/5eea3a0f5a79cb1e1c155e04d1f54cd2252b5e38
          Log:
          JENKINS-48613 - Hotfix: Prevent piling up of tearDownConnection() calls if they take long (#78)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: src/main/java/hudson/plugins/sshslaves/SSHLauncher.java http://jenkins-ci.org/commit/ssh-slaves-plugin/5eea3a0f5a79cb1e1c155e04d1f54cd2252b5e38 Log: JENKINS-48613 - Hotfix: Prevent piling up of tearDownConnection() calls if they take long (#78)

          Oleg Nenashev added a comment -

          It has been released in 1.24

          Oleg Nenashev added a comment - It has been released in 1.24

          Hi,

          We are running SSH slaves plugin version 1.24. Noticed several BLOCKED threads related to SSHLauncher.tearDownConnection.

          Please find the thread dumps.

          "Computer.threadPoolForRemoting 71" #530 daemon prio=5 os_prio=0 tid=0x00007f96f442c000 nid=0x3669 waiting on condition [0x00007f96befdf000]
          java.lang.Thread.State: WAITING (parking)
          at sun.misc.Unsafe.park(Native Method)

          • parking to wait for <0x0000000655180750> (a java.util.concurrent.FutureTask)
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
            at java.util.concurrent.FutureTask.get(FutureTask.java:191)
            at java.util.concurrent.AbstractExecutorService.invokeAll(AbstractExecutorService.java:244)
            at java.util.concurrent.Executors$DelegatedExecutorService.invokeAll(Executors.java:688)
            at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:868)
          • locked <0x0000000655180810> (a hudson.plugins.sshslaves.SSHLauncher)
            at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:285)
            at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)

          "Computer.threadPoolForRemoting 6147" #44628 daemon prio=5 os_prio=0 tid=0x00007f96e8980800 nid=0xf4da waiting for monitor entry [0x00007f96bc385000]
          java.lang.Thread.State: BLOCKED (on object monitor)
          at hudson.plugins.sshslaves.SSHLauncher.tearDownConnection(SSHLauncher.java:1392)

          • waiting to lock <0x0000000655180810> (a hudson.plugins.sshslaves.SSHLauncher)
            at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1388)
            at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:656)
            at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

          Shanmugasundaram Gopalsamy added a comment - Hi, We are running SSH slaves plugin version 1.24. Noticed several BLOCKED threads related to SSHLauncher.tearDownConnection. Please find the thread dumps. "Computer.threadPoolForRemoting 71 " #530 daemon prio=5 os_prio=0 tid=0x00007f96f442c000 nid=0x3669 waiting on condition [0x00007f96befdf000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) parking to wait for <0x0000000655180750> (a java.util.concurrent.FutureTask) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at java.util.concurrent.FutureTask.get(FutureTask.java:191) at java.util.concurrent.AbstractExecutorService.invokeAll(AbstractExecutorService.java:244) at java.util.concurrent.Executors$DelegatedExecutorService.invokeAll(Executors.java:688) at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:868) locked <0x0000000655180810> (a hudson.plugins.sshslaves.SSHLauncher) at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:285) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "Computer.threadPoolForRemoting 6147 " #44628 daemon prio=5 os_prio=0 tid=0x00007f96e8980800 nid=0xf4da waiting for monitor entry [0x00007f96bc385000] java.lang.Thread.State: BLOCKED (on object monitor) at hudson.plugins.sshslaves.SSHLauncher.tearDownConnection(SSHLauncher.java:1392) waiting to lock <0x0000000655180810> (a hudson.plugins.sshslaves.SSHLauncher) at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1388) at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:656) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

            oleg_nenashev Oleg Nenashev
            oleg_nenashev Oleg Nenashev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: