Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48613

SSH Slaves 1.23 can create lots of threads waiting for SSHLauncher lock in tearDownConnection

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • ssh-slaves-plugin
    • None

    Description

      Fix of JENKINS-19465 seems to be incomplete in some cases (e.g. when there is a lock conflict with Trilead SSH). We need a better fix, which would prevent it at all.

      Proposals:

      • tearDown hooks are being offloaded to a separate executor pool with merging of similar requests
      • Ideal: All agent listeners are offloaded to a separate hook. Likely it cannot work in such way due to the listener implementations

      Lock example I see:

      "SSHLauncher.launch for 'myagent' node [#1]" #2565 prio=5 os_prio=0 tid=0x00007f080c1b1000 nid=0x35c runnable [0x00007f07b2c5c000]
         java.lang.Thread.State: RUNNABLE
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
          at java.net.SocketInputStream.read(SocketInputStream.java:171)
          at java.net.SocketInputStream.read(SocketInputStream.java:141)
          at java.net.SocketInputStream.read(SocketInputStream.java:224)
          at com.trilead.ssh2.transport.ClientServerHello.readLineRN(ClientServerHello.java:31)
          at com.trilead.ssh2.transport.ClientServerHello.<init>(ClientServerHello.java:68)
          at com.trilead.ssh2.transport.TransportManager.initialize(TransportManager.java:487)
          at com.trilead.ssh2.Connection.connect(Connection.java:774)
          - locked <0x0000000594003de0> (a com.trilead.ssh2.Connection)
          at com.trilead.ssh2.Connection.connect(Connection.java:703)
          - locked <0x0000000594003de0> (a com.trilead.ssh2.Connection)
          at com.trilead.ssh2.Connection.connect(Connection.java:617)
          - locked <0x0000000594003de0> (a com.trilead.ssh2.Connection)
          at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1302)
          at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:814)
          at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:803)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:748)
      
      ...
      
      Hundreds of threads:
      
      "Computer.threadPoolForRemoting [#104]" #1768 daemon prio=5 os_prio=0 tid=0x00007f07e02db800 nid=0x7d46 waiting for monitor entry [0x00007f07c24f5000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at com.trilead.ssh2.Connection.close(Connection.java:573)
      - waiting to lock <0x0000000594003de0> (a com.trilead.ssh2.Connection)
      at hudson.plugins.sshslaves.SSHLauncher.cleanupConnection(SSHLauncher.java:897)
      at hudson.plugins.sshslaves.SSHLauncher.tearDownConnection(SSHLauncher.java:1445)
      - locked <0x0000000608aa1468> (a hudson.plugins.sshslaves.SSHLauncher)
      at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1371)
      at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:633)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:748)
      
      Locked ownable synchronizers:
      - <0x000000058bc977c8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      
      .....
      
      "Computer.threadPoolForRemoting [#98]" #1714 daemon prio=5 os_prio=0 tid=0x00007f08002df800 nid=0x7ce4 waiting for monitor entry [0x00007f07c0546000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:799)
      - waiting to lock <0x0000000608aa1468> (a hudson.plugins.sshslaves.SSHLauncher)
      at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:262)
      at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:748)
      
      

      Attachments

        Issue Links

          Activity

            People

              oleg_nenashev Oleg Nenashev
              oleg_nenashev Oleg Nenashev
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: