Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52739

Threads leaked when machine unavailable

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Resolved (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Component/s: ssh-slaves-plugin
    • Labels:
      None
    • Environment:
      ssh-slaves 1.25.1
    • Similar Issues:
    • Released As:
      ssh-slaves-1.27

      Description

      The number of dangling threads created by the plugin can be high enough (3K) to bring the instance down. The cloud provider was creating machines to be launched by SSHLauncher, with no timeout configured and broken ssh. The workaround is of course to configure some timeout or fix the machine(s) being connected to but the plugin should handle the situation even when that does not happen.

      1048 connecting threads:

      "SSHLauncher.launch for 'slave1.acme.com' node [#1]" #6545 prio=5 os_prio=0 tid=0x00007fda9c11c9e0 nid=0x325e in Object.wait() [0x00007fd97076b000]
         java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	at java.lang.Object.wait(Object.java:502)
      	at com.trilead.ssh2.transport.KexManager.getOrWaitForConnectionInfo(KexManager.java:99)
      	- locked <0x000000070f2ba8a8> (a java.lang.Object)
      	at com.trilead.ssh2.transport.TransportManager.getConnectionInfo(TransportManager.java:237)
      	at com.trilead.ssh2.Connection.connect(Connection.java:786)
      	- locked <0x000000070f2b5f30> (a com.trilead.ssh2.Connection)
      	at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1313)
      	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:821)
      	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:810)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      
         Locked ownable synchronizers:
      	- <0x000000070f2adc40> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      

      1048 launching threads:

      "Computer.threadPoolForRemoting [#655]" #6087 daemon prio=5 tid=140574416092208 nid=12172
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for <0x70ff8e538> (a java.util.concurrent.FutureTask)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:191)
      	at java.util.concurrent.AbstractExecutorService.invokeAll(AbstractExecutorService.java:244)
      	at java.util.concurrent.Executors$DelegatedExecutorService.invokeAll(Executors.java:688)
      	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:868)
      	- locked <0x70ff8e220> (a hudson.plugins.sshslaves.SSHLauncher)
      	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:285)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      
         Locked ownable synchronizers:
      	- <0x70eb61160> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      

      1103 teardown threads:

      "Computer.threadPoolForRemoting [#678]" #7651 daemon prio=5 tid=140572870923088 nid=14204
         java.lang.Thread.State: BLOCKED (on object monitor)
      	at hudson.plugins.sshslaves.SSHLauncher.tearDownConnection(SSHLauncher.java:1396)
      	- waiting to lock <0x70ff8e220> (a hudson.plugins.sshslaves.SSHLauncher)
      	at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1392)
      	at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:656)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      
         Locked ownable synchronizers:
      	- <0x70ff49748> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      

        Attachments

          Issue Links

            Activity

            Hide
            ifernandezcalvo Ivan Fernandez Calvo added a comment -

            another good reason to set defaults to timeout in the plugin, I was thinking about this a few days ago, I will prioritize it.

            Show
            ifernandezcalvo Ivan Fernandez Calvo added a comment - another good reason to set defaults to timeout in the plugin, I was thinking about this a few days ago, I will prioritize it.
            Hide
            hariprashadh Hariprashadh Sellamuthu added a comment -

            Hi Ivan Fernandez Calvo

            I am also facing this issue. The thread count increases to 4K and the slaves goes offline due to OOM. We desperately need this fix.

            I see the code change is already merged with master. May I know when this fix will be released?

            Show
            hariprashadh Hariprashadh Sellamuthu added a comment - Hi Ivan Fernandez Calvo ,  I am also facing this issue. The thread count increases to 4K and the slaves goes offline due to OOM. We desperately need this fix. I see the code change is already merged with master. May I know when this fix will be released?
            Hide
            ifernandezcalvo Ivan Fernandez Calvo added a comment -

            I have plans to release it on Sunday, Aug 26th.

            Show
            ifernandezcalvo Ivan Fernandez Calvo added a comment - I have plans to release it on Sunday, Aug 26th.

              People

              Assignee:
              ifernandezcalvo Ivan Fernandez Calvo
              Reporter:
              olivergondza Oliver Gondža
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: