Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50984

SSHLauncher/Fingerprint Thread Locking Stopping Dynamic Slave Launch

      Curious if someone can help me unpack this.  We recently upgraded Jenkins.  We use the Docker-plugin to dynamically provision slaves and we're now running into a situation where the slaves do not properly finish provisioning (The SSH connection is never established). When taking a thread dump there is a very large number of Blocked threads on the SSHLauncher teardown and Fingerprinting for some reason, here's the dumps:

      "Computer.threadPoolForRemoting [#1020]" daemon prio=5 BLOCKED
      	hudson.plugins.sshslaves.SSHLauncher.tearDownConnection(SSHLauncher.java:1407)
      	hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1403)
      	com.nirima.jenkins.plugins.docker.launcher.DockerComputerLauncher.afterDisconnect(DockerComputerLauncher.java:71)
      	hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:665)
      	jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	java.lang.Thread.run(Thread.java:748)
      
      "Computer.threadPoolForRemoting [#1019]" daemon prio=5 BLOCKED
      	hudson.model.Fingerprint.save(Fingerprint.java:1238)
      	hudson.BulkChange.commit(BulkChange.java:98)
      	com.cloudbees.plugins.credentials.CredentialsProvider.trackAll(CredentialsProvider.java:1533)
      	com.cloudbees.plugins.credentials.CredentialsProvider.track(CredentialsProvider.java:1478)
      	hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:866)
      	com.nirima.jenkins.plugins.docker.launcher.DockerComputerLauncher.launch(DockerComputerLauncher.java:66)
      	hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:288)
      	jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	java.lang.Thread.run(Thread.java:748)
      

      The thread dump has many of these (when things get bad it gets to 100's).  We're currently planning a Docker-plugin upgrade and moving away from the SSH Launcher but I'm looking for ideas as to why this may be happening.

          [JENKINS-50984] SSHLauncher/Fingerprint Thread Locking Stopping Dynamic Slave Launch

          I believe this is related to this issue: https://issues.jenkins-ci.org/browse/JENKINS-49235

          The SSH Slaves plugin started to store fingerprints and that's causing a bit of a race condition I think when the containers shutdown/startup, not entirely sure. It might just be that the Docker Plugin I'm using (0.15) doesn't handle this shutdown/spin up well and the newer ones do.

          (The newer ones have very different expectations of SSH slaves which require us to completely refactor our container environments, so we'd actually move off SSH connectors).

          I'm curious if I just revert the SSH Slaves plugin if this problem goes away.

          Maxfield Stewart added a comment - I believe this is related to this issue: https://issues.jenkins-ci.org/browse/JENKINS-49235 The SSH Slaves plugin started to store fingerprints and that's causing a bit of a race condition I think when the containers shutdown/startup, not entirely sure. It might just be that the Docker Plugin I'm using (0.15) doesn't handle this shutdown/spin up well and the newer ones do. (The newer ones have very different expectations of SSH slaves which require us to completely refactor our container environments, so we'd actually move off SSH connectors). I'm curious if I just revert the SSH Slaves plugin if this problem goes away.

          We're testing a revert to SSH Slaves plugin v1.20 today to see if this problem goes away. I'll post back here on how well it works, though the problem is somewhat intermittant and it may take a few days to confirm.

          I'm currently thinking theres a race condition between the slave being shut down and the workspace being removed and the SSH Slaves plugin desire to create a fingerprint of the credential, either the workspace or the remoting channel are closed before the fingerprint.save function can finish and it ends up blocked waiting for a resource that is now gone.

          Maxfield Stewart added a comment - We're testing a revert to SSH Slaves plugin v1.20 today to see if this problem goes away. I'll post back here on how well it works, though the problem is somewhat intermittant and it may take a few days to confirm. I'm currently thinking theres a race condition between the slave being shut down and the workspace being removed and the SSH Slaves plugin desire to create a fingerprint of the credential, either the workspace or the remoting channel are closed before the fingerprint.save function can finish and it ends up blocked waiting for a resource that is now gone.

          After running for a day rolled back to SSH Slaves plugin v1.20 I can confirm that things were /much/ more stable today, we saw ZERO blocked threads today. All nodes provisioned and connected correctly.  So I'm pretty confident the v1.21 plugin and the corresponding credentials fingerprinting changes caused the issue.  

          Maxfield Stewart added a comment - After running for a day rolled back to SSH Slaves plugin v1.20 I can confirm that things were /much/ more stable today, we saw ZERO blocked threads today. All nodes provisioned and connected correctly.  So I'm pretty confident the v1.21 plugin and the corresponding credentials fingerprinting changes caused the issue.  

          there is a PR to allow disable the credentials tracking https://github.com/jenkinsci/ssh-slaves-plugin/pull/94

          Ivan Fernandez Calvo added a comment - there is a PR to allow disable the credentials tracking https://github.com/jenkinsci/ssh-slaves-plugin/pull/94

            ifernandezcalvo Ivan Fernandez Calvo
            maxfields2000 Maxfield Stewart
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: