Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53810

Launch Agents fails with ERROR: null java.util.concurrent.CancellationException

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • ssh-slaves-1.30.0

      Launching node/agent fails with

      ERROR: null
      java.util.concurrent.CancellationException

      We have large number number of jobs in queue which gets assigned to slaves being created by Docker plugin. Even, if we try creating slave and try to launch agent, it fails.
      Note: Slave image adheres to all the requirement and works well if there is no huge queue.

      Executor Status
      SSHLauncher{host='9.47.78.144', port=32870, credentialsId='slave-test', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
      [09/27/18 02:53:32] [SSH] Opening SSH connection to 9.47.78.144:32870.
      [09/27/18 02:53:32] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
      [09/27/18 02:53:32] [SSH] Authentication successful.
      [09/27/18 02:53:32] [SSH] The remote user's environment is:
      BASH=/usr/bin/bash
      BASHOPTS=cmdhist:expand_aliases:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="1" [4]="release" [5]="s390x-ibm-linux-gnu")
      BASH_VERSION='4.2.46(1)-release'
      DIRSTACK=()
      EUID=1000
      GROUPS=()
      HOME=/home/test
      HOSTNAME=01695f4aae73
      HOSTTYPE=s390x
      IFS=$' \t\n'
      LESSOPEN='||/usr/bin/lesspipe.sh %s'
      LOGNAME=test
      MACHTYPE=s390x-ibm-linux-gnu
      MAIL=/var/mail/test
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/bin:/usr/bin
      PIPESTATUS=([0]="0")
      PPID=13
      PS4='+ '
      PWD=/home/test
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='9.42.27.56 44378 22'
      SSH_CONNECTION='9.42.27.56 44378 172.17.0.2 22'
      TERM=dumb
      UID=1000
      USER=test
      _=sudo
      [09/27/18 02:53:32] [SSH] Checking java version of /home/test/jdk/bin/java
      Couldn't figure out the Java version of /home/test/jdk/bin/java
      bash: /home/test/jdk/bin/java: No such file or directory
      
      [09/27/18 02:53:33] [SSH] Checking java version of java
      [09/27/18 02:53:34] [SSH] java -version returned 1.8.0_151.
      [09/27/18 02:53:34] [SSH] Starting sftp client.
      [09/27/18 02:53:34] [SSH] Copying latest remoting.jar...
      [09/27/18 02:53:36] [SSH] Copied 776,265 bytes.
      Expanded the channel window size to 4MB
      [09/27/18 02:53:36] [SSH] Starting agent process: cd "/home/test" && java  -jar remoting.jar -workDir /home/test
      Sep 27, 2018 6:54:09 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /home/test/remoting as a remoting work directory
      Both error and output logs will be printed to /home/test/remoting
      ERROR: null
      java.util.concurrent.CancellationException
      	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
      	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:904)
      	at hudson.slaves.DelegatingComputerLauncher.launch(DelegatingComputerLauncher.java:64)
      	at io.jenkins.docker.connector.DockerComputerConnector$1.launch(DockerComputerConnector.java:117)
      	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      [09/27/18 02:57:02] Launch failed - cleaning up connection
      Slave JVM has not reported exit code. Is it still running?
      [09/27/18 02:57:02] [SSH] Connection closed.
      
      
      

          [JENKINS-53810] Launch Agents fails with ERROR: null java.util.concurrent.CancellationException

          Durgadas Kamath created issue -
          Oleg Nenashev made changes -
          Component/s New: ssh-slaves-plugin [ 15578 ]

          Oleg Nenashev added a comment -

          Also CC ifernandezcalvo. It looks rather like an SSH Slaves plugin issue

          Oleg Nenashev added a comment - Also CC ifernandezcalvo . It looks rather like an SSH Slaves plugin issue

          Ivan Fernandez Calvo added a comment - 90% percent sure that it is related to https://issues.jenkins-ci.org/browse/JENKINS-49235 , there is a workaround https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#threads-stuck-at-credentialsprovidertrackall

          ifernandezcalvo I tried the above workaround but that didn't solve the problem.

          Durgadas Kamath added a comment - ifernandezcalvo I tried the above workaround but that didn't solve the problem.

          So you set the property `-Dhudson.plugins.sshslaves.SSHLauncher.trackCredentials=false` on you Jenkins instance JVM parameters and the issue persist, In that case I need this info to try to understand/replicate whatever happen. https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#common-info-needed-to-troubleshooting-a-bug, what I saw in the log it is that the agent try to connect and after 4 min it is killed (pingThread probably) but it seems never end the connection. You said that this happens when you have when you have a huge queue, probably we'll need a thread dump of the instance when the issue happens to see what threads are blocked.

          https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump

          Ivan Fernandez Calvo added a comment - So you set the property `-Dhudson.plugins.sshslaves.SSHLauncher.trackCredentials=false` on you Jenkins instance JVM parameters and the issue persist, In that case I need this info to try to understand/replicate whatever happen. https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#common-info-needed-to-troubleshooting-a-bug , what I saw in the log it is that the agent try to connect and after 4 min it is killed (pingThread probably) but it seems never end the connection. You said that this happens when you have when you have a huge queue, probably we'll need a thread dump of the instance when the issue happens to see what threads are blocked. https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump
          yong wu made changes -
          Attachment New: image-2018-10-09-19-18-08-873.png [ 44733 ]

          yong wu added a comment -

           

          I also ran into similar problem while adding an node – SLES12.3 . Java ver on slave was up to 1.8 , not sure if this is related to ssh slave plugin or not...

          yong wu added a comment -   I also ran into similar problem while adding an node – SLES12.3 . Java ver on slave was up to 1.8 , not sure if this is related to ssh slave plugin or not...
          Jeff Thompson made changes -
          Assignee Original: Jeff Thompson [ jthompson ] New: Ivan Fernandez Calvo [ ifernandezcalvo ]

          xman_pires Could you grab a thread dump meanwhile the agent is stuck trying to start? I see in the log that you have about a minute to get it

          https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump

          Ivan Fernandez Calvo added a comment - xman_pires Could you grab a thread dump meanwhile the agent is stuck trying to start? I see in the log that you have about a minute to get it https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump

            ifernandezcalvo Ivan Fernandez Calvo
            durgadas Durgadas Kamath
            Votes:
            4 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: