Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54897

Unexpected error in launching a agent after restart

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • ssh-slaves-plugin
    • None
    • Jenkins ver. 2.138.2
      ssh-slaves-plugin ver: 1.29.1
      Linux RHEL 7.5
    • ssh-slaves-1.30.2

      Launch agent has issues after restarting Jenkins. If you try to launch several times, it will eventually work.

       

      ssh rules: no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty

       

      Node config.xml

      <?xml version='1.1' encoding='UTF-8'?>
      <slave>
        <name>cac-slave-01</name>
        <description></description>
        <remoteFS>/app/var/lib/jenkins/slave_node</remoteFS>
        <numExecutors>5</numExecutors>
        <mode>NORMAL</mode>
        <retentionStrategy class="hudson.slaves.RetentionStrategy$Always"/>
        <launcher class="hudson.plugins.sshslaves.SSHLauncher" plugin="ssh-slaves@1.29.1">
          <host>jenkins-slave.example.com</host>
          <port>22</port>
          <credentialsId>jenkins-master-ssh-key</credentialsId>
          <launchTimeoutSeconds>210</launchTimeoutSeconds>
          <maxNumRetries>10</maxNumRetries>
          <retryWaitTime>15</retryWaitTime>
          <sshHostKeyVerificationStrategy class="hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy"/>
        </launcher>
        <label></label>
        <nodeProperties/>
      </slave>

       

      Error log 1:

      SSHLauncher{host='jenkins-slave.example.com', port=22, credentialsId='jenkins-master-ssh-key', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
      [11/27/18 16:46:51] [SSH] Opening SSH connection to jenkins-slave.example.com:22.
      ERROR: null
      java.util.concurrent.CancellationException
      	at java.util.concurrent.FutureTask.report(FutureTask.java:121)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
      	at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:902)
      	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      [11/27/18 16:50:21] Launch failed - cleaning up connection
      [11/27/18 16:50:21] SSH Launch of cac-slave-01 on jenkins-slave.example.com failed in 210,000 ms

      Error log 2:

      SSHLauncher{host='jenkins-slave.example.com', port=22, credentialsId='jenkins-master-ssh-key', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
      [11/27/18 16:18:08] [SSH] Opening SSH connection to jenkins-slave.example.com:22.
      [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed.
      [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed.
      [11/27/18 16:18:29] [SSH] Authentication successful.
      [11/27/18 16:18:29] [SSH] Authentication successful.
      [11/27/18 16:18:29] [SSH] The remote user's environment is:
      [11/27/18 16:18:29] [SSH] The remote user's environment is:
      BASH=/usr/bin/bash
      BASHOPTS=cmdhist:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="2" [4]="release" [5]="x86_64-redhat-linux-gnu")
      BASH_VERSION='4.2.46(2)-release'
      DIRSTACK=()
      EUID=1000
      GROUPS=()
      HOME=/var/lib/jenkins_slave
      HOSTNAME=jenkins-slave.example.com
      HOSTTYPE=x86_64
      ID=1000
      IFS=$' \t\n'
      LANG=en_US.UTF-8
      LESSOPEN='||/usr/bin/lesspipe.sh %s'
      LOGNAME=jenkins_slave_local
      MACHTYPE=x86_64-redhat-linux-gnu
      MAIL=/var/mail/jenkins_slave_local
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/bin:/usr/bin
      PIPESTATUS=([0]="0")
      PPID=15365
      PS4='+ '
      PWD=/var/lib/jenkins_slave
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='165.115.33.181 36584 22'
      SSH_CONNECTION='165.115.33.181 36584 165.115.33.182 22'
      TERM=dumb
      TMOUT=1800
      UID=1000
      USER=jenkins_slave_local
      XDG_RUNTIME_DIR=/run/user/1000
      XDG_SESSION_ID=3293
      _=/etc/bashrc
      [11/27/18 16:18:29] [SSH] Checking java version of /app/var/lib/jenkins/slave_node/jdk/bin/java
      BASH=/usr/bin/bash
      BASHOPTS=cmdhist:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="2" [4]="release" [5]="x86_64-redhat-linux-gnu")
      BASH_VERSION='4.2.46(2)-release'
      DIRSTACK=()
      EUID=1000
      GROUPS=()
      HOME=/var/lib/jenkins_slave
      HOSTNAME=jenkins-slave.example.com
      HOSTTYPE=x86_64
      ID=1000
      IFS=$' \t\n'
      LANG=en_US.UTF-8
      LESSOPEN='||/usr/bin/lesspipe.sh %s'
      LOGNAME=jenkins_slave_local
      MACHTYPE=x86_64-redhat-linux-gnu
      MAIL=/var/mail/jenkins_slave_local
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/bin:/usr/bin
      PIPESTATUS=([0]="0")
      PPID=15366
      PS4='+ '
      PWD=/var/lib/jenkins_slave
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='165.115.33.181 36878 22'
      SSH_CONNECTION='165.115.33.181 36878 165.115.33.182 22'
      TERM=dumb
      TMOUT=1800
      UID=1000
      USER=jenkins_slave_local
      XDG_RUNTIME_DIR=/run/user/1000
      XDG_SESSION_ID=3292
      _=/etc/bashrc
      [11/27/18 16:18:29] [SSH] Checking java version of /app/var/lib/jenkins/slave_node/jdk/bin/java
      Couldn't figure out the Java version of /app/var/lib/jenkins/slave_node/jdk/bin/java
      bash: /app/var/lib/jenkins/slave_node/jdk/bin/java: No such file or directory[11/27/18 16:18:29] [SSH] Checking java version of java
      Couldn't figure out the Java version of /app/var/lib/jenkins/slave_node/jdk/bin/java
      bash: /app/var/lib/jenkins/slave_node/jdk/bin/java: No such file or directory[11/27/18 16:18:29] [SSH] Checking java version of java
      [11/27/18 16:18:29] [SSH] java -version returned 1.8.0_181.
      [11/27/18 16:18:29] [SSH] Starting sftp client.
      [11/27/18 16:18:29] [SSH] java -version returned 1.8.0_181.
      [11/27/18 16:18:29] [SSH] Starting sftp client.
      [11/27/18 16:18:29] [SSH] Copying latest remoting.jar...
      [11/27/18 16:18:29] [SSH] Copying latest remoting.jar...
      [11/27/18 16:18:29] [SSH] Copied 776,265 bytes.
      Expanded the channel window size to 4MB
      [11/27/18 16:18:29] [SSH] Starting agent process: cd "/app/var/lib/jenkins/slave_node" && java  -jar remoting.jar -workDir /app/var/lib/jenkins/slave_node
      [11/27/18 16:18:29] [SSH] Copied 776,265 bytes.
      Expanded the channel window size to 4MB
      [11/27/18 16:18:29] [SSH] Starting agent process: cd "/app/var/lib/jenkins/slave_node" && java  -jar remoting.jar -workDir /app/var/lib/jenkins/slave_node
      Nov 27, 2018 4:18:30 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /app/var/lib/jenkins/slave_node/remoting as a remoting work directory
      Both error and output logs will be printed to /app/var/lib/jenkins/slave_node/remoting
      Nov 27, 2018 4:18:30 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /app/var/lib/jenkins/slave_node/remoting as a remoting work directory
      Both error and output logs will be printed to /app/var/lib/jenkins/slave_node/remoting
      <===[JENKINS REMOTING CAPACITY]===>channel started
      <===[JENKINS REMOTING CAPACITY]===>channel started
      Remoting version: 3.25
      Remoting version: 3.25
      This is a Unix agent
      This is a Unix agent
      Evacuated stdout
      Evacuated stdout
      ERROR: Unexpected error in launching a agent. This is probably a bug in Jenkins.
      java.lang.IllegalStateException: Already connected
      	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:678)
      	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:432)
      	at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:1032)
      	at hudson.plugins.sshslaves.SSHLauncher.access$500(SSHLauncher.java:128)
      	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:866)
      	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:831)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      [11/27/18 16:18:33] Launch failed - cleaning up connection
      [11/27/18 16:18:33] [SSH] Connection closed.
      Connection terminated
      channel stopped
      Connection terminated
      ERROR: Unexpected error in launching a agent. This is probably a bug in Jenkins.
      java.lang.NullPointerException
      	at org.jenkinsci.modules.systemd_slave_installer.SlaveInstallerFactoryImpl.createIfApplicable(SlaveInstallerFactoryImpl.java:33)
      	at org.jenkinsci.modules.slave_installer.SlaveInstallerFactory.createIfApplicable(SlaveInstallerFactory.java:29)
      	at org.jenkinsci.modules.slave_installer.SlaveInstallerFactory.createFor(SlaveInstallerFactory.java:46)
      	at org.jenkinsci.modules.slave_installer.impl.ComputerListenerImpl.onOnline(ComputerListenerImpl.java:30)
      	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:693)
      	at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:432)
      	at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:1032)
      	at hudson.plugins.sshslaves.SSHLauncher.access$500(SSHLauncher.java:128)
      	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:866)
      	at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:831)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      [11/27/18 16:18:33] Launch failed - cleaning up connection
      [11/27/18 16:18:33] [SSH] Connection closed.
      SSHLauncher{host='jenkins-slave.example.com', port=22, credentialsId='jenkins-master-ssh-key', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.KnownHostsFileKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
      [11/27/18 16:19:03] [SSH] Opening SSH connection to jenkins-slave.example.com:22.
      [11/27/18 16:19:03] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed.
      [11/27/18 16:19:03] [SSH] Authentication successful.
      [11/27/18 16:19:03] [SSH] The remote user's environment is:
      BASH=/usr/bin/bash
      BASHOPTS=cmdhist:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="2" [2]="46" [3]="2" [4]="release" [5]="x86_64-redhat-linux-gnu")
      BASH_VERSION='4.2.46(2)-release'
      DIRSTACK=()
      EUID=1000
      GROUPS=()
      HOME=/var/lib/jenkins_slave
      HOSTNAME=jenkins-slave.example.com
      HOSTTYPE=x86_64
      ID=1000
      IFS=$' \t\n'
      LANG=en_US.UTF-8
      LESSOPEN='||/usr/bin/lesspipe.sh %s'
      LOGNAME=jenkins_slave_local
      MACHTYPE=x86_64-redhat-linux-gnu
      MAIL=/var/mail/jenkins_slave_local
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/bin:/usr/bin
      PIPESTATUS=([0]="0")
      PPID=15645
      PS4='+ '
      PWD=/var/lib/jenkins_slave
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='165.115.33.181 37012 22'
      SSH_CONNECTION='165.115.33.181 37012 165.115.33.182 22'
      TERM=dumb
      TMOUT=1800
      UID=1000
      USER=jenkins_slave_local
      XDG_RUNTIME_DIR=/run/user/1000
      XDG_SESSION_ID=3294
      _=/etc/bashrc
      [11/27/18 16:19:03] [SSH] Checking java version of /app/var/lib/jenkins/slave_node/jdk/bin/java
      Couldn't figure out the Java version of /app/var/lib/jenkins/slave_node/jdk/bin/java
      bash: /app/var/lib/jenkins/slave_node/jdk/bin/java: No such file or directory[11/27/18 16:19:03] [SSH] Checking java version of java
      [11/27/18 16:19:03] [SSH] java -version returned 1.8.0_181.
      [11/27/18 16:19:03] [SSH] Starting sftp client.
      [11/27/18 16:19:03] [SSH] Copying latest remoting.jar...
      [11/27/18 16:19:03] [SSH] Copied 776,265 bytes.
      Expanded the channel window size to 4MB
      [11/27/18 16:19:03] [SSH] Starting agent process: cd "/app/var/lib/jenkins/slave_node" && java  -jar remoting.jar -workDir /app/var/lib/jenkins/slave_node
      Nov 27, 2018 4:19:03 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /app/var/lib/jenkins/slave_node/remoting as a remoting work directory
      Both error and output logs will be printed to /app/var/lib/jenkins/slave_node/remoting
      <===[JENKINS REMOTING CAPACITY]===>channel started
      Remoting version: 3.25
      This is a Unix agent
      Evacuated stdout
      Agent successfully connected and online

          [JENKINS-54897] Unexpected error in launching a agent after restart

          Did you see if the plugin retry to connect automatically? if should be because has the default configuration to make it `launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15`, after 15 seconds of the fail it must retry to connect

          The duplicate messages bothered me, it is like two processes were running at the same time, Which JDK do you use?

          [11/27/18 16:18:08] [SSH] Opening SSH connection to jenkins-slave.example.com:22.
          [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed.
          [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed.
          [11/27/18 16:18:29] [SSH] Authentication successful.
          [11/27/18 16:18:29] [SSH] Authentication successful.
          [11/27/18 16:18:29] [SSH] The remote user's environment is:
          [11/27/18 16:18:29] [SSH] The remote user's environment is:
          

          Ivan Fernandez Calvo added a comment - Did you see if the plugin retry to connect automatically? if should be because has the default configuration to make it `launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15`, after 15 seconds of the fail it must retry to connect The duplicate messages bothered me, it is like two processes were running at the same time, Which JDK do you use? [11/27/18 16:18:08] [SSH] Opening SSH connection to jenkins-slave.example.com:22. [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed. [11/27/18 16:18:29] [SSH] SSH host key matches key in Known Hosts file. Connection will be allowed. [11/27/18 16:18:29] [SSH] Authentication successful. [11/27/18 16:18:29] [SSH] Authentication successful. [11/27/18 16:18:29] [SSH] The remote user's environment is: [11/27/18 16:18:29] [SSH] The remote user's environment is:

          openjdk version "1.8.0_181"
          OpenJDK Runtime Environment (build 1.8.0_181-b13)
          OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
          

          I think this is because I clicked Launch Agent and at the same time it was configured to re-try. I guess there are no semaphores to handle that

          Konstantinos Psimoulis added a comment - openjdk version "1.8.0_181" OpenJDK Runtime Environment (build 1.8.0_181-b13) OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode) I think this is because I clicked Launch Agent and at the same time it was configured to re-try. I guess there are no semaphores to handle that

          Ivan Fernandez Calvo added a comment - - edited

          yep, I'm starting to understand what happens in every `CancellationException`, I am working on a consistent way to replicate it on a test to make a proper fix.

          Ivan Fernandez Calvo added a comment - - edited yep, I'm starting to understand what happens in every `CancellationException`, I am working on a consistent way to replicate it on a test to make a proper fix.

          Bob Ashforth added a comment -

          Can you recommend any workarounds? This is crippling a major lift-and-shift project for us.

          Bob Ashforth added a comment - Can you recommend any workarounds? This is crippling a major lift-and-shift project for us.

          On the logs attached the retries are not working so the SSHLauncher should be blocked by other thread, it is possibly related to the credentials tracking issue JENKINS-49235 if you get a thread dump when the issue happens you can compare it to this troubleshooting threads-stuck-at-credentialsprovidertrackall

          Ivan Fernandez Calvo added a comment - On the logs attached the retries are not working so the SSHLauncher should be blocked by other thread, it is possibly related to the credentials tracking issue JENKINS-49235 if you get a thread dump when the issue happens you can compare it to this troubleshooting threads-stuck-at-credentialsprovidertrackall

            ifernandezcalvo Ivan Fernandez Calvo
            kpsimoulis Konstantinos Psimoulis
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: