Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-45755

Unable to launch SSH Slave since 2.68 when HOME is not writable on Master

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Critical
    • Resolution: Fixed
    • Single master and single slave setup, both running Debian and using SSH slave where master user is not the same as the slave user. Jenkins 2.71 / Remoting 3.10

    Description

      Last working for me Jenkins version 2.67

      Versions not working 2.68 - 2.78

      Issue: SSH Slave will not launch

      Some part of the release in 2.68 seems to have changed how the environment is setup for slaves. The user my slave connects as has a home directory of: /home/SPALDING/jenkinsbuildserver however I can see in the error message that jarCache is attempting to look in master user's home directory which is /usr/share/tomcat8, which doesn't exist on the slave.

      I figured I could at least work around this issue if I set the new "workDir" parameter, but the launch will still fail. Even if I create the /usr/share/tomcat8/.jenkins/cache/jars directory and ensure it is writable by the slave user,  it still fails to launch.

      There appears to be 2 issues:

      • The slave is not using the correct location for the default jar cache because it seems to be using the home directory of the master user.
      • Even when the 'workDir' parameter is specified, Jenkins is still trying to validate the default location.
      [07/24/17 10:42:11] [SSH] Opening SSH connection to SLAVEHOST:22.
      [07/24/17 10:42:11] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
      [07/24/17 10:42:11] [SSH] Authentication successful.
      [07/24/17 10:42:11] [SSH] The remote users environment is:
      BASH=/bin/bash
      BASHOPTS=cmdhist:complete_fullquote:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="3" [2]="30" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
      BASH_VERSION='4.3.30(1)-release'
      DIRSTACK=()
      EUID=10169
      GROUPS=()
      HOME=/home/SPALDING/jenkinsbuildserver
      HOSTNAME=SLAVEHOST
      HOSTTYPE=x86_64
      IFS=$' \t\n'
      LANG=en_US.UTF-8
      LOGNAME=jenkinsbuildserver
      MACHTYPE=x86_64-pc-linux-gnu
      MAIL=/var/mail/jenkinsbuildserver
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/bin:/usr/bin:/bin:/usr/games
      PIPESTATUS=([0]="0")
      PPID=21458
      PS4='+ '
      PWD=/home/SPALDING/jenkinsbuildserver
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='10.10.1.179 57938 22'
      SSH_CONNECTION='10.10.1.179 57938 10.10.0.251 22'
      TERM=dumb
      UID=10169
      USER=jenkinsbuildserver
      _=']'
      [07/24/17 10:42:11] [SSH] Checking java version of java
      [07/24/17 10:42:11] [SSH] java -version returned 1.8.0_131.
      [07/24/17 10:42:11] [SSH] Starting sftp client.
      [07/24/17 10:42:11] [SSH] Copying latest slave.jar...
      [07/24/17 10:42:11] [SSH] Copied 730,299 bytes.
      Expanded the channel window size to 4MB
      [07/24/17 10:42:11] [SSH] Starting slave process: cd "/home/SPALDING/jenkinsbuildserver/jenkins-agent" && java  -jar slave.jar -workDir /home/SPALDING/jenkinsbuildserver/jenkins-agent/work -failIfWorkDirIsMissing
      Jul 24, 2017 10:42:12 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /home/SPALDING/jenkinsbuildserver/jenkins-agent/work/remoting as a remoting work directory
      Both error and output logs will be printed to /home/SPALDING/jenkinsbuildserver/jenkins-agent/work/remoting
      <===[JENKINS REMOTING CAPACITY]===>ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins.
      java.lang.RuntimeException: Root directory not writable: /usr/share/tomcat8/.jenkins/cache/jars
              at hudson.remoting.FileSystemJarCache.<init>(FileSystemJarCache.java:57)
              at hudson.remoting.JarCache.getDefault(JarCache.java:32)
              at hudson.remoting.Channel.<init>(Channel.java:505)
              at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:323)
              at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:389)
              at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:1070)
              at hudson.plugins.sshslaves.SSHLauncher.access$500(SSHLauncher.java:144)
              at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:817)
              at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:792)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:748)
      [07/24/17 10:42:12] Launch failed - cleaning up connection
      [07/24/17 10:42:12] [SSH] Connection closed.
      
      
      

      Reproducer using Docker

      Launch Jenkins using the following

      mkdir jenkins_home
      sudo chown 2000 jenkins_home
      sudo chmod 777 jenkins_home
      docker run -ti -v $(pwd)/jenkins_home:/var/jenkins_home  -u 2000 --rm -p 8080:8080 -p 50000:50000 jenkinsci/jenkins:2.73
      

      Create a jnlp agent, then try to connect. You will then get the following sequence:

      master

      Sep 12, 2017 1:06:16 PM hudson.TcpSlaveAgentListener$ConnectionHandler run
      INFO: Accepted JNLP4-connect connection #1 from /172.17.0.1:38552
      Sep 12, 2017 1:06:16 PM org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer onRecv
      WARNING: [JNLP4-connect connection from 172.17.0.1/172.17.0.1:38552]
      java.lang.RuntimeException: Root directory not writable: ?/.jenkins/cache/jars
      	at hudson.remoting.FileSystemJarCache.<init>(FileSystemJarCache.java:57)
      	at hudson.remoting.JarCache.getDefault(JarCache.java:32)
      	at hudson.remoting.Channel.<init>(Channel.java:505)
      	at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:339)
      	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onRead(ChannelApplicationLayer.java:149)
      	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecv(ApplicationLayer.java:207)
      	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:669)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processRead(SSLEngineFilterLayer.java:369)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecv(SSLEngineFilterLayer.java:117)
      	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:669)
      	at org.jenkinsci.remoting.protocol.NetworkLayer.onRead(NetworkLayer.java:136)
      	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:160)
      	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:721)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      agent

      INFOS: Locating server among [http://localhost:8080/]
      sept. 12, 2017 1:06:16 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
      INFOS: Remoting server accepts the following protocols: [JNLP4-connect, JNLP-connect, Ping, JNLP2-connect]
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Agent discovery successful
        Agent address: localhost
        Agent port:    50000
        Identity:      80:e1:c5:f6:d5:96:cf:1d:6a:58:45:48:2b:fe:67:76
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Handshaking
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Connecting to localhost:50000
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Trying protocol: JNLP4-connect
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Remote identity confirmed: 80:e1:c5:f6:d5:96:cf:1d:6a:58:45:48:2b:fe:67:76
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Connected
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Terminated
      

      Attachments

        Issue Links

          Activity

            Code changed in jenkins
            User: Oleg Nenashev
            Path:
            pom.xml
            http://jenkins-ci.org/commit/jenkins/e8b2f5a59c0b075f62bea9b6a45ef35a1c2ca1bb
            Log:
            Merge pull request #3025 from oleg-nenashev/remoting/3.12

            [JENKINS-45755, JENKINS-46140] - Update Remoting to 3.12

            Compare: https://github.com/jenkinsci/jenkins/compare/2343909f0240...e8b2f5a59c0b

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: pom.xml http://jenkins-ci.org/commit/jenkins/e8b2f5a59c0b075f62bea9b6a45ef35a1c2ca1bb Log: Merge pull request #3025 from oleg-nenashev/remoting/3.12 [JENKINS-45755, JENKINS-46140] - Update Remoting to 3.12 Compare: https://github.com/jenkinsci/jenkins/compare/2343909f0240...e8b2f5a59c0b
            jmccormick Jesse McCormick added a comment - - edited

            vlatombe - Tomcat was started with -Dcatalina.home=/usr/share/tomcat8. And yes, the directory is not writable by the tomcat8 user. This has caused other problems for Jenkins before. For instance, when the SSH client starts, it is also looking in /usr/share/tomcat8/, which is not the user's home directory (for the .ssh directory access). Is it possible we are using catalina.home instead of the actual user's home directory?

            Full parameters:

            /usr/lib/jvm/default-java/bin/java 
            -Djava.util.logging.config.file=/var/lib/tomcat8/conf/logging.properties 
            -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager 
            -Djava.awt.headless=true
            -Xmx1280m 
            -XX:+UseConcMarkSweepGC 
            -DJENKINS_HOME=/var/lib/tomcat8/webapps/jenkins/
            -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true
            -Djdk.tls.ephemeralDHKeySize=2048
            -Djava.protocol.handler.pkgs=org.apache.catalina.webresources
            -classpath /usr/share/tomcat8/bin/bootstrap.jar:/usr/share/tomcat8/bin/tomcat-juli.jar
            -Dcatalina.base=/var/lib/tomcat8
            -Dcatalina.home=/usr/share/tomcat8
            -Djava.io.tmpdir=/tmp/tomcat8-tomcat8-tmp 
            org.apache.catalina.startup.Bootstrap start

             

            jmccormick Jesse McCormick added a comment - - edited vlatombe - Tomcat was started with -Dcatalina.home=/usr/share/tomcat8. And yes, the directory is not writable by the tomcat8 user. This has caused other problems for Jenkins before. For instance, when the SSH client starts, it is also looking in /usr/share/tomcat8/, which is not the user's home directory (for the .ssh directory access). Is it possible we are using catalina.home instead of the actual user's home directory? Full parameters: /usr/lib/jvm/ default -java/bin/java -Djava.util.logging.config.file=/ var /lib/tomcat8/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.awt.headless= true -Xmx1280m -XX:+UseConcMarkSweepGC -DJENKINS_HOME=/ var /lib/tomcat8/webapps/jenkins/ -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH= true -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -classpath /usr/share/tomcat8/bin/bootstrap.jar:/usr/share/tomcat8/bin/tomcat-juli.jar -Dcatalina.base=/ var /lib/tomcat8 -Dcatalina.home=/usr/share/tomcat8 -Djava.io.tmpdir=/tmp/tomcat8-tomcat8-tmp org.apache.catalina.startup.Bootstrap start  

            Thanks for the fix! I can confirm all is well in the Jenkins 2.79 release.

            jmccormick Jesse McCormick added a comment - Thanks for the fix! I can confirm all is well in the Jenkins 2.79 release.
            danielbeck Daniel Beck added a comment -

            Fixed in 2.79.

            danielbeck Daniel Beck added a comment - Fixed in 2.79.

            Code changed in jenkins
            User: Oleg Nenashev
            Path:
            src/main/java/hudson/remoting/Channel.java
            src/main/java/hudson/remoting/ChannelBuilder.java
            src/main/java/hudson/remoting/Engine.java
            src/main/java/hudson/remoting/FileSystemJarCache.java
            src/main/java/hudson/remoting/JarCache.java
            src/main/java/hudson/remoting/Launcher.java
            src/main/java/hudson/remoting/ResourceImageBoth.java
            src/main/java/hudson/remoting/ResourceImageInJar.java
            src/main/java/org/jenkinsci/remoting/nio/NioChannelBuilder.java
            src/test/java/hudson/remoting/PrefetchingTest.java
            http://jenkins-ci.org/commit/remoting/8f550052f803bcb9ad708881b3534e0227d91150
            Log:
            JENKINS-45755 - Make JARCache nullable in Channel

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: src/main/java/hudson/remoting/Channel.java src/main/java/hudson/remoting/ChannelBuilder.java src/main/java/hudson/remoting/Engine.java src/main/java/hudson/remoting/FileSystemJarCache.java src/main/java/hudson/remoting/JarCache.java src/main/java/hudson/remoting/Launcher.java src/main/java/hudson/remoting/ResourceImageBoth.java src/main/java/hudson/remoting/ResourceImageInJar.java src/main/java/org/jenkinsci/remoting/nio/NioChannelBuilder.java src/test/java/hudson/remoting/PrefetchingTest.java http://jenkins-ci.org/commit/remoting/8f550052f803bcb9ad708881b3534e0227d91150 Log: JENKINS-45755 - Make JARCache nullable in Channel

            People

              oleg_nenashev Oleg Nenashev
              jmccormick Jesse McCormick
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: