Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-45755

Unable to launch SSH Slave since 2.68 when HOME is not writable on Master

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • Single master and single slave setup, both running Debian and using SSH slave where master user is not the same as the slave user. Jenkins 2.71 / Remoting 3.10

      Last working for me Jenkins version 2.67

      Versions not working 2.68 - 2.78

      Issue: SSH Slave will not launch

      Some part of the release in 2.68 seems to have changed how the environment is setup for slaves. The user my slave connects as has a home directory of: /home/SPALDING/jenkinsbuildserver however I can see in the error message that jarCache is attempting to look in master user's home directory which is /usr/share/tomcat8, which doesn't exist on the slave.

      I figured I could at least work around this issue if I set the new "workDir" parameter, but the launch will still fail. Even if I create the /usr/share/tomcat8/.jenkins/cache/jars directory and ensure it is writable by the slave user,  it still fails to launch.

      There appears to be 2 issues:

      • The slave is not using the correct location for the default jar cache because it seems to be using the home directory of the master user.
      • Even when the 'workDir' parameter is specified, Jenkins is still trying to validate the default location.
      [07/24/17 10:42:11] [SSH] Opening SSH connection to SLAVEHOST:22.
      [07/24/17 10:42:11] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
      [07/24/17 10:42:11] [SSH] Authentication successful.
      [07/24/17 10:42:11] [SSH] The remote users environment is:
      BASH=/bin/bash
      BASHOPTS=cmdhist:complete_fullquote:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
      BASH_ALIASES=()
      BASH_ARGC=()
      BASH_ARGV=()
      BASH_CMDS=()
      BASH_EXECUTION_STRING=set
      BASH_LINENO=()
      BASH_SOURCE=()
      BASH_VERSINFO=([0]="4" [1]="3" [2]="30" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
      BASH_VERSION='4.3.30(1)-release'
      DIRSTACK=()
      EUID=10169
      GROUPS=()
      HOME=/home/SPALDING/jenkinsbuildserver
      HOSTNAME=SLAVEHOST
      HOSTTYPE=x86_64
      IFS=$' \t\n'
      LANG=en_US.UTF-8
      LOGNAME=jenkinsbuildserver
      MACHTYPE=x86_64-pc-linux-gnu
      MAIL=/var/mail/jenkinsbuildserver
      OPTERR=1
      OPTIND=1
      OSTYPE=linux-gnu
      PATH=/usr/local/bin:/usr/bin:/bin:/usr/games
      PIPESTATUS=([0]="0")
      PPID=21458
      PS4='+ '
      PWD=/home/SPALDING/jenkinsbuildserver
      SHELL=/bin/bash
      SHELLOPTS=braceexpand:hashall:interactive-comments
      SHLVL=1
      SSH_CLIENT='10.10.1.179 57938 22'
      SSH_CONNECTION='10.10.1.179 57938 10.10.0.251 22'
      TERM=dumb
      UID=10169
      USER=jenkinsbuildserver
      _=']'
      [07/24/17 10:42:11] [SSH] Checking java version of java
      [07/24/17 10:42:11] [SSH] java -version returned 1.8.0_131.
      [07/24/17 10:42:11] [SSH] Starting sftp client.
      [07/24/17 10:42:11] [SSH] Copying latest slave.jar...
      [07/24/17 10:42:11] [SSH] Copied 730,299 bytes.
      Expanded the channel window size to 4MB
      [07/24/17 10:42:11] [SSH] Starting slave process: cd "/home/SPALDING/jenkinsbuildserver/jenkins-agent" && java  -jar slave.jar -workDir /home/SPALDING/jenkinsbuildserver/jenkins-agent/work -failIfWorkDirIsMissing
      Jul 24, 2017 10:42:12 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
      INFO: Using /home/SPALDING/jenkinsbuildserver/jenkins-agent/work/remoting as a remoting work directory
      Both error and output logs will be printed to /home/SPALDING/jenkinsbuildserver/jenkins-agent/work/remoting
      <===[JENKINS REMOTING CAPACITY]===>ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins.
      java.lang.RuntimeException: Root directory not writable: /usr/share/tomcat8/.jenkins/cache/jars
              at hudson.remoting.FileSystemJarCache.<init>(FileSystemJarCache.java:57)
              at hudson.remoting.JarCache.getDefault(JarCache.java:32)
              at hudson.remoting.Channel.<init>(Channel.java:505)
              at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:323)
              at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:389)
              at hudson.plugins.sshslaves.SSHLauncher.startSlave(SSHLauncher.java:1070)
              at hudson.plugins.sshslaves.SSHLauncher.access$500(SSHLauncher.java:144)
              at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:817)
              at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:792)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:748)
      [07/24/17 10:42:12] Launch failed - cleaning up connection
      [07/24/17 10:42:12] [SSH] Connection closed.
      
      
      

      Reproducer using Docker

      Launch Jenkins using the following

      mkdir jenkins_home
      sudo chown 2000 jenkins_home
      sudo chmod 777 jenkins_home
      docker run -ti -v $(pwd)/jenkins_home:/var/jenkins_home  -u 2000 --rm -p 8080:8080 -p 50000:50000 jenkinsci/jenkins:2.73
      

      Create a jnlp agent, then try to connect. You will then get the following sequence:

      master

      Sep 12, 2017 1:06:16 PM hudson.TcpSlaveAgentListener$ConnectionHandler run
      INFO: Accepted JNLP4-connect connection #1 from /172.17.0.1:38552
      Sep 12, 2017 1:06:16 PM org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer onRecv
      WARNING: [JNLP4-connect connection from 172.17.0.1/172.17.0.1:38552]
      java.lang.RuntimeException: Root directory not writable: ?/.jenkins/cache/jars
      	at hudson.remoting.FileSystemJarCache.<init>(FileSystemJarCache.java:57)
      	at hudson.remoting.JarCache.getDefault(JarCache.java:32)
      	at hudson.remoting.Channel.<init>(Channel.java:505)
      	at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:339)
      	at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onRead(ChannelApplicationLayer.java:149)
      	at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecv(ApplicationLayer.java:207)
      	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:669)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processRead(SSLEngineFilterLayer.java:369)
      	at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecv(SSLEngineFilterLayer.java:117)
      	at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecv(ProtocolStack.java:669)
      	at org.jenkinsci.remoting.protocol.NetworkLayer.onRead(NetworkLayer.java:136)
      	at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:160)
      	at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:721)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      agent

      INFOS: Locating server among [http://localhost:8080/]
      sept. 12, 2017 1:06:16 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
      INFOS: Remoting server accepts the following protocols: [JNLP4-connect, JNLP-connect, Ping, JNLP2-connect]
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Agent discovery successful
        Agent address: localhost
        Agent port:    50000
        Identity:      80:e1:c5:f6:d5:96:cf:1d:6a:58:45:48:2b:fe:67:76
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Handshaking
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Connecting to localhost:50000
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Trying protocol: JNLP4-connect
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Remote identity confirmed: 80:e1:c5:f6:d5:96:cf:1d:6a:58:45:48:2b:fe:67:76
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Connected
      sept. 12, 2017 1:06:16 PM hudson.remoting.jnlp.Main$CuiListener status
      INFOS: Terminated
      

          [JENKINS-45755] Unable to launch SSH Slave since 2.68 when HOME is not writable on Master

          Daniel Beck added a comment - - edited

          Legacy remoting-based CLI connections are also affected:

          Sep 14, 2017 4:11:14 AM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtException
          SEVERE: A thread (TCP agent connection handler #6 with /192.168.99.1:65128/107) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code.
          java.lang.RuntimeException: Root directory not writable: /.jenkins/cache/jars
              at hudson.remoting.FileSystemJarCache.<init>(FileSystemJarCache.java:57)
              at hudson.remoting.JarCache.getDefault(JarCache.java:32)
              at hudson.remoting.Channel.<init>(Channel.java:520)
              at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:323)
              at hudson.cli.CliProtocol$Handler.runCli(CliProtocol.java:106)
              at hudson.cli.CliProtocol2$Handler2.run(CliProtocol2.java:101)
              at hudson.cli.CliProtocol2.handle(CliProtocol2.java:57)
              at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:235)

          Daniel Beck added a comment - - edited Legacy remoting-based CLI connections are also affected: Sep 14, 2017 4:11:14 AM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtException SEVERE: A thread (TCP agent connection handler #6 with /192.168.99.1:65128/107) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code. java.lang.RuntimeException: Root directory not writable: /.jenkins/cache/jars     at hudson.remoting.FileSystemJarCache.<init>(FileSystemJarCache.java:57)     at hudson.remoting.JarCache.getDefault(JarCache.java:32)     at hudson.remoting.Channel.<init>(Channel.java:520)     at hudson.remoting.ChannelBuilder.build(ChannelBuilder.java:323)     at hudson.cli.CliProtocol$Handler.runCli(CliProtocol.java:106)     at hudson.cli.CliProtocol2$Handler2.run(CliProtocol2.java:101)     at hudson.cli.CliProtocol2.handle(CliProtocol2.java:57)     at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:235)

          Oleg Nenashev added a comment -

          Yes. There are many valid use-cases when the home directory should not be writable at all, so I disagree with the priority assessment by jglick . IMHO it is still a high-priority regression we need to fix, at least in .2.

          Oleg Nenashev added a comment - Yes. There are many valid use-cases when the home directory should not be writable at all, so I disagree with the priority assessment by jglick . IMHO it is still a high-priority regression we need to fix, at least in .2.

          Jesse McCormick added a comment - - edited

          In my specific case, the home directory for each user is writable, just not by the other user. Here is more detail on the setup:

          Master Node:

          • Master User: tomcat8
          • Master Home: /var/lib/tomcat8 (writable by user tomcat8)
          • Slave User: jenkinsbuildserver
          • Slave Home: Does not exist on master

          Slave Node:

          • Master User: Does not exist on slave
          • Master Home: Does not exist on slave
          • Slave User: jenkinsbuildserver
          • Slave Home: /home/SPALDING/jenkinsbuildserver (writable by user jenkinsbuildserver )

          Jesse McCormick added a comment - - edited In my specific case, the home directory for each user is writable, just not by the other user. Here is more detail on the setup: Master Node: Master User: tomcat8 Master Home: /var/lib/tomcat8 (writable by user tomcat8) Slave User: jenkinsbuildserver Slave Home: Does not exist on master Slave Node: Master User: Does not exist on slave Master Home: Does not exist on slave Slave User: jenkinsbuildserver Slave Home: /home/SPALDING/jenkinsbuildserver (writable by user jenkinsbuildserver )

          Hi jmccormick, from your report it looks like the home directory for your tomcat8 user on the machine running master is /usr/share/tomcat8/ which is not writable. If it is not the user home, maybe the system property -Duser.home=/usr/share/tomcat8 is specified when launching your tomcat instance, this would have the same effect.

          Vincent Latombe added a comment - Hi jmccormick , from your report it looks like the home directory for your tomcat8 user on the machine running master is /usr/share/tomcat8/ which is not writable. If it is not the user home, maybe the system property -Duser.home=/usr/share/tomcat8 is specified when launching your tomcat instance, this would have the same effect.

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          pom.xml
          http://jenkins-ci.org/commit/jenkins/018f9875ca58230afc4eb52ac66b3195f00128ef
          Log:
          [JENKINS-45755, JENKINS-46140] - Update Remoting to 3.12

          https://github.com/jenkinsci/remoting/blob/master/CHANGELOG.md#312

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: pom.xml http://jenkins-ci.org/commit/jenkins/018f9875ca58230afc4eb52ac66b3195f00128ef Log: [JENKINS-45755, JENKINS-46140] - Update Remoting to 3.12 https://github.com/jenkinsci/remoting/blob/master/CHANGELOG.md#312

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          pom.xml
          http://jenkins-ci.org/commit/jenkins/e8b2f5a59c0b075f62bea9b6a45ef35a1c2ca1bb
          Log:
          Merge pull request #3025 from oleg-nenashev/remoting/3.12

          [JENKINS-45755, JENKINS-46140] - Update Remoting to 3.12

          Compare: https://github.com/jenkinsci/jenkins/compare/2343909f0240...e8b2f5a59c0b

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: pom.xml http://jenkins-ci.org/commit/jenkins/e8b2f5a59c0b075f62bea9b6a45ef35a1c2ca1bb Log: Merge pull request #3025 from oleg-nenashev/remoting/3.12 [JENKINS-45755, JENKINS-46140] - Update Remoting to 3.12 Compare: https://github.com/jenkinsci/jenkins/compare/2343909f0240...e8b2f5a59c0b

          Jesse McCormick added a comment - - edited

          vlatombe - Tomcat was started with -Dcatalina.home=/usr/share/tomcat8. And yes, the directory is not writable by the tomcat8 user. This has caused other problems for Jenkins before. For instance, when the SSH client starts, it is also looking in /usr/share/tomcat8/, which is not the user's home directory (for the .ssh directory access). Is it possible we are using catalina.home instead of the actual user's home directory?

          Full parameters:

          /usr/lib/jvm/default-java/bin/java 
          -Djava.util.logging.config.file=/var/lib/tomcat8/conf/logging.properties 
          -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager 
          -Djava.awt.headless=true
          -Xmx1280m 
          -XX:+UseConcMarkSweepGC 
          -DJENKINS_HOME=/var/lib/tomcat8/webapps/jenkins/
          -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true
          -Djdk.tls.ephemeralDHKeySize=2048
          -Djava.protocol.handler.pkgs=org.apache.catalina.webresources
          -classpath /usr/share/tomcat8/bin/bootstrap.jar:/usr/share/tomcat8/bin/tomcat-juli.jar
          -Dcatalina.base=/var/lib/tomcat8
          -Dcatalina.home=/usr/share/tomcat8
          -Djava.io.tmpdir=/tmp/tomcat8-tomcat8-tmp 
          org.apache.catalina.startup.Bootstrap start

           

          Jesse McCormick added a comment - - edited vlatombe - Tomcat was started with -Dcatalina.home=/usr/share/tomcat8. And yes, the directory is not writable by the tomcat8 user. This has caused other problems for Jenkins before. For instance, when the SSH client starts, it is also looking in /usr/share/tomcat8/, which is not the user's home directory (for the .ssh directory access). Is it possible we are using catalina.home instead of the actual user's home directory? Full parameters: /usr/lib/jvm/ default -java/bin/java -Djava.util.logging.config.file=/ var /lib/tomcat8/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.awt.headless= true -Xmx1280m -XX:+UseConcMarkSweepGC -DJENKINS_HOME=/ var /lib/tomcat8/webapps/jenkins/ -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH= true -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -classpath /usr/share/tomcat8/bin/bootstrap.jar:/usr/share/tomcat8/bin/tomcat-juli.jar -Dcatalina.base=/ var /lib/tomcat8 -Dcatalina.home=/usr/share/tomcat8 -Djava.io.tmpdir=/tmp/tomcat8-tomcat8-tmp org.apache.catalina.startup.Bootstrap start  

          Thanks for the fix! I can confirm all is well in the Jenkins 2.79 release.

          Jesse McCormick added a comment - Thanks for the fix! I can confirm all is well in the Jenkins 2.79 release.

          Daniel Beck added a comment -

          Fixed in 2.79.

          Daniel Beck added a comment - Fixed in 2.79.

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          src/main/java/hudson/remoting/Channel.java
          src/main/java/hudson/remoting/ChannelBuilder.java
          src/main/java/hudson/remoting/Engine.java
          src/main/java/hudson/remoting/FileSystemJarCache.java
          src/main/java/hudson/remoting/JarCache.java
          src/main/java/hudson/remoting/Launcher.java
          src/main/java/hudson/remoting/ResourceImageBoth.java
          src/main/java/hudson/remoting/ResourceImageInJar.java
          src/main/java/org/jenkinsci/remoting/nio/NioChannelBuilder.java
          src/test/java/hudson/remoting/PrefetchingTest.java
          http://jenkins-ci.org/commit/remoting/8f550052f803bcb9ad708881b3534e0227d91150
          Log:
          JENKINS-45755 - Make JARCache nullable in Channel

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: src/main/java/hudson/remoting/Channel.java src/main/java/hudson/remoting/ChannelBuilder.java src/main/java/hudson/remoting/Engine.java src/main/java/hudson/remoting/FileSystemJarCache.java src/main/java/hudson/remoting/JarCache.java src/main/java/hudson/remoting/Launcher.java src/main/java/hudson/remoting/ResourceImageBoth.java src/main/java/hudson/remoting/ResourceImageInJar.java src/main/java/org/jenkinsci/remoting/nio/NioChannelBuilder.java src/test/java/hudson/remoting/PrefetchingTest.java http://jenkins-ci.org/commit/remoting/8f550052f803bcb9ad708881b3534e0227d91150 Log: JENKINS-45755 - Make JARCache nullable in Channel

            oleg_nenashev Oleg Nenashev
            jmccormick Jesse McCormick
            Votes:
            2 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: