Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59903

durable-task v1.31 breaks sh steps in pipeline when running in a Docker container

    • 1.33

      A pipeline like this:

      pipeline {
          agent {
              docker {
                  label 'docker'
                  image 'busybox'
              }
          }
          stages {
              stage("Test sh script in container") {
                  steps {
                    sh label: 'Echo "Hello World...', script: 'echo "Hello World!"'
                  }
              }
          }
      }
      

      Fails with this log:

      Running in Durability level: PERFORMANCE_OPTIMIZED
      [Pipeline] Start of Pipeline (hide)
      [Pipeline] node
      Running on docker-node in /...
      [Pipeline] {
      [Pipeline] isUnix
      [Pipeline] sh
      + docker inspect -f . busybox
      .
      [Pipeline] withDockerContainer
      got-legaci-3 does not seem to be running inside a container
      $ docker run -t -d -u 1002:1002 -w <<hidden>> busybox cat
      $ docker top 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f -eo pid,comm
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (Test sh script in container)
      [Pipeline] sh (Echo "Hello World...)
      process apparently never started in /...
      (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      $ docker stop --time=1 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f
      $ docker rm -f 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -2
      Finished: FAILURE
      

      Adding the -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true parameter gives this log:

      Running in Durability level: PERFORMANCE_OPTIMIZED
      [Pipeline] Start of Pipeline
      [Pipeline] node
      Running on docker-node in /...
      [Pipeline] {
      [Pipeline] isUnix
      [Pipeline] sh
      + docker inspect -f . busybox
      .
      [Pipeline] withDockerContainer
      got-legaci-3 does not seem to be running inside a container
      $ docker run -t -d -u 1002:1002 -w <<hidden>> busybox cat
      $ docker top 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e -eo pid,comm
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (Test sh script in container)
      [Pipeline] sh (Echo "Hello World...)
      OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"/var/jenkins/caches/durable-task/durable_task_monitor_1.31_unix_64\": stat /var/jenkins/caches/durable-task/durable_task_monitor_1.31_unix_64: no such file or directory": unknown
      process apparently never started in /...
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      $ docker stop --time=1 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e
      $ docker rm -f 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -2
      Finished: FAILURE
      

      Tested on three different Jenkins masters with similar, but no identical, configurations.

      Reverting to Durable Task Plugin v. 1.30 "solves" the problem.

          [JENKINS-59903] durable-task v1.31 breaks sh steps in pipeline when running in a Docker container

          Harald Albers added a comment -

          carroll 1.32 does not resolve the issue for me.

          When running a sh step remotely on a dockerized agent as described above, I still get java.nio.file.AccessDeniedException: /caches, see details above.

          Harald Albers added a comment - carroll 1.32 does not resolve the issue for me. When running a sh step remotely on a dockerized agent as described above, I still get java.nio.file.AccessDeniedException: /caches , see details above.

          albers How are you running your container?

          I'm guessing wildly here, but to me it looks like your node config is setting "Remote root directory" to /. And I'm also guessing that you are running the container as a specific user, e.g. '-u jenkins:jenkins' and probably mount the workspace like e.g. '-v /home/jenkins/workspace:/workspace'. And then start the agent inside the container.

          With such a setup the Jenkins agent will probably not have enough permissions to create '/cache', which the plugin perhaps still is trying to do even if it's set to not use the new wrapper.

          Try adding e.g. '-v /home/jenkins/cache:/cache' (modified to your config) or pre-creating a /cache folder in your image that is owned by 'jenkins:jenkins' (the user you run the container as).

          Jesper Andersson added a comment - albers How are you running your container? I'm guessing wildly here, but to me it looks like your node config is setting "Remote root directory" to /. And I'm also guessing that you are running the container as a specific user, e.g. '-u jenkins:jenkins' and probably mount the workspace like e.g. '-v /home/jenkins/workspace:/workspace'. And then start the agent inside the container. With such a setup the Jenkins agent will probably not have enough permissions to create '/cache', which the plugin perhaps still is trying to do even if it's set to not use the new wrapper. Try adding e.g. '-v /home/jenkins/cache:/cache' (modified to your config) or pre-creating a /cache folder in your image that is owned by 'jenkins:jenkins' (the user you run the container as).

          Harald Albers added a comment -

          njesper Your questions pointed me to a solution, thanks a lot.

          But first the answers:

          The Docker image of the agent runs as the user jenkins. The swarm client plugin sets the "Remote root directory" to "/" when connecting to the master and dynamically creating an agent. The image has an existing /workspace directory that is writable for the user jenkins. The user jenkins obviously does not have sufficient permissions to create a directory in /.

          The swarm client can be configured to use a specific root directory. If I set it to a directory where the user jenkins has write permission, the build will successfully create a directory caches alongside the workspace directory.

          Another solution would be to pre-create the /caches directory in the image as well.

          I'm fine with this solution.

          But the bottom line is that we need documentation that the user who performs the build must have sufficient permissions to create directories in the build root, or that specific directories need to exist with appropriate permissions.

          Harald Albers added a comment - njesper Your questions pointed me to a solution, thanks a lot. But first the answers: The Docker image of the agent runs as the user jenkins . The swarm client plugin sets the "Remote root directory" to "/" when connecting to the master and dynamically creating an agent. The image has an existing /workspace directory that is writable for the user jenkins . The user jenkins obviously does not have sufficient permissions to create a directory in / . The swarm client can be configured to use a specific root directory. If I set it to a directory where the user jenkins has write permission, the build will successfully create a directory caches alongside the workspace directory. Another solution would be to pre-create the /caches directory in the image as well. I'm fine with this solution. But the bottom line is that we need documentation that the user who performs the build must have sufficient permissions to create directories in the build root, or that specific directories need to exist with appropriate permissions.

          Carroll Chiou added a comment -

          I apologize, what 1.31 did was disable the binary wrapper as default, but it did not resolve the caching issue because the cache dir is stil trying to be created. I am in the process of merging in my current fix (https://github.com/jenkinsci/durable-task-plugin/pull/114) into master.

          albers once the fix gets through, those users who do not have permissions to create directories in the build root will have caching disabled.

          Carroll Chiou added a comment - I apologize, what 1.31 did was disable the binary wrapper as default, but it did not resolve the caching issue because the cache dir is stil trying to be created. I am in the process of merging in my current fix ( https://github.com/jenkinsci/durable-task-plugin/pull/114 ) into master. albers once the fix gets through, those users who do not have permissions to create directories in the build root will have caching disabled.

          Carroll Chiou added a comment -

          So version 1.33 has now been released. This includes the fix for disabling cache when there are insufficient permissions to access the cache dir. The binary is still disabled by default.

          Carroll Chiou added a comment - So version 1.33 has now been released. This includes the fix for disabling cache when there are insufficient permissions to access the cache dir. The binary is still disabled by default.

          Harald Albers added a comment -

          carroll 1.33 works for my usecase (build root in /, user not having permissions to create /caches directory)

          Harald Albers added a comment - carroll 1.33 works for my usecase (build root in / , user not having permissions to create /caches directory)

          Don L added a comment -

          I've found this also reproduces when using build agents in Kubernetes, not just Docker. The problem here is that Kubernetes launches two containers into a pod with a shared mount: a JNLP slave container, which Jenkins does have permission to write the cache directory in, and a build container (in my case kubectl, but could be any container without a Jenkins user) where it does not necessarily have the same permission, in which code actually runs. The plugin runs its test inside the JNLP container, enables the wrapper, and then exhibits the same hanging behavior when commands are run in the kubectl container.

          Tests run on the latest (v1.33) of durable-task.

          Logs with LAUNCH_DIAGNOSTICS set:

          sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt: Permission denied
          sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp: Permission denied
          touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt': Permission denied
          mv: cannot stat '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp': No such file or directory
          touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt': Permission denied
          [ last line repeated ~100 times ]                                                                                          
          process apparently never started in /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47  

          In the JNLP container:

          bash-4.4$ cd /home/jenkins/agent/caches
          bash-4.4$ ls -l
          total 0
          drwxr-xr-x    2 jenkins  jenkins          6 Mar  6 15:47 durable-task 

          In the kubectl container:

          I have no name!@<REDACTED>:/home/jenkins/agent/caches$ ls -l
          total 0
          drwxr-xr-x 2 1000 1000 6 Mar  6 15:47 durable-taskI have no name!@<REDACTED>:/home/jenkins/agent/caches$ id
          uid=1001 gid=0(root) groups=0(root) 

          I've had some success today working around this by adding a security context to my pods, forcing a run as Jenkins's UID (which for me is 1000 - YMMV depending on how Jenkins is running), e.g.:

          kind: Pod                                                                                                                  
          metadata:                                                                                                                  
            name: kubectl                                                                                                            
          spec:                                                                                                                      
            containers:                                                                                                              
            - command:                                                                                                               
              - cat                                                                                                                  
              image: bitnami/kubectl:1.14                                     
              imagePullPolicy: Always                                                                                                
              name: kubectl                                                                                                          
              tty: true                                                                                                              
            securityContext:                                                                                                         
              runAsUser: 1000 

          Don L added a comment - I've found this also reproduces when using build agents in Kubernetes, not just Docker. The problem here is that Kubernetes launches two containers into a pod with a shared mount: a JNLP slave container, which Jenkins does have permission to write the cache directory in, and a build container (in my case kubectl , but could be any container without a Jenkins user) where it does not necessarily have the same permission, in which code actually runs. The plugin runs its test inside the JNLP container, enables the wrapper, and then exhibits the same hanging behavior when commands are run in the kubectl container. Tests run on the latest (v1.33) of durable-task . Logs with LAUNCH_DIAGNOSTICS set: sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt: Permission denied sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp: Permission denied touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt' : Permission denied mv: cannot stat '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp' : No such file or directory touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt' : Permission denied [ last line repeated ~100 times ] process apparently never started in /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47 In the JNLP container: bash-4.4$ cd /home/jenkins/agent/caches bash-4.4$ ls -l total 0 drwxr-xr-x 2 jenkins jenkins 6 Mar 6 15:47 durable-task In the kubectl container: I have no name!@<REDACTED>:/home/jenkins/agent/caches$ ls -l total 0 drwxr-xr-x 2 1000 1000 6 Mar 6 15:47 durable-taskI have no name!@<REDACTED>:/home/jenkins/agent/caches$ id uid=1001 gid=0(root) groups=0(root) I've had some success today working around this by adding a security context to my pods, forcing a run as Jenkins's UID (which for me is 1000 - YMMV depending on how Jenkins is running), e.g.: kind: Pod metadata: name: kubectl spec: containers: - command: - cat image: bitnami/kubectl:1.14 imagePullPolicy: Always name: kubectl tty: true securityContext: runAsUser: 1000

          Devin Nusbaum added a comment -

          don_code Please open a new issue instead of commenting here. In durable-task 1.33, the caches directory is not actually used by default, so I think you can ignore it. The problem in your case looks like permissions on the control directory for the script, and I think that you would run into the same problems on durable-task 1.30 or older, so I would check for similar bugs reported against Durable Task Plugin and/or Kubernetes Plugin.

          Devin Nusbaum added a comment - don_code Please open a new issue instead of commenting here. In durable-task 1.33, the caches directory is not actually used by default, so I think you can ignore it. The problem in your case looks like permissions on the control directory for the script, and I think that you would run into the same problems on durable-task 1.30 or older, so I would check for similar bugs reported against Durable Task Plugin and/or Kubernetes Plugin.

          Komal Bardia added a comment -

          After upgrading to version 1.33 My all Jenkins job failed with error. Please suggest the solution. Before upgrade there were no issues.

          [Pipeline] shprocess apparently never started in XYZ/durable-xyz (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer) [Pipeline] }

          Komal Bardia added a comment - After upgrading to version 1.33 My all Jenkins job failed with error. Please suggest the solution. Before upgrade there were no issues. [Pipeline] shprocess apparently never started in XYZ/durable-xyz (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer) [Pipeline] }

          Devin Nusbaum added a comment - - edited

          komalb08 The code that caused the issue described in this ticket is now disabled by default. Please open a new ticket and fully describe the issue you are seeing, and run Jenkins with the system property mentioned in the error message to get more details on the specific problem.

          Devin Nusbaum added a comment - - edited komalb08 The code that caused the issue described in this ticket is now disabled by default. Please open a new ticket and fully describe the issue you are seeing, and run Jenkins with the system property mentioned in the error message to get more details on the specific problem.

            carroll Carroll Chiou
            njesper Jesper Andersson
            Votes:
            34 Vote for this issue
            Watchers:
            48 Start watching this issue

              Created:
              Updated:
              Resolved: