Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59903

durable-task v1.31 breaks sh steps in pipeline when running in a Docker container

    XMLWordPrintable

Details

    • 1.33

    Description

      A pipeline like this:

      pipeline {
          agent {
              docker {
                  label 'docker'
                  image 'busybox'
              }
          }
          stages {
              stage("Test sh script in container") {
                  steps {
                    sh label: 'Echo "Hello World...', script: 'echo "Hello World!"'
                  }
              }
          }
      }
      

      Fails with this log:

      Running in Durability level: PERFORMANCE_OPTIMIZED
      [Pipeline] Start of Pipeline (hide)
      [Pipeline] node
      Running on docker-node in /...
      [Pipeline] {
      [Pipeline] isUnix
      [Pipeline] sh
      + docker inspect -f . busybox
      .
      [Pipeline] withDockerContainer
      got-legaci-3 does not seem to be running inside a container
      $ docker run -t -d -u 1002:1002 -w <<hidden>> busybox cat
      $ docker top 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f -eo pid,comm
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (Test sh script in container)
      [Pipeline] sh (Echo "Hello World...)
      process apparently never started in /...
      (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      $ docker stop --time=1 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f
      $ docker rm -f 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -2
      Finished: FAILURE
      

      Adding the -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true parameter gives this log:

      Running in Durability level: PERFORMANCE_OPTIMIZED
      [Pipeline] Start of Pipeline
      [Pipeline] node
      Running on docker-node in /...
      [Pipeline] {
      [Pipeline] isUnix
      [Pipeline] sh
      + docker inspect -f . busybox
      .
      [Pipeline] withDockerContainer
      got-legaci-3 does not seem to be running inside a container
      $ docker run -t -d -u 1002:1002 -w <<hidden>> busybox cat
      $ docker top 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e -eo pid,comm
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (Test sh script in container)
      [Pipeline] sh (Echo "Hello World...)
      OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"/var/jenkins/caches/durable-task/durable_task_monitor_1.31_unix_64\": stat /var/jenkins/caches/durable-task/durable_task_monitor_1.31_unix_64: no such file or directory": unknown
      process apparently never started in /...
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      $ docker stop --time=1 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e
      $ docker rm -f 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -2
      Finished: FAILURE
      

      Tested on three different Jenkins masters with similar, but no identical, configurations.

      Reverting to Durable Task Plugin v. 1.30 "solves" the problem.

      Attachments

        Issue Links

          Activity

            albers Harald Albers added a comment -

            carroll 1.33 works for my usecase (build root in /, user not having permissions to create /caches directory)

            albers Harald Albers added a comment - carroll 1.33 works for my usecase (build root in / , user not having permissions to create /caches directory)
            don_code Don L added a comment -

            I've found this also reproduces when using build agents in Kubernetes, not just Docker. The problem here is that Kubernetes launches two containers into a pod with a shared mount: a JNLP slave container, which Jenkins does have permission to write the cache directory in, and a build container (in my case kubectl, but could be any container without a Jenkins user) where it does not necessarily have the same permission, in which code actually runs. The plugin runs its test inside the JNLP container, enables the wrapper, and then exhibits the same hanging behavior when commands are run in the kubectl container.

            Tests run on the latest (v1.33) of durable-task.

            Logs with LAUNCH_DIAGNOSTICS set:

            sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt: Permission denied
            sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp: Permission denied
            touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt': Permission denied
            mv: cannot stat '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp': No such file or directory
            touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt': Permission denied
            [ last line repeated ~100 times ]                                                                                          
            process apparently never started in /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47  

            In the JNLP container:

            bash-4.4$ cd /home/jenkins/agent/caches
            bash-4.4$ ls -l
            total 0
            drwxr-xr-x    2 jenkins  jenkins          6 Mar  6 15:47 durable-task 

            In the kubectl container:

            I have no name!@<REDACTED>:/home/jenkins/agent/caches$ ls -l
            total 0
            drwxr-xr-x 2 1000 1000 6 Mar  6 15:47 durable-taskI have no name!@<REDACTED>:/home/jenkins/agent/caches$ id
            uid=1001 gid=0(root) groups=0(root) 

            I've had some success today working around this by adding a security context to my pods, forcing a run as Jenkins's UID (which for me is 1000 - YMMV depending on how Jenkins is running), e.g.:

            kind: Pod                                                                                                                  
            metadata:                                                                                                                  
              name: kubectl                                                                                                            
            spec:                                                                                                                      
              containers:                                                                                                              
              - command:                                                                                                               
                - cat                                                                                                                  
                image: bitnami/kubectl:1.14                                     
                imagePullPolicy: Always                                                                                                
                name: kubectl                                                                                                          
                tty: true                                                                                                              
              securityContext:                                                                                                         
                runAsUser: 1000 
            don_code Don L added a comment - I've found this also reproduces when using build agents in Kubernetes, not just Docker. The problem here is that Kubernetes launches two containers into a pod with a shared mount: a JNLP slave container, which Jenkins does have permission to write the cache directory in, and a build container (in my case kubectl , but could be any container without a Jenkins user) where it does not necessarily have the same permission, in which code actually runs. The plugin runs its test inside the JNLP container, enables the wrapper, and then exhibits the same hanging behavior when commands are run in the kubectl container. Tests run on the latest (v1.33) of durable-task . Logs with LAUNCH_DIAGNOSTICS set: sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt: Permission denied sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp: Permission denied touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt' : Permission denied mv: cannot stat '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp' : No such file or directory touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt' : Permission denied [ last line repeated ~100 times ] process apparently never started in /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47 In the JNLP container: bash-4.4$ cd /home/jenkins/agent/caches bash-4.4$ ls -l total 0 drwxr-xr-x 2 jenkins jenkins 6 Mar 6 15:47 durable-task In the kubectl container: I have no name!@<REDACTED>:/home/jenkins/agent/caches$ ls -l total 0 drwxr-xr-x 2 1000 1000 6 Mar 6 15:47 durable-taskI have no name!@<REDACTED>:/home/jenkins/agent/caches$ id uid=1001 gid=0(root) groups=0(root) I've had some success today working around this by adding a security context to my pods, forcing a run as Jenkins's UID (which for me is 1000 - YMMV depending on how Jenkins is running), e.g.: kind: Pod metadata: name: kubectl spec: containers: - command: - cat image: bitnami/kubectl:1.14 imagePullPolicy: Always name: kubectl tty: true securityContext: runAsUser: 1000
            dnusbaum Devin Nusbaum added a comment -

            don_code Please open a new issue instead of commenting here. In durable-task 1.33, the caches directory is not actually used by default, so I think you can ignore it. The problem in your case looks like permissions on the control directory for the script, and I think that you would run into the same problems on durable-task 1.30 or older, so I would check for similar bugs reported against Durable Task Plugin and/or Kubernetes Plugin.

            dnusbaum Devin Nusbaum added a comment - don_code Please open a new issue instead of commenting here. In durable-task 1.33, the caches directory is not actually used by default, so I think you can ignore it. The problem in your case looks like permissions on the control directory for the script, and I think that you would run into the same problems on durable-task 1.30 or older, so I would check for similar bugs reported against Durable Task Plugin and/or Kubernetes Plugin.
            komalb08 Komal Bardia added a comment -

            After upgrading to version 1.33 My all Jenkins job failed with error. Please suggest the solution. Before upgrade there were no issues.

            [Pipeline] shprocess apparently never started in XYZ/durable-xyz (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer) [Pipeline] }

            komalb08 Komal Bardia added a comment - After upgrading to version 1.33 My all Jenkins job failed with error. Please suggest the solution. Before upgrade there were no issues. [Pipeline] shprocess apparently never started in XYZ/durable-xyz (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer) [Pipeline] }
            dnusbaum Devin Nusbaum added a comment - - edited

            komalb08 The code that caused the issue described in this ticket is now disabled by default. Please open a new ticket and fully describe the issue you are seeing, and run Jenkins with the system property mentioned in the error message to get more details on the specific problem.

            dnusbaum Devin Nusbaum added a comment - - edited komalb08 The code that caused the issue described in this ticket is now disabled by default. Please open a new ticket and fully describe the issue you are seeing, and run Jenkins with the system property mentioned in the error message to get more details on the specific problem.

            People

              carroll Carroll Chiou
              njesper Jesper Andersson
              Votes:
              34 Vote for this issue
              Watchers:
              47 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: