Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63414

Global Docker agent breaks nested agent usage

      pipeline {
          agent none
          stages {
              stage('parent stage') {
                  agent {
                      docker {
                          image 'ubuntu:bionic'
                      }
                  }
                  stages {
                      stage('inherited agent') {
                          steps {
                              sh 'uname -a'
                          }
                      }
                      stage('explicit agent') {
                          agent {
                              node {
                                  label 'master'
                              }
                          }
                          steps {
                              sh 'uname -a'
                          }
                      }
                  }
              }
          }
      }
      

      The above pipeline results in the following output:

      Started by user unknown or anonymous
      Running in Durability level: MAX_SURVIVABILITY
      [Pipeline] Start of Pipeline
      [Pipeline] node
      Running on Jenkins in /var/lib/jenkins/workspace/docker-durable-bug
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (parent stage)
      [Pipeline] getContext
      [Pipeline] isUnix
      [Pipeline] sh
      + docker inspect -f . ubuntu:bionic
      .
      [Pipeline] withDockerContainer
      Jenkins does not seem to be running inside a container
      $ docker run -t -d -u 982:982 -w /var/lib/jenkins/workspace/docker-durable-bug -v /var/lib/jenkins/workspace/docker-durable-bug:/var/lib/jenkins/workspace/docker-durable-bug:rw,z -v /var/lib/jenkins/workspace/docker-durable-bug@tmp:/var/lib/jenkins/workspace/docker-durable-bug@tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** ubuntu:bionic cat
      $ docker top c84a4a643ed58929a86d80300821f04249f3e882de21c190043ac475b43eb3f6 -eo pid,comm
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (inherited agent)
      [Pipeline] sh
      + uname -a
      Linux c84a4a643ed5 5.8.1-arch1-1 #1 SMP PREEMPT Wed, 12 Aug 2020 18:50:43 +0000 x86_64 x86_64 x86_64 GNU/Linux
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] stage
      [Pipeline] { (explicit agent)
      [Pipeline] node
      Running on Jenkins in /var/lib/jenkins/workspace/docker-durable-bug@2
      [Pipeline] {
      [Pipeline] sh
      process apparently never started in /var/lib/jenkins/workspace/docker-durable-bug@2@tmp/durable-4a02313c
      (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      $ docker stop --time=1 c84a4a643ed58929a86d80300821f04249f3e882de21c190043ac475b43eb3f6
      $ docker rm -f c84a4a643ed58929a86d80300821f04249f3e882de21c190043ac475b43eb3f6
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -2
      Finished: FAILURE
      

      If I use a regular agent, rather than a docker one, there's no problem.
      The above example works in a clean environment and it is explicitly prepared to reproduce it.

          [JENKINS-63414] Global Docker agent breaks nested agent usage

          I'm having the same issue, these is my pipeline:

           

          pipeline {
            agent {
              kubernetes {
                yamlFile 'agent_definition.yml'
                idleMinutes 5
              }
            }
            stages {
              stage('List Git Repo'){
                steps {
                  sh 'echo hola'
                  container('awsclislave') {
                    sh '''
                      . ./aws_auth.sh
                      aws s3 ls
                      aws sts get-caller-identity
                    '''
                  }
                }
              }
            }
          }

          It works fine on the first "sh 'echo hola'", but not on the second piece of code. If I enabled that 
          -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true I get this:
           
           
           

          [Pipeline] // stage
          [Pipeline] withEnv
          [Pipeline] {
          [Pipeline] stage
          [Pipeline] { (List Git Repo)
          [Pipeline] sh
          + echo hola
          hola
          [Pipeline] container
          [Pipeline] {
          [Pipeline] sh
          sh: 1: cd: can't cd to /home/jenkins/agent/workspace/_jenkins_triggering_tests_master
          sh: 1: cannot create /home/jenkins/agent/workspace/_jenkins_triggering_tests_master@tmp/durable-9413a0fc/jenkins-log.txt: Directory nonexistent
          sh: 1: cannot create /home/jenkins/agent/workspace/_jenkins_triggering_tests_master@tmp/durable-9413a0fc/jenkins-result.txt.tmp: Directory nonexistent
          mv: cannot stat '/home/jenkins/agent/workspace/_jenkins_triggering_tests_master@tmp/durable-9413a0fc/jenkins-result.txt.tmp': No such file or directory
          process apparently never started in /home/jenkins/agent/workspace/_jenkins_triggering_tests_master@tmp/durable-9413a0fc
          [Pipeline] }
          [Pipeline] // container
          [Pipeline] }
          [Pipeline] // stage
          [Pipeline] }
          [Pipeline] // withEnv
          [Pipeline] }
          [Pipeline] // node
          [Pipeline] }
          [Pipeline] // podTemplate
          [Pipeline] End of Pipeline
          

            I logged inside the running container and I saw the files are there but on a different location:

           

          $ cd /home/jenkins/workspace
          $ pwd
          /home/jenkins/workspace
          $ ls
          _jenkins_triggering_tests_master  _jenkins_triggering_tests_master@tmp workspaces.txt
          
          

          the path should be /home/jenkins/workspace/ instead of /home/jenkins/agent/workspace/

           

          Ruben Sancho Ramos added a comment - I'm having the same issue, these is my pipeline:   pipeline { agent { kubernetes { yamlFile 'agent_definition.yml' idleMinutes 5 } } stages { stage('List Git Repo'){ steps { sh 'echo hola' container('awsclislave') { sh ''' . ./aws_auth.sh aws s3 ls aws sts get-caller-identity ''' } } } } } It works fine on the first "sh 'echo hola'", but not on the second piece of code. If I enabled that  -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true I get this:       [Pipeline] // stage [Pipeline] withEnv [Pipeline] { [Pipeline] stage [Pipeline] { (List Git Repo) [Pipeline] sh + echo hola hola [Pipeline] container [Pipeline] { [Pipeline] sh sh: 1: cd: can't cd to /home/jenkins/agent/workspace/_jenkins_triggering_tests_master sh: 1: cannot create /home/jenkins/agent/workspace/_jenkins_triggering_tests_master@tmp/durable-9413a0fc/jenkins-log.txt: Directory nonexistent sh: 1: cannot create /home/jenkins/agent/workspace/_jenkins_triggering_tests_master@tmp/durable-9413a0fc/jenkins-result.txt.tmp: Directory nonexistent mv: cannot stat '/home/jenkins/agent/workspace/_jenkins_triggering_tests_master@tmp/durable-9413a0fc/jenkins-result.txt.tmp': No such file or directory process apparently never started in /home/jenkins/agent/workspace/_jenkins_triggering_tests_master@tmp/durable-9413a0fc [Pipeline] } [Pipeline] // container [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // withEnv [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // podTemplate [Pipeline] End of Pipeline   I logged inside the running container and I saw the files are there but on a different location:   $ cd /home/jenkins/workspace $ pwd /home/jenkins/workspace $ ls _jenkins_triggering_tests_master  _jenkins_triggering_tests_master@tmp workspaces.txt the path should be /home/jenkins/workspace/ instead of /home/jenkins/agent/workspace/  

          Found the solution for my issue:
          https://superuser.com/questions/1459174/jenkins-pipeline-sh-step-hangs

          changing the "workingDir" to /home/jenkins/agent fixed it!

          Ruben Sancho Ramos added a comment - Found the solution for my issue: https://superuser.com/questions/1459174/jenkins-pipeline-sh-step-hangs changing the "workingDir" to /home/jenkins/agent fixed it!

          Hitesh kumar added a comment -

          Hi smirky 

          I am using jenkins version : Jenkins 2.249.1

          Durable task : 1.35

          We run builds in kubenetes farm and our builds are dockerised, After the upgrade of Jenkins, and durable plugins as we see multiple issues raised with "sh" initialisation inside the container breaks, we considered the above solution as suggested and set workingDir: "/home/jenkins/agent" and builds are success after making the change.

           

          How ever still some of the builds on Jenkins are still failing randomly with same error

          [2021-05-10T15:16:33.046Z] [Pipeline] sh [2021-05-10T15:22:08.073Z] process apparently never started in /home/jenkins/agent/workspace/CORE-CommitStage@tmp/durable-f6a728e7 [2021-05-10T15:22:08.087Z] [Pipeline] }

          Also we had already enabled as per suggestions- 
          -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true \

          Issue is however not persistent but still jobs fails randomly.  Looking for permanent fix for the current issue.

          Hitesh kumar added a comment - Hi smirky   I am using jenkins version :  Jenkins 2.249.1 Durable task : 1.35 We run builds in kubenetes farm and our builds are dockerised, After the upgrade of Jenkins, and durable plugins as we see multiple issues raised with "sh" initialisation inside the container breaks, we considered the above solution as suggested and set workingDir: "/home/jenkins/agent" and builds are success after making the change.   How ever still some of the builds on Jenkins are still failing randomly with same error [2021-05-10T15:16:33.046Z] [Pipeline] sh [2021-05-10T15:22:08.073Z] process apparently never started in /home/jenkins/agent/workspace/CORE-CommitStage@tmp/durable-f6a728e7 [2021-05-10T15:22:08.087Z] [Pipeline] } Also we had already enabled as per suggestions-  -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true \ Issue is however not persistent but still jobs fails randomly.  Looking for permanent fix for the current issue.

          Serdar added a comment -

          Hi There, 

          Jenkins version: Jenkins 2.277.2

          Durable task: 1.35

          Node: SSH agent

          Agent root directory: /home/<user>/jenkins

          -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true

          We run parallel builds on docker to verify our codebase on multiple platforms but we see random failures as listed below.  

          I reviewed the durable-task source code and seems like it is failing to generate durable-2b3ef7ef a directory does not exist. link title

           

          [2021-05-24T18:14:59.321Z] process apparently never started in /home/<user>/jenkins/workspace/lumpdk-coverage_PR-4785@tmp/durable-2b3ef7ef
          [2021-05-24T18:15:01.657Z] sh: 1: cannot create /home/<user>/jenkins/workspace/lumpdk-coverage_PR-4785@tmp/durable-2b3ef7ef/jenkins-result.txt.tmp: Directory nonexistent
          [2021-05-24T18:15:01.659Z] mv: cannot stat '/home/<user>/jenkins/workspace/lumpdk-coverage_PR-4785@tmp/durable-2b3ef7ef/jenkins-result.txt.tmp': No such file or directory
          script returned exit code -2

           

          Serdar added a comment - Hi There,  Jenkins version:  Jenkins 2.277.2 Durable task: 1.35 Node: SSH agent Agent root directory: /home/<user>/jenkins -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true We run parallel builds on docker to verify our codebase on multiple platforms but we see random failures as listed below.   I reviewed the durable-task source code and seems like it is failing to generate  durable-2b3ef7ef  a directory does not exist.  link title   [2021-05-24T18:14:59.321Z] process apparently never started in /home/<user>/jenkins/workspace/lumpdk-coverage_PR-4785@tmp/durable-2b3ef7ef [2021-05-24T18:15:01.657Z] sh: 1: cannot create /home/<user>/jenkins/workspace/lumpdk-coverage_PR-4785@tmp/durable-2b3ef7ef/jenkins-result.txt.tmp: Directory nonexistent [2021-05-24T18:15:01.659Z] mv: cannot stat '/home/<user>/jenkins/workspace/lumpdk-coverage_PR-4785@tmp/durable-2b3ef7ef/jenkins-result.txt.tmp': No such file or directory script returned exit code -2  

          Mark Waite added a comment - - edited

          I see the same behavior as is described by smirky. Defining the outer agent as a docker agent seems to affect the inner agent, even if the inner agent is unrelated to docker.

          I'm not sure if Pipeline ever officially supported replacing an agent definition in a nested stage when an outer stage has defined an agent. In this case, it appears that the outer agent definition is affecting the replacement agent in the nested stage. I've never replaced an outer agent definition with an inner agent definition and have never seen it referenced in any of the Jenkins documentation. That doesn't mean it is not valid, just that I've never seen it.

          My environment only has docker available on agents with the label 'docker'. The below job definition incorrectly attempts to invoke the command docker on the nested agent definition that will use the label 'windows' with Durable Task plugin 1.39 and Docker Pipeline plugin 1.26. None of my 'windows' agents have the command 'docker' or the label 'docker'.

          pipeline {
              agent none
              stages {
                  stage('parent stage') {
                      agent {
                          docker {
                              image 'ubuntu:bionic'
                              label 'docker'
                          }
                      }
                      stages {
                          stage('inherited agent') {
                              steps {
                                  sh 'hostname;uname -a' // run inside a docker container
                              }
                          }
                          stage('explicit agent') {
                              agent {
                                  node {
                                      label 'windows'
                                  }
                              }
                              steps {
                                  bat 'echo %PATH%' // run without docker, yet docker command is called
                              }
                          }
                      }
                  }
              }
          }
          

          If I avoid the nested agent definition, the job behaves correctly.

          If the outer agent definition is a simple label (not docker), the job behaves correctly. I assume that is because there is less initialization code for a labeled agent, while there is special initialization code for a docker agent. That's just me making an assumption.

          I don't have access to a kubernetes cluster, but I assume the same type of failure would happen if the outer agent were a kubernetes agent and the nested agent were a non-kubernetes agent.

          Mark Waite added a comment - - edited I see the same behavior as is described by smirky . Defining the outer agent as a docker agent seems to affect the inner agent, even if the inner agent is unrelated to docker. I'm not sure if Pipeline ever officially supported replacing an agent definition in a nested stage when an outer stage has defined an agent. In this case, it appears that the outer agent definition is affecting the replacement agent in the nested stage. I've never replaced an outer agent definition with an inner agent definition and have never seen it referenced in any of the Jenkins documentation. That doesn't mean it is not valid, just that I've never seen it. My environment only has docker available on agents with the label 'docker'. The below job definition incorrectly attempts to invoke the command docker on the nested agent definition that will use the label 'windows' with Durable Task plugin 1.39 and Docker Pipeline plugin 1.26. None of my 'windows' agents have the command 'docker' or the label 'docker'. pipeline { agent none stages { stage('parent stage') { agent { docker { image 'ubuntu:bionic' label 'docker' } } stages { stage('inherited agent') { steps { sh 'hostname;uname -a' // run inside a docker container } } stage('explicit agent') { agent { node { label 'windows' } } steps { bat 'echo %PATH%' // run without docker, yet docker command is called } } } } } } If I avoid the nested agent definition, the job behaves correctly. If the outer agent definition is a simple label (not docker), the job behaves correctly. I assume that is because there is less initialization code for a labeled agent, while there is special initialization code for a docker agent. That's just me making an assumption. I don't have access to a kubernetes cluster, but I assume the same type of failure would happen if the outer agent were a kubernetes agent and the nested agent were a non-kubernetes agent.

          Steven added a comment -

          Hi markewaite. Facing the same issue using the docker plugin. I have tested the scenario using Kubernetes plugin with global build pod for the pipeline. When switching to another agent not supporting Docker or Kubernetes pods, this works correct.

          Steven added a comment - Hi  markewaite . Facing the same issue using the docker plugin. I have tested the scenario using Kubernetes plugin with global build pod for the pipeline. When switching to another agent not supporting Docker or Kubernetes pods, this works correct.

          Not sure if someone is working on this. It's a very unfortunate issue, which causes us to use many unconvinient workarounds, such as splitting parts of the Jenkinsfile into separate Jenkinsfiles so that they can escape the agent context and run the code outside. While this works, it's hard to maintain and adds significant complexity to the process.

          I'm really looking forward to seeing this resolved. Honestly, I can't believe that only a few people are reporting it as it seems to be a very common functionality.

          Bogomil Vasilev added a comment - Not sure if someone is working on this. It's a very unfortunate issue, which causes us to use many unconvinient workarounds, such as splitting parts of the Jenkinsfile into separate Jenkinsfiles so that they can escape the agent context and run the code outside. While this works, it's hard to maintain and adds significant complexity to the process. I'm really looking forward to seeing this resolved. Honestly, I can't believe that only a few people are reporting it as it seems to be a very common functionality.

          Kevin Broselge added a comment - - edited

          We seem to have the same or a similar issue here...
          our pipeline looks like this

           

          pipeline {
              agent none
              stages {
                  stage('nested node test') {
                      agent {
                           label 'Docker_windows'
                       }
                      steps {
                          bat(
                              label: 'WORKING',
                              script: 'whoami'
                          )
                          script {
                              node('Docker_windows') {
                                   bat(
                                       label: 'WORKING',
                                       script: 'whoami'
                                   )
                               }
                          }
                      }
                  }
                  stage('docker nested node test') {
                      agent {
                          docker {
                               label 'Docker_windows'
                               image 'anyImage'
                           }
                      }
                      steps {
                          bat(
                              label: 'WORKING',
                              script: 'whoami'
                          )
                          script {
                              node('Docker_windows') {
                                   bat(
                                       label: 'HANGS FOREVER!',
                                       script: 'whoami'
                                   )
                               }
                          }
                      }
                  }
              }
          }

           

          This really is a dealbreaker when we are using functions from sharedlibraries/pipelines.
          Hence this issue is from 2020 isn't there any hope that this will get fixed?

          Kevin Broselge added a comment - - edited We seem to have the same or a similar issue here... our pipeline looks like this   pipeline {     agent none     stages {         stage( 'nested node test' ) {             agent {                 label 'Docker_windows'             }             steps {                 bat(                     label: 'WORKING' ,                     script: 'whoami'                 )                 script {                     node( 'Docker_windows' ) {                         bat(                             label: 'WORKING' ,                             script: 'whoami'                         )                     }                 }             }         }         stage( 'docker nested node test' ) {             agent {                 docker {                     label 'Docker_windows'                     image 'anyImage'                 }             }             steps {                 bat(                     label: 'WORKING' ,                     script: 'whoami'                 )                 script {                     node( 'Docker_windows' ) {                         bat(                             label: 'HANGS FOREVER!' ,                             script: 'whoami'                         )                     }                 }             }         }     } }   This really is a dealbreaker when we are using functions from sharedlibraries/pipelines. Hence this issue is from 2020 isn't there any hope that this will get fixed?

          Mark Waite added a comment -

          This really is a dealbreaker when we are using functions from sharedlibraries/pipelines.
          Hence this issue is from 2020 isn't there any hope that this will get fixed?

          I don't expect it to be fixed. I think that nested agent definitions are hints that a Pipeline needs to be simplified.

          Mark Waite added a comment - This really is a dealbreaker when we are using functions from sharedlibraries/pipelines. Hence this issue is from 2020 isn't there any hope that this will get fixed? I don't expect it to be fixed. I think that nested agent definitions are hints that a Pipeline needs to be simplified.

            Unassigned Unassigned
            smirky Bogomil Vasilev
            Votes:
            5 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated: