Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-34289

docker.image.inside fails unexpectedly with Jenkinsfile

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • docker-workflow-plugin
    • None
    • Jenkins 1.651.1 
      CloudBees Docker Pipeline 1.4 
      Pipeline 2.0
      Docker 1.11.0
      RHEL 7.2

      With a simple Jenkinsfile when building, at some point it'll fail for no obvious reason.

      An example Jenkinsfile:

      def img = 'centos:7';
      
      node('docker') {
        stage "pulling";
        sh "docker pull ${img}"; // workaround for JENKINS-34288
      
        checkout scm;
      
        docker.image(img).inside {
          sh 'for i in $(seq 30); do sleep 1; echo $i; done';
          sh 'ls -alh --color';
        }
      }
      

      Partial output:

      [Pipeline] Run build steps inside a Docker container : Start
      $ docker run -t -d -u 995:993 -w /var/lib/jenkins/workspace/tron/docwhat-test-jenkinsfile/master -v /var/lib/jenkins/workspace/tron/docwhat-test-jenkinsfile/master:/var/lib/jenkins/workspace/tron/docwhat-test-jenkinsfile/master:rw -v /var/lib/jenkins/workspace/tron/docwhat-test-jenkinsfile/master@tmp:/var/lib/jenkins/workspace/tron/docwhat-test-jenkinsfile/master@tmp:rw -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** centos:7 cat
      [Pipeline] withDockerContainer {
      [Pipeline] sh
      [master] Running shell script
      ++ seq 30
      + for i in '$(seq 30)'
      + sleep 1
      [Pipeline] } //withDockerContainer
      $ docker stop 7fcbfd6ab39cf05257a43a774bd20b670bc39674a2047777fe603ee1a3162b10
      $ docker rm -f 7fcbfd6ab39cf05257a43a774bd20b670bc39674a2047777fe603ee1a3162b10
      [Pipeline] Run build steps inside a Docker container : End
      [Pipeline] } //node
      [Pipeline] Allocate node : End
      [Pipeline] End of Pipeline
      

          [JENKINS-34289] docker.image.inside fails unexpectedly with Jenkinsfile

          Jesse Glick added a comment -

          From a quick glance this sounds like the (poorly stated) problem docker-workflow PR 25 purports to address. Is there a simple way to reproduce the problem from scratch?

          Jesse Glick added a comment - From a quick glance this sounds like the (poorly stated) problem docker-workflow PR 25 purports to address. Is there a simple way to reproduce the problem from scratch?

          Christian Höltje added a comment - - edited

          How about this?

          node {
              def img = docker.image('busybox');
              img.pull();
              img.inside {
                  sh 'for i in $(seq 30); do sleep 1; echo $i; done';
              }
          }
          

          Christian Höltje added a comment - - edited How about this? node { def img = docker.image( 'busybox' ); img.pull(); img.inside { sh ' for i in $(seq 30); do sleep 1; echo $i; done' ; } }

          I'm pretty sure that the shell step isn't even executed inside the container and that AbortException is being returned from something else. If I change the step to run sleep 1 || true it still aborts with -1.

          Christian Höltje added a comment - I'm pretty sure that the shell step isn't even executed inside the container and that AbortException is being returned from something else. If I change the step to run sleep 1 || true it still aborts with -1 .

          Interesting.

          At the moment I don't have slaves; I'm using master to build stuff... Jenkins is running as the non-root user jenkins and using DOCKER_HOST=tcp://127.0.0.1:2375 via configuring the master node.

          If I stand up a clean Jenkins running as root then my example code works. If I stand up a clean Jenkins running as a non-root user, then my example code fails on the first sleep 1.

          Christian Höltje added a comment - Interesting. At the moment I don't have slaves; I'm using master to build stuff... Jenkins is running as the non-root user jenkins and using DOCKER_HOST=tcp://127.0.0.1:2375 via configuring the master node. If I stand up a clean Jenkins running as root then my example code works. If I stand up a clean Jenkins running as a non-root user, then my example code fails on the first sleep 1 .

          Here's my "setup a test jenkins script" incase I'm doing something stupid. I use this as root and as a new user jbugs.

          I pull the plugins from the existing "live-ish" Jenkins to speed things up and the init.groovy.d is just setting the UpdateCenter URL to "https://updates.jenkins-ci.org/stable-1.651/update-center.json"

          #!/bin/bash
          
          set -euo pipefail
          set -x
          
          if [ "$(id -u)" = 0 ]; then
            jenkins_port=5000
          else
            jenkins_port=5010
          fi
          
          cd
          
          rm -rf .jenkins
          mkdir -p .jenkins/plugins
          
          cp -avr /var/lib/jenkins/init.groovy.d/ .jenkins/init.groovy.d/
          cp -avr /var/lib/jenkins/plugins/*.jpi .jenkins/plugins/
          
          cat <<CONFIGXML > .jenkins/config.xml
          <?xml version='1.0' encoding='UTF-8'?>
          <hudson>
            <version>1.0</version>
            <nodeProperties>
              <hudson.slaves.EnvironmentVariablesNodeProperty>
                <envVars serialization="custom">
                  <unserializable-parents/>
                  <tree-map>
                    <default>
                      <comparator class="hudson.util.CaseInsensitiveComparator"/>
                    </default>
                    <int>1</int>
                    <string>DOCKER_HOST</string>
                    <string>tcp://127.0.0.1:2375</string>
                  </tree-map>
                </envVars>
              </hudson.slaves.EnvironmentVariablesNodeProperty>
            </nodeProperties>
            <globalNodeProperties/>
          </hudson>
          CONFIGXML
          
          exec java -jar /var/lib/jenkins/jenkins.war --httpPort="$jenkins_port"
          

          Christian Höltje added a comment - Here's my "setup a test jenkins script" incase I'm doing something stupid. I use this as root and as a new user jbugs . I pull the plugins from the existing "live-ish" Jenkins to speed things up and the init.groovy.d is just setting the UpdateCenter URL to "https://updates.jenkins-ci.org/stable-1.651/update-center.json" #!/bin/bash set -euo pipefail set -x if [ "$(id -u)" = 0 ]; then jenkins_port=5000 else jenkins_port=5010 fi cd rm -rf .jenkins mkdir -p .jenkins/plugins cp -avr /var/lib/jenkins/init.groovy.d/ .jenkins/init.groovy.d/ cp -avr /var/lib/jenkins/plugins/*.jpi .jenkins/plugins/ cat <<CONFIGXML > .jenkins/config.xml <?xml version='1.0' encoding='UTF-8'?> <hudson> <version>1.0</version> <nodeProperties> <hudson.slaves.EnvironmentVariablesNodeProperty> <envVars serialization="custom"> <unserializable-parents/> <tree-map> <default> <comparator class="hudson.util.CaseInsensitiveComparator"/> </default> <int>1</int> <string>DOCKER_HOST</string> <string>tcp://127.0.0.1:2375</string> </tree-map> </envVars> </hudson.slaves.EnvironmentVariablesNodeProperty> </nodeProperties> <globalNodeProperties/> </hudson> CONFIGXML exec java -jar /var/lib/jenkins/jenkins.war --httpPort="$jenkins_port"

          If it helps, I was able to use auditd to see that execve() was called for the docker exec for the command in question and execve() returned 0 (which is not the exit code, but just says the syscall worked). So the command seems to be run by Jenkins. It just isn't getting the exit code back correctly (and gets -1 instead).

          Looking around, could this be related to JENKINS-25727 ? If I'm reading the code correctly, this is falling afoul of this code: https://github.com/jenkinsci/durable-task-plugin/blob/66d80d2b9761ebdb4f0d3bb7b9edb82357e33399/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L172-L174

          Christian Höltje added a comment - If it helps, I was able to use auditd to see that execve() was called for the docker exec for the command in question and execve() returned 0 (which is not the exit code, but just says the syscall worked). So the command seems to be run by Jenkins. It just isn't getting the exit code back correctly (and gets -1 instead). Looking around, could this be related to JENKINS-25727 ? If I'm reading the code correctly, this is falling afoul of this code: https://github.com/jenkinsci/durable-task-plugin/blob/66d80d2b9761ebdb4f0d3bb7b9edb82357e33399/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L172-L174

          So I got some slaves up-and-running and the problem persists there. The slave.jar runs as the user "jenkins", not root. I suspect that if I ran it as root, it'd work like the master above.

          Christian Höltje added a comment - So I got some slaves up-and-running and the problem persists there. The slave.jar runs as the user "jenkins", not root. I suspect that if I ran it as root, it'd work like the master above.

          I just upgraded github-organization-folder from version 1.2 to 1.3 and lo-and-behold! It works!

          W00T!

          I also upgraded github-api from 1.72.1 to 1.75 as well, but I doubt it impacted this.

          You guys are the greatest!

          Christian Höltje added a comment - I just upgraded github-organization-folder from version 1.2 to 1.3 and lo-and-behold! It works! W00T! I also upgraded github-api from 1.72.1 to 1.75 as well, but I doubt it impacted this. You guys are the greatest!

          Jesse Glick added a comment -

          No fix was made to address this problem, you just stopped running into it for some reason TBD.

          Jesse Glick added a comment - No fix was made to address this problem, you just stopped running into it for some reason TBD.

          Agreed. But it was definitely something in the change between 1.72.1 and 1.75. I should have closed it myself. Sorry.

          Christian Höltje added a comment - Agreed. But it was definitely something in the change between 1.72.1 and 1.75. I should have closed it myself. Sorry.

            jglick Jesse Glick
            docwhat Christian Höltje
            Votes:
            3 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: