Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-35370

Workflow shell step ERROR: script returned exit code -1

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • docker-workflow-plugin
    • None
    • jenkins 2.11, gerrit-trigger 2.21.1, docker-workflow 1.5, pipeline 2.2

      So I'm trying to use workflow and gerrit-trigger plugins together. But I'm getting script returned exit code -1. I've checked, and it's not 'git fetch' network related issue or anything like that. Any command I run through 'sh' which make some output delay like, git fetch or clone, or whatever - gives me -1 exit code. Groovy script is same like in gerrit-trigger example from https://wiki.jenkins-ci.org/display/JENKINS/Gerrit+Trigger
      My log samples below:

      Retriggered by user admin for Gerrit: https://url/gerrit/73010 in silent mode.
      [Pipeline] node
      Running on master in /var/jenkins_home/workspace/pipeline-docker-test
      [Pipeline] {
      [Pipeline] withDockerServer
      [Pipeline] {
      [Pipeline] sh
      [pipeline-docker-test] Running shell script
      + docker inspect -f . dockers.local:5000/image:latest
      .
      [Pipeline] withDockerContainer
      $ docker run -t -d -u 1000:1000 -w /var/jenkins_home/workspace/pipeline-docker-test -v /var/jenkins_home/workspace/pipeline-docker-test:/var/jenkins_home/workspace/pipeline-docker-test:rw -v /var/jenkins_home/workspace/pipeline-docker-test@tmp:/var/jenkins_home/workspace/pipeline-docker-test@tmp:rw -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** dockers.local:5000/image:latest cat
      [Pipeline] {
      [Pipeline] stage (cloning repo)
      Entering stage cloning repo
      Proceeding
      [Pipeline] git
       > git rev-parse --is-inside-work-tree # timeout=10
      Fetching changes from the remote Git repository
       > git config remote.origin.url ssh://url:29418/project # timeout=10
      Fetching upstream changes from ssh://url:29418/project
       > git --version # timeout=10
       > git -c core.askpass=true fetch --tags --progress ssh://url:29418/project +refs/heads/*:refs/remotes/origin/*
       > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
       > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
      Checking out Revision 82beb7889ae2fea7a35a59575bb849c881302eec (refs/remotes/origin/master)
       > git config core.sparsecheckout # timeout=10
       > git checkout -f 82beb7889ae2fea7a35a59575bb849c881302eec # timeout=10
       > git branch -a -v --no-abbrev # timeout=10
       > git branch -D master # timeout=10
       > git checkout -b master 82beb7889ae2fea7a35a59575bb849c881302eec
       > git rev-list 82beb7889ae2fea7a35a59575bb849c881302eec # timeout=10
      [Pipeline] stage (Checkout patchset)
      Entering stage Checkout patchset
      Proceeding
      [Pipeline] sh
      [pipeline-docker-test] Running shell script
      + git fetch origin refs/changes/10/73010/1:change-73010-1
      [Pipeline] }
      $ docker stop d3499aacbbef9d2acbb0e9cdbac51c0ea620e91b9893d46ceda49d5344357c61
      $ docker rm -f d3499aacbbef9d2acbb0e9cdbac51c0ea620e91b9893d46ceda49d5344357c61
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // withDockerServer
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -1
      Finished: FAILURE
      

      Example below. I made 3 'sh' steps:

      sh 'pwd'
      sh 'sleep 1'
      sh 'uname'
      

      and log:

      [Pipeline] {
      [Pipeline] sh
      [pipeline-docker-test] Running shell script
      
      + pwd
      /var/jenkins_home/workspace/pipeline-docker-test
      [Pipeline] sh
      
      [pipeline-docker-test] Running shell script
      
      + sleep 1
      
      [Pipeline] }
      $ docker stop 93ea34143c0a9c7e28b6e580d620f33bcc8f0e0081e7885ea63ef9ad0d1fc57e
      $ docker rm -f 93ea34143c0a9c7e28b6e580d620f33bcc8f0e0081e7885ea63ef9ad0d1fc57e
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // withDockerServer
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -1
      Finished: FAILURE
      
      

          [JENKINS-35370] Workflow shell step ERROR: script returned exit code -1

          any thoughts on this?

          zbigniew jasinski added a comment - any thoughts on this?

          I've run my configuration without using dockers and it works. So it seems that docker-workflow issue.

          zbigniew jasinski added a comment - I've run my configuration without using dockers and it works. So it seems that docker-workflow issue.

          Jesse Glick added a comment -

          Make sure you have updated all relevant plugins, especially durable-task, as there have been diagnostic improvements. Typically the problem is that the Docker daemon does not share a filesystem with the client, which is a prerequisite for Image.inside to work.

          Jesse Glick added a comment - Make sure you have updated all relevant plugins, especially durable-task , as there have been diagnostic improvements. Typically the problem is that the Docker daemon does not share a filesystem with the client, which is a prerequisite for Image.inside to work.

          Jesse Glick added a comment -

          Not enough information here.

          Jesse Glick added a comment - Not enough information here.

          Ben Mathews added a comment -

          I'm seeing a similar issue where the sh command inside a docker.image block always terminates after some delay. The point in the sh command that jenkins errors out varies which leads me to believe that jenkins is causing the problem. I'm on the latest version of jenkins and all plugins. Any tips on how to debug this?

          Ben Mathews added a comment - I'm seeing a similar issue where the sh command inside a docker.image block always terminates after some delay. The point in the sh command that jenkins errors out varies which leads me to believe that jenkins is causing the problem. I'm on the latest version of jenkins and all plugins. Any tips on how to debug this?

          Jesse Glick added a comment -

          Typically it means workspace sharing between host & container is failing. If the agent itself is inside a container, make sure --volumes-from is included on the command line; if not, nothing will work.

          Jesse Glick added a comment - Typically it means workspace sharing between host & container is failing. If the agent itself is inside a container, make sure --volumes-from is included on the command line; if not, nothing will work.

          Ben Mathews added a comment -

          The workspace is getting mounted (--volumes-from param is present) and is being interacted with. I've got a shell script that should take a couple minutes to complete. But within a couple seconds, the script will terminate and jenkins returns the above mentioned

          ERROR: script returned exit code -1
          Finished: FAILURE
          

          The Jenkins log has the call stack

          hudson.AbortException: script returned exit code -1
          	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:285)
          	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:234)
          	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          	at java.lang.Thread.run(Thread.java:745)
          

          Looking at the pipeline code in the call stack, it appears that Jenkins thinks my process is exiting. But (a) it never fails when I launch the process directly w/ "docker run ..." and (b) it fails at a different spot every time.

          I've replaced my script w/ "for i in `seq 1 20`; do echo $i;date;sleep 5;done" and it never fails. So, it is apparent that something in the interaction between jenkins and my script is failing.

          FWIW, the script is a series of "python setup.py develop" commands.

          Ben Mathews added a comment - The workspace is getting mounted (--volumes-from param is present) and is being interacted with. I've got a shell script that should take a couple minutes to complete. But within a couple seconds, the script will terminate and jenkins returns the above mentioned ERROR: script returned exit code -1 Finished: FAILURE The Jenkins log has the call stack hudson.AbortException: script returned exit code -1 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:285) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:234) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang. Thread .run( Thread .java:745) Looking at the pipeline code in the call stack, it appears that Jenkins thinks my process is exiting. But (a) it never fails when I launch the process directly w/ "docker run ..." and (b) it fails at a different spot every time. I've replaced my script w/ "for i in `seq 1 20`; do echo $i;date;sleep 5;done" and it never fails. So, it is apparent that something in the interaction between jenkins and my script is failing. FWIW, the script is a series of "python setup.py develop" commands.

          Jesse Glick added a comment -

          benm Hmm, does not sound like a familiar issue. The fake exit status -1 means that Jenkins cannot find the PID of the controller sh script which tracks the output and exit code of your actual script (also sh, unless you specified a #!/bin/command). The typical reason for the failure to find this PID when using Image.inside is that the container does not share the right mount with the agent, for example because --volumes-from was not passed when it should have been. But in your case it was, and you say other shell scripts work, so something trickier is happening. If you can narrow it down to a reproducible test case, that would help a lot of course. Otherwise you will need to inspect the process tree inside the container, to see if the wrapper script is really still running or not; and inspect the …job@tmp sibling workspace that holds the control directory with the PID, output, and exit code.

          Jesse Glick added a comment - benm Hmm, does not sound like a familiar issue. The fake exit status -1 means that Jenkins cannot find the PID of the controller sh script which tracks the output and exit code of your actual script (also sh , unless you specified a #!/bin/command ). The typical reason for the failure to find this PID when using Image.inside is that the container does not share the right mount with the agent, for example because --volumes-from was not passed when it should have been. But in your case it was, and you say other shell scripts work, so something trickier is happening. If you can narrow it down to a reproducible test case, that would help a lot of course. Otherwise you will need to inspect the process tree inside the container, to see if the wrapper script is really still running or not; and inspect the …job@tmp sibling workspace that holds the control directory with the PID, output, and exit code.

          Ben Mathews added a comment -

          An update....

          I cut the problem down to this repo

          stage('run unit tests') {
                node() {
                  docker.image("centos:6.7").inside() {
                    sh 'for i in `seq 1 50`; do echo $i;date;sleep 2;done'
                  }
          }
          }
          

          This fails on both a docker using https://hub.docker.com/r/axltxl/jenkins-dood/ and jenkins installed on my ubuntu laptop via apt-get.

          Before creating a ticket, I made one final test on our production jenkins server and it passed with flying colors four times consecutively. Obviously, something is strange with my box. Not sure what though. Any ideas would be appreciated. I'll keep poking around

          $ uname -a
          Linux mathewslaptop 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          $ cat /etc/os-release 
          NAME="Ubuntu"
          VERSION="16.04.1 LTS (Xenial Xerus)"
          ID=ubuntu
          ID_LIKE=debian
          PRETTY_NAME="Ubuntu 16.04.1 LTS"
          VERSION_ID="16.04"
          HOME_URL="http://www.ubuntu.com/"
          SUPPORT_URL="http://help.ubuntu.com/"
          BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
          UBUNTU_CODENAME=xenial
          $ docker -v
          Docker version 1.12.1, build 23cf638
          

          Ben Mathews added a comment - An update.... I cut the problem down to this repo stage( 'run unit tests' ) { node() { docker.image( "centos:6.7" ).inside() { sh ' for i in `seq 1 50`; do echo $i;date;sleep 2;done' } } } This fails on both a docker using https://hub.docker.com/r/axltxl/jenkins-dood/ and jenkins installed on my ubuntu laptop via apt-get. Before creating a ticket, I made one final test on our production jenkins server and it passed with flying colors four times consecutively. Obviously, something is strange with my box. Not sure what though. Any ideas would be appreciated. I'll keep poking around $ uname -a Linux mathewslaptop 4.4.0-45- generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/os-release NAME= "Ubuntu" VERSION= "16.04.1 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME= "Ubuntu 16.04.1 LTS" VERSION_ID= "16.04" HOME_URL= "http: //www.ubuntu.com/" SUPPORT_URL= "http: //help.ubuntu.com/" BUG_REPORT_URL= "http: //bugs.launchpad.net/ubuntu/" UBUNTU_CODENAME=xenial $ docker -v Docker version 1.12.1, build 23cf638

          Jesse Glick added a comment -

          Are you using a (remote) agent on the production server but not in the test environment, or vice-versa? Generally should not matter except to the extent that it is the functioning of the docker command on the agent where the build is running which matters.

          Jesse Glick added a comment - Are you using a (remote) agent on the production server but not in the test environment, or vice-versa? Generally should not matter except to the extent that it is the functioning of the docker command on the agent where the build is running which matters.

          Ben Mathews added a comment -

          The production server is running the builds on a centos 7 slave. My test environment is running jenkins in a docker container that is mounting /var/run/docker.sock from my laptop.

          Ben Mathews added a comment - The production server is running the builds on a centos 7 slave. My test environment is running jenkins in a docker container that is mounting /var/run/docker.sock from my laptop.

          benm did you eventually solve this problem?  I'm seeing the same problem

          where sh scripts are failing with -1.

           

          I am also using:

          docker.image(....).inside {

          }

           

          As a further note, my Jenkins instance is also a docker container.

           

          Craig Rodrigues added a comment - benm did you eventually solve this problem?  I'm seeing the same problem where sh scripts are failing with -1.   I am also using: docker.image(....).inside { }   As a further note, my Jenkins instance is also a docker container.  

          Ben Mathews added a comment -

          Sorry, I should have followed up back in Dec. I've forgotten what happened with this. I stopped trying to do development locally, so I probably never resolved it.

          Ben Mathews added a comment - Sorry, I should have followed up back in Dec. I've forgotten what happened with this. I stopped trying to do development locally, so I probably never resolved it.

          benm fair enough.  I'm running into the same problem so was just curious.

          My understanding of the problem is that if your Jenkins server is running inside a Docker container, and then you try to start a docker container on the same server (so you are doing Docker container inside Docker container), then that doesn't work so well.

          The Durable Task plugin used by the pipeline **sh step has some complicated logic for how it figures out the process ID (pid) of the shell script that has been executed, and this logic gets confused when you do Docker inside Docker, and returns -1, even though your shell script is still running.  I ran into the -1 problems in JENKINS-32264 , (not in a Docker context).

           

          Craig Rodrigues added a comment - benm fair enough.  I'm running into the same problem so was just curious. My understanding of the problem is that if your Jenkins server is running inside a Docker container, and then you try to start a docker container on the same server (so you are doing Docker container inside Docker container), then that doesn't work so well. The Durable Task plugin used by the pipeline **sh step has some complicated logic for how it figures out the process ID (pid) of the shell script that has been executed, and this logic gets confused when you do Docker inside Docker, and returns -1, even though your shell script is still running.  I ran into the -1 problems in  JENKINS-32264 , (not in a Docker context).  

          I have been having the same problems as benm with the shell script returning -1 if it is run inside a docker.inside block in a pipeline.

          I took Ben's testcase here: https://issues.jenkins-ci.org/browse/JENKINS-35370?focusedCommentId=275803&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-275803

          and could reproduce the problem.

           

          The environment I used to reproduce the problem was:

          • host environment was a VM running Debian stretch
          • docker version:Client:

          Version:      1.13.1

          API version:  1.26

          Go version:   go1.7.5

          Git commit:   092cba372

          Built:        Wed Feb  8 06:44:30 2017

          OS/Arch:      linux/amd64

           

          Server:

          Version:      1.13.1

          API version:  1.26 (minimum version 1.12)

          Go version:   go1.7.5
          Git commit:   092cba372
          Built:        Wed Feb  8 06:44:30 2017
          OS/Arch:      linux/amd64

          So the host environment, and the docker container are on the same machine and sharing the same file system.

          Craig Rodrigues added a comment - I have been having the same problems as benm with the shell script returning -1 if it is run inside a docker.inside block in a pipeline. I took Ben's testcase here: https://issues.jenkins-ci.org/browse/JENKINS-35370?focusedCommentId=275803&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-275803 and could reproduce the problem.   The environment I used to reproduce the problem was: host environment was a VM running Debian stretch docker version:Client: Version:      1.13.1 API version:  1.26 Go version:   go1.7.5 Git commit:   092cba372 Built:        Wed Feb  8 06:44:30 2017 OS/Arch:      linux/amd64   Server: Version:      1.13.1 API version:  1.26 (minimum version 1.12) Go version:   go1.7.5 Git commit:   092cba372 Built:        Wed Feb  8 06:44:30 2017 OS/Arch:      linux/amd64 So the host environment, and the docker container are on the same machine and sharing the same file system.

          I'm also using:

          • Jenkins 2.51
          • Pipeline 2.5
          • Durable task plugin 1.13
          • Docker pipeline 1.10

          Craig Rodrigues added a comment - I'm also using: Jenkins 2.51 Pipeline 2.5 Durable task plugin 1.13 Docker pipeline 1.10

          Steve Todorov added a comment -

          I'm seeing this as well. It only shows whenever there is a failing build, though.

          Steve Todorov added a comment - I'm seeing this as well. It only shows whenever there is a failing build, though.

          Mathias Rühle added a comment - - edited

          I'm experiencing the same issue and it seems to depend on the docker image being used

          I have a simple pipeline setup:

          node('docker') {
              stage('test') {
                  docker.image('r-base:3.4.0').inside() {
                      sh(script: 'ping -c 2 jenkins.io')
                  }
              }
          }
          

          which fails in all of about 100 test runs.

          Log output:

          Started by user Mathias Rühle
          [Pipeline] node
          Running on slave-2 in /home/jenkins/workspace/pipeline-test
          [Pipeline] {
          [Pipeline] stage
          [Pipeline] { (test)
          [Pipeline] sh
          [pipeline-test] Running shell script
          + docker inspect -f . r-base:3.4.0
          .
          [Pipeline] withDockerContainer
          slave-2 does not seem to be running inside a container
          $ docker run -t -d -u 1007:1007 -w /home/jenkins/workspace/pipeline-test -v /home/jenkins/workspace/pipeline-test:/home/jenkins/workspace/pipeline-test:rw -v /home/jenkins/workspace/pipeline-test@tmp:/home/jenkins/workspace/pipeline-test@tmp:rw -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** --entrypoint cat r-base:3.4.0
          [Pipeline] {
          [Pipeline] sh
          [pipeline-test] Running shell script
          + ping -c 2 jenkins.io
          PING jenkins.io (140.211.15.101): 56 data bytes
          64 bytes from 140.211.15.101: icmp_seq=0 ttl=44 time=163.801 ms
          [Pipeline] }
          $ docker stop --time=1 352151927bb3d4123f3bcfc467ab1a137b2829599a623807d22e18f9497cc742
          $ docker rm -f 352151927bb3d4123f3bcfc467ab1a137b2829599a623807d22e18f9497cc742
          [Pipeline] // withDockerContainer
          [Pipeline] }
          [Pipeline] // stage
          [Pipeline] }
          [Pipeline] // node
          [Pipeline] End of Pipeline
          ERROR: script returned exit code -1
          Finished: FAILURE
          

          When I change the image version to 3.1.2:

          node('docker') {
              stage('test') {
                  docker.image('r-base:3.1.2').inside() {
                      sh(script: 'ping -c 2 jenkins.io')
                  }
              }
          }
          

          it succeeds in all of 20 test runs.

          Log output:

          Started by user Mathias Rühle
          [Pipeline] node
          Running on slave-2 in /home/jenkins/workspace/pipeline-test
          [Pipeline] {
          [Pipeline] stage
          [Pipeline] { (test)
          [Pipeline] sh
          [pipeline-test] Running shell script
          + docker inspect -f . r-base:3.1.2
          .
          [Pipeline] withDockerContainer
          slave-2 does not seem to be running inside a container
          $ docker run -t -d -u 1007:1007 -w /home/jenkins/workspace/pipeline-test -v /home/jenkins/workspace/pipeline-test:/home/jenkins/workspace/pipeline-test:rw -v /home/jenkins/workspace/pipeline-test@tmp:/home/jenkins/workspace/pipeline-test@tmp:rw -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** --entrypoint cat r-base:3.1.2
          [Pipeline] {
          [Pipeline] sh
          [pipeline-test] Running shell script
          + ping -c 2 jenkins.io
          PING jenkins.io (140.211.15.101): 56 data bytes
          64 bytes from 140.211.15.101: icmp_seq=0 ttl=44 time=163.756 ms
          64 bytes from 140.211.15.101: icmp_seq=1 ttl=44 time=163.624 ms
          --- jenkins.io ping statistics ---
          2 packets transmitted, 2 packets received, 0% packet loss
          round-trip min/avg/max/stddev = 163.624/163.690/163.756/0.066 ms
          [Pipeline] }
          $ docker stop --time=1 443c8c93c17cc07d08c85be69304100b306191772ff4d5770537d1074b9d3679
          $ docker rm -f 443c8c93c17cc07d08c85be69304100b306191772ff4d5770537d1074b9d3679
          [Pipeline] // withDockerContainer
          [Pipeline] }
          [Pipeline] // stage
          [Pipeline] }
          [Pipeline] // node
          [Pipeline] End of Pipeline
          Finished: SUCCESS
          

          Another working docker image for me is perl:5.24.1.

          I used 2 different setups. The first on my development maching running jenkins via

          mvn hpi:run
          

          and having a docker container (customized imaged base on ubuntu:14.04) with ssh daemon as slave node. The slave node contains a statically linked docker executable (https://get.docker.com/builds/Linux/x86_64/docker-1.12.1.tgz) and the DOCKER_HOST environment variable is set to the docker network bridge ip 172.17.0.1 (the actual host machine). I would this consider a docker-in-docker setup.

          The second setup is a jenkins instance running inside docker using the image jenkins:2.46.2. The jenkins slave is a separate vm running ubuntu 14.04 and having docker 1.12.1 installed. In my opinion this is not a docker-in-docker setup because of the jenkins slave being a real vm.

          I get the same results on both setups.

           

          Mathias Rühle added a comment - - edited I'm experiencing the same issue and it seems to depend on the docker image being used I have a simple pipeline setup: node( 'docker' ) {     stage( 'test' ) {         docker.image( 'r-base:3.4.0' ).inside() {             sh(script: 'ping -c 2 jenkins.io' )         }     } } which fails in all of about 100 test runs. Log output: Started by user Mathias Rühle [Pipeline] node Running on slave-2 in /home/jenkins/workspace/pipeline-test [Pipeline] { [Pipeline] stage [Pipeline] { (test) [Pipeline] sh [pipeline-test] Running shell script + docker inspect -f . r-base:3.4.0 . [Pipeline] withDockerContainer slave-2 does not seem to be running inside a container $ docker run -t -d -u 1007:1007 -w /home/jenkins/workspace/pipeline-test -v /home/jenkins/workspace/pipeline-test:/home/jenkins/workspace/pipeline-test:rw -v /home/jenkins/workspace/pipeline-test@tmp:/home/jenkins/workspace/pipeline-test@tmp:rw -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** --entrypoint cat r-base:3.4.0 [Pipeline] { [Pipeline] sh [pipeline-test] Running shell script + ping -c 2 jenkins.io PING jenkins.io (140.211.15.101): 56 data bytes 64 bytes from 140.211.15.101: icmp_seq=0 ttl=44 time=163.801 ms [Pipeline] } $ docker stop --time=1 352151927bb3d4123f3bcfc467ab1a137b2829599a623807d22e18f9497cc742 $ docker rm -f 352151927bb3d4123f3bcfc467ab1a137b2829599a623807d22e18f9497cc742 [Pipeline] // withDockerContainer [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline ERROR: script returned exit code -1 Finished: FAILURE When I change the image version to 3.1.2: node( 'docker' ) {     stage( 'test' ) {         docker.image( 'r-base:3.1.2' ).inside() {             sh(script: 'ping -c 2 jenkins.io' )         }     } } it succeeds in all of 20 test runs. Log output: Started by user Mathias Rühle [Pipeline] node Running on slave-2 in /home/jenkins/workspace/pipeline-test [Pipeline] { [Pipeline] stage [Pipeline] { (test) [Pipeline] sh [pipeline-test] Running shell script + docker inspect -f . r-base:3.1.2 . [Pipeline] withDockerContainer slave-2 does not seem to be running inside a container $ docker run -t -d -u 1007:1007 -w /home/jenkins/workspace/pipeline-test -v /home/jenkins/workspace/pipeline-test:/home/jenkins/workspace/pipeline-test:rw -v /home/jenkins/workspace/pipeline-test@tmp:/home/jenkins/workspace/pipeline-test@tmp:rw -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** --entrypoint cat r-base:3.1.2 [Pipeline] { [Pipeline] sh [pipeline-test] Running shell script + ping -c 2 jenkins.io PING jenkins.io (140.211.15.101): 56 data bytes 64 bytes from 140.211.15.101: icmp_seq=0 ttl=44 time=163.756 ms 64 bytes from 140.211.15.101: icmp_seq=1 ttl=44 time=163.624 ms --- jenkins.io ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max/stddev = 163.624/163.690/163.756/0.066 ms [Pipeline] } $ docker stop --time=1 443c8c93c17cc07d08c85be69304100b306191772ff4d5770537d1074b9d3679 $ docker rm -f 443c8c93c17cc07d08c85be69304100b306191772ff4d5770537d1074b9d3679 [Pipeline] // withDockerContainer [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline Finished: SUCCESS Another working docker image for me is  perl:5.24.1 . I used 2 different setups. The first on my development maching running jenkins via mvn hpi:run and having a docker container (customized imaged base on ubuntu:14.04) with ssh daemon as slave node. The slave node contains a statically linked docker executable ( https://get.docker.com/builds/Linux/x86_64/docker-1.12.1.tgz) and the DOCKER_HOST environment variable is set to the docker network bridge ip 172.17.0.1 (the actual host machine). I would this consider a docker-in-docker setup. The second setup is a jenkins instance running inside docker using the image jenkins:2.46.2 . The jenkins slave is a separate vm running ubuntu 14.04 and having docker 1.12.1 installed. In my opinion this is not a docker-in-docker setup because of the jenkins slave being a real vm. I get the same results on both setups.  

          Encountered the same bug (at least, that is what I'm suspecting). By digging further and debugging the plugin into Eclipse, I was wondering if the docker plugin was faulty, or if it was not in the durable-task plugin instead, ending up my debugging onto the exitStatus() from ShellController class (see org.jenkinsci.plugins.durabletask.BournShellScript class).

          Actually, I found out that this bug seemed to appear from our side after the recent Debian Stretch release, and suprisingly:

          $ docker run -it --rm debian:stretch ps
          docker: Error response from daemon: Container command 'ps' not found or does not exist..
          $ docker run -it --rm debian:jessie ps 
          PID TTY TIME CMD
          1 ? 00:00:00 ps
          

          The durable-task makes the assumption that it has the ps command to check if the process is still alive, which does not seem to be the case in new debian:stretch image. Note that if the sh command finishes before Jenkins has the need to control whether the process is still alive, then you won't encounter the issue.

          Pierre Mauduit added a comment - Encountered the same bug (at least, that is what I'm suspecting). By digging further and debugging the plugin into Eclipse, I was wondering if the docker plugin was faulty, or if it was not in the durable-task plugin instead, ending up my debugging onto the exitStatus() from ShellController class (see org.jenkinsci.plugins.durabletask.BournShellScript class). Actually, I found out that this bug seemed to appear from our side after the recent Debian Stretch release, and suprisingly: $ docker run -it --rm debian:stretch ps docker: Error response from daemon: Container command 'ps' not found or does not exist.. $ docker run -it --rm debian:jessie ps PID TTY TIME CMD 1 ? 00:00:00 ps The durable-task makes the assumption that it has the ps command to check if the process is still alive, which does not seem to be the case in new debian:stretch image. Note that if the sh command finishes before Jenkins has the need to control whether the process is still alive, then you won't encounter the issue.

          Following my previous comment, I think that even with the ps command available into the underlying docker container, the Java code in ShellController might fail to detect the process return code, I am getting some commands returning -2 as status code where they are not supposed to, so I suspect to fall into the following condition:

          https://github.com/jenkinsci/durable-task-plugin/blob/master/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L207-L209

           

          Pierre Mauduit added a comment - Following my previous comment, I think that even with the ps command available into the underlying docker container, the Java code in ShellController might fail to detect the process return code, I am getting some commands returning -2 as status code where they are not supposed to, so I suspect to fall into the following condition: https://github.com/jenkinsci/durable-task-plugin/blob/master/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L207-L209  

          Jesse Glick added a comment -

          pmauduit yes that is the well-known JENKINS-40101. For now these images are not supported for inside.

          Jesse Glick added a comment - pmauduit yes that is the well-known  JENKINS-40101 . For now these images are not supported for inside .

          jglick is there any docker image that can be used to circumvent the issue ?

          Pierre Mauduit added a comment - jglick is there any docker image that can be used to circumvent the issue ?

          Jesse Glick added a comment -

          Anything with ps.

          Jesse Glick added a comment - Anything with ps .

          I can reproduce even with images having ps installed

          Pierre Mauduit added a comment - I can reproduce even with images having ps installed

          Pierre Mauduit added a comment - - edited

          Something really weird (at least, I cannot explain): first clicking "scan repository now" before launching the actual build seems to solve my issue.

          [edit]actually, it still failing after other build attempts

          Pierre Mauduit added a comment - - edited Something really weird (at least, I cannot explain): first clicking "scan repository now" before launching the actual build seems to solve my issue. [edit] actually, it still failing after other build attempts

          Pavel Georgiev added a comment - - edited

          Hi,

          I am using docker.image.inside() with a centos:7.3.1611 image for building maven project.

          It was working perfect when my docker host was installed with Fedora 25.

          I changed the docker host os from fedora to centos 7 and i started seeing this issue.

          • The centos image has ps inside.
          • The Jenkins server is running as docker container
          • The pipeline is executed on a jenkins slave that runs on the same server

          I can reproduce the issue with the example code from above:

           

          node('build') {
              
             docker.image("builder-centos-7.3.1611-maven-3.5-jdk-8:2.2.0").inside() {
                    sh 'for i in `seq 1 50`; do echo $i;date;sleep 2;done'
                  }
              
          }
          

          The image above is based on cent os 7 but has jdk and maven installed.

           

          Any hints? My whole environment is down right now.....

          Pavel Georgiev added a comment - - edited Hi, I am using docker.image.inside() with a centos:7.3.1611 image for building maven project. It was working perfect when my docker host was installed with Fedora 25 . I changed the docker host os from fedora to centos 7 and i started seeing this issue. The centos image has ps inside. The Jenkins server is running as docker container The pipeline is executed on a jenkins slave that runs on the same server I can reproduce the issue with the example code from above:   node( 'build' ) {         docker.image( "builder-centos-7.3.1611-maven-3.5-jdk-8:2.2.0" ).inside() {           sh ' for i in `seq 1 50`; do echo $i;date;sleep 2;done'         }      } The image above is based on cent os 7 but has jdk and maven installed.   Any hints? My whole environment is down right now.....

          My issue was resolved by either rebooting the server or making sure that the home folder of the user running the jenkins slave is exactly the same as the folder inside the centos image.

           

          Hope that helps someone.

           

          p.s.

           

          The example code above also passes the command was executed 50 times without failure.

          Pavel Georgiev added a comment - My issue was resolved by either rebooting the server or making sure that the home folder of the user running the jenkins slave is exactly the same as the folder inside the centos image.   Hope that helps someone.   p.s.   The example code above also passes the command was executed 50 times without failure.

          Misha Yesiev added a comment -

          pgeorgiev, what do you mean by

          home folder of the user running the jenkins slave is exactly the same as the folder inside the centos image

          My setup is as follows:

          node(‘alpine-node’) {
                  docker.withServer('tcp://dockerhost:2375') {
                      docker.image('centos-slave’).inside('--net=bridge') {
                          sh '''
                              for i in $(seq 3); do sleep 1; echo $i; done
                          '''
                      }
                  }  
              }
          }
          

          alpine-node is run from the same dockerhost as the centos-slave.

          Should I map the home folder from dockerhost to the centos-slave?

          Misha Yesiev added a comment - pgeorgiev , what do you mean by home folder of the user running the jenkins slave is exactly the same as the folder inside the centos image My setup is as follows: node(‘alpine-node’) {         docker.withServer( 'tcp: //dockerhost:2375' ) {             docker.image( 'centos-slave’).inside(' --net=bridge') {                 sh '''                     for i in $(seq 3); do sleep 1; echo $i; done                 '''             }         }       } } alpine-node is run from the same dockerhost as the centos-slave . Should I map the home folder from dockerhost to the centos-slave ?

          Jesse Glick added a comment -

          Do not do such one-off mapping.

          Probably this is a bug, possibly duplicate. The workaround is as always to just avoid inside. You can accomplish similar goals more portably and transparently using plain docker CLI commands.

          Jesse Glick added a comment - Do not do such one-off mapping. Probably this is a bug, possibly duplicate. The workaround is as always to just avoid inside . You can accomplish similar goals more portably and transparently using plain docker CLI commands.

          Misha Yesiev added a comment -

          jglick, thanks for your response.

          I managed to fix the issue by rebuilding the Durable Task plugin with the following change:
          https://github.com/jenkinsci/durable-task-plugin/pull/40

          Misha Yesiev added a comment - jglick , thanks for your response. I managed to fix the issue by rebuilding the Durable Task plugin with the following change: https://github.com/jenkinsci/durable-task-plugin/pull/40

          Bruno Didot added a comment -

          emishas thanks for the tip. After compiling the plugin with pr#40 Durable Task is able to keep track of the mvn process in my use case, and the pipeline jobs executes properly.

          Bruno Didot added a comment - emishas thanks for the tip. After compiling the plugin with pr#40 Durable Task is able to keep track of the mvn process in my use case, and the pipeline jobs executes properly.

          Misha Yesiev added a comment -

          jglick,

          The workaround is as always to just avoid inside. You can accomplish similar goals more portably and transparently using plain docker CLI commands.

          This does work when I just use 'sh' inside the .inside{}, then it would be like:

          docker -H <dockerhost> exec <container> curl blablabla
          

          But how do I execute other DSLs (like archiveArtifacts) inside the container?
          Thanks!

          Misha Yesiev added a comment - jglick , The workaround is as always to just avoid inside. You can accomplish similar goals more portably and transparently using plain docker CLI commands. This does work when I just use 'sh' inside the .inside{}, then it would be like: docker -H <dockerhost> exec <container> curl blablabla But how do I execute other DSLs (like archiveArtifacts) inside the container? Thanks!

          Jesse Glick added a comment -

          how do I execute other DSLs (like archiveArtifacts) inside the container?

          You cannot, but you docker-cp files outside, etc.

          Jesse Glick added a comment - how do I execute other DSLs (like archiveArtifacts) inside the container? You cannot, but you docker-cp files outside, etc.

          I am encountering this same problem as well. I see there is at least one pull request associated with this issue emishas. Has the fix been released into an updated plugin somewhere?

          Kevin Phillips added a comment - I am encountering this same problem as well. I see there is at least one pull request associated with this issue emishas . Has the fix been released into an updated plugin somewhere?

          Just to be clear, here are a few specifics of the behavior I'm seeing in our production environment in case it helps isolate the problem further:

          • for some reason we only started experiencing this issue about 2 weeks ago. What's even stranger is the instigating factor seems to be a server reboot of our Jenkins master instance. Based on my review, the agents remained unchanged during this outage, and the master remained the same (same core version, same plugin versions, same OS packages, same pipeline DSL code, etc.) but for some reason the reboot has caused this problem to start happening.
          • For an example of the DSL code we're using, see this issue I created - and resolved as a duplicate of this one: JENKINS-46969
          • our environment is building a Docker image from a Dockerfile in the workspace and running the build operations inside a container launched from this image - which is slightly different than pulling an existing image from a Docker registry.
          • our Docker image is based off a RHEL 7.2 base image
          • I've confirmed our container has the 'ps' command line tool installed (there were some mentions that the Jenkins Docker APIs require this tool to be installed to work correctly)
          • Using the exact same docker image and pipeline code results in successful builds about 75% of the time, so the failures are intermittent .... and yet frequent enough to be affecting production work
          • We've reproduced the bug on 2 different Jenkins farms, one running core v2.43.3 and the other running v2.60.2. There are a variety of plugins installed on each of our farms, so if there are particular plugins that may play a part in this bug feel free to let me know and I'll compile a list of the versions of those plugins for review.
          • The agents attached to these masters are running one of the following host OSes: Centos 7.3, RHEL 7.3, RHEL 7.4
          • The agents are running one of the following versions of Docker: 17.03.1-ce, 17.06.0-ce

          As mentioned, these failures are affecting our production builds so any assistance with resolving them would be appreciated.

          Kevin Phillips added a comment - Just to be clear, here are a few specifics of the behavior I'm seeing in our production environment in case it helps isolate the problem further: for some reason we only started experiencing this issue about 2 weeks ago. What's even stranger is the instigating factor seems to be a server reboot of our Jenkins master instance. Based on my review, the agents remained unchanged during this outage, and the master remained the same (same core version, same plugin versions, same OS packages, same pipeline DSL code, etc.) but for some reason the reboot has caused this problem to start happening. For an example of the DSL code we're using, see this issue I created - and resolved as a duplicate of this one: JENKINS-46969 our environment is building a Docker image from a Dockerfile in the workspace and running the build operations inside a container launched from this image - which is slightly different than pulling an existing image from a Docker registry. our Docker image is based off a RHEL 7.2 base image I've confirmed our container has the 'ps' command line tool installed (there were some mentions that the Jenkins Docker APIs require this tool to be installed to work correctly) Using the exact same docker image and pipeline code results in successful builds about 75% of the time, so the failures are intermittent .... and yet frequent enough to be affecting production work We've reproduced the bug on 2 different Jenkins farms, one running core v2.43.3 and the other running v2.60.2. There are a variety of plugins installed on each of our farms, so if there are particular plugins that may play a part in this bug feel free to let me know and I'll compile a list of the versions of those plugins for review. The agents attached to these masters are running one of the following host OSes: Centos 7.3, RHEL 7.3, RHEL 7.4 The agents are running one of the following versions of Docker: 17.03.1-ce, 17.06.0-ce As mentioned, these failures are affecting our production builds so any assistance with resolving them would be appreciated.

          jglick I do appologize if this comment comes across as cynical, but telling people to just not use the built-in docker APIs provided by the Jenkins Pipeline infrastructure doesn't seem very helpful to me. Cloudbees seems to have made it very clear that they are expecting everyone to adopt the new Pipeline subsystem as a new standard for Jenkins automation, and as part of that infrastructure are APIs for orchestrating Docker containers. Suggesting that these APIs are unstable and should simply not be used seems to contradict that stance. Further, as emishas has already pointed out, trying to get other build steps / plugins to interact correctly with a docker container managed in this way is going to be fragile at best, and impossible at worst. While this might be a reasonable workaround for the trivial case of running simple shell commands within the container, it does not seem to me to be a reasonable workaround for the general case.

          Kevin Phillips added a comment - jglick I do appologize if this comment comes across as cynical, but telling people to just not use the built-in docker APIs provided by the Jenkins Pipeline infrastructure doesn't seem very helpful to me. Cloudbees seems to have made it very clear that they are expecting everyone to adopt the new Pipeline subsystem as a new standard for Jenkins automation, and as part of that infrastructure are APIs for orchestrating Docker containers. Suggesting that these APIs are unstable and should simply not be used seems to contradict that stance. Further, as emishas has already pointed out, trying to get other build steps / plugins to interact correctly with a docker container managed in this way is going to be fragile at best, and impossible at worst. While this might be a reasonable workaround for the trivial case of running simple shell commands within the container, it does not seem to me to be a reasonable workaround for the general case.

          Is there any way to generate verbose output from Jenkins and / or Docker to help debug this issue more effectively? 

          For example, there are some mentions above that Jenkins may be doing some 'ps' operations to detect whether the scripts running within the container have finished execution or not. Is there any way to get details as to what commands are being run, what their command line options are at the time they are run, what their stdout/stderr messages are, what their returns codes are, etc.? Similarly, if there are Docker tools being used to orchestrate these operations, is there any way to see which commands are being issued and when, and what their inputs/outputs are at runtime?

          Based on my current evaluation, none of the system logs provide any sort of feedback in this regard. I've enabled verbose logging for Jenkins and Docker, examined their logs on both the master and the agents, I've looked as the sys logs, etc. and none of them give any indication of how or why the containers are being closed. In fact the only indication of any error happening at all is that message in the build log "ERROR: script returned exit code -1" which is misleading at best. It appears that the script - as in, the one being run as part of the 'sh' build step - isn't actually returning that error code. Perhaps it's an error code produced by a Docker command run by the Jenkins API under the hood. Not sure. Either way it appears to be of little to no help in debugging this problem.

          Any suggestions on how to gather more intel on the problem would be appreciated.

          Kevin Phillips added a comment - Is there any way to generate verbose output from Jenkins and / or Docker to help debug this issue more effectively?  For example, there are some mentions above that Jenkins may be doing some 'ps' operations to detect whether the scripts running within the container have finished execution or not. Is there any way to get details as to what commands are being run, what their command line options are at the time they are run, what their stdout/stderr messages are, what their returns codes are, etc.? Similarly, if there are Docker tools being used to orchestrate these operations, is there any way to see which commands are being issued and when, and what their inputs/outputs are at runtime? Based on my current evaluation, none of the system logs provide any sort of feedback in this regard. I've enabled verbose logging for Jenkins and Docker, examined their logs on both the master and the agents, I've looked as the sys logs, etc. and none of them give any indication of how or why the containers are being closed. In fact the only indication of any error happening at all is that message in the build log "ERROR: script returned exit code -1" which is misleading at best. It appears that the script - as in, the one being run as part of the 'sh' build step - isn't actually returning that error code. Perhaps it's an error code produced by a Docker command run by the Jenkins API under the hood. Not sure. Either way it appears to be of little to no help in debugging this problem. Any suggestions on how to gather more intel on the problem would be appreciated.

          I had the same issue in my build where a Jenkins container, is running a Jenkinsfile which in turn runs commands inside another container, based on the latest ubuntu one, with the withDockerContainer or docker.image.inside command. A script that would take a few minutes checking out some git repositories would prematurely end with the ERROR: script returned exit code -1.

          There is some discussion in Stackoverflow where it is mentioned that this issue is related with ps not installed in the container you want to run commands inside in. Indeed manually running my container with docker exec -it mycontainer bash, I noticed that running ps was failing with a command not found. I am not sure what the exact connection is but I can confirm that adding apt-get install -y procps in the Dockerfile of the container that I want to run commands inside, solved that issue for me, at least as a temporary workaround.

           

          Ioannis Iosifidis added a comment - I had the same issue in my build where a Jenkins container, is running a Jenkinsfile which in turn runs commands inside another container, based on the latest ubuntu one, with the withDockerContainer or docker.image.inside command. A script that would take a few minutes checking out some git repositories would prematurely end with the ERROR: script returned exit code -1. There is some discussion in Stackoverflow where it is mentioned that this issue is related with ps not installed in the container you want to run commands inside in. Indeed manually running my container with docker exec -it mycontainer bash, I noticed that running ps was failing with a command not found. I am not sure what the exact connection is but I can confirm that adding apt-get install -y procps in the Dockerfile of the container that I want to run commands inside, solved that issue for me, at least as a temporary workaround.  

          Jesse Glick added a comment -

          Possibly solved by JENKINS-47791, not sure.

          Jesse Glick added a comment - Possibly solved by  JENKINS-47791 , not sure.

          Cosmin Stroe added a comment - - edited

          I'm seeing this issue in Jenkins ver. 2.63.  I am running a python script inside a docker container and getting:

          ...
          [Pipeline] stage
          [Pipeline] { (Publish Classes)
          [Pipeline] sh
          [***] Running shell script
          + ./post_classes.py
          ...
          ERROR: script returned exit code -1
          Finished: FAILURE

          I've tried to reproduce it with a simpler script, but I can't.  It happens only with certain builds.

           

          Update:

          The docker container I was building didn't contain ps (thank you iiosifidis for your message above). After adding ps, it fixed the issue.

          Cosmin Stroe added a comment - - edited I'm seeing this issue in Jenkins ver. 2.63.  I am running a python script inside a docker container and getting: ... [Pipeline] stage [Pipeline] { (Publish Classes) [Pipeline] sh [***] Running shell script + ./post_classes.py ... ERROR: script returned exit code -1 Finished: FAILURE I've tried to reproduce it with a simpler script, but I can't.  It happens only with certain builds.   Update: The docker container I was building didn't contain ps (thank you iiosifidis for your message above). After adding ps, it fixed the issue.

          wei lan added a comment -

          I had the same issue,and I had checked my docker container,“ps” command can run in it,so I thought it had no relation with the cmmand “ps” installed in docker image,any other solutions?

          wei lan added a comment - I had the same issue,and I had checked my docker container,“ps” command can run in it,so I thought it had no relation with the cmmand “ps” installed in docker image,any other solutions?

          Jesse Glick added a comment -

          There could be many reasons, not necessarily related to one another.

          Jesse Glick added a comment - There could be many reasons, not necessarily related to one another.

            Unassigned Unassigned
            penszo zbigniew jasinski
            Votes:
            6 Vote for this issue
            Watchers:
            20 Start watching this issue

              Created:
              Updated: