Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-35370

Workflow shell step ERROR: script returned exit code -1

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • docker-workflow-plugin
    • None
    • jenkins 2.11, gerrit-trigger 2.21.1, docker-workflow 1.5, pipeline 2.2

      So I'm trying to use workflow and gerrit-trigger plugins together. But I'm getting script returned exit code -1. I've checked, and it's not 'git fetch' network related issue or anything like that. Any command I run through 'sh' which make some output delay like, git fetch or clone, or whatever - gives me -1 exit code. Groovy script is same like in gerrit-trigger example from https://wiki.jenkins-ci.org/display/JENKINS/Gerrit+Trigger
      My log samples below:

      Retriggered by user admin for Gerrit: https://url/gerrit/73010 in silent mode.
      [Pipeline] node
      Running on master in /var/jenkins_home/workspace/pipeline-docker-test
      [Pipeline] {
      [Pipeline] withDockerServer
      [Pipeline] {
      [Pipeline] sh
      [pipeline-docker-test] Running shell script
      + docker inspect -f . dockers.local:5000/image:latest
      .
      [Pipeline] withDockerContainer
      $ docker run -t -d -u 1000:1000 -w /var/jenkins_home/workspace/pipeline-docker-test -v /var/jenkins_home/workspace/pipeline-docker-test:/var/jenkins_home/workspace/pipeline-docker-test:rw -v /var/jenkins_home/workspace/pipeline-docker-test@tmp:/var/jenkins_home/workspace/pipeline-docker-test@tmp:rw -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** dockers.local:5000/image:latest cat
      [Pipeline] {
      [Pipeline] stage (cloning repo)
      Entering stage cloning repo
      Proceeding
      [Pipeline] git
       > git rev-parse --is-inside-work-tree # timeout=10
      Fetching changes from the remote Git repository
       > git config remote.origin.url ssh://url:29418/project # timeout=10
      Fetching upstream changes from ssh://url:29418/project
       > git --version # timeout=10
       > git -c core.askpass=true fetch --tags --progress ssh://url:29418/project +refs/heads/*:refs/remotes/origin/*
       > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
       > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
      Checking out Revision 82beb7889ae2fea7a35a59575bb849c881302eec (refs/remotes/origin/master)
       > git config core.sparsecheckout # timeout=10
       > git checkout -f 82beb7889ae2fea7a35a59575bb849c881302eec # timeout=10
       > git branch -a -v --no-abbrev # timeout=10
       > git branch -D master # timeout=10
       > git checkout -b master 82beb7889ae2fea7a35a59575bb849c881302eec
       > git rev-list 82beb7889ae2fea7a35a59575bb849c881302eec # timeout=10
      [Pipeline] stage (Checkout patchset)
      Entering stage Checkout patchset
      Proceeding
      [Pipeline] sh
      [pipeline-docker-test] Running shell script
      + git fetch origin refs/changes/10/73010/1:change-73010-1
      [Pipeline] }
      $ docker stop d3499aacbbef9d2acbb0e9cdbac51c0ea620e91b9893d46ceda49d5344357c61
      $ docker rm -f d3499aacbbef9d2acbb0e9cdbac51c0ea620e91b9893d46ceda49d5344357c61
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // withDockerServer
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -1
      Finished: FAILURE
      

      Example below. I made 3 'sh' steps:

      sh 'pwd'
      sh 'sleep 1'
      sh 'uname'
      

      and log:

      [Pipeline] {
      [Pipeline] sh
      [pipeline-docker-test] Running shell script
      
      + pwd
      /var/jenkins_home/workspace/pipeline-docker-test
      [Pipeline] sh
      
      [pipeline-docker-test] Running shell script
      
      + sleep 1
      
      [Pipeline] }
      $ docker stop 93ea34143c0a9c7e28b6e580d620f33bcc8f0e0081e7885ea63ef9ad0d1fc57e
      $ docker rm -f 93ea34143c0a9c7e28b6e580d620f33bcc8f0e0081e7885ea63ef9ad0d1fc57e
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // withDockerServer
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -1
      Finished: FAILURE
      
      

          [JENKINS-35370] Workflow shell step ERROR: script returned exit code -1

          Jesse Glick added a comment -

          how do I execute other DSLs (like archiveArtifacts) inside the container?

          You cannot, but you docker-cp files outside, etc.

          Jesse Glick added a comment - how do I execute other DSLs (like archiveArtifacts) inside the container? You cannot, but you docker-cp files outside, etc.

          I am encountering this same problem as well. I see there is at least one pull request associated with this issue emishas. Has the fix been released into an updated plugin somewhere?

          Kevin Phillips added a comment - I am encountering this same problem as well. I see there is at least one pull request associated with this issue emishas . Has the fix been released into an updated plugin somewhere?

          Just to be clear, here are a few specifics of the behavior I'm seeing in our production environment in case it helps isolate the problem further:

          • for some reason we only started experiencing this issue about 2 weeks ago. What's even stranger is the instigating factor seems to be a server reboot of our Jenkins master instance. Based on my review, the agents remained unchanged during this outage, and the master remained the same (same core version, same plugin versions, same OS packages, same pipeline DSL code, etc.) but for some reason the reboot has caused this problem to start happening.
          • For an example of the DSL code we're using, see this issue I created - and resolved as a duplicate of this one: JENKINS-46969
          • our environment is building a Docker image from a Dockerfile in the workspace and running the build operations inside a container launched from this image - which is slightly different than pulling an existing image from a Docker registry.
          • our Docker image is based off a RHEL 7.2 base image
          • I've confirmed our container has the 'ps' command line tool installed (there were some mentions that the Jenkins Docker APIs require this tool to be installed to work correctly)
          • Using the exact same docker image and pipeline code results in successful builds about 75% of the time, so the failures are intermittent .... and yet frequent enough to be affecting production work
          • We've reproduced the bug on 2 different Jenkins farms, one running core v2.43.3 and the other running v2.60.2. There are a variety of plugins installed on each of our farms, so if there are particular plugins that may play a part in this bug feel free to let me know and I'll compile a list of the versions of those plugins for review.
          • The agents attached to these masters are running one of the following host OSes: Centos 7.3, RHEL 7.3, RHEL 7.4
          • The agents are running one of the following versions of Docker: 17.03.1-ce, 17.06.0-ce

          As mentioned, these failures are affecting our production builds so any assistance with resolving them would be appreciated.

          Kevin Phillips added a comment - Just to be clear, here are a few specifics of the behavior I'm seeing in our production environment in case it helps isolate the problem further: for some reason we only started experiencing this issue about 2 weeks ago. What's even stranger is the instigating factor seems to be a server reboot of our Jenkins master instance. Based on my review, the agents remained unchanged during this outage, and the master remained the same (same core version, same plugin versions, same OS packages, same pipeline DSL code, etc.) but for some reason the reboot has caused this problem to start happening. For an example of the DSL code we're using, see this issue I created - and resolved as a duplicate of this one: JENKINS-46969 our environment is building a Docker image from a Dockerfile in the workspace and running the build operations inside a container launched from this image - which is slightly different than pulling an existing image from a Docker registry. our Docker image is based off a RHEL 7.2 base image I've confirmed our container has the 'ps' command line tool installed (there were some mentions that the Jenkins Docker APIs require this tool to be installed to work correctly) Using the exact same docker image and pipeline code results in successful builds about 75% of the time, so the failures are intermittent .... and yet frequent enough to be affecting production work We've reproduced the bug on 2 different Jenkins farms, one running core v2.43.3 and the other running v2.60.2. There are a variety of plugins installed on each of our farms, so if there are particular plugins that may play a part in this bug feel free to let me know and I'll compile a list of the versions of those plugins for review. The agents attached to these masters are running one of the following host OSes: Centos 7.3, RHEL 7.3, RHEL 7.4 The agents are running one of the following versions of Docker: 17.03.1-ce, 17.06.0-ce As mentioned, these failures are affecting our production builds so any assistance with resolving them would be appreciated.

          jglick I do appologize if this comment comes across as cynical, but telling people to just not use the built-in docker APIs provided by the Jenkins Pipeline infrastructure doesn't seem very helpful to me. Cloudbees seems to have made it very clear that they are expecting everyone to adopt the new Pipeline subsystem as a new standard for Jenkins automation, and as part of that infrastructure are APIs for orchestrating Docker containers. Suggesting that these APIs are unstable and should simply not be used seems to contradict that stance. Further, as emishas has already pointed out, trying to get other build steps / plugins to interact correctly with a docker container managed in this way is going to be fragile at best, and impossible at worst. While this might be a reasonable workaround for the trivial case of running simple shell commands within the container, it does not seem to me to be a reasonable workaround for the general case.

          Kevin Phillips added a comment - jglick I do appologize if this comment comes across as cynical, but telling people to just not use the built-in docker APIs provided by the Jenkins Pipeline infrastructure doesn't seem very helpful to me. Cloudbees seems to have made it very clear that they are expecting everyone to adopt the new Pipeline subsystem as a new standard for Jenkins automation, and as part of that infrastructure are APIs for orchestrating Docker containers. Suggesting that these APIs are unstable and should simply not be used seems to contradict that stance. Further, as emishas has already pointed out, trying to get other build steps / plugins to interact correctly with a docker container managed in this way is going to be fragile at best, and impossible at worst. While this might be a reasonable workaround for the trivial case of running simple shell commands within the container, it does not seem to me to be a reasonable workaround for the general case.

          Is there any way to generate verbose output from Jenkins and / or Docker to help debug this issue more effectively? 

          For example, there are some mentions above that Jenkins may be doing some 'ps' operations to detect whether the scripts running within the container have finished execution or not. Is there any way to get details as to what commands are being run, what their command line options are at the time they are run, what their stdout/stderr messages are, what their returns codes are, etc.? Similarly, if there are Docker tools being used to orchestrate these operations, is there any way to see which commands are being issued and when, and what their inputs/outputs are at runtime?

          Based on my current evaluation, none of the system logs provide any sort of feedback in this regard. I've enabled verbose logging for Jenkins and Docker, examined their logs on both the master and the agents, I've looked as the sys logs, etc. and none of them give any indication of how or why the containers are being closed. In fact the only indication of any error happening at all is that message in the build log "ERROR: script returned exit code -1" which is misleading at best. It appears that the script - as in, the one being run as part of the 'sh' build step - isn't actually returning that error code. Perhaps it's an error code produced by a Docker command run by the Jenkins API under the hood. Not sure. Either way it appears to be of little to no help in debugging this problem.

          Any suggestions on how to gather more intel on the problem would be appreciated.

          Kevin Phillips added a comment - Is there any way to generate verbose output from Jenkins and / or Docker to help debug this issue more effectively?  For example, there are some mentions above that Jenkins may be doing some 'ps' operations to detect whether the scripts running within the container have finished execution or not. Is there any way to get details as to what commands are being run, what their command line options are at the time they are run, what their stdout/stderr messages are, what their returns codes are, etc.? Similarly, if there are Docker tools being used to orchestrate these operations, is there any way to see which commands are being issued and when, and what their inputs/outputs are at runtime? Based on my current evaluation, none of the system logs provide any sort of feedback in this regard. I've enabled verbose logging for Jenkins and Docker, examined their logs on both the master and the agents, I've looked as the sys logs, etc. and none of them give any indication of how or why the containers are being closed. In fact the only indication of any error happening at all is that message in the build log "ERROR: script returned exit code -1" which is misleading at best. It appears that the script - as in, the one being run as part of the 'sh' build step - isn't actually returning that error code. Perhaps it's an error code produced by a Docker command run by the Jenkins API under the hood. Not sure. Either way it appears to be of little to no help in debugging this problem. Any suggestions on how to gather more intel on the problem would be appreciated.

          I had the same issue in my build where a Jenkins container, is running a Jenkinsfile which in turn runs commands inside another container, based on the latest ubuntu one, with the withDockerContainer or docker.image.inside command. A script that would take a few minutes checking out some git repositories would prematurely end with the ERROR: script returned exit code -1.

          There is some discussion in Stackoverflow where it is mentioned that this issue is related with ps not installed in the container you want to run commands inside in. Indeed manually running my container with docker exec -it mycontainer bash, I noticed that running ps was failing with a command not found. I am not sure what the exact connection is but I can confirm that adding apt-get install -y procps in the Dockerfile of the container that I want to run commands inside, solved that issue for me, at least as a temporary workaround.

           

          Ioannis Iosifidis added a comment - I had the same issue in my build where a Jenkins container, is running a Jenkinsfile which in turn runs commands inside another container, based on the latest ubuntu one, with the withDockerContainer or docker.image.inside command. A script that would take a few minutes checking out some git repositories would prematurely end with the ERROR: script returned exit code -1. There is some discussion in Stackoverflow where it is mentioned that this issue is related with ps not installed in the container you want to run commands inside in. Indeed manually running my container with docker exec -it mycontainer bash, I noticed that running ps was failing with a command not found. I am not sure what the exact connection is but I can confirm that adding apt-get install -y procps in the Dockerfile of the container that I want to run commands inside, solved that issue for me, at least as a temporary workaround.  

          Jesse Glick added a comment -

          Possibly solved by JENKINS-47791, not sure.

          Jesse Glick added a comment - Possibly solved by  JENKINS-47791 , not sure.

          Cosmin Stroe added a comment - - edited

          I'm seeing this issue in Jenkins ver. 2.63.  I am running a python script inside a docker container and getting:

          ...
          [Pipeline] stage
          [Pipeline] { (Publish Classes)
          [Pipeline] sh
          [***] Running shell script
          + ./post_classes.py
          ...
          ERROR: script returned exit code -1
          Finished: FAILURE

          I've tried to reproduce it with a simpler script, but I can't.  It happens only with certain builds.

           

          Update:

          The docker container I was building didn't contain ps (thank you iiosifidis for your message above). After adding ps, it fixed the issue.

          Cosmin Stroe added a comment - - edited I'm seeing this issue in Jenkins ver. 2.63.  I am running a python script inside a docker container and getting: ... [Pipeline] stage [Pipeline] { (Publish Classes) [Pipeline] sh [***] Running shell script + ./post_classes.py ... ERROR: script returned exit code -1 Finished: FAILURE I've tried to reproduce it with a simpler script, but I can't.  It happens only with certain builds.   Update: The docker container I was building didn't contain ps (thank you iiosifidis for your message above). After adding ps, it fixed the issue.

          wei lan added a comment -

          I had the same issue,and I had checked my docker container,“ps” command can run in it,so I thought it had no relation with the cmmand “ps” installed in docker image,any other solutions?

          wei lan added a comment - I had the same issue,and I had checked my docker container,“ps” command can run in it,so I thought it had no relation with the cmmand “ps” installed in docker image,any other solutions?

          Jesse Glick added a comment -

          There could be many reasons, not necessarily related to one another.

          Jesse Glick added a comment - There could be many reasons, not necessarily related to one another.

            Unassigned Unassigned
            penszo zbigniew jasinski
            Votes:
            6 Vote for this issue
            Watchers:
            20 Start watching this issue

              Created:
              Updated: