Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-46969

Docker container closes prematurely



    • Bug
    • Status: Resolved (View Workflow)
    • Major
    • Resolution: Duplicate
    • docker-workflow-plugin
    • None
    • Jenkins 2.46.3 / 2.60.2
      Docker Workflow plugin v1.10/1.12


      I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases:

      10:13:54 [ksp_delme] Running shell script*10:13:54* + for i in '{1..24}'

      10:13:54 + sleep 1200
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }

      10:16:07 $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // timestamps
      [Pipeline] }
      ERROR: script returned exit code -1

      So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1".

      I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely.

      To make matters worse, I and run and re-run this test case dozens of times before I get the error, so it is very difficult to reproduce. Based on my preliminary review of our production systems I believe the load on the agents encountering this problem plays a part in the problem, although I'm not entirely sure how. Perhaps running many parallel builds on the same agent all running Docker containers may be a factor, but I'm not certain. 

      Any help anyone can give to help debug this problem further would be appreciated.


        Issue Links


            For reference, the Pipeline DSL code I was using for my test above looks like this:


            node () {
                catchError {
                    timestamps {
                        def build_env
                        stage ("Init") {
                            // git checkout ...
                            build_env = docker.build("build_env", './docker')
                        build_env.inside {
                            stage ("Test") {
                                sh 'for i in {1..24}; do sleep 1200; echo "Still Running"; done'
            leedega Kevin Phillips added a comment - For reference, the Pipeline DSL code I was using for my test above looks like this:   node () { catchError { timestamps { def build_env stage ( "Init" ) { // git checkout ... build_env = docker.build( "build_env" , './docker' ) } build_env.inside { stage ( "Test" ) { sh ' for i in {1..24}; do sleep 1200; echo "Still Running" ; done' } } } } }


              Unassigned Unassigned
              leedega Kevin Phillips
              0 Vote for this issue
              1 Start watching this issue