• Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • docker-workflow-plugin
    • None
    • Jenkins 2.46.3 / 2.60.2
      Docker Workflow plugin v1.10/1.12

      I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases:

       
      10:13:54 [ksp_delme] Running shell script*10:13:54* + for i in '{1..24}'

      10:13:54 + sleep 1200
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }

      10:16:07 $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // timestamps
      [Pipeline] }
      ERROR: script returned exit code -1

      So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1".

      I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely.

      To make matters worse, I and run and re-run this test case dozens of times before I get the error, so it is very difficult to reproduce. Based on my preliminary review of our production systems I believe the load on the agents encountering this problem plays a part in the problem, although I'm not entirely sure how. Perhaps running many parallel builds on the same agent all running Docker containers may be a factor, but I'm not certain. 

      Any help anyone can give to help debug this problem further would be appreciated.

          [JENKINS-46969] Docker container closes prematurely

          Kevin Phillips created issue -
          Kevin Phillips made changes -
          Link New: This issue relates to JENKINS-42322 [ JENKINS-42322 ]
          Kevin Phillips made changes -
          Description Original: I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases:

           
          *10:13:54* [ksp_delme] Running shell script*10:13:54* + for i in '\{1..24}'*10:13:54* + sleep 1200
          [Pipeline] }
          [Pipeline] // stage
          [Pipeline] }*10:16:07* $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208
          [Pipeline] // withDockerContainer
          [Pipeline] }
          [Pipeline] // timestamps
          [Pipeline] }
          ERROR: script returned exit code -1
          So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1".

           

          I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely.

           

          Any help anyone can give to help debug this problem further would be appreciated.
          New: I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases:

           
           *10:13:54* [ksp_delme] Running shell script*10:13:54* + for i in '\{1..24}'

          *10:13:54* + sleep 1200
           [Pipeline] }
           [Pipeline] // stage
           [Pipeline] }

          *10:16:07* $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208
           [Pipeline] // withDockerContainer
           [Pipeline] }
           [Pipeline] // timestamps
           [Pipeline] }
           ERROR: script returned exit code -1


           So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1".

          I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely.

          To make matters worse, I and run and re-run this test case dozens of times before I get the error, so it is very difficult to reproduce. Based on my preliminary review of our production systems I believe the load on the agents encountering this problem plays a part in the problem, although I'm not entirely sure how. Perhaps running many parallel builds on the same agent all running Docker containers may be a factor, but I'm not certain. 

          Any help anyone can give to help debug this problem further would be appreciated.

          For reference, the Pipeline DSL code I was using for my test above looks like this:

           

          node () {
              catchError {
                  timestamps {
                      def build_env
                      stage ("Init") {
                          // git checkout ...
                          build_env = docker.build("build_env", './docker')
                      }
                      build_env.inside {
                          stage ("Test") {
                              sh 'for i in {1..24}; do sleep 1200; echo "Still Running"; done'
                          }
                      }
                  }
              }
          }

          Kevin Phillips added a comment - For reference, the Pipeline DSL code I was using for my test above looks like this:   node () { catchError { timestamps { def build_env stage ( "Init" ) { // git checkout ... build_env = docker.build( "build_env" , './docker' ) } build_env.inside { stage ( "Test" ) { sh ' for i in {1..24}; do sleep 1200; echo "Still Running" ; done' } } } } }
          Kevin Phillips made changes -
          Link New: This issue relates to JENKINS-40101 [ JENKINS-40101 ]
          Kevin Phillips made changes -
          Link New: This issue relates to JENKINS-35370 [ JENKINS-35370 ]
          Kevin Phillips made changes -
          Link New: This issue relates to JENKINS-42166 [ JENKINS-42166 ]
          Kevin Phillips made changes -
          Link New: This issue relates to JENKINS-34289 [ JENKINS-34289 ]
          Kevin Phillips made changes -
          Link New: This issue duplicates JENKINS-35370 [ JENKINS-35370 ]
          Kevin Phillips made changes -
          Resolution New: Duplicate [ 3 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]

            Unassigned Unassigned
            leedega Kevin Phillips
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: