Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-46969

Docker container closes prematurely

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Duplicate
    • Component/s: docker-workflow-plugin
    • Labels:
      None
    • Environment:
      Jenkins 2.46.3 / 2.60.2
      Docker Workflow plugin v1.10/1.12
    • Similar Issues:

      Description

      I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases:

       
      10:13:54 [ksp_delme] Running shell script*10:13:54* + for i in '{1..24}'

      10:13:54 + sleep 1200
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }

      10:16:07 $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // timestamps
      [Pipeline] }
      ERROR: script returned exit code -1

      So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1".

      I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely.

      To make matters worse, I and run and re-run this test case dozens of times before I get the error, so it is very difficult to reproduce. Based on my preliminary review of our production systems I believe the load on the agents encountering this problem plays a part in the problem, although I'm not entirely sure how. Perhaps running many parallel builds on the same agent all running Docker containers may be a factor, but I'm not certain. 

      Any help anyone can give to help debug this problem further would be appreciated.

        Attachments

          Issue Links

            Activity

            leedega Kevin Phillips created issue -
            leedega Kevin Phillips made changes -
            Field Original Value New Value
            Link This issue relates to JENKINS-42322 [ JENKINS-42322 ]
            leedega Kevin Phillips made changes -
            Description I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases:

             
            *10:13:54* [ksp_delme] Running shell script*10:13:54* + for i in '\{1..24}'*10:13:54* + sleep 1200
            [Pipeline] }
            [Pipeline] // stage
            [Pipeline] }*10:16:07* $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208
            [Pipeline] // withDockerContainer
            [Pipeline] }
            [Pipeline] // timestamps
            [Pipeline] }
            ERROR: script returned exit code -1
            So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1".

             

            I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely.

             

            Any help anyone can give to help debug this problem further would be appreciated.
            I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases:

             
             *10:13:54* [ksp_delme] Running shell script*10:13:54* + for i in '\{1..24}'

            *10:13:54* + sleep 1200
             [Pipeline] }
             [Pipeline] // stage
             [Pipeline] }

            *10:16:07* $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208
             [Pipeline] // withDockerContainer
             [Pipeline] }
             [Pipeline] // timestamps
             [Pipeline] }
             ERROR: script returned exit code -1


             So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1".

            I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely.

            To make matters worse, I and run and re-run this test case dozens of times before I get the error, so it is very difficult to reproduce. Based on my preliminary review of our production systems I believe the load on the agents encountering this problem plays a part in the problem, although I'm not entirely sure how. Perhaps running many parallel builds on the same agent all running Docker containers may be a factor, but I'm not certain. 

            Any help anyone can give to help debug this problem further would be appreciated.
            leedega Kevin Phillips made changes -
            Link This issue relates to JENKINS-40101 [ JENKINS-40101 ]
            leedega Kevin Phillips made changes -
            Link This issue relates to JENKINS-35370 [ JENKINS-35370 ]
            leedega Kevin Phillips made changes -
            Link This issue relates to JENKINS-42166 [ JENKINS-42166 ]
            leedega Kevin Phillips made changes -
            Link This issue relates to JENKINS-34289 [ JENKINS-34289 ]
            leedega Kevin Phillips made changes -
            Link This issue duplicates JENKINS-35370 [ JENKINS-35370 ]
            leedega Kevin Phillips made changes -
            Resolution Duplicate [ 3 ]
            Status Open [ 1 ] Resolved [ 5 ]
            iceiceice Alexey Grigorov made changes -
            Link This issue is related to JENKINS-47822 [ JENKINS-47822 ]

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              leedega Kevin Phillips
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: