Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-46969

Docker container closes prematurely

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • docker-workflow-plugin
    • None
    • Jenkins 2.46.3 / 2.60.2
      Docker Workflow plugin v1.10/1.12

      I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases:

       
      10:13:54 [ksp_delme] Running shell script*10:13:54* + for i in '{1..24}'

      10:13:54 + sleep 1200
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }

      10:16:07 $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // timestamps
      [Pipeline] }
      ERROR: script returned exit code -1

      So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1".

      I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely.

      To make matters worse, I and run and re-run this test case dozens of times before I get the error, so it is very difficult to reproduce. Based on my preliminary review of our production systems I believe the load on the agents encountering this problem plays a part in the problem, although I'm not entirely sure how. Perhaps running many parallel builds on the same agent all running Docker containers may be a factor, but I'm not certain. 

      Any help anyone can give to help debug this problem further would be appreciated.

            Unassigned Unassigned
            leedega Kevin Phillips
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: