-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Jenkins 2.46.3 / 2.60.2
Docker Workflow plugin v1.10/1.12
I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases:
10:13:54 [ksp_delme] Running shell script*10:13:54* + for i in '{1..24}'
10:13:54 + sleep 1200
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
10:16:07 $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208
[Pipeline] // withDockerContainer
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] }
ERROR: script returned exit code -1
So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1".
I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely.
To make matters worse, I and run and re-run this test case dozens of times before I get the error, so it is very difficult to reproduce. Based on my preliminary review of our production systems I believe the load on the agents encountering this problem plays a part in the problem, although I'm not entirely sure how. Perhaps running many parallel builds on the same agent all running Docker containers may be a factor, but I'm not certain.
Any help anyone can give to help debug this problem further would be appreciated.
- duplicates
-
JENKINS-35370 Workflow shell step ERROR: script returned exit code -1
-
- Reopened
-
- is related to
-
JENKINS-47822 docker pipeline finish beforehand when tcp socket is used
-
- Closed
-
- relates to
-
JENKINS-42322 Docker rm/stop/... commands killed by the timeout, failing builds
-
- Resolved
-
-
JENKINS-40101 Different behavior between debian container using docker.inside
-
- Open
-
-
JENKINS-35370 Workflow shell step ERROR: script returned exit code -1
-
- Reopened
-
-
JENKINS-34289 docker.image.inside fails unexpectedly with Jenkinsfile
-
- Resolved
-
-
JENKINS-42166 ProcessLiveness.workingLaunchers heuristic is flaky
-
- Resolved
-
[JENKINS-46969] Docker container closes prematurely
Link |
New:
This issue relates to |
Description |
Original:
I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases: *10:13:54* [ksp_delme] Running shell script*10:13:54* + for i in '\{1..24}'*10:13:54* + sleep 1200 [Pipeline] } [Pipeline] // stage [Pipeline] }*10:16:07* $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208 [Pipeline] // withDockerContainer [Pipeline] } [Pipeline] // timestamps [Pipeline] } ERROR: script returned exit code -1 So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1". I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely. Any help anyone can give to help debug this problem further would be appreciated. |
New:
I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases: *10:13:54* [ksp_delme] Running shell script*10:13:54* + for i in '\{1..24}' *10:13:54* + sleep 1200 [Pipeline] } [Pipeline] // stage [Pipeline] } *10:16:07* $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208 [Pipeline] // withDockerContainer [Pipeline] } [Pipeline] // timestamps [Pipeline] } ERROR: script returned exit code -1 So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1". I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely. To make matters worse, I and run and re-run this test case dozens of times before I get the error, so it is very difficult to reproduce. Based on my preliminary review of our production systems I believe the load on the agents encountering this problem plays a part in the problem, although I'm not entirely sure how. Perhaps running many parallel builds on the same agent all running Docker containers may be a factor, but I'm not certain. Any help anyone can give to help debug this problem further would be appreciated. |
Link | New: This issue relates to JENKINS-40101 [ JENKINS-40101 ] |
Link | New: This issue relates to JENKINS-35370 [ JENKINS-35370 ] |
Link |
New:
This issue relates to |
Link |
New:
This issue relates to |
Link | New: This issue duplicates JENKINS-35370 [ JENKINS-35370 ] |
Resolution | New: Duplicate [ 3 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
For reference, the Pipeline DSL code I was using for my test above looks like this: