-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Jenkins 2.46.3 / 2.60.2
Docker Workflow plugin v1.10/1.12
I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases:
10:13:54 [ksp_delme] Running shell script*10:13:54* + for i in '{1..24}'
10:13:54 + sleep 1200
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
10:16:07 $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208
[Pipeline] // withDockerContainer
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] }
ERROR: script returned exit code -1
So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1".
I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely.
To make matters worse, I and run and re-run this test case dozens of times before I get the error, so it is very difficult to reproduce. Based on my preliminary review of our production systems I believe the load on the agents encountering this problem plays a part in the problem, although I'm not entirely sure how. Perhaps running many parallel builds on the same agent all running Docker containers may be a factor, but I'm not certain.
Any help anyone can give to help debug this problem further would be appreciated.
- duplicates
-
JENKINS-35370 Workflow shell step ERROR: script returned exit code -1
- Reopened
- is related to
-
JENKINS-47822 docker pipeline finish beforehand when tcp socket is used
- Closed
- relates to
-
JENKINS-42322 Docker rm/stop/... commands killed by the timeout, failing builds
- Resolved
-
JENKINS-40101 Different behavior between debian container using docker.inside
- Open
-
JENKINS-35370 Workflow shell step ERROR: script returned exit code -1
- Reopened
-
JENKINS-34289 docker.image.inside fails unexpectedly with Jenkinsfile
- Resolved
-
JENKINS-42166 ProcessLiveness.workingLaunchers heuristic is flaky
- Resolved