[JENKINS-46969] Docker container closes prematurely - Jenkins Jira

Type: Bug
Resolution: Duplicate
Priority: Major
Component/s: docker-workflow-plugin
Labels:
None
Environment:
Jenkins 2.46.3 / 2.60.2
Docker Workflow plugin v1.10/1.12

Similar Issues:
Powered by SuggestiMate

Show

I've been seeing intermittent build failures with some of our Jenkins Pipeline builds when running build steps within Docker containers. The environments in question build a Docker image from a Dockerfile on the fly, then run the build steps within an instance of the image using the docker.inside() method. From what I can tell, the operations run within the container occassionally cease execution before completion. If I sound unsure of the exact cause it's because the output produced in the build logs are largely useless. Below is an example of one of my trial cases:

10:13:54 [ksp_delme] Running shell script*10:13:54* + for i in '{1..24}'

10:13:54 + sleep 1200
[Pipeline] }
[Pipeline] // stage
[Pipeline] }

10:16:07 $ docker stop --time=1 d8d8049ea42e9b7ee12a6363ea1a31439c18d0f9f3ea0068d605eb77562e4208
[Pipeline] // withDockerContainer
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] }
ERROR: script returned exit code -1

So as you can see from the timestamps in the output, a container gets launched then I put an "sh 'sleep'" operation within the container. While the sleep should have lasted 20 minutes the container exited about 2 minutes later, and the log indicates that the "script" exited with a return code of "-1".

I can assure you that the sleep operation did not error out with a -1, but even if it did the error value reported by Jenkins would have been 255 since the negative integer appears to be converted to an unsigned 8bit value. Conversely, I did discover that if I log in to our build box and rerun this test case and manually force-terminate the container running the build, I get the exact same error code / result in the log. So I'm guessing that something somewhere is causing the container to terminate prematurely.

To make matters worse, I and run and re-run this test case dozens of times before I get the error, so it is very difficult to reproduce. Based on my preliminary review of our production systems I believe the load on the agents encountering this problem plays a part in the problem, although I'm not entirely sure how. Perhaps running many parallel builds on the same agent all running Docker containers may be a factor, but I'm not certain.

Any help anyone can give to help debug this problem further would be appreciated.

duplicates

JENKINS-35370 Workflow shell step ERROR: script returned exit code -1

Reopened

is related to

JENKINS-47822 docker pipeline finish beforehand when tcp socket is used

Closed

relates to

JENKINS-42322 Docker rm/stop/... commands killed by the timeout, failing builds

Resolved

JENKINS-40101 Different behavior between debian container using docker.inside

Open

JENKINS-35370 Workflow shell step ERROR: script returned exit code -1

Reopened

JENKINS-34289 docker.image.inside fails unexpectedly with Jenkinsfile

Resolved

JENKINS-42166 ProcessLiveness.workingLaunchers heuristic is flaky

Resolved

(2 relates to)

Assignee:: Unassigned

Reporter:: Kevin Phillips

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2017-09-19 18:28

Updated:: 2017-11-03 20:36

Resolved:: 2017-09-20 12:34

Details

Description

Attachments

Issue Links

Activity

People

Dates