-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Jenkins ver. 2.89.3 and 2.89.4, docker commons 1.9 and 1.11, docker pipeline 1.15 and 1.15.1
We have some load tests that run ~50 tests at a time overnight, in loops - so thousands of tests in a night. About 1% of them hang forever and must be manually killed.
Jenkins log:
Started by upstream project "tools/release-validator" build number 92 originally caused by: Started by timer Obtained Jenkinsfile from git [...] Running in Durability level: MAX_SURVIVABILITY Loading library TestRunner@master Attempting to resolve master from remote references... > git --version # timeout=10 > git ls-remote -h -t [...] # timeout=10 Found match: refs/heads/master revision 4f9f1287a87cedcccbe456d96176084fbfb2500c > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url [...] # timeout=10 Fetching without tags Fetching upstream changes from [...] > git --version # timeout=10 > git fetch --no-tags --progress [...] +refs/heads/*:refs/remotes/origin/* Checking out Revision 4f9f1287a87cedcccbe456d96176084fbfb2500c (master) > git config core.sparsecheckout # timeout=10 > git checkout -f 4f9f1287a87cedcccbe456d96176084fbfb2500c Commit message: "[...]" > git rev-list --no-walk 4f9f1287a87cedcccbe456d96176084fbfb2500c # timeout=10 [Pipeline] node Running on Jenkins in /var/jenkins_home/workspace/staging-load-tests/load-native_android_eu@10 [Pipeline] { [Pipeline] stage [Pipeline] { (checkout) [Pipeline] checkout > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url [...] # timeout=10 Fetching upstream changes from [...] > git --version # timeout=10 > git fetch --tags --progress [...] +refs/heads/*:refs/remotes/origin/* > git rev-parse refs/remotes/origin/master^{commit} # timeout=10 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10 Checking out Revision 4d6e39a68e488aa7c9e130d664326af6c646d1cb (refs/remotes/origin/master) > git config core.sparsecheckout # timeout=10 > git checkout -f 4d6e39a68e488aa7c9e130d664326af6c646d1cb Commit message: "Merge pull request #31 from [...]" > git rev-list --no-walk 4d6e39a68e488aa7c9e130d664326af6c646d1cb # timeout=10 [Pipeline] } [Pipeline] // stage [Pipeline] stage [Pipeline] { (run test) [Pipeline] sh [load-native_android_eu@10] Running shell script + docker inspect -f . maven:3.5.2 . [Pipeline] withDockerContainer Jenkins seems to be running inside container 5c894538586c4a19e2a60ca784403fbfda24cc75781a52ea8ae54028fecbe5ff $ docker run -t -d -u 0:0 -v /root/.m2:/root/.m2 -w /var/jenkins_home/workspace/staging-load-tests/load-native_android_eu@10 --volumes-from 5c894538586c4a19e2a60ca784403fbfda24cc75781a52ea8ae54028fecbe5ff -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** maven:3.5.2 cat $ docker top 901d717402c013afccae3074ec7e46c6ec70ce2e66f3e7e773ba9015d58c3cfa -eo pid,comm [Pipeline] // withDockerContainer [spinning wheel here]
I notice that //withDockerContainer seems out of place - normally it doesn't occur until much later.
Thread dump:
Thread #6 at org.jenkinsci.plugins.docker.workflow.Docker$Image.inside(jar:file:/var/jenkins_home/plugins/docker-workflow/WEB-INF/lib/docker-workflow.jar!/org/jenkinsci/plugins/docker/workflow/Docker.groovy:129) at org.jenkinsci.plugins.docker.workflow.Docker.node(jar:file:/var/jenkins_home/plugins/docker-workflow/WEB-INF/lib/docker-workflow.jar!/org/jenkinsci/plugins/docker/workflow/Docker.groovy:66) at org.jenkinsci.plugins.docker.workflow.Docker$Image.inside(jar:file:/var/jenkins_home/plugins/docker-workflow/WEB-INF/lib/docker-workflow.jar!/org/jenkinsci/plugins/docker/workflow/Docker.groovy:123) at TestRunner.runTest(/var/jenkins_home/jobs/staging-load-tests/jobs/load-native_android_eu/builds/35486/libs/TestRunner/vars/TestRunner.groovy:51) at DSL.stage(Native Method) at TestRunner.runTest(/var/jenkins_home/jobs/staging-load-tests/jobs/load-native_android_eu/builds/35486/libs/TestRunner/vars/TestRunner.groovy:44) at DSL.node(running on ) at TestRunner.runTest(/var/jenkins_home/jobs/staging-load-tests/jobs/load-native_android_eu/builds/35486/libs/TestRunner/vars/TestRunner.groovy:36) at TestRunner.call(/var/jenkins_home/jobs/staging-load-tests/jobs/load-native_android_eu/builds/35486/libs/TestRunner/vars/TestRunner.groovy:17) at WorkflowScript.run(WorkflowScript:8)
The pipeline script itself runs with a pipeline library script. Here's what triggers it:
#!groovy @Library('TestRunner') _ def test = { sh "mvn -q clean test -DthreadCount=${env.PARALLEL_TESTS ?: 5} -Dtest=${env.TESTS}" } TestRunner { steps = test }
TestRunner has a bunch of code for flexibility but essentially runs something like:
node { // checkout stage("test") { docker.inside("maven") { steps() } } }
I can provide more detail if needed.
[JENKINS-49710] Pipelines run under heavy load sometimes hang running Docker
Component/s | New: docker-workflow-plugin [ 20625 ] | |
Component/s | Original: docker [ 20834 ] | |
Component/s | Original: pipeline [ 21692 ] |
Assignee | Original: Nicolas De Loof [ ndeloof ] |
Environment | Original: Jenkins ver. 2.89.3 and 2.89.4, docker commons 1.11, docker pipeline 1.15.1 | New: Jenkins ver. 2.89.3 and 2.89.4, docker commons 1.9 and 1.11, docker pipeline 1.15 and 1.15.1 |
Component/s | New: durable-task-plugin [ 18622 ] | |
Component/s | New: pipeline [ 21692 ] |
Labels | New: groo |
Labels | Original: groo |