-
Bug
-
Resolution: Unresolved
-
Major
-
None
We have a pipeline that's creating and launching jobs in parallel inside docker containers. The workload is spread across multiple machines. When the below pipeline launches 0-5 jobs in parallel they complete successfully 100% of the time, but when it launched 35 jobs there's 1 or 2 jobs that fail 100% of the time AFTER the workload completes successfully (shown by "python job runs here" in the below pipeline).
The error is always the same, the docker plugin fails to stop a container. The docker logs show that the container exited with code 137 meaning docker was finally able to stop the container with kill -9.
pipeline { agent { node { label 'master' } } stages { stage('prev stage running docker containers') {} stage('problematic stage') { steps { script { unstash name: 'playbook' def playbook = readJSON file: 'playbook.json' def simulations = [:] int counter = 0 playbook.each { job -> python_jobs["worker ${counter++}"] = { node(label: 'label') { ws(dir: 'workspace/python') { script { docker.withRegistry(env.PROJECT_DOCKER_REGISTRY, env.PROJECT_DOCKER_REGISTRY_CREDENTIAL_ID) { docker.image(env.PROJECT_DOCKER_IMAGE).inside('-e http_proxy -e https_proxy -e no_proxy') { // python job runs here } } } } } } } python_jobs.failFast = false parallel python_jobs } } } } }
Found unhandled java.io.IOException exception:1Failed to kill container 'fd4059a173c0bbf107e9231194747ecfc28595f9579ecbd77b82209cf5b219eb'.2 org.jenkinsci.plugins.docker.workflow.client.DockerClient.stop(DockerClient.java:187)3 org.jenkinsci.plugins.docker.workflow.WithContainerStep.destroy(WithContainerStep.java:111)4 org.jenkinsci.plugins.docker.workflow.WithContainerStep$Callback.finished(WithContainerStep.java:415)5 org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onSuccess(BodyExecutionCallback.java:119)6 org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$SuccessAdapter.receive(CpsBodyExecution.java:375)7 com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:70)8 com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:144)9 org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:17)10 org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:49)11 org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:180)12 org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:423)13 org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:331)14 org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:295)15 org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService.lambda$wrap$4(CpsVmExecutorService.java:136)16 java.base/java.util.concurrent.FutureTask.run(Unknown Source)17 hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)18 jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)19 jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)20 jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)21 java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)22 java.base/java.util.concurrent.FutureTask.run(Unknown Source)23 java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)24 java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)25 org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.call(CpsVmExecutorService.java:53)26 org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.call(CpsVmExecutorService.java:50)27 org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136)28 org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275)29 org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService.lambda$categoryThreadFactory$0(CpsVmExecutorService.java:50)30 java.base/java.lang.Thread.run(Unknown Source)