-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major
-
Component/s: docker-workflow-plugin
-
None
We have a pipeline that's creating and launching jobs in parallel inside docker containers. The workload is spread across multiple machines. When the below pipeline launches 0-5 jobs in parallel they complete successfully 100% of the time, but when it launched 35 jobs there's 1 or 2 jobs that fail 100% of the time AFTER the workload completes successfully (shown by "python job runs here" in the below pipeline).
The error is always the same, the docker plugin fails to stop a container. The docker logs show that the container exited with code 137 meaning docker was finally able to stop the container with kill -9.
Â
Â
Â
pipeline {
agent {
node {
label 'master'
}
}
stages {
stage('prev stage running docker containers') {}
stage('problematic stage') {
steps {
script {
unstash name: 'playbook'
def playbook = readJSON file: 'playbook.json'
def simulations = [:]
int counter = 0
playbook.each { job ->
python_jobs["worker ${counter++}"] = {
node(label: 'label') {
ws(dir: 'workspace/python') {
script {
docker.withRegistry(env.PROJECT_DOCKER_REGISTRY, env.PROJECT_DOCKER_REGISTRY_CREDENTIAL_ID) {
docker.image(env.PROJECT_DOCKER_IMAGE).inside('-e http_proxy -e https_proxy -e no_proxy') {
// python job runs here
}
}
}
}
}
}
}
python_jobs.failFast = false
parallel python_jobs
}
}
}
}
}
Â
Found unhandled java.io.IOException exception:1Failed to kill container 'fd4059a173c0bbf107e9231194747ecfc28595f9579ecbd77b82209cf5b219eb'.2 org.jenkinsci.plugins.docker.workflow.client.DockerClient.stop(DockerClient.java:187)3 org.jenkinsci.plugins.docker.workflow.WithContainerStep.destroy(WithContainerStep.java:111)4 org.jenkinsci.plugins.docker.workflow.WithContainerStep$Callback.finished(WithContainerStep.java:415)5 org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onSuccess(BodyExecutionCallback.java:119)6 org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$SuccessAdapter.receive(CpsBodyExecution.java:375)7 com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:70)8 com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:144)9 org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:17)10 org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:49)11 org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:180)12 org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:423)13 org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:331)14 org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:295)15 org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService.lambda$wrap$4(CpsVmExecutorService.java:136)16 java.base/java.util.concurrent.FutureTask.run(Unknown Source)17 hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)18 jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)19 jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)20 jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)21 java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)22 java.base/java.util.concurrent.FutureTask.run(Unknown Source)23 java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)24 java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)25 org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.call(CpsVmExecutorService.java:53)26 org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.call(CpsVmExecutorService.java:50)27 org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136)28 org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275)29 org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService.lambda$categoryThreadFactory$0(CpsVmExecutorService.java:50)30 java.base/java.lang.Thread.run(Unknown Source)