Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42322

Docker rm/stop/... commands killed by the timeout, failing builds

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • docker-workflow-plugin
    • Jenkins 2.46
      Debian GNU/Linux 7.8 / Docker 1.13.1
      Docker worflow plugin 1.10

      Hi,

      I've recently upgraded the docker workflow plugin from 1.8 to 1.10, in 1.8 my pipeline worked perfectly well, It uses 2 external containers and 1 where actions are done into.

      In 1.10 I have the following error on the container launched with .inside { } method :

      $ docker stop --time=1 6b4b512400884e660dc4cd4eda6e9b3d7c358317f08a1c46399b5253ec7e1b02
      $ docker rm -f 6b4b512400884e660dc4cd4eda6e9b3d7c358317f08a1c46399b5253ec7e1b02
      ERROR: Timeout after 10 seconds
      

      And the job, fail with

      java.io.IOException: Failed to rm container '6b4b512400884e660dc4cd4eda6e9b3d7c358317f08a1c46399b5253ec7e1b02'.
      	at org.jenkinsci.plugins.docker.workflow.client.DockerClient.rm(DockerClient.java:158)
      	at org.jenkinsci.plugins.docker.workflow.client.DockerClient.stop(DockerClient.java:145)
      	at org.jenkinsci.plugins.docker.workflow.WithContainerStep.destroy(WithContainerStep.java:107)
      	at org.jenkinsci.plugins.docker.workflow.WithContainerStep.access$400(WithContainerStep.java:74)
      	at org.jenkinsci.plugins.docker.workflow.WithContainerStep$Callback.finished(WithContainerStep.java:302)
      	at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onSuccess(BodyExecutionCallback.java:114)
      	at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$SuccessAdapter.receive(CpsBodyExecution.java:362)
      	at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73)
      	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:33)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:30)
      	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:30)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:165)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:328)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:80)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:240)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:228)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      
      

      This seems to be related with the
      https://github.com/jenkinsci/docker-workflow-plugin/pull/65
      I tried to downgrade manualy to 1.8 and it works well again (as it does not specify the option --time=1 anymore).

      Is there a way to disable this option ?

      Thanks,

          [JENKINS-42322] Docker rm/stop/... commands killed by the timeout, failing builds

          This is my jenkins in a docker-compose that works, and the environment JAVA_OPTS has to be modified to load the params:

              jenkins:
                  image: myregistry.com/jenkins:2.83:latest
                  command: --prefix="/jenkins"
                  user: root
                  ports:
                      - "8081:8080"
                      - "8050:50000"
                  volumes:
                      - jenkins-dv:/var/jenkins_home
                  environment:
                      - JAVA_OPTS=-Dorg.jenkinsci.plugins.docker.workflow.client.DockerClient.CLIENT_TIMEOUT=240

          Benjamin Henrion added a comment - This is my jenkins in a docker-compose that works, and the environment JAVA_OPTS has to be modified to load the params:     jenkins:         image: myregistry.com/jenkins:2.83:latest         command: --prefix="/jenkins"         user: root         ports:             - "8081:8080"             - "8050:50000"         volumes:             - jenkins-dv:/var/jenkins_home         environment:             - JAVA_OPTS=-Dorg.jenkinsci.plugins.docker.workflow.client.DockerClient.CLIENT_TIMEOUT=240

          I'm commenting to report that we begun facing similar errors last week, after we changed several pipeline scripts, which increased the frequency with which we launch Docker containers.

          We're using Jenkins 2.107.2 and docker-workflow-plugin 1.17.

          I increased the plugin timeout to 250s and decreased the number of executors in our slaves to no avail.

          In despair, I patched the plugin to make it ignore docker-rm errors. Using the patched plugin we were able to finish all of our weekend builds successfully.

          I know this is not a decent solution. I feared that our builds could be delayed waiting for the stuck containers, but I didn't notice this or any other side effect. So, for now, we'll use this patched plugin.

          I don't understand yet what makes the docker-rm command delay. Do you have some hints to share?

          Gustavo Chaves added a comment - I'm commenting to report that we begun facing similar errors last week, after we changed several pipeline scripts, which increased the frequency with which we launch Docker containers. We're using Jenkins 2.107.2 and docker-workflow-plugin 1.17. I increased the plugin timeout to 250s and decreased the number of executors in our slaves to no avail. In despair, I patched the plugin to make it ignore docker-rm errors. Using the patched plugin we were able to finish all of our weekend builds successfully. I know this is not a decent solution. I feared that our builds could be delayed waiting for the stuck containers, but I didn't notice this or any other side effect. So, for now, we'll use this patched plugin. I don't understand yet what makes the docker-rm command delay. Do you have some hints to share?

          Sam Van Oort added a comment -

          gnustavo Which version of Docker itself are you using, and do you see the timeout error, or just the error "Failed to rm container" ?

          Sam Van Oort added a comment - gnustavo Which version of Docker itself are you using, and do you see the timeout error, or just the error "Failed to rm container" ?

          Chris Maes added a comment -

          jglick: thanks for the hint, this seems to help. We now have one timeout with 280 seconds... You say around 300 and more will probably not work. Has this been verified? What kind of error or so would you expect if we put a value above 300?

          Chris Maes added a comment - jglick : thanks for the hint, this seems to help. We now have one timeout with 280 seconds... You say around 300 and more will probably not work. Has this been verified? What kind of error or so would you expect if we put a value above 300?

          Joshua Noble comment keep me wondering, why this plugin chooses to use the `cat` program as the first container process? I did some tests and found, that `cat` is not killable with SIGTERM, while `bash` is killed perfectly fine. Also, `docker stop` on the cat took 10 seconds, which means it was forcefully killed. While the same `docker stop` on bash happens immediately. Could it be related to this issue?

          Andrius Semionovas added a comment - Joshua Noble comment keep me wondering, why this plugin chooses to use the `cat` program as the first container process? I did some tests and found, that `cat` is not killable with SIGTERM, while `bash` is killed perfectly fine. Also, `docker stop` on the cat took 10 seconds, which means it was forcefully killed. While the same `docker stop` on bash happens immediately. Could it be related to this issue?

          Jesse Glick added a comment -

          cat with empty stdin is just a way to hang. Could use e.g. sleep 999999 if that binary is just as ubiquitous.

          Jesse Glick added a comment - cat with empty stdin is just a way to hang. Could use e.g. sleep 999999 if that binary is just as ubiquitous.

          jglick, yes! `cat` does its job, but it is not killable with `SIGTERM`. In the end, everything is killable, but it needs more effort from the docker.

          The problem is, for some reason 180s is not enough for `docker rm` in our infra. I just do not have an idea. The problem looks unlogical to me, so my just a random idea, maybe it could be improved by using something different than `cat`?

          Or another idea, maybe is it possible to do `docker rm` outside job asynchronously?

          Andrius Semionovas added a comment - jglick , yes! `cat` does its job, but it is not killable with `SIGTERM`. In the end, everything is killable, but it needs more effort from the docker. The problem is, for some reason 180s is not enough for `docker rm` in our infra. I just do not have an idea. The problem looks unlogical to me, so my just a random idea, maybe it could be improved by using something different than `cat`? Or another idea, maybe is it possible to do `docker rm` outside job asynchronously?

          Jenkins User added a comment - - edited

          I have tried to redefine the CLIENT_TIMEOUT parameter via the JAVA_OPTS environment variable in the Docker agent definition in Jenkins DSL, without success:

           

          agent {
           {{ docker { }}
           {{ image "**/**:1.0"}}
           {{ args "-u jenkins -v /var/run/docker.sock:/var/run/docker.sock --security-opt seccomp=unconfined --name \"${BUILD_TAG}\" -e JAVA_OPTS=\"-Dorg.jenkinsci.plugins.docker.workflow.client.DockerClient.CLIENT_TIMEOUT=240\""}}
           {{ reuseNode true}}
           {{ alwaysPull true}}
           {{ label node_label}}
           {{ }}}
           }
          {{}}
            

          I check the environment variables inside the Docker container, during the pipeline execution, and the JAVA_OPTS variable is well set:

          ~$ docker inspect ******* | jq '.[] | .Config.Env'
          ...  "JAVA_OPTS=-Dorg.jenkinsci.plugins.docker.workflow.client.DockerClient.CLIENT_TIMEOUT=240",
          

          Please, I need your help. The pipeline is working right, except for a container with a size of 1,5 GB:

           

          {{ [Pipeline] }$ docker stop --time=1 c4bf7a64a476881420dd7d03aba9d2596ce799f309a10ca66e5cdd739b6afe9c$ docker rm -f c4bf7a64a476881420dd7d03aba9d2596ce799f309a10ca66e5cdd739b6afe9c}}
           {{ ERROR: Timeout after 180 seconds[Pipeline] // withDockerContainer[Pipeline] }[Pipeline] // node[Pipeline] End of Pipelinejava.io.IOException: Failed to rm container 'c4bf7a64a476881420dd7d03aba9d2596ce799f309a10ca66e5cdd739b6afe9c'.}}
           {{ at org.jenkinsci.plugins.docker.workflow.client.DockerClient.rm(DockerClient.java:201)}}
           {{ at org.jenkinsci.plugins.docker.workflow.client.DockerClient.stop(DockerClient.java:187)}}
           {{ at org.jenkinsci.plugins.docker.workflow.WithContainerStep.destroy(WithContainerStep.java:109)}}
           {{ at org.jenkinsci.plugins.docker.workflow.WithContainerStep.access$400(WithContainerStep.java:76)}}
           {{ ...}}
           

          Jenkins version: 2.277.4 – Docker pipeline: 1.26

          Please, any help would be appreciated. Thanks in advance.

           

          Jenkins User added a comment - - edited I have tried to redefine the CLIENT_TIMEOUT parameter via the JAVA_OPTS environment variable in the Docker agent definition in Jenkins DSL, without success:   agent { {{ docker { }} {{ image "**/**:1.0" }} {{ args "-u jenkins -v / var /run/docker.sock:/ var /run/docker.sock --security-opt seccomp=unconfined --name \" ${BUILD_TAG}\ " -e JAVA_OPTS=\" -Dorg.jenkinsci.plugins.docker.workflow.client.DockerClient.CLIENT_TIMEOUT=240\""}} {{ reuseNode true }} {{ alwaysPull true }} {{ label node_label}} {{ }}} } {{}}   I check the environment variables inside the Docker container, during the pipeline execution, and the JAVA_OPTS variable is well set: ~$ docker inspect ******* | jq '.[] | .Config.Env' ...  "JAVA_OPTS=-Dorg.jenkinsci.plugins.docker.workflow.client.DockerClient.CLIENT_TIMEOUT=240" , Please, I need your help. The pipeline is working right, except for a container with a size of 1,5 GB:   {{ [Pipeline] }$ docker stop --time=1 c4bf7a64a476881420dd7d03aba9d2596ce799f309a10ca66e5cdd739b6afe9c$ docker rm -f c4bf7a64a476881420dd7d03aba9d2596ce799f309a10ca66e5cdd739b6afe9c}} {{ ERROR: Timeout after 180 seconds[Pipeline] // withDockerContainer[Pipeline] }[Pipeline] // node[Pipeline] End of Pipelinejava.io.IOException: Failed to rm container 'c4bf7a64a476881420dd7d03aba9d2596ce799f309a10ca66e5cdd739b6afe9c' .}} {{ at org.jenkinsci.plugins.docker.workflow.client.DockerClient.rm(DockerClient.java:201)}} {{ at org.jenkinsci.plugins.docker.workflow.client.DockerClient.stop(DockerClient.java:187)}} {{ at org.jenkinsci.plugins.docker.workflow.WithContainerStep.destroy(WithContainerStep.java:109)}} {{ at org.jenkinsci.plugins.docker.workflow.WithContainerStep.access$400(WithContainerStep.java:76)}} {{ ...}} Jenkins version: 2.277.4 – Docker pipeline: 1.26 Please, any help would be appreciated. Thanks in advance.  

          Jenkins User added a comment - - edited

          chrismaes How did you manage to increase the client timeout to 280 seconds? Thanks.

          Jenkins User added a comment - - edited chrismaes  How did you manage to increase the client timeout to 280 seconds? Thanks.

          Jenkins User added a comment -

          jglick Please, could you help me? I don´t know how to configure this property for CLIENT_TIMEOUT? Is it possible in the Jenkins DSL Pipeline? If not, I understand that it is possible to define it in the configuration of the Jenkins nodes, don´t I? You can check my doubts, two messages before.  Thanks in advance. 

          Jenkins User added a comment - jglick  Please, could you help me? I don´t know how to configure this property for CLIENT_TIMEOUT? Is it possible in the Jenkins DSL Pipeline? If not, I understand that it is possible to define it in the configuration of the Jenkins nodes, don´t I? You can check my doubts, two messages before.  Thanks in advance. 

            jglick Jesse Glick
            kevanescence Kevin REMY
            Votes:
            34 Vote for this issue
            Watchers:
            51 Start watching this issue

              Created:
              Updated:
              Resolved: