Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42322

Docker rm/stop/... commands killed by the timeout, failing builds

    XMLWordPrintable

Details

    Description

      Hi,

      I've recently upgraded the docker workflow plugin from 1.8 to 1.10, in 1.8 my pipeline worked perfectly well, It uses 2 external containers and 1 where actions are done into.

      In 1.10 I have the following error on the container launched with .inside { } method :

      $ docker stop --time=1 6b4b512400884e660dc4cd4eda6e9b3d7c358317f08a1c46399b5253ec7e1b02
      $ docker rm -f 6b4b512400884e660dc4cd4eda6e9b3d7c358317f08a1c46399b5253ec7e1b02
      ERROR: Timeout after 10 seconds
      

      And the job, fail with

      java.io.IOException: Failed to rm container '6b4b512400884e660dc4cd4eda6e9b3d7c358317f08a1c46399b5253ec7e1b02'.
      	at org.jenkinsci.plugins.docker.workflow.client.DockerClient.rm(DockerClient.java:158)
      	at org.jenkinsci.plugins.docker.workflow.client.DockerClient.stop(DockerClient.java:145)
      	at org.jenkinsci.plugins.docker.workflow.WithContainerStep.destroy(WithContainerStep.java:107)
      	at org.jenkinsci.plugins.docker.workflow.WithContainerStep.access$400(WithContainerStep.java:74)
      	at org.jenkinsci.plugins.docker.workflow.WithContainerStep$Callback.finished(WithContainerStep.java:302)
      	at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onSuccess(BodyExecutionCallback.java:114)
      	at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$SuccessAdapter.receive(CpsBodyExecution.java:362)
      	at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73)
      	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:33)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable$1.call(SandboxContinuable.java:30)
      	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:108)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:30)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:165)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:328)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$100(CpsThreadGroup.java:80)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:240)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:228)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:64)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      
      

      This seems to be related with the
      https://github.com/jenkinsci/docker-workflow-plugin/pull/65
      I tried to downgrade manualy to 1.8 and it works well again (as it does not specify the option --time=1 anymore).

      Is there a way to disable this option ?

      Thanks,

      Attachments

        Issue Links

          Activity

            jglick Jesse Glick added a comment -

            cat with empty stdin is just a way to hang. Could use e.g. sleep 999999 if that binary is just as ubiquitous.

            jglick Jesse Glick added a comment - cat with empty stdin is just a way to hang. Could use e.g. sleep 999999 if that binary is just as ubiquitous.

            jglick, yes! `cat` does its job, but it is not killable with `SIGTERM`. In the end, everything is killable, but it needs more effort from the docker.

            The problem is, for some reason 180s is not enough for `docker rm` in our infra. I just do not have an idea. The problem looks unlogical to me, so my just a random idea, maybe it could be improved by using something different than `cat`?

            Or another idea, maybe is it possible to do `docker rm` outside job asynchronously?

            neworldlt Andrius Semionovas added a comment - jglick , yes! `cat` does its job, but it is not killable with `SIGTERM`. In the end, everything is killable, but it needs more effort from the docker. The problem is, for some reason 180s is not enough for `docker rm` in our infra. I just do not have an idea. The problem looks unlogical to me, so my just a random idea, maybe it could be improved by using something different than `cat`? Or another idea, maybe is it possible to do `docker rm` outside job asynchronously?
            paferdie Jenkins User added a comment - - edited

            I have tried to redefine the CLIENT_TIMEOUT parameter via the JAVA_OPTS environment variable in the Docker agent definition in Jenkins DSL, without success:

             

            agent {
             {{ docker { }}
             {{ image "**/**:1.0"}}
             {{ args "-u jenkins -v /var/run/docker.sock:/var/run/docker.sock --security-opt seccomp=unconfined --name \"${BUILD_TAG}\" -e JAVA_OPTS=\"-Dorg.jenkinsci.plugins.docker.workflow.client.DockerClient.CLIENT_TIMEOUT=240\""}}
             {{ reuseNode true}}
             {{ alwaysPull true}}
             {{ label node_label}}
             {{ }}}
             }
            {{}}
              

            I check the environment variables inside the Docker container, during the pipeline execution, and the JAVA_OPTS variable is well set:

            ~$ docker inspect ******* | jq '.[] | .Config.Env'
            ...  "JAVA_OPTS=-Dorg.jenkinsci.plugins.docker.workflow.client.DockerClient.CLIENT_TIMEOUT=240",
            

            Please, I need your help. The pipeline is working right, except for a container with a size of 1,5 GB:

             

            {{ [Pipeline] }$ docker stop --time=1 c4bf7a64a476881420dd7d03aba9d2596ce799f309a10ca66e5cdd739b6afe9c$ docker rm -f c4bf7a64a476881420dd7d03aba9d2596ce799f309a10ca66e5cdd739b6afe9c}}
             {{ ERROR: Timeout after 180 seconds[Pipeline] // withDockerContainer[Pipeline] }[Pipeline] // node[Pipeline] End of Pipelinejava.io.IOException: Failed to rm container 'c4bf7a64a476881420dd7d03aba9d2596ce799f309a10ca66e5cdd739b6afe9c'.}}
             {{ at org.jenkinsci.plugins.docker.workflow.client.DockerClient.rm(DockerClient.java:201)}}
             {{ at org.jenkinsci.plugins.docker.workflow.client.DockerClient.stop(DockerClient.java:187)}}
             {{ at org.jenkinsci.plugins.docker.workflow.WithContainerStep.destroy(WithContainerStep.java:109)}}
             {{ at org.jenkinsci.plugins.docker.workflow.WithContainerStep.access$400(WithContainerStep.java:76)}}
             {{ ...}}
             

            Jenkins version: 2.277.4 – Docker pipeline: 1.26

            Please, any help would be appreciated. Thanks in advance.

             

            paferdie Jenkins User added a comment - - edited I have tried to redefine the CLIENT_TIMEOUT parameter via the JAVA_OPTS environment variable in the Docker agent definition in Jenkins DSL, without success:   agent { {{ docker { }} {{ image "**/**:1.0" }} {{ args "-u jenkins -v / var /run/docker.sock:/ var /run/docker.sock --security-opt seccomp=unconfined --name \" ${BUILD_TAG}\ " -e JAVA_OPTS=\" -Dorg.jenkinsci.plugins.docker.workflow.client.DockerClient.CLIENT_TIMEOUT=240\""}} {{ reuseNode true }} {{ alwaysPull true }} {{ label node_label}} {{ }}} } {{}}   I check the environment variables inside the Docker container, during the pipeline execution, and the JAVA_OPTS variable is well set: ~$ docker inspect ******* | jq '.[] | .Config.Env' ...  "JAVA_OPTS=-Dorg.jenkinsci.plugins.docker.workflow.client.DockerClient.CLIENT_TIMEOUT=240" , Please, I need your help. The pipeline is working right, except for a container with a size of 1,5 GB:   {{ [Pipeline] }$ docker stop --time=1 c4bf7a64a476881420dd7d03aba9d2596ce799f309a10ca66e5cdd739b6afe9c$ docker rm -f c4bf7a64a476881420dd7d03aba9d2596ce799f309a10ca66e5cdd739b6afe9c}} {{ ERROR: Timeout after 180 seconds[Pipeline] // withDockerContainer[Pipeline] }[Pipeline] // node[Pipeline] End of Pipelinejava.io.IOException: Failed to rm container 'c4bf7a64a476881420dd7d03aba9d2596ce799f309a10ca66e5cdd739b6afe9c' .}} {{ at org.jenkinsci.plugins.docker.workflow.client.DockerClient.rm(DockerClient.java:201)}} {{ at org.jenkinsci.plugins.docker.workflow.client.DockerClient.stop(DockerClient.java:187)}} {{ at org.jenkinsci.plugins.docker.workflow.WithContainerStep.destroy(WithContainerStep.java:109)}} {{ at org.jenkinsci.plugins.docker.workflow.WithContainerStep.access$400(WithContainerStep.java:76)}} {{ ...}} Jenkins version: 2.277.4 – Docker pipeline: 1.26 Please, any help would be appreciated. Thanks in advance.  
            paferdie Jenkins User added a comment - - edited

            chrismaes How did you manage to increase the client timeout to 280 seconds? Thanks.

            paferdie Jenkins User added a comment - - edited chrismaes  How did you manage to increase the client timeout to 280 seconds? Thanks.
            paferdie Jenkins User added a comment -

            jglick Please, could you help me? I don´t know how to configure this property for CLIENT_TIMEOUT? Is it possible in the Jenkins DSL Pipeline? If not, I understand that it is possible to define it in the configuration of the Jenkins nodes, don´t I? You can check my doubts, two messages before.  Thanks in advance. 

            paferdie Jenkins User added a comment - jglick  Please, could you help me? I don´t know how to configure this property for CLIENT_TIMEOUT? Is it possible in the Jenkins DSL Pipeline? If not, I understand that it is possible to define it in the configuration of the Jenkins nodes, don´t I? You can check my doubts, two messages before.  Thanks in advance. 

            People

              jglick Jesse Glick
              kevanescence Kevin REMY
              Votes:
              34 Vote for this issue
              Watchers:
              50 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: