Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63628

Process leaked file descriptors message within docker top command

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • docker-workflow-plugin
    • None
    • Jenkins 2.204.2
      Docker Pipeline 1.22
      Jenkins Master and Agents running on CentOS 7 Linux 64 bit

      We are running Jenkins master and agent nodes on CentOS 7 Linux 64 bit. Agent nodes have docker (version 19.03.5) installed. All builds are executing on Jenkins agent nodes.
      We have a declarative pipeline job executing on 10 docker nodes in parallel stages. Each parallel stage starts a container on a docker node ,by pulling docker image from internal registry ,then executes some build actions inside the container. Sometime we face this intermittent issue just after the container is started on a docker node (log from Console Output):

      ----------------

      docker pull <docker_image>

      docker run -t -d -u <uid>:<uid> --user root:root --shm-size=2g -w <workspace_path> -v <workspace_path>:<workspace_path>:rw,z -v <workspace_path>@tmp:<workspace_path>@tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** <docker_image> cat

      docker top "ee632b4c84ed5aaea15608d5180169dfc7eedeaf7021a6724232e61c1f4d5d4c Process leaked file descriptors. See https://jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information" -eo pid,comm

      ...

      java.io.IOException: Failed to run top 'ee632b4c84ed5aaea15608d5180169dfc7eedeaf7021a6724232e61c1f4d5d4c Process leaked file descriptors. See https://jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information'.  Error: Error response from daemon: page not found at org.jenkinsci.plugins.docker.workflow.client.DockerClient.listProcess(DockerClient.java:145) at org.jenkinsci.plugins.docker.workflow.WithContainerStep$Execution.start(WithContainerStep.java:199) at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:286) at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:179) at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113) at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20) at org.jenkinsci.plugins.docker.workflow.Docker$Image.inside(Docker.groovy:126) at org.jenkinsci.plugins.docker.workflow.Docker.node(Docker.groovy:66) at org.jenkinsci.plugins.docker.workflow.Docker$Image.inside(Docker.groovy:114) at org.jenkinsci.plugins.docker.workflow.declarative.DockerPipelineScript.runImage(DockerPipelineScript.groovy:57) at org.jenkinsci.plugins.docker.workflow.declarative.AbstractDockerPipelineScript.configureRegistry(AbstractDockerPipelineScript.groovy:70) at org.jenkinsci.plugins.docker.workflow.Docker.withRegistry(Docker.groovy:41) at __cps.transform__(Native Method) at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:86) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83) at sun.reflect.GeneratedMethodAccessor332.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) at com.cloudbees.groovy.cps.impl.ClosureBlock.eval(ClosureBlock.java:46) at com.cloudbees.groovy.cps.Next.step(Next.java:83) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163) at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129) at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268) at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51) at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:185) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:400) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:96) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:312) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:276) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

      Finished: FAILURE

      ----------------

       

      This results in a started container on the docker node ,but no further steps are executed into the container and the stage fails. This is intermittent issue and we can't reproduce it. It  happens in random ,let say 3-5% of all builds.
      Note the success "docker top" command is: "docker top <container_hash> -eo pid,comm" and stage continue to execute all build actions/steps inside started container.
      We checked all resources like ulimits(especially nofiles user limits)/CPU/Memory/Disk ,etc. at the time of this issue and all were good.

      We cannot determine why we got this "Process leaked file descriptors ..." message within docker top ,and seems it is breaking the "docker top" command and causes the specific stage to fail.
      Do someone else face similar issue with docker-workflow-plugin?

      Let me know for any more details.

          [JENKINS-63628] Process leaked file descriptors message within docker top command

          Nik Reiman added a comment -

          We have also observed this, but our company is not using declarative pipelines. We've only seen it once so far, but I'm sure it'll resurface given enough builds.

          Nik Reiman added a comment - We have also observed this, but our company is not using declarative pipelines. We've only seen it once so far, but I'm sure it'll resurface given enough builds.

          Alex B added a comment -

          I can also confirm the problem when using declarative/multi-branch pipelines.

          The problem occurs roughly in ~1 in 200 builds on our machine - mostly when it's running on full load.

          Alex B added a comment - I can also confirm the problem when using declarative/multi-branch pipelines. The problem occurs roughly in ~1 in 200 builds on our machine - mostly when it's running on full load.

          Martin Karing added a comment - - edited

          I found the issue to be still present and affecting the WindowsDockerClient if the agent is running on Windows Server 2022.
          The issue happens a log if the agent is very busy.

          The stacktrace in this case looks like this:

          java.io.IOException: Failed to run top '5fc8e16a415725a31d105f4b9847cf0023e26fba4171f8fd043c074bbbd9031f
          Process leaked file descriptors. See https://www.jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information'. Error: Error response from daemon: page not found
          	at org.jenkinsci.plugins.docker.workflow.client.WindowsDockerClient.listProcess(WindowsDockerClient.java:66)
          	at org.jenkinsci.plugins.docker.workflow.WithContainerStep$Execution.start(WithContainerStep.java:201)
          	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:322)
          	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:196)
          	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:124)
          	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:47)
          	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
          	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
          	at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)
          	at org.jenkinsci.plugins.workflow.cps.LoggingInvoker.methodCall(LoggingInvoker.java:105)
          	at org.jenkinsci.plugins.docker.workflow.Docker$Image.inside(Docker.groovy:140)
          	at org.jenkinsci.plugins.docker.workflow.Docker.node(Docker.groovy:66)
          	at org.jenkinsci.plugins.docker.workflow.Docker$Image.inside(Docker.groovy:125)
          	at de.tkichemnitz.buildserver.msbuild.MSBuildProject.findDuplicatedCode(MSBuildProject.groovy:146)
          	at org.jenkinsci.plugins.docker.workflow.Docker.withServer(Docker.groovy:50)
          	at ___cps.transform___(Native Method)
          

          Martin Karing added a comment - - edited I found the issue to be still present and affecting the WindowsDockerClient if the agent is running on Windows Server 2022. The issue happens a log if the agent is very busy. The stacktrace in this case looks like this: java.io.IOException: Failed to run top '5fc8e16a415725a31d105f4b9847cf0023e26fba4171f8fd043c074bbbd9031f Process leaked file descriptors. See https: //www.jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information'. Error: Error response from daemon: page not found at org.jenkinsci.plugins.docker.workflow.client.WindowsDockerClient.listProcess(WindowsDockerClient.java:66) at org.jenkinsci.plugins.docker.workflow.WithContainerStep$Execution.start(WithContainerStep.java:201) at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:322) at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:196) at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:124) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:47) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20) at org.jenkinsci.plugins.workflow.cps.LoggingInvoker.methodCall(LoggingInvoker.java:105) at org.jenkinsci.plugins.docker.workflow.Docker$Image.inside(Docker.groovy:140) at org.jenkinsci.plugins.docker.workflow.Docker.node(Docker.groovy:66) at org.jenkinsci.plugins.docker.workflow.Docker$Image.inside(Docker.groovy:125) at de.tkichemnitz.buildserver.msbuild.MSBuildProject.findDuplicatedCode(MSBuildProject.groovy:146) at org.jenkinsci.plugins.docker.workflow.Docker.withServer(Docker.groovy:50) at ___cps.transform___(Native Method)

            Unassigned Unassigned
            ninaydenov Nikola Naydenov
            Votes:
            3 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: