Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-58656

Wrapper process leaves zombie when no init process present

      The merge of PR-98 moved the wrapper process to the background to allow the launching process to quickly exit. However, that very act will orphan the wrapper process. This is only a problem in environments where there is no init process (e.g. docker containers that are run with no --init flag).

      Unit tests did not discover this bug due to a race condition of when the last ps was called and when the wrapper process exited. If another ps is called after the test detects that the script as finished running, the zombie state of the wrapper process is revealed.

      I'm not sure how much of an issue this really is as there are numerous solutions on enabling zombie-reaping for containers, but as there is an explicit check for zombies in the unit tests, it seemed worth mentioning.

          [JENKINS-58656] Wrapper process leaves zombie when no init process present

          Jesse Glick added a comment -

          But it seems that PID namespace sharing is off by default, so most users with Jenkins running in a K8s cluster would not be able to rely on that.

          As per this comment it seems that to avoid a memory leak for kubernetes plugin users, we would need to patch docker-jnlp-slave to run Tini (or equivalent). But if I understand correctly, that would only help for subprocesses of the default agent container, not for other containers listed in the pod template and used via the container step: to fix these, we would need to ensure that the command run via (the API equivalent of) kubectl exec by ContainerExecDecorator waits for all its children. Right?

          Is there a known easy way to check for this condition in a realistic cluster, say GKE? Run some straightforward Pipeline build (using podTemplate + node + container + sh) a bunch of times, and then somehow get a root shell into the raw node and scan for zombie processes?

          Jesse Glick added a comment - But it seems that PID namespace sharing is off by default, so most users with Jenkins running in a K8s cluster would not be able to rely on that. As per this comment it seems that to avoid a memory leak for kubernetes plugin users, we would need to patch docker-jnlp-slave to run Tini (or equivalent). But if I understand correctly, that would only help for subprocesses of the default agent container, not for other containers listed in the pod template and used via the container step: to fix these, we would need to ensure that the command run via (the API equivalent of) kubectl exec by ContainerExecDecorator waits for all its children. Right? Is there a known easy way to check for this condition in a realistic cluster, say GKE? Run some straightforward Pipeline build (using podTemplate + node + container + sh ) a bunch of times, and then somehow get a root shell into the raw node and scan for zombie processes?

          Carroll Chiou added a comment - - edited

          Yes, just to clarify, I ran my Jenkinsfile on a Jenkins instance that was deployed to GKE. The Jenkinsfile has two stages to it, with each stage running a different pod template. The only difference between the two pod templates is thatI add the shareProcessNamespace: true to the last stage. You have to look a bit carefully, but in the ps output for the first stage, you will see a zombie process, whereas in the second stage, there is no zombie process.
          Now this instance is running my latest version of `durable-task` from PR-106. I can also confirm that the behavior is the same with the latest version on `master`

          I only need to run sh once and wait a bit to pull up a zombie instance. durable-task is guaranteed to create a zombie every time it is executed due to the background process requirement. This only happens within the container itself, so once the container goes away, so do the zombies. My understanding of zombie processes is that the only resource they're consuming is the entry in the process table. So I guess if you have a long running container that's doing a serious amount of shell steps then you can run into trouble? For reference, I looked into /proc/sys/kernel/pid_max for the jenkins/jnlp-slave image and got 99,999. Apparently on 32 bit systems pid_max can be configured up to >4 million (2^22) entries. And this is all for just one container.

          Carroll Chiou added a comment - - edited Yes, just to clarify, I ran my Jenkinsfile on a Jenkins instance that was deployed to GKE. The Jenkinsfile has two stages to it, with each stage running a different pod template. The only difference between the two pod templates is thatI add the shareProcessNamespace: true to the last stage. You have to look a bit carefully, but in the ps output for the first stage, you will see a zombie process, whereas in the second stage, there is no zombie process. Now this instance is running my latest version of `durable-task` from PR-106 . I can also confirm that the behavior is the same with the latest version on `master` I only need to run sh once and wait a bit to pull up a zombie instance. durable-task is guaranteed to create a zombie every time it is executed due to the background process requirement. This only happens within the container itself, so once the container goes away, so do the zombies. My understanding of zombie processes is that the only resource they're consuming is the entry in the process table. So I guess if you have a long running container that's doing a serious amount of shell steps then you can run into trouble? For reference, I looked into /proc/sys/kernel/pid_max for the jenkins/jnlp-slave image and got 99,999 . Apparently on 32 bit systems pid_max can be configured up to >4 million (2^22) entries. And this is all for just one container.

          Jesse Glick added a comment -

          This only happens within the container itself, so once the container goes away, so do the zombies.

          If true, then there is no problem; most pods run only one sh step, and it is rare for more than a handful to be run. My concern is whether after running a bunch of builds in a cluster, after all the pods are gone there are still zombie processes left over on the nodes, which would eventually cause the node to crash and need to be rebooted. That is why I asked specifically about

          get a root shell into the raw node and scan for zombie processes

          I just learned of shareProcessNamespace: true today. We could enable it automatically in agent pods if that is what we have to do, but first I would like to understand the current behavior.

          Jesse Glick added a comment - This only happens within the container itself, so once the container goes away, so do the zombies. If true, then there is no problem; most pods run only one sh step, and it is rare for more than a handful to be run. My concern is whether after running a bunch of builds in a cluster, after all the pods are gone there are still zombie processes left over on the nodes, which would eventually cause the node to crash and need to be rebooted. That is why I asked specifically about get a root shell into the raw node and scan for zombie processes I just learned of shareProcessNamespace: true today. We could enable it automatically in agent pods if that is what we have to do, but first I would like to understand the current behavior.

          Stephan Kirsten added a comment - - edited

          We also ran into this problem. We have long running builds based on hundreds of Makefiles which all invoke for every command a bash which isn't reaped anymore. After some time we exausted the pids and received following message: 

          fork: Resource temporarily unavailable

          The "shareProcessNamespace: true" solves this situation for us. But we think this should be documented in someway, because the behavior definitely changed.

          Stephan Kirsten added a comment - - edited We also ran into this problem. We have long running builds based on hundreds of Makefiles which all invoke for every command a bash which isn't reaped anymore. After some time we exausted the pids and received following message:  fork: Resource temporarily unavailable The "shareProcessNamespace: true" solves this situation for us. But we think this should be documented in someway, because the behavior definitely changed.

          Jesse Glick added a comment -

          stephankirsten can you elaborate a bit here? You are using the kubernetes plugin? How many sh steps per pod? What kind of K8s installation?

          I do not want to have to document this; I want things to work out of the box. If adding this option to pod definitions reliably fixes what is otherwise a reproducible leak, and does not cause ill effects, then we can automate that in the kubernetes plugin. I see that there is a feature gate for this which might be turned off, and we would need to check what happens if the option is specified but the feature is disabled.

          Jesse Glick added a comment - stephankirsten can you elaborate a bit here? You are using the kubernetes plugin? How many sh steps per pod? What kind of K8s installation? I do not want to have to document this; I want things to work out of the box. If adding this option to pod definitions reliably fixes what is otherwise a reproducible leak, and does not cause ill effects, then we can automate that in the kubernetes plugin. I see that there is a feature gate for this which might be turned off, and we would need to check what happens if the option is specified but the feature is disabled.

          We use the kubernetes plugin with kubernetes 1.15.3 on premise. Regarding sh steps per pod, we have only around 10, but invoke our build system via shell scripts which then work through Makefiles and invoke bash for every step of the Makefiles. It sums up to around 27k defunct bash processes that are not getting reaped and eventually we run into the error which i mentioned above.

          Stephan Kirsten added a comment - We use the kubernetes plugin with kubernetes 1.15.3 on premise. Regarding sh steps per pod, we have only around 10, but invoke our build system via shell scripts which then work through Makefiles and invoke bash for every step of the Makefiles. It sums up to around 27k defunct bash processes that are not getting reaped and eventually we run into the error which i mentioned above.

          Jesse Glick added a comment -

          So reading between the lines, repeatedly running a build which has one sh step that runs a script that launches a thousand subprocesses should eventually result in an error. That is something that can be tested and, if true, worked around.

          Jesse Glick added a comment - So reading between the lines, repeatedly running a build which has one sh step that runs a script that launches a thousand subprocesses should eventually result in an error. That is something that can be tested and, if true, worked around.

          I have been unable to reproduce the `Resource temporarily unavailable.` error when attempting to run pipelines that simulate the situation described.

          I created a cluster in GKE using the gcloud cli. gcloud container clusters create <cluster-name> --machine-type=n1-standard-2 --cluster-version=latest. Installed Cloudbees Core for Modern Platforms version 2.204.3.7 (latest public release at the time I started testing) using Helm. Used kubectl get nodes to find the names of the nodes and gcloud beta compute ssh to connect to the nodes via ssh. Then running watch 'ps fauxwww | fgrep Z' to watch for zombie processes on each node.

          Using groovy while (true) { sh 'sleep 1' } I was able to produce zombie processes on the node the build agent was assigned to. The process ran for 5 hours 17 minutes before using all the process resources. After the processes were exhausted the job exited with an error message that there were not processes available. After the pod running the job exited the zombie processes on the node were removed and the node continued to function.

          Using `while :; do /usr/bin/sleep .01; done` as a way to generate subprocesses I've tested as the direct parameter of an `sh` step in a pipeline using both `jenkins/jnlp-slave` and `cloudbees/cloudbees-core-agent` images. Neither produced any zombie processes on the worker nodes of the Kubernetes cluster. To induce another layer of subprocess I also put that `while` line into a file and had the `sh` process execute that file, but it also did not produce any zombie processes on the worker nodes. Additionally I made that while loop a step in a Makefile and executed it that way, which also did not produce any zombies on the nodes.

          Kenneth Rogers added a comment - I have been unable to reproduce the `Resource temporarily unavailable.` error when attempting to run pipelines that simulate the situation described. I created a cluster in GKE using the gcloud cli. gcloud container clusters create <cluster-name> --machine-type=n1-standard-2 --cluster-version=latest. Installed Cloudbees Core for Modern Platforms version 2.204.3.7 (latest public release at the time I started testing) using Helm. Used kubectl get nodes to find the names of the nodes and gcloud beta compute ssh to connect to the nodes via ssh. Then running watch 'ps fauxwww | fgrep Z' to watch for zombie processes on each node. Using groovy while (true) { sh 'sleep 1' } I was able to produce zombie processes on the node the build agent was assigned to. The process ran for 5 hours 17 minutes before using all the process resources. After the processes were exhausted the job exited with an error message that there were not processes available. After the pod running the job exited the zombie processes on the node were removed and the node continued to function. Using `while :; do /usr/bin/sleep .01; done` as a way to generate subprocesses I've tested as the direct parameter of an `sh` step in a pipeline using both `jenkins/jnlp-slave` and `cloudbees/cloudbees-core-agent` images. Neither produced any zombie processes on the worker nodes of the Kubernetes cluster. To induce another layer of subprocess I also put that `while` line into a file and had the `sh` process execute that file, but it also did not produce any zombie processes on the worker nodes. Additionally I made that while loop a step in a Makefile and executed it that way, which also did not produce any zombies on the nodes.

          I have been observing some issues on nodes.  I have been using amazon EKS for cluster and once in a while node ends up either with soft lockup or flips to NodeNotReady. I have tried to troubleshoot a lot but still now nothing concrete is figured out. I was working with AWS support and they told me that there were couple of other cases where they told similar behavior using Jenkins pods. One of the other patterns which I observed is, all the nodes which had issues had very high number of zombie processes, at least 4000+. I still haven't got any conclusive evidence to tell that issue is due to zombie process/Jenkins but the patterns all indicate that there could be something with k8s plugin of Jenkins which may be causing the issue.

          Did any of you face the same issue?

          chetan shivashankar added a comment - I have been observing some issues on nodes.  I have been using amazon EKS for cluster and once in a while node ends up either with soft lockup or flips to NodeNotReady. I have tried to troubleshoot a lot but still now nothing concrete is figured out. I was working with AWS support and they told me that there were couple of other cases where they told similar behavior using Jenkins pods. One of the other patterns which I observed is, all the nodes which had issues had very high number of zombie processes, at least 4000+. I still haven't got any conclusive evidence to tell that issue is due to zombie process/Jenkins but the patterns all indicate that there could be something with k8s plugin of Jenkins which may be causing the issue. Did any of you face the same issue?

          junwei ning added a comment -

          i use   helm   deploy  bitnami/jenkins   had meet this

          junwei ning added a comment - i use   helm   deploy  bitnami/jenkins   had meet this

            Unassigned Unassigned
            carroll Carroll Chiou
            Votes:
            2 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated: