Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-58656

Wrapper process leaves zombie when no init process present

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      The merge of PR-98 moved the wrapper process to the background to allow the launching process to quickly exit. However, that very act will orphan the wrapper process. This is only a problem in environments where there is no init process (e.g. docker containers that are run with no --init flag).

      Unit tests did not discover this bug due to a race condition of when the last ps was called and when the wrapper process exited. If another ps is called after the test detects that the script as finished running, the zombie state of the wrapper process is revealed.

      I'm not sure how much of an issue this really is as there are numerous solutions on enabling zombie-reaping for containers, but as there is an explicit check for zombies in the unit tests, it seemed worth mentioning.

        Attachments

          Issue Links

            Activity

            carroll Carroll Chiou created issue -
            carroll Carroll Chiou made changes -
            Field Original Value New Value
            Description The merge of [PR-98|https://github.com/jenkinsci/durable-task-plugin/pull/98] moved the wrapper process to the background but, as a result, zombies it. This is only a problem in environments where there is no {{init}} process (e.g. docker containers that are run with no {{--init}} flag).

            Unit tests did not discover this bug due to a race condition of when the last {{ps}} was called and when the wrapper process exited. If another {{ps}} is called after the test detects that the script as finished running, the zombie state of the wrapper process is revealed.

            I'm not sure how much of an issue this really is as there are numerous solutions on enabling zombie-reaping for containers, but as there is an explicit check for zombies in the unit tests, it seemed worth mentioning.
            The merge of [PR-98|https://github.com/jenkinsci/durable-task-plugin/pull/98] moved the wrapper process to the background to allow the launching process to quickly exit. However, that very act will orphan the wrapper process. This is only a problem in environments where there is no {{init}} process (e.g. docker containers that are run with no {{--init}} flag).

            Unit tests did not discover this bug due to a race condition of when the last {{ps}} was called and when the wrapper process exited. If another {{ps}} is called after the test detects that the script as finished running, the zombie state of the wrapper process is revealed.

            I'm not sure how much of an issue this really is as there are numerous solutions on enabling zombie-reaping for containers, but as there is an explicit check for zombies in the unit tests, it seemed worth mentioning.
            dnusbaum Devin Nusbaum made changes -
            Labels pipeline
            jglick Jesse Glick made changes -
            Link This issue is caused by JENKINS-58290 [ JENKINS-58290 ]
            Hide
            jglick Jesse Glick added a comment - - edited

            Adding to kubernetes plugin as it is important to check whether there is a practical impact on a Kubernetes node. Does something in K8s itself reap zombies? Can we reproduce a PID exhaustion error by repeatedly running brief sh steps? The defense for command-launcher + jenkins/slave is documented (just use docker run --init) if not enforced at runtime, but it is unknown at this time whether this affects a typical Kubernetes pod using jenkins/jnlp-slave.

            Show
            jglick Jesse Glick added a comment - - edited Adding to kubernetes plugin as it is important to check whether there is a practical impact on a Kubernetes node. Does something in K8s itself reap zombies? Can we reproduce a PID exhaustion error by repeatedly running brief sh steps? The defense for command-launcher + jenkins/slave is documented (just use docker run --init ) if not enforced at runtime, but it is unknown at this time whether this affects a typical Kubernetes pod using jenkins/jnlp-slave .
            jglick Jesse Glick made changes -
            Component/s kubernetes-plugin [ 20639 ]
            jglick Jesse Glick made changes -
            Labels pipeline pipeline regression
            Hide
            carroll Carroll Chiou added a comment - - edited

            It looks like if you enable pid namespace sharing, the pause container will handle zombie reaping (kubernetes 1.7+ and docker 1.13.1+). Otherwise you will have to have each container handle the zombie reaping independently.

            https://www.ianlewis.org/en/almighty-pause-container
            https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/pod-pid-namespace.md

            Update: ran a quick test to confirm this does work:

            Show
            carroll Carroll Chiou added a comment - - edited It looks like if you enable pid namespace sharing, the pause container will handle zombie reaping (kubernetes 1.7+ and docker 1.13.1+). Otherwise you will have to have each container handle the zombie reaping independently. https://www.ianlewis.org/en/almighty-pause-container https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/pod-pid-namespace.md Update: ran a quick test to confirm this does work: Jenkinsfile Console output
            Hide
            jglick Jesse Glick added a comment -

            But it seems that PID namespace sharing is off by default, so most users with Jenkins running in a K8s cluster would not be able to rely on that.

            As per this comment it seems that to avoid a memory leak for kubernetes plugin users, we would need to patch docker-jnlp-slave to run Tini (or equivalent). But if I understand correctly, that would only help for subprocesses of the default agent container, not for other containers listed in the pod template and used via the container step: to fix these, we would need to ensure that the command run via (the API equivalent of) kubectl exec by ContainerExecDecorator waits for all its children. Right?

            Is there a known easy way to check for this condition in a realistic cluster, say GKE? Run some straightforward Pipeline build (using podTemplate + node + container + sh) a bunch of times, and then somehow get a root shell into the raw node and scan for zombie processes?

            Show
            jglick Jesse Glick added a comment - But it seems that PID namespace sharing is off by default, so most users with Jenkins running in a K8s cluster would not be able to rely on that. As per this comment it seems that to avoid a memory leak for kubernetes plugin users, we would need to patch docker-jnlp-slave to run Tini (or equivalent). But if I understand correctly, that would only help for subprocesses of the default agent container, not for other containers listed in the pod template and used via the container step: to fix these, we would need to ensure that the command run via (the API equivalent of) kubectl exec by ContainerExecDecorator waits for all its children. Right? Is there a known easy way to check for this condition in a realistic cluster, say GKE? Run some straightforward Pipeline build (using podTemplate + node + container + sh ) a bunch of times, and then somehow get a root shell into the raw node and scan for zombie processes?
            Hide
            carroll Carroll Chiou added a comment - - edited

            Yes, just to clarify, I ran my Jenkinsfile on a Jenkins instance that was deployed to GKE. The Jenkinsfile has two stages to it, with each stage running a different pod template. The only difference between the two pod templates is thatI add the shareProcessNamespace: true to the last stage. You have to look a bit carefully, but in the ps output for the first stage, you will see a zombie process, whereas in the second stage, there is no zombie process.
            Now this instance is running my latest version of `durable-task` from PR-106. I can also confirm that the behavior is the same with the latest version on `master`

            I only need to run sh once and wait a bit to pull up a zombie instance. durable-task is guaranteed to create a zombie every time it is executed due to the background process requirement. This only happens within the container itself, so once the container goes away, so do the zombies. My understanding of zombie processes is that the only resource they're consuming is the entry in the process table. So I guess if you have a long running container that's doing a serious amount of shell steps then you can run into trouble? For reference, I looked into /proc/sys/kernel/pid_max for the jenkins/jnlp-slave image and got 99,999. Apparently on 32 bit systems pid_max can be configured up to >4 million (2^22) entries. And this is all for just one container.

            Show
            carroll Carroll Chiou added a comment - - edited Yes, just to clarify, I ran my Jenkinsfile on a Jenkins instance that was deployed to GKE. The Jenkinsfile has two stages to it, with each stage running a different pod template. The only difference between the two pod templates is thatI add the shareProcessNamespace: true to the last stage. You have to look a bit carefully, but in the ps output for the first stage, you will see a zombie process, whereas in the second stage, there is no zombie process. Now this instance is running my latest version of `durable-task` from PR-106 . I can also confirm that the behavior is the same with the latest version on `master` I only need to run sh once and wait a bit to pull up a zombie instance. durable-task is guaranteed to create a zombie every time it is executed due to the background process requirement. This only happens within the container itself, so once the container goes away, so do the zombies. My understanding of zombie processes is that the only resource they're consuming is the entry in the process table. So I guess if you have a long running container that's doing a serious amount of shell steps then you can run into trouble? For reference, I looked into /proc/sys/kernel/pid_max for the jenkins/jnlp-slave image and got 99,999 . Apparently on 32 bit systems pid_max can be configured up to >4 million (2^22) entries. And this is all for just one container.
            Hide
            jglick Jesse Glick added a comment -

            This only happens within the container itself, so once the container goes away, so do the zombies.

            If true, then there is no problem; most pods run only one sh step, and it is rare for more than a handful to be run. My concern is whether after running a bunch of builds in a cluster, after all the pods are gone there are still zombie processes left over on the nodes, which would eventually cause the node to crash and need to be rebooted. That is why I asked specifically about

            get a root shell into the raw node and scan for zombie processes

            I just learned of shareProcessNamespace: true today. We could enable it automatically in agent pods if that is what we have to do, but first I would like to understand the current behavior.

            Show
            jglick Jesse Glick added a comment - This only happens within the container itself, so once the container goes away, so do the zombies. If true, then there is no problem; most pods run only one sh step, and it is rare for more than a handful to be run. My concern is whether after running a bunch of builds in a cluster, after all the pods are gone there are still zombie processes left over on the nodes, which would eventually cause the node to crash and need to be rebooted. That is why I asked specifically about get a root shell into the raw node and scan for zombie processes I just learned of shareProcessNamespace: true today. We could enable it automatically in agent pods if that is what we have to do, but first I would like to understand the current behavior.
            Hide
            stephankirsten Stephan Kirsten added a comment - - edited

            We also ran into this problem. We have long running builds based on hundreds of Makefiles which all invoke for every command a bash which isn't reaped anymore. After some time we exausted the pids and received following message: 

            fork: Resource temporarily unavailable

            The "shareProcessNamespace: true" solves this situation for us. But we think this should be documented in someway, because the behavior definitely changed.

            Show
            stephankirsten Stephan Kirsten added a comment - - edited We also ran into this problem. We have long running builds based on hundreds of Makefiles which all invoke for every command a bash which isn't reaped anymore. After some time we exausted the pids and received following message:  fork: Resource temporarily unavailable The "shareProcessNamespace: true" solves this situation for us. But we think this should be documented in someway, because the behavior definitely changed.
            Hide
            jglick Jesse Glick added a comment -

            Stephan Kirsten can you elaborate a bit here? You are using the kubernetes plugin? How many sh steps per pod? What kind of K8s installation?

            I do not want to have to document this; I want things to work out of the box. If adding this option to pod definitions reliably fixes what is otherwise a reproducible leak, and does not cause ill effects, then we can automate that in the kubernetes plugin. I see that there is a feature gate for this which might be turned off, and we would need to check what happens if the option is specified but the feature is disabled.

            Show
            jglick Jesse Glick added a comment - Stephan Kirsten can you elaborate a bit here? You are using the kubernetes plugin? How many sh steps per pod? What kind of K8s installation? I do not want to have to document this; I want things to work out of the box. If adding this option to pod definitions reliably fixes what is otherwise a reproducible leak, and does not cause ill effects, then we can automate that in the kubernetes plugin. I see that there is a feature gate for this which might be turned off, and we would need to check what happens if the option is specified but the feature is disabled.
            Hide
            stephankirsten Stephan Kirsten added a comment -

            We use the kubernetes plugin with kubernetes 1.15.3 on premise. Regarding sh steps per pod, we have only around 10, but invoke our build system via shell scripts which then work through Makefiles and invoke bash for every step of the Makefiles. It sums up to around 27k defunct bash processes that are not getting reaped and eventually we run into the error which i mentioned above.

            Show
            stephankirsten Stephan Kirsten added a comment - We use the kubernetes plugin with kubernetes 1.15.3 on premise. Regarding sh steps per pod, we have only around 10, but invoke our build system via shell scripts which then work through Makefiles and invoke bash for every step of the Makefiles. It sums up to around 27k defunct bash processes that are not getting reaped and eventually we run into the error which i mentioned above.
            Hide
            jglick Jesse Glick added a comment -

            So reading between the lines, repeatedly running a build which has one sh step that runs a script that launches a thousand subprocesses should eventually result in an error. That is something that can be tested and, if true, worked around.

            Show
            jglick Jesse Glick added a comment - So reading between the lines, repeatedly running a build which has one sh step that runs a script that launches a thousand subprocesses should eventually result in an error. That is something that can be tested and, if true, worked around.
            jglick Jesse Glick made changes -
            Priority Minor [ 4 ] Critical [ 2 ]
            jglick Jesse Glick made changes -
            Remote Link This issue links to "CloudBees-internal issue (Web Link)" [ 23813 ]
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PodProcessNamespaceSharing GA in 1.17 (Web Link)" [ 24113 ]
            Hide
            kerogers Kenneth Rogers added a comment -

            I have been unable to reproduce the `Resource temporarily unavailable.` error when attempting to run pipelines that simulate the situation described.

            I created a cluster in GKE using the gcloud cli. gcloud container clusters create <cluster-name> --machine-type=n1-standard-2 --cluster-version=latest. Installed Cloudbees Core for Modern Platforms version 2.204.3.7 (latest public release at the time I started testing) using Helm. Used kubectl get nodes to find the names of the nodes and gcloud beta compute ssh to connect to the nodes via ssh. Then running watch 'ps fauxwww | fgrep Z' to watch for zombie processes on each node.

            Using groovy while (true) { sh 'sleep 1' } I was able to produce zombie processes on the node the build agent was assigned to. The process ran for 5 hours 17 minutes before using all the process resources. After the processes were exhausted the job exited with an error message that there were not processes available. After the pod running the job exited the zombie processes on the node were removed and the node continued to function.

            Using `while :; do /usr/bin/sleep .01; done` as a way to generate subprocesses I've tested as the direct parameter of an `sh` step in a pipeline using both `jenkins/jnlp-slave` and `cloudbees/cloudbees-core-agent` images. Neither produced any zombie processes on the worker nodes of the Kubernetes cluster. To induce another layer of subprocess I also put that `while` line into a file and had the `sh` process execute that file, but it also did not produce any zombie processes on the worker nodes. Additionally I made that while loop a step in a Makefile and executed it that way, which also did not produce any zombies on the nodes.

            Show
            kerogers Kenneth Rogers added a comment - I have been unable to reproduce the `Resource temporarily unavailable.` error when attempting to run pipelines that simulate the situation described. I created a cluster in GKE using the gcloud cli. gcloud container clusters create <cluster-name> --machine-type=n1-standard-2 --cluster-version=latest. Installed Cloudbees Core for Modern Platforms version 2.204.3.7 (latest public release at the time I started testing) using Helm. Used kubectl get nodes to find the names of the nodes and gcloud beta compute ssh to connect to the nodes via ssh. Then running watch 'ps fauxwww | fgrep Z' to watch for zombie processes on each node. Using groovy while (true) { sh 'sleep 1' } I was able to produce zombie processes on the node the build agent was assigned to. The process ran for 5 hours 17 minutes before using all the process resources. After the processes were exhausted the job exited with an error message that there were not processes available. After the pod running the job exited the zombie processes on the node were removed and the node continued to function. Using `while :; do /usr/bin/sleep .01; done` as a way to generate subprocesses I've tested as the direct parameter of an `sh` step in a pipeline using both `jenkins/jnlp-slave` and `cloudbees/cloudbees-core-agent` images. Neither produced any zombie processes on the worker nodes of the Kubernetes cluster. To induce another layer of subprocess I also put that `while` line into a file and had the `sh` process execute that file, but it also did not produce any zombie processes on the worker nodes. Additionally I made that while loop a step in a Makefile and executed it that way, which also did not produce any zombies on the nodes.
            Hide
            cshivashankar chetan shivashankar added a comment -

            I have been observing some issues on nodes.  I have been using amazon EKS for cluster and once in a while node ends up either with soft lockup or flips to NodeNotReady. I have tried to troubleshoot a lot but still now nothing concrete is figured out. I was working with AWS support and they told me that there were couple of other cases where they told similar behavior using Jenkins pods. One of the other patterns which I observed is, all the nodes which had issues had very high number of zombie processes, at least 4000+. I still haven't got any conclusive evidence to tell that issue is due to zombie process/Jenkins but the patterns all indicate that there could be something with k8s plugin of Jenkins which may be causing the issue.

            Did any of you face the same issue?

            Show
            cshivashankar chetan shivashankar added a comment - I have been observing some issues on nodes.  I have been using amazon EKS for cluster and once in a while node ends up either with soft lockup or flips to NodeNotReady. I have tried to troubleshoot a lot but still now nothing concrete is figured out. I was working with AWS support and they told me that there were couple of other cases where they told similar behavior using Jenkins pods. One of the other patterns which I observed is, all the nodes which had issues had very high number of zombie processes, at least 4000+. I still haven't got any conclusive evidence to tell that issue is due to zombie process/Jenkins but the patterns all indicate that there could be something with k8s plugin of Jenkins which may be causing the issue. Did any of you face the same issue?

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              carroll Carroll Chiou
              Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

                Dates

                Created:
                Updated: