-
Bug
-
Resolution: Unresolved
-
Critical
-
Running version 1.30-rc422.f179f78e479f
The merge of PR-98 moved the wrapper process to the background to allow the launching process to quickly exit. However, that very act will orphan the wrapper process. This is only a problem in environments where there is no init process (e.g. docker containers that are run with no --init flag).
Unit tests did not discover this bug due to a race condition of when the last ps was called and when the wrapper process exited. If another ps is called after the test detects that the script as finished running, the zombie state of the wrapper process is revealed.
I'm not sure how much of an issue this really is as there are numerous solutions on enabling zombie-reaping for containers, but as there is an explicit check for zombies in the unit tests, it seemed worth mentioning.
- is caused by
-
JENKINS-58290 WebSocket / OkHttp thread leak from BourneShellScript + ContainerExecDecorator
-
- Resolved
-
- links to
But it seems that PID namespace sharing is off by default, so most users with Jenkins running in a K8s cluster would not be able to rely on that.
As per this comment it seems that to avoid a memory leak for kubernetes plugin users, we would need to patch docker-jnlp-slave to run Tini (or equivalent). But if I understand correctly, that would only help for subprocesses of the default agent container, not for other containers listed in the pod template and used via the container step: to fix these, we would need to ensure that the command run via (the API equivalent of) kubectl exec by ContainerExecDecorator waits for all its children. Right?
Is there a known easy way to check for this condition in a realistic cluster, say GKE? Run some straightforward Pipeline build (using podTemplate + node + container + sh) a bunch of times, and then somehow get a root shell into the raw node and scan for zombie processes?