-
Bug
-
Resolution: Unresolved
-
Minor
-
Jenkins v2.26, Debian Jessie on EC2
When multiple jobs trigger git fetches at the same time, our system ooms and dies, leaving the workspaces broken because the git lockfile is present from a dead git process.
We have the "# of executors" for the node set to 1, to avoid this sort of problem, and while this does stop it from running the actual jobs in parallel, it will still perfrom git fetches in parallel, and there doesn't seem to be any way to stop it from doing so (does anyone know a workaround?).
This has been triggered recently by the github api going down, then coming back up, causing all of our open pull requests to be rebuilt at the same time, running ~10 git fetches on large repos, which causes the whole instance to die and need a hard reboot.
I see this same condition frequently at startup of my lts-with-plugins docker instance. The instance includes job definitions for repositories which have a bug verification job on each branch. When the instance starts, a series of parallel git operations are started which overload the master node.
I don't know of a work-around for the problem, and suspect that it will require some form of coordination to share polling results for a single repository. stephenconnolly or jglick may have better suggestions for a technique.