Status: Open (View Workflow)
Jenkins v2.26, Debian Jessie on EC2
When multiple jobs trigger git fetches at the same time, our system ooms and dies, leaving the workspaces broken because the git lockfile is present from a dead git process.
We have the "# of executors" for the node set to 1, to avoid this sort of problem, and while this does stop it from running the actual jobs in parallel, it will still perfrom git fetches in parallel, and there doesn't seem to be any way to stop it from doing so (does anyone know a workaround?).
This has been triggered recently by the github api going down, then coming back up, causing all of our open pull requests to be rebuilt at the same time, running ~10 git fetches on large repos, which causes the whole instance to die and need a hard reboot.
I'd be happy to have it just block git operations until all other processes have finished. I would wrap git in a shell script that used flock to do that, but I suspect that would cause jenkins to think it timed out.
No idea, without knowing (a) what is actually triggering the fetch in the observed case, (b) why even a bunch of concurrent fetches would require so much heap as to trigger an OOME.
GitHub pull request builder plugin causes the large amount of jobs to run. A bunch of concurrent fetches on large repos causes an oom... because git uses a lot of memory on fetches of large repos.
I see this same condition frequently at startup of my lts-with-plugins docker instance. The instance includes job definitions for repositories which have a bug verification job on each branch. When the instance starts, a series of parallel git operations are started which overload the master node.
I don't know of a work-around for the problem, and suspect that it will require some form of coordination to share polling results for a single repository. stephenconnolly or jglick may have better suggestions for a technique.