Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-41320

Concurrent git fetches cause OOM

    XMLWordPrintable

Details

    Description

      When multiple jobs trigger git fetches at the same time, our system ooms and dies, leaving the workspaces broken because the git lockfile is present from a dead git process.
      We have the "# of executors" for the node set to 1, to avoid this sort of problem, and while this does stop it from running the actual jobs in parallel, it will still perfrom git fetches in parallel, and there doesn't seem to be any way to stop it from doing so (does anyone know a workaround?).

      This has been triggered recently by the github api going down, then coming back up, causing all of our open pull requests to be rebuilt at the same time, running ~10 git fetches on large repos, which causes the whole instance to die and need a hard reboot.

      Attachments

        Activity

          markewaite Mark Waite added a comment -

          I see this same condition frequently at startup of my lts-with-plugins docker instance. The instance includes job definitions for repositories which have a bug verification job on each branch. When the instance starts, a series of parallel git operations are started which overload the master node.

          I don't know of a work-around for the problem, and suspect that it will require some form of coordination to share polling results for a single repository. stephenconnolly or jglick may have better suggestions for a technique.

          markewaite Mark Waite added a comment - I see this same condition frequently at startup of my lts-with-plugins docker instance . The instance includes job definitions for repositories which have a bug verification job on each branch. When the instance starts, a series of parallel git operations are started which overload the master node. I don't know of a work-around for the problem, and suspect that it will require some form of coordination to share polling results for a single repository. stephenconnolly or jglick may have better suggestions for a technique.
          tom_artomatix Tom Mason added a comment -

          I'd be happy to have it just block git operations until all other processes have finished. I would wrap git in a shell script that used flock to do that, but I suspect that would cause jenkins to think it timed out.

          tom_artomatix Tom Mason added a comment - I'd be happy to have it just block git operations until all other processes have finished. I would wrap git in a shell script that used flock to do that, but I suspect that would cause jenkins to think it timed out.
          jglick Jesse Glick added a comment -

          No idea, without knowing (a) what is actually triggering the fetch in the observed case, (b) why even a bunch of concurrent fetches would require so much heap as to trigger an OOME.

          jglick Jesse Glick added a comment - No idea, without knowing (a) what is actually triggering the fetch in the observed case, (b) why even a bunch of concurrent fetches would require so much heap as to trigger an OOME.
          tom_artomatix Tom Mason added a comment -

          GitHub pull request builder plugin causes the large amount of jobs to run. A bunch of concurrent fetches on large repos causes an oom... because git uses a lot of memory on fetches of large repos.

          tom_artomatix Tom Mason added a comment - GitHub pull request builder plugin causes the large amount of jobs to run. A bunch of concurrent fetches on large repos causes an oom... because git uses a lot of memory on fetches of large repos.

          People

            Unassigned Unassigned
            tom_artomatix Tom Mason
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: