Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38362

Poll on a full clone on master, shallow clone using refspec on slaves

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • git-plugin
    • None

      We have a large git repo with lots of history and lots of branches. As of JENKINS-31393, we can now specify a refspec and have the initial fetch respect it.

      We have two jobs for each repo. One that builds master, and one that builds everything except master. Specifying a refspec works for our master job, but won't for the feature branches job since it needs to be able to fetch more than just the specified refspec.

      However, a shallow clone of all branches is too expensive and takes about 18 minutes to complete from inside the datacenter that contains our git repo.

      It seems like the ideal is for the master to have a full clone of our repo to be able to do polling and figure out what branch / revision to build, and then for it to tell a slave what revision to do a shallow checkout using a full refspec so it only checks out that one branch.

          [JENKINS-38362] Poll on a full clone on master, shallow clone using refspec on slaves

          Mark Waite added a comment -

          That's a good suggestion and is very similar to something that is implemented in the Mercurial plugin. The Mercurial plugin clones a copy of the mercurial repository to the master, then uses the Jenkins client / server protocol to deliver the repository to individual agents. Unfortunately, that technique has not been implemented for the git plugin.

          You might be able to get most of the benefit of that technique by referring to my "Jenkins hints for large git repos" from Jenkins World 2016. In particular, a reference repository can dramatically reduce the clone time and the disc use of a repository. I've used it with 18 GB git repositories quite successfully. The other potential significant win is if you can use a sparse checkout to narrow the directories you checkout to the working directory.

          Mark Waite added a comment - That's a good suggestion and is very similar to something that is implemented in the Mercurial plugin. The Mercurial plugin clones a copy of the mercurial repository to the master, then uses the Jenkins client / server protocol to deliver the repository to individual agents. Unfortunately, that technique has not been implemented for the git plugin. You might be able to get most of the benefit of that technique by referring to my " Jenkins hints for large git repos " from Jenkins World 2016. In particular, a reference repository can dramatically reduce the clone time and the disc use of a repository. I've used it with 18 GB git repositories quite successfully. The other potential significant win is if you can use a sparse checkout to narrow the directories you checkout to the working directory.

          Eli White added a comment -

          Thanks. We've been through those slides (and other documents on the web) that give ideas of how to speed this process up.

          The limited refspec approach doesn't work for us because we want to build everything except for master. Thus our refspec has to be a wildcard.
          The limited folders approach doesn't work for us because our repo isn't a mono repo. We need the entirety of the working directory

          I'm not super familiar with the reference repo approach, but I have some thoughts based on what I've read. We currently run our git server and our jenkins nodes on aws in the same region. We dynamically spin up our slaves and thus don't have an existing spot on disk for the reference repo to live local to the machine. A proxy cache doesn't really help us because it would still be a network access to another machine on AWS which we are already doing hitting our normal git server.

          Ideally we only have one full copy of the repo that can be used for figuring out what to build. The dynamically spun up slaves can then do a shallow clone with a refspec that contains a single branch. A shallow clone with a single refspec takes about 15 seconds for us. A shallow clone with all refspecs takes about 18 minutes.

          This plugin has the power to make builds way way faster. Especially when used in conjunction with the pipeline plugin which currently does 3 checkouts for every run jglick.

          Eli White added a comment - Thanks. We've been through those slides (and other documents on the web) that give ideas of how to speed this process up. The limited refspec approach doesn't work for us because we want to build everything except for master. Thus our refspec has to be a wildcard. The limited folders approach doesn't work for us because our repo isn't a mono repo. We need the entirety of the working directory I'm not super familiar with the reference repo approach, but I have some thoughts based on what I've read. We currently run our git server and our jenkins nodes on aws in the same region. We dynamically spin up our slaves and thus don't have an existing spot on disk for the reference repo to live local to the machine. A proxy cache doesn't really help us because it would still be a network access to another machine on AWS which we are already doing hitting our normal git server. Ideally we only have one full copy of the repo that can be used for figuring out what to build. The dynamically spun up slaves can then do a shallow clone with a refspec that contains a single branch. A shallow clone with a single refspec takes about 15 seconds for us. A shallow clone with all refspecs takes about 18 minutes. This plugin has the power to make builds way way faster. Especially when used in conjunction with the pipeline plugin which currently does 3 checkouts for every run jglick .

            Unassigned Unassigned
            eliwhite Eli White
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: