-
New Feature
-
Resolution: Won't Do
-
Major
-
None
-
Powered by SuggestiMate
git clone is very efficient at downloading large repositories.
git fetch, can take twice as long as it has to check that each object exists.
Our repository has 700-800K objects, it's a moderately sized repo, with a lot of history.
git fetch takes 14 mins, git clone takes about 4 minutes.
It would be nice to have some control over which method is used.
[JENKINS-23345] Git Plugin should have an option to use clone instead of init/fetch
Hey Mark,
I've been working with Dave on this issue so I can expand on some of the details. I'll note that my information is still second hand, as I've received this information from someone else, but that someone works on git, so, I'm satisfied with the answer. Here's hoping I'm repeating the information again correctly .
When you use git clone, you receive cryptographic/verifiable assurance that the server has provided you all the objects. When you run 'git clone', the directory is set up, the server collects all the objects, the transfer happens. After it's complete, the receiving end runs a quick verification on the packfile(s), and then puts all the objects in place on the local filesystem, and the process is done.
This is as opposed to git fetch, which is meant for incremental updates. After a fetch completes, git walks the object graph; In most cases this is merely from your previous head forward through all of the new objects, verifying that everything is intact. In Dave's case, walking the entire history of the repository accounts for the vast majority of the execution time. Longer than the default fetch timeout of 10 minutes, for the record.
Receiving data itself isn't any faster, but the verification that takes place after receiving objects via fetch is dramatically different.
I came across this ticket while searching for the reason why Jenkins is doing init+fetch instead of clone. I understand the reasoning, but I feels like a hack. For instance, it seems I need to set Additional behaviours > Check out to specific local branch to master so the Jenkins workspace is not left in a "detached head" situation. If the default value for Branches to build is */master, why do I have to specify this twice?
Case in point: I'm trying to troubleshoot a release script (editing a version file, tagging, pushing the tag, etc) that works on my machine™ but not in Jenkins, because the repository is setup differently in Jenkins — I don't even know how to simulate the Jenkins behaviour on my computer.
So, while I understand that having both options (init+fetch or clone) would double the possible bugs and necessary tests, I believe having only the clone option would be the best choice — and compliant to the principle of least surprise
dserodio I think "least surprise" includes "don't break workflows for the 30000+ installations of the git plugin and git client plugin" and also includes "don't hang the git command line by prompting for authentication".
In one sense, it is a hack, since git does not have a "--no-interactive" option to prevent the git command from prompting for authentication. The subversion command line has that option, but git hasn't yet reached the point of adding that option.
It would really be great to be able to use "git clone" instead of "git init + git fetch", but I think it is more important to not break existing users than it is to switch from "git init + git fetch" to "git clone".
In order to satisfy my curiosity about the performance difference between "git clone" and "git fetch", I ran a pair of tests to compare them. I used git 2.2.1 on a Ubuntu 14.04 64 bit machine with a solid state disc as the file system hosting both the source repository and the destination repository. I used a local copy of the linux kernel repository as it exists at commit 69e273c0b0a3c337a521d083374c918dc52c666f. That repository on my disc is about 1.3 GB and contains many, many objects. For a report on the relative activity of the linux kernel repository, refer to
What I found:
$ time git clone ssh://mark-pc1/var/lib/git/mwaite/linux.git - 2m41s
$ time (mkdir fetch;cd fetch;git init; git fetch ssh://mark-pc1/var/lib/git/mwaite/linux.git) - 2m52s
The "git fetch" time was consistently about 7% slower than the "git clone" time.
Your experience is significantly different, since you note in the original report that "git fetch takes 14 mins, git clone takes about 4 minutes." I don't plan to make any change in the git plugin based on that, but wanted to record my observed results in case others are concerned about the apparent difference between "git clone" and "git fetch".
Another problem with init+fetch instead of clone:
We are using header expansion with git filter + git attribute. The clean and smudge filters are perl scripts.
Unfortunately it doesn't work with the git plugin.
After having read this issue, I recreate manually by using git init + git pull and indeed, the filters don't work.
It is very very annoying since it means that I'll not be able to use the git plugin and will do the clone with a script.
Is there a way to use git just to check changes without pulling the code in the workspace?
The problem is that when configured with filter the git init+plugin takes several hours instead of few minutes.
The scripts are: https://github.com/turon/git-rcs-keywords
I would be grateful to any idea.
mhallak As far as I can tell, yours is the first case of someone using git in Jenkins for header expansion. Even if I were willing to accept the increased complexity and reliability risk of having a git clone based implementation, I would still be unlikely to add support for header expansion.
Of course, there is no need to support header expansion. I just wanted to explain why we need the clone functionality and not init+pull. The support for filters in .gitattributes is an official feature of git and we cannot use it with Jenkins. I'll have to write the cloning script and we'll not be able to rely on the built-in git implementation....
The clone functionality is also required to use git extensions for large files such as git-fat.
Without Git clone Git LFS does not work at all as it doesn't init LFS properly. We are going to need a solution to this soon, as Git LFS is becoming quite popular among Game Dev software projects. Is there anything in progress?
Nothing is in progress to switch from init + fetch to clone. There are no pending pull requests for an implementation that will allow switching from one to the other.
I had a quick scan over the plugin code and I understand that it would be a complex change, for sure. I've raised a ticket over here to discuss solutions that might be less of an impact: https://issues.jenkins-ci.org/browse/JENKINS-30318
Jacob Keller has proposed git plugin PR342 and git client plugin PR180 to implement submodule authentication. One of the side effects of his changes for submodule authentication may be that it will be easier / feasible to allow the option to switch from init/fetch to clone.
Because the git client plugin change alters the authentication technique, we'll need a solid beta test phase before delivering that change to the larger community of Jenkins users.
Is there an ETA on when this will be resolved? We're also invested in the use of "git clone" being implemented.
There is no ETA for any implementation that will replace the current init/fetch with clone.
Is your use case one of the use cases already described, or something different?
Won't be implemented. We hope in the future to provide a credential binding wrapper so that users who want precise control of the git command can use the wrapper and then call command line git from their pipeline script.
This is the first I've heard that clone is faster than fetch at retrieving the remote repository. I don't see any mention in my google searches that indicate "git fetch" is slower than "git clone". Can you give some pointers that provide support for the statement that clone is more efficient than fetch?
Even if clone is faster, I'm hesitant to support such an addition to the plugin. Git fetch was chosen because there are fewer ways it can hang (prompting for authentication information) in the various authentication scenarios supported by the plugin. It is even more challenging because there are currently no unit tests for the authentication scenarios, so they must be tested interactively (or not tested) at each plugin release. Adding the option to use "clone" instead of "fetch" would effectively double the cases we need to test in an already very complicated portion of the code.