[JENKINS-23345] Git Plugin should have an option to use clone instead of init/fetch

Type: New Feature
Resolution: Won't Do
Priority: Major
Component/s: git-client-plugin
Labels:
None

Similar Issues:
Powered by SuggestiMate

Show

git clone is very efficient at downloading large repositories.

git fetch, can take twice as long as it has to check that each object exists.

Our repository has 700-800K objects, it's a moderately sized repo, with a lot of history.

git fetch takes 14 mins, git clone takes about 4 minutes.

It would be nice to have some control over which method is used.

is related to

JENKINS-56404 Impossible exclude redundant/double fetches

Closed

davedash created issue - 2014-06-05 23:20

Mark Waite added a comment - 2014-06-06 12:10

This is the first I've heard that clone is faster than fetch at retrieving the remote repository. I don't see any mention in my google searches that indicate "git fetch" is slower than "git clone". Can you give some pointers that provide support for the statement that clone is more efficient than fetch?

Even if clone is faster, I'm hesitant to support such an addition to the plugin. Git fetch was chosen because there are fewer ways it can hang (prompting for authentication information) in the various authentication scenarios supported by the plugin. It is even more challenging because there are currently no unit tests for the authentication scenarios, so they must be tested interactively (or not tested) at each plugin release. Adding the option to use "clone" instead of "fetch" would effectively double the cases we need to test in an already very complicated portion of the code.

Mark Waite added a comment - 2014-06-06 12:10 This is the first I've heard that clone is faster than fetch at retrieving the remote repository. I don't see any mention in my google searches that indicate "git fetch" is slower than "git clone". Can you give some pointers that provide support for the statement that clone is more efficient than fetch? Even if clone is faster, I'm hesitant to support such an addition to the plugin. Git fetch was chosen because there are fewer ways it can hang (prompting for authentication information) in the various authentication scenarios supported by the plugin. It is even more challenging because there are currently no unit tests for the authentication scenarios, so they must be tested interactively (or not tested) at each plugin release. Adding the option to use "clone" instead of "fetch" would effectively double the cases we need to test in an already very complicated portion of the code.

Jason Salaz added a comment - 2014-06-06 20:14

Hey Mark,

I've been working with Dave on this issue so I can expand on some of the details. I'll note that my information is still second hand, as I've received this information from someone else, but that someone works on git, so, I'm satisfied with the answer. Here's hoping I'm repeating the information again correctly .

When you use git clone, you receive cryptographic/verifiable assurance that the server has provided you all the objects. When you run 'git clone', the directory is set up, the server collects all the objects, the transfer happens. After it's complete, the receiving end runs a quick verification on the packfile(s), and then puts all the objects in place on the local filesystem, and the process is done.

This is as opposed to git fetch, which is meant for incremental updates. After a fetch completes, git walks the object graph; In most cases this is merely from your previous head forward through all of the new objects, verifying that everything is intact. In Dave's case, walking the entire history of the repository accounts for the vast majority of the execution time. Longer than the default fetch timeout of 10 minutes, for the record.

Receiving data itself isn't any faster, but the verification that takes place after receiving objects via fetch is dramatically different.

Jason Salaz added a comment - 2014-06-06 20:14 Hey Mark, I've been working with Dave on this issue so I can expand on some of the details. I'll note that my information is still second hand, as I've received this information from someone else, but that someone works on git, so, I'm satisfied with the answer. Here's hoping I'm repeating the information again correctly . When you use git clone, you receive cryptographic/verifiable assurance that the server has provided you all the objects. When you run 'git clone', the directory is set up, the server collects all the objects, the transfer happens. After it's complete, the receiving end runs a quick verification on the packfile(s), and then puts all the objects in place on the local filesystem, and the process is done. This is as opposed to git fetch, which is meant for incremental updates. After a fetch completes, git walks the object graph; In most cases this is merely from your previous head forward through all of the new objects, verifying that everything is intact. In Dave's case, walking the entire history of the repository accounts for the vast majority of the execution time. Longer than the default fetch timeout of 10 minutes, for the record. Receiving data itself isn't any faster, but the verification that takes place after receiving objects via fetch is dramatically different.

Daniel Serodio added a comment - 2014-07-08 14:07

I came across this ticket while searching for the reason why Jenkins is doing init+fetch instead of clone. I understand the reasoning, but I feels like a hack. For instance, it seems I need to set Additional behaviours > Check out to specific local branch to master so the Jenkins workspace is not left in a "detached head" situation. If the default value for Branches to build is */master, why do I have to specify this twice?

Case in point: I'm trying to troubleshoot a release script (editing a version file, tagging, pushing the tag, etc) that works on my machine™ but not in Jenkins, because the repository is setup differently in Jenkins — I don't even know how to simulate the Jenkins behaviour on my computer.

So, while I understand that having both options (init+fetch or clone) would double the possible bugs and necessary tests, I believe having only the clone option would be the best choice — and compliant to the principle of least surprise

Daniel Serodio added a comment - 2014-07-08 14:07 I came across this ticket while searching for the reason why Jenkins is doing init+fetch instead of clone. I understand the reasoning, but I feels like a hack. For instance, it seems I need to set Additional behaviours > Check out to specific local branch to master so the Jenkins workspace is not left in a "detached head" situation. If the default value for Branches to build is */master , why do I have to specify this twice? Case in point: I'm trying to troubleshoot a release script (editing a version file, tagging, pushing the tag, etc) that works on my machine ™ but not in Jenkins, because the repository is setup differently in Jenkins — I don't even know how to simulate the Jenkins behaviour on my computer. So, while I understand that having both options (init+fetch or clone) would double the possible bugs and necessary tests, I believe having only the clone option would be the best choice — and compliant to the principle of least surprise

Mark Waite added a comment - 2014-07-08 16:34

dserodio I think "least surprise" includes "don't break workflows for the 30000+ installations of the git plugin and git client plugin" and also includes "don't hang the git command line by prompting for authentication".

In one sense, it is a hack, since git does not have a "--no-interactive" option to prevent the git command from prompting for authentication. The subversion command line has that option, but git hasn't yet reached the point of adding that option.

It would really be great to be able to use "git clone" instead of "git init + git fetch", but I think it is more important to not break existing users than it is to switch from "git init + git fetch" to "git clone".

Mark Waite added a comment - 2014-07-08 16:34 dserodio I think "least surprise" includes "don't break workflows for the 30000+ installations of the git plugin and git client plugin" and also includes "don't hang the git command line by prompting for authentication". In one sense, it is a hack, since git does not have a "--no-interactive" option to prevent the git command from prompting for authentication. The subversion command line has that option, but git hasn't yet reached the point of adding that option. It would really be great to be able to use "git clone" instead of "git init + git fetch", but I think it is more important to not break existing users than it is to switch from "git init + git fetch" to "git clone".

Mark Waite made changes - 2014-12-26 05:15

Assignee

Original: Nicolas De Loof [ ndeloof ]

Mark Waite added a comment - 2015-02-01 23:31

In order to satisfy my curiosity about the performance difference between "git clone" and "git fetch", I ran a pair of tests to compare them. I used git 2.2.1 on a Ubuntu 14.04 64 bit machine with a solid state disc as the file system hosting both the source repository and the destination repository. I used a local copy of the linux kernel repository as it exists at commit 69e273c0b0a3c337a521d083374c918dc52c666f. That repository on my disc is about 1.3 GB and contains many, many objects. For a report on the relative activity of the linux kernel repository, refer to

What I found:

$ time git clone ssh://mark-pc1/var/lib/git/mwaite/linux.git - 2m41s
$ time (mkdir fetch;cd fetch;git init; git fetch ssh://mark-pc1/var/lib/git/mwaite/linux.git) - 2m52s

The "git fetch" time was consistently about 7% slower than the "git clone" time.

Your experience is significantly different, since you note in the original report that "git fetch takes 14 mins, git clone takes about 4 minutes." I don't plan to make any change in the git plugin based on that, but wanted to record my observed results in case others are concerned about the apparent difference between "git clone" and "git fetch".

Mark Waite added a comment - 2015-02-01 23:31 In order to satisfy my curiosity about the performance difference between "git clone" and "git fetch", I ran a pair of tests to compare them. I used git 2.2.1 on a Ubuntu 14.04 64 bit machine with a solid state disc as the file system hosting both the source repository and the destination repository. I used a local copy of the linux kernel repository as it exists at commit 69e273c0b0a3c337a521d083374c918dc52c666f. That repository on my disc is about 1.3 GB and contains many, many objects. For a report on the relative activity of the linux kernel repository, refer to What I found: $ time git clone ssh://mark-pc1/var/lib/git/mwaite/linux.git - 2m41s $ time (mkdir fetch;cd fetch;git init; git fetch ssh://mark-pc1/var/lib/git/mwaite/linux.git) - 2m52s The "git fetch" time was consistently about 7% slower than the "git clone" time. Your experience is significantly different, since you note in the original report that "git fetch takes 14 mins, git clone takes about 4 minutes." I don't plan to make any change in the git plugin based on that, but wanted to record my observed results in case others are concerned about the apparent difference between "git clone" and "git fetch".

michele hallak-stamler added a comment - 2015-03-11 08:18

Another problem with init+fetch instead of clone:
We are using header expansion with git filter + git attribute. The clean and smudge filters are perl scripts.
Unfortunately it doesn't work with the git plugin.
After having read this issue, I recreate manually by using git init + git pull and indeed, the filters don't work.
It is very very annoying since it means that I'll not be able to use the git plugin and will do the clone with a script.
Is there a way to use git just to check changes without pulling the code in the workspace?
The problem is that when configured with filter the git init+plugin takes several hours instead of few minutes.
The scripts are: https://github.com/turon/git-rcs-keywords
I would be grateful to any idea.

michele hallak-stamler added a comment - 2015-03-11 08:18 Another problem with init+fetch instead of clone: We are using header expansion with git filter + git attribute. The clean and smudge filters are perl scripts. Unfortunately it doesn't work with the git plugin. After having read this issue, I recreate manually by using git init + git pull and indeed, the filters don't work. It is very very annoying since it means that I'll not be able to use the git plugin and will do the clone with a script. Is there a way to use git just to check changes without pulling the code in the workspace? The problem is that when configured with filter the git init+plugin takes several hours instead of few minutes. The scripts are: https://github.com/turon/git-rcs-keywords I would be grateful to any idea.

Mark Waite added a comment - 2015-03-12 03:40

mhallak As far as I can tell, yours is the first case of someone using git in Jenkins for header expansion. Even if I were willing to accept the increased complexity and reliability risk of having a git clone based implementation, I would still be unlikely to add support for header expansion.

Mark Waite added a comment - 2015-03-12 03:40 mhallak As far as I can tell, yours is the first case of someone using git in Jenkins for header expansion. Even if I were willing to accept the increased complexity and reliability risk of having a git clone based implementation, I would still be unlikely to add support for header expansion.

michele hallak-stamler added a comment - 2015-03-12 06:24

Of course, there is no need to support header expansion. I just wanted to explain why we need the clone functionality and not init+pull. The support for filters in .gitattributes is an official feature of git and we cannot use it with Jenkins. I'll have to write the cloning script and we'll not be able to rely on the built-in git implementation....

michele hallak-stamler added a comment - 2015-03-12 06:24 Of course, there is no need to support header expansion. I just wanted to explain why we need the clone functionality and not init+pull. The support for filters in .gitattributes is an official feature of git and we cannot use it with Jenkins. I'll have to write the cloning script and we'll not be able to rely on the built-in git implementation....

Assignee:: Unassigned

Reporter:: davedash

Votes:: 8 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2014-06-05 23:20

Updated:: 2020-12-10 04:34

Resolved:: 2020-12-10 04:34

Jenkins

Details

Description

Attachments

Issue Links

Activity

Collapse comment: Mark Waite added a comment - 2014-06-06 12:10

Expand comment: Mark Waite added a comment - 2014-06-06 12:10

Collapse comment: Jason Salaz added a comment - 2014-06-06 20:14

Expand comment: Jason Salaz added a comment - 2014-06-06 20:14

Collapse comment: Daniel Serodio added a comment - 2014-07-08 14:07

Expand comment: Daniel Serodio added a comment - 2014-07-08 14:07

Collapse comment: Mark Waite added a comment - 2014-07-08 16:34

Expand comment: Mark Waite added a comment - 2014-07-08 16:34

Collapse comment: Mark Waite added a comment - 2015-02-01 23:31

Expand comment: Mark Waite added a comment - 2015-02-01 23:31

Collapse comment: michele hallak-stamler added a comment - 2015-03-11 08:18

Expand comment: michele hallak-stamler added a comment - 2015-03-11 08:18

Collapse comment: Mark Waite added a comment - 2015-03-12 03:40

Expand comment: Mark Waite added a comment - 2015-03-12 03:40

Collapse comment: michele hallak-stamler added a comment - 2015-03-12 06:24

Expand comment: michele hallak-stamler added a comment - 2015-03-12 06:24

People

Dates