-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Powered by SuggestiMate -
git plugin 4.4.0
When performing a git clone operation,
git-client and git-plugin both triggers a git fetch.
Below are the two lines of code running in the same git clone command both doing fetches.
The second link is part of the GITSCM step, it triggers the git-client-plugin's CloneCommand (first link) to perform the clone operation. And afterwards, the GITSCM step performs a fetch once the CloneCommand Completes.
However, within the CloneCommand (first link) itself, a fetch command was already done.
I am aware there are needs to support for git clone to fetch all refs first --. The suggestion provided in that ticket was to use "honor git refspec", so shouldn't "honor git refspec" skip the first fetch here?JENKINS-31393
Thanks,
Sam
[JENKINS-49757] Git plugin calls fetch twice per checkout
Hi,
The clone itself isn't being performed twice, however it's triggering fetch twice.
Both fetches are being trigger by the same Clone operation. Once by the git-plugin after doing a clone, and once more by the git-client-plugin after calling the git-plugin to do the clone.
Below is the log output
00:06:32 Cloning repository ssh://[git project] 00:06:32 > /usr/bin/git init /var/lib/jenkins/workspace/workspace # timeout=10 00:06:32 Fetching upstream changes from ssh://[git project] 00:06:32 > /usr/bin/git --version # timeout=10 00:06:32 using GIT_SSH to set credentials Gerrit SSH Key 00:06:32 > /usr/bin/git fetch --tags --progress ssh://[git project] +refs/heads/*:refs/remotes/origin/* # timeout=20 00:07:49 > /usr/bin/git config remote.origin.url ssh://[git project] # timeout=10 00:07:49 > /usr/bin/git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 00:07:49 > /usr/bin/git config remote.origin.url ssh://[git project] # timeout=10 00:07:49 Fetching upstream changes from ssh://[git project] 00:07:49 using GIT_SSH to set credentials fpajenkins Gerrit SSH Key 00:07:49 > /usr/bin/git fetch --tags --progress ssh://[git project] refs/changes/14/3174614/2 # timeout=20 00:08:18 > /usr/bin/git rev-parse cc8266a2cecb79d475d5c3b1dbaa5b2ea3171400^{commit} # timeout=10 00:08:18 Checking out Revision cc8266a2cecb79d475d5c3b1dbaa5b2ea3171400 (detached) 00:08:18 > /usr/bin/git config core.sparsecheckout # timeout=10 00:08:18 > /usr/bin/git config core.sparsecheckout true # timeout=10 00:08:18 > /usr/bin/git read-tree -mu HEAD # timeout=10
If you look at the first link. It's calling the fetchcommand.execute directly and skipping all the fetchCommand decorators.
The second link is part of the GITSCM step, it triggers the git-client-plugin's CloneCommand (first link) to perform the clone operation. And afterwards, the GITSCM step performs a fetch once the CloneCommand Completes.
However, within the CloneCommand (first link) itself, a fetch command was already done.
see GitSCM#1140 on the call from GITSCM into CloneCommand,
see CliGitAPIImpl#609 inside CloneCommand for the first fetch.
see GitSCM#1153 for the second fetch
Thanks a lot for looking into this
I think I see the same behavior in the log file of my jobs. I have a Freestyle job configured which generates the following log output:
> git init /home/beemarkwaite/mark-pc2.markwaite.net-agent/workspace/Bugs-Individual/JENKINS-48064-Fisheye-url-validator-ignored # timeout=10 Fetching upstream changes from https://github.com/aeshell/aesh.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/aeshell/aesh.git +refs/heads/master:refs/remotes/origin/master +refs/heads/0.56.1:refs/remotes/origin/0.56.1 > git config remote.origin.url https://github.com/aeshell/aesh.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master # timeout=10 > git config --add remote.origin.fetch +refs/heads/0.56.1:refs/remotes/origin/0.56.1 # timeout=10 > git config remote.origin.url https://github.com/aeshell/aesh.git # timeout=10 Fetching upstream changes from https://github.com/aeshell/aesh.git > git fetch --tags --progress https://github.com/aeshell/aesh.git +refs/heads/master:refs/remotes/origin/master +refs/heads/0.56.1:refs/remotes/origin/0.56.1
That shows two calls are made to git fetch. Since git fetch is an incremental operation, I'm not planning to make any change in this area of the code. The second fetch operation is not free, but it is usually very lightweight, since it will only request changes made since the preceding fetch.
For the sake of precision, your earlier comments are not describing the implementation of the code. You said:
The first link is part of the Git-client clone step, it triggers the git-plugin CloneCommand (second link) to perform the clone operations. And afterwards, the Git-client plugin performs a fetch.
As far as I can tell, that's not the way the code works. The git client plugin does not call the git plugin. It does not have any dependency on the git plugin and does not know about the git plugin implementation. The git client plugin provides an API that is called by the git plugin.
The stack trace you provided shows two calls to git fetch, but the two calls are using different credentials. I'm assuming that the difference in credentials is because you edited the log output to remove identifying information.
Facing same issue when using gerrit trigger.
My solution is using pipeline job like below
stage('CheckOut') { steps { sh "git init" checkout(scm) } } post { cleanup { deleteDir() } }
Init git repo first will skip clone step.
Do not use WipeWorkspace in scm step, clean workspace in post closure
dogin2006 I believe you are seeing a different condition. The `sh 'git iniit'` initializes an empty git repository in the workspace. The `checkout(scm)` step likely erases the newly created empty git repository and creates a new git repository populated with the results of fetching changes from the remote server.
The deleteDir() step removes the workspace contents at the end.
As far as I understand declarative Pipeline, it will perform the initial checkout implicitly (unless you'v added the skipDefaultCheckout property), then it sees your `checkout scm` and does it again.
First of all, my job is using Gerrit Trigger.
When my Jenkinsfile is like
def scm = [ $class : 'GitSCM', branches : [[name: '*/master']], doGenerateSubmoduleConfigurations: false, extensions : [ [$class: 'WipeWorkspace'], [$class: 'CloneOption', depth: 2, honorRefspec: true, noTags: true, reference: '', shallow: true, timeout: 10], [$class: 'BuildChooserSetting', buildChooser: [$class: 'GerritTriggerBuildChooser']] ], submoduleCfg : [], userRemoteConfigs : [ [ credentialsId: 'hide', refspec : '$GERRIT_REFSPEC', url : 'hide' ] ] ] pipeline { agent { node { label 'gerrit' } } options { skipDefaultCheckout true } stages { stage('CheckOut') { steps { checkout(scm) } } } }
The console log
Wiping out workspace first.
Cloning the remote Git repository
Using shallow clone
shallow clone depth 2
Avoid fetching tags
Honoring refspec on initial clone
Cloning repository hide
> git init hide # timeout=10
Fetching upstream changes from hide
> git --version # timeout=10
using GIT_SSH to set credentials
> git fetch --no-tags --progress hide $GERRIT_REFSPEC --depth=2 # timeout=10
> git config remote.origin.url hide # timeout=10
> git config --add remote.origin.fetch $GERRIT_REFSPEC # timeout=10
> git config remote.origin.url hide # timeout=10
Fetching upstream changes from hide
using GIT_SSH to set credentials
> git fetch --no-tags --progress hide refs/changes/64/4264/7 --depth=2 # timeout=10
> git rev-parse 4b5226960edd4551bbafc058db4afc0e69bd103f^{commit} # timeout=10
Checking out Revision 4b5226960edd4551bbafc058db4afc0e69bd103f (refs/changes/64/4264/7)
> git config core.sparsecheckout # timeout=10
> git checkout -f 4b5226960edd4551bbafc058db4afc0e69bd103f
Commit message: "hide"
> git rev-parse FETCH_HEAD^{commit} # timeout=10
> git rev-list --no-walk a1323feb6fa57ce13c87a5fd6886b14aa3f75dab # timeout=10
Fetch twice and the first fetch cannot recognize $GERRIT_REFSPEC
Then I changed my Jenkinsfile like
def scm = [ $class : 'GitSCM', branches : [[name: '*/master']], doGenerateSubmoduleConfigurations: false, extensions : [ [$class: 'CloneOption', depth: 2, honorRefspec: true, noTags: true, reference: '', shallow: true, timeout: 10], [$class: 'BuildChooserSetting', buildChooser: [$class: 'GerritTriggerBuildChooser']] ], submoduleCfg : [], userRemoteConfigs : [ [ credentialsId: 'hide', refspec : '$GERRIT_REFSPEC', url : 'hide' ] ] ] pipeline { agent { node { label 'gerrit' } } options { skipDefaultCheckout true } stages { stage('CheckOut') { steps { sh('git init') checkout(scm) } } } post { cleanup { cleanWs() } } }
The console log
[gerrit_test_2] Running shell script + git init Initialized empty Git repository in hide [Pipeline] checkout > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url hide # timeout=10 Fetching upstream changes from hide > git --version # timeout=10 using GIT_SSH to set credentials > git fetch --no-tags --progress hide refs/changes/64/4264/7 --depth=2 # timeout=10 > git rev-parse 4b5226960edd4551bbafc058db4afc0e69bd103f^{commit} # timeout=10 Checking out Revision 4b5226960edd4551bbafc058db4afc0e69bd103f (refs/changes/64/4264/7) > git config core.sparsecheckout # timeout=10 > git checkout -f 4b5226960edd4551bbafc058db4afc0e69bd103f Commit message: "hide" > git rev-parse FETCH_HEAD^{commit} # timeout=10 > git rev-list --no-walk a1323feb6fa57ce13c87a5fd6886b14aa3f75dab # timeout=10 [Pipeline] } [Pipeline] // stage [Pipeline] stage [Pipeline] { (Declarative: Post Actions) [Pipeline] cleanWs [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline Finished: SUCCESS
Fetch only once
Jenkins Version: 2.120
Gerrit Trigger: 2.27.5
Git client plugin: 2.7.1
Git plugin: 3.8.0
dogin2006 since you're using Pipeline, you can expand the value of GERRIT_REFSPEC without relying on the git plugin internal environment variable expansion. Use:
refspec : "${GERRIT_REFSPEC}",
instead of
refspec : '$GERRIT_REFSPEC',
The bug report is specific to Freestyle jobs because the variable expansion for strings is not available to Freestyle jobs.
Using "${GERRIT_REFSPEC}" fix the first problem.
10:38:45 Cloning the remote Git repository
10:38:45 Using shallow clone
10:38:45 shallow clone depth 2
10:38:45 Avoid fetching tags
10:38:45 Honoring refspec on initial clone
10:38:45 Cloning repository hide
10:38:45 > git init hide # timeout=10
10:38:45 Fetching upstream changes from hide
10:38:45 > git --version # timeout=10
10:38:45 using GIT_SSH to set credentials
10:38:45 > git fetch --no-tags --progress hide refs/changes/50/4750/1 --depth=2 # timeout=10
10:39:34 > git config remote.origin.url hide # timeout=10
10:39:34 > git config --add remote.origin.fetch refs/changes/50/4750/1 # timeout=10
10:39:34 > git config remote.origin.url hide # timeout=10
10:39:34 Fetching upstream changes from hide
10:39:34 using GIT_SSH to set credentials
10:39:34 > git fetch --no-tags --progress hide refs/changes/50/4750/1 --depth=2 # timeout=10
10:40:34 > git rev-parse 44fd5ca794ef1d4ccbfb437d94ef804626dea726^{commit} # timeout=10
10:40:34 Checking out Revision 44fd5ca794ef1d4ccbfb437d94ef804626dea726 (refs/changes/50/4750/1)
10:40:34 > git config core.sparsecheckout # timeout=10
10:40:34 > git checkout -f 44fd5ca794ef1d4ccbfb437d94ef804626dea726 # timeout=10
10:40:38 Commit message: "hide"
10:40:38 > git rev-parse FETCH_HEAD^{commit} # timeout=10
10:40:38 > git rev-list --no-walk eb696053ba8f4c94c388db0334b886879e6aaa42 # timeout=10
But it's also triggering fetch twice and the second fetch is not cost free.
In our case, both fetch cost about one minute.
dogin2006 you're welcome to propose a pull request to the plugin to improve its behavior. Alternately, you could try replacing the "depth=2" with a reference repository to see if that would reduce the fetch time.
Using a reference repository doesn't fix the issue.
Using the command line git clone including reference repository, it takes approx 11 seconds and the same configuration via the checkout scm option takes 1 minute 3x seconds.
I have attempted to fix this issue:
https://github.com/jenkinsci/git-plugin/pull/845
The fix from rishabhbudhouliya has been merged to the master branch and will be included in the next release of the git plugin
I appreciate the analysis very much.
Unfortunately, the analysis doesn't seem (to me) like evidence of a bug. The git plugin (your second link) uses the git client plugin (your first link) to perform the fetch. I don't see from your description that two fetch operations are being performed for each repository being cloned.
Please provide log output which shows two fetches being performed.
Since you've provided those call sites, I assume you could also include stack traces to show that a clone is being performed twice.