Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51542

Git checkout is slower than the command line execution

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • git-plugin
    • None
    • Development

      The git checkout is much slower even after using reference repositories, shallow clones etc.

      Running the same commands via the command line is much faster. I am using the latest version of all the plugins.

      If you look at the below output, it runs the fetch more than once.

      09:53:13 Cloning the remote Git repository
      09:53:13 Using shallow clone
      09:53:13 Avoid fetching tags
      09:53:13 Cloning repository git@xxxxxxx:xxxx/xxxxxxx.git
      09:53:13  > git init /srv/jenkins/workspace/shared-buck-2-master # timeout=10
      09:53:14 Using reference repository: /var/lib/jenkins/reference-repositories/xxxxxx.git
      09:53:14 Fetching upstream changes from git@xxxxx:xxxx/xxxxxxx.git
      09:53:14  > git --version # timeout=10
      09:53:14  > git fetch --no-tags --progress git@xxxxxx:xxxx/xxxxxxx.git +refs/heads/*:refs/remotes/xxxxxxx/* --depth=1
      09:54:15  > git config remote.xxxxxxx.url git@xxxxxxxx:xxxxx/xxxxxxxx.git # timeout=10
      09:54:15  > git config --add remote.xxxxxxxx.fetch +refs/heads/*:refs/remotes/xxxxxxx/* # timeout=10
      09:54:15  > git config remote.xxxxxxx.url git@xxxxxxxx:xxxxx/xxxxxxx.git # timeout=10
      09:54:15 Fetching upstream changes from git@xxxxxxxx:xxxx/xxxxxxx.git
      09:54:15  > git fetch --no-tags --progress git@xxxxxxxx:xxxxx/xxxxxxxx.git +refs/heads/*:refs/remotes/xxxxxx/* --depth=1
      09:54:18  > git rev-parse 87dc72cf506dcf684775c7e3be56184e09c44701^{commit} # timeout=10
      09:54:18 Checking out Revision 87dc72cf506dcf684775c7e3be56184e09c44701 (detached)
      09:54:18  > git config core.sparsecheckout # timeout=10
      09:54:18  > git checkout -f 87dc72cf506dcf684775c7e3be56184e09c44701
      09:54:46 Commit message: "@MS-123 - Increase the jvm memory size for the bat tests"

          [JENKINS-51542] Git checkout is slower than the command line execution

          Hosh added a comment - - edited

          markewaite, jglick, any updates on this? This seems to be affecting us as well. I've tested all the exact same commands Jenkins runs directly on the instance running them (as jenkins user, in /tmp), and the checkout is what we consider normal speeds, that is - with my copy pasting of the commands in the log output, it took less than ~30 seconds. I don't have the exact timings from Jenkins (not sure how to produce them), but looking through BlueOcean UI, it's showing the initial `checkout from version control` step to be over 7 minutes. This isn't a particularly large repository either. Based on GitHub's API it's only about 27 mb.

          EDIT: oddly, the blueocean ui changes the time taken for the step to be 4 seconds after a refresh.

          Hosh added a comment - - edited markewaite , jglick , any updates on this? This seems to be affecting us as well. I've tested all the exact same commands Jenkins runs directly on the instance running them (as jenkins user, in /tmp), and the checkout is what we consider normal speeds, that is - with my copy pasting of the commands in the log output, it took less than ~30 seconds. I don't have the exact timings from Jenkins (not sure how to produce them), but looking through BlueOcean UI, it's showing the initial `checkout from version control` step to be over 7 minutes. This isn't a particularly large repository either. Based on GitHub's API it's only about 27 mb. EDIT: oddly, the blueocean ui changes the time taken for the step to be 4 seconds after a refresh.

          Mark Waite added a comment -

          thehosh I've started a draft of a Google Summer of Code project idea that proposes to add a new pipeline task that will allow sh, bat, and powershell steps to perform authenticated git commands. However, Google Summer of Code implementations won't start until May 2021.

          You may want to investigate other possible causes of the performance difference you are seeing. For example, some of the questions I asked earlier included:

          • Are you using a narrow refspec to reduce the amount of data the git client requests from the git server?
          • Are you using a reference repository to reduce the amount of data the git client copies from the git server?
          • Are you mistakenly using shallow clone with a default refspec and hoping that it will improve performance? Most cases where I've used shallow clone it provided less performance improvement than I hoped. A GitHub performance blog post noted that shallow clone can be especially demanding on the git server
          • Is the agent workspace empty before the first fetch or does it have existing content? If it has existing content, is that content needing a "git gc" to restore it to good performance?

          Mark Waite added a comment - thehosh I've started a draft of a Google Summer of Code project idea that proposes to add a new pipeline task that will allow sh, bat, and powershell steps to perform authenticated git commands. However, Google Summer of Code implementations won't start until May 2021. You may want to investigate other possible causes of the performance difference you are seeing. For example, some of the questions I asked earlier included: Are you using a narrow refspec to reduce the amount of data the git client requests from the git server? Are you using a reference repository to reduce the amount of data the git client copies from the git server? Are you mistakenly using shallow clone with a default refspec and hoping that it will improve performance? Most cases where I've used shallow clone it provided less performance improvement than I hoped. A GitHub performance blog post noted that shallow clone can be especially demanding on the git server Is the agent workspace empty before the first fetch or does it have existing content? If it has existing content, is that content needing a "git gc" to restore it to good performance?

          Hosh added a comment - - edited

          markewaite Is there a reason to believe that running that the proposal would fix the issue? I thought under the hood Jenkins simply ran git commands?

          • Are you using a narrow refspec to reduce the amount of data the git client requests from the git server? - Not that I'm aware of.
          • Are you using a reference repository to reduce the amount of data the git client copies from the git server? - I'm not sure what you mean with this.
          • Are you mistakenly using shallow clone with a default refspec and hoping that it will improve performance? - Again, not that I'm aware of. But I might be misunderstanding.
          • Is the agent workspace empty before the first fetch or does it have existing content? - As you can see below, we have the "wipe out repository & force clone" enabled,

          This is a screenshot of our pipeline setup. Note that wipe out work also seems to take extremely long. However, once the initial clone is done, I can add additional clones, and they seem to run fine.

          Hosh added a comment - - edited markewaite Is there a reason to believe that running that the proposal would fix the issue? I thought under the hood Jenkins simply ran git commands? Are you using a narrow refspec to reduce the amount of data the git client requests from the git server? - Not that I'm aware of. Are you using a reference repository to reduce the amount of data the git client copies from the git server? - I'm not sure what you mean with this. Are you mistakenly using shallow clone with a default refspec and hoping that it will improve performance? - Again, not that I'm aware of. But I might be misunderstanding. Is the agent workspace empty before the first fetch or does it have existing content? - As you can see below, we have the "wipe out repository & force clone" enabled, This is a screenshot of our pipeline setup. Note that wipe out work also seems to take extremely long. However, once the initial clone is done, I can add additional clones, and they seem to run fine.

          Mark Waite added a comment -

          thehosh I do not know if the proposed withGitCredentials pipeline step would help in your case. I've not yet found a case where the original claims in this bug report could be verified. As far as I can tell, the git plugin uses command line git to fetch and checkout content from the remote repository. As far as I can tell, it does that with speed that is comparable to git clone.

          Since you're using Pipeline, you probably don't want "Wipe out repository and force clone". That same operation can be done from the pipeline itself with a pipeline task. Moving that into the pipeline definition places one more part of the job definition inside source control.

          You probably also do not want branch specifier as "$BRANCH" because that means the change history in the job will not be usable. The change history shows the changes from one build to the next, but if you build a different branch on job n-1 than is built on job n, then change log is not very useful. In most cases that I've seen, it is better to use a multibranch pipeline and allow Jenkins to create and destroy jobs as branches are created and destroyed in the repository.

          If your git provider is GitHub, Bitbucket, GitLab, or Gitea, then you should probably use the plugin that is specific to those implementations, rather than the general purpose "Git" provider that you've selected as your SCM provider. The Git SCM provider does not know that GitHub, Bitbucket, GitLab, and Gitea all provide REST APIs that can make some git operations (like polling and reading the Jenkinsfile) much faster.

          Mark Waite added a comment - thehosh I do not know if the proposed withGitCredentials pipeline step would help in your case. I've not yet found a case where the original claims in this bug report could be verified. As far as I can tell, the git plugin uses command line git to fetch and checkout content from the remote repository. As far as I can tell, it does that with speed that is comparable to git clone. Since you're using Pipeline, you probably don't want "Wipe out repository and force clone". That same operation can be done from the pipeline itself with a pipeline task. Moving that into the pipeline definition places one more part of the job definition inside source control. You probably also do not want branch specifier as "$BRANCH" because that means the change history in the job will not be usable. The change history shows the changes from one build to the next, but if you build a different branch on job n-1 than is built on job n, then change log is not very useful. In most cases that I've seen, it is better to use a multibranch pipeline and allow Jenkins to create and destroy jobs as branches are created and destroyed in the repository. If your git provider is GitHub, Bitbucket, GitLab, or Gitea, then you should probably use the plugin that is specific to those implementations, rather than the general purpose "Git" provider that you've selected as your SCM provider. The Git SCM provider does not know that GitHub, Bitbucket, GitLab, and Gitea all provide REST APIs that can make some git operations (like polling and reading the Jenkinsfile) much faster.

          Hosh added a comment -

          Thanks markewaite.

          I suppose withGitCredentials will be useful either way. I've had a need for that previously, and instead have had to workaround the issue.

          We're using job-dsl, so everything is already in git. Although, I'd love to move wipe out repository option to the pipeline, but I have on found the option in the directive generator. Maybe I'm missing something?

          I'm aware of the change history, it's not a huge issue for us. This specific job is running a test suite against our live environment. In this scenario multibranch isn't usable for us. We're planning to multibranch pipelines for other things, though the limitation of not being able to specify subdirectory to look at for monorepo setups (like github-branch-pr-change-filter but for branches). There's a Jira raised on that, but doesn't seem to be have been acted on.

          Re github provider, non-multibranch pipeline doesn't seem to support anything but git and mercurial. So we're unable to use that.

          That's all useful feedback, and appreciate it. Though the slow checkout is still an issue. It's extremely painful. Is there anything I can do to help have this debugged? Running almost the exact same setup locally (though, through docker), the checkout is as fast as I expect it to be.

          Hosh added a comment - Thanks markewaite . I suppose withGitCredentials will be useful either way. I've had a need for that previously, and instead have had to workaround the issue. We're using job-dsl, so everything is already in git. Although, I'd love to move wipe out repository option to the pipeline, but I have on found the option in the directive generator. Maybe I'm missing something? I'm aware of the change history, it's not a huge issue for us. This specific job is running a test suite against our live environment. In this scenario multibranch isn't usable for us. We're planning to multibranch pipelines for other things, though the limitation of not being able to specify subdirectory to look at for monorepo setups (like github-branch-pr-change-filter but for branches). There's a Jira raised on that, but doesn't seem to be have been acted on. Re github provider, non-multibranch pipeline doesn't seem to support anything but git and mercurial. So we're unable to use that. That's all useful feedback, and appreciate it. Though the slow checkout is still an issue. It's extremely painful. Is there anything I can do to help have this debugged? Running almost the exact same setup locally (though, through docker), the checkout is as fast as I expect it to be.

          Mark Waite added a comment -

          thehosh one of the earlier comments mentions that memory pressure on the container process can significantly slow the git process. You might check the memory available to the agent process that performs the git operations.. If it is a kubernetes agent, then you'll need to assure that the JNLP agent has enough memory allocated.

          I can't duplicate the problem and haven't seen new information that indicates I should again attempt to duplicate the problem. If you find a way that allows others to duplicate the problem, I would be willing to try to duplicate it.

          Mark Waite added a comment - thehosh one of the earlier comments mentions that memory pressure on the container process can significantly slow the git process. You might check the memory available to the agent process that performs the git operations.. If it is a kubernetes agent, then you'll need to assure that the JNLP agent has enough memory allocated. I can't duplicate the problem and haven't seen new information that indicates I should again attempt to duplicate the problem. If you find a way that allows others to duplicate the problem, I would be willing to try to duplicate it.

          Hosh added a comment -

          Thank you markewaite helpful.

          After reading your comment, and moments earlier noticing our Jenkins backup job taking >10 hours, something clicked in my brain and made the connection. I realised this might not be a Jenkins issue at all, and after some more investigating, it looked like a disk IO issue. We're using AWS EFS (which is NFS under the hood) to store Jenkins home, so when I said I tested it earlier, sadly, I was testing in /tmp which is not on the NFS storage, so this explains why when I ran it, it ran fine. And these manual tests I ran were the main reason I thought it might be Jenkins.
          It turned out that due to our backup job (which we had misconfigured in the first place), all our IOPS were being eaten up. We're going change our EFS storage so that we get a little bit more oomph from it, and of course we'll fix our backup job too.

          For anyone else running into this issue, I suggest you look at resources available to Jenkins.

          Hosh added a comment - Thank you markewaite helpful. After reading your comment, and moments earlier noticing our Jenkins backup job taking >10 hours, something clicked in my brain and made the connection. I realised this might not be a Jenkins issue at all, and after some more investigating, it looked like a disk IO issue. We're using AWS EFS (which is NFS under the hood) to store Jenkins home, so when I said I tested it earlier, sadly, I was testing in /tmp which is not on the NFS storage, so this explains why when I ran it, it ran fine. And these manual tests I ran were the main reason I thought it might be Jenkins. It turned out that due to our backup job (which we had misconfigured in the first place), all our IOPS were being eaten up. We're going change our EFS storage so that we get a little bit more oomph from it, and of course we'll fix our backup job too. For anyone else running into this issue, I suggest you look at resources available to Jenkins.

          Mark Waite added a comment -

          Good to hear that thehosh. Thanks for sharing. Experiences with git on network file systems are often complicated by the different locking semantics and performance characteristics of network file systems. You've provided excellent advice to network file system users. Thanks again.

          Mark Waite added a comment - Good to hear that thehosh . Thanks for sharing. Experiences with git on network file systems are often complicated by the different locking semantics and performance characteristics of network file systems. You've provided excellent advice to network file system users. Thanks again.

          Mark Waite added a comment -

          The withCredentials step that has been implemented in the git plugin allows users to perform their own authenticated operations with command line git in sh, bat, and powershell steps. In those cases where a user finds that the git plugin is much slower than command line git (memory pressure in the JNLP container, need specific settings on git command line, etc.), the user can replace the checkout scm call and use a withCredentials block with the git commands inside an sh, bat, or powershell step

          Mark Waite added a comment - The withCredentials step that has been implemented in the git plugin allows users to perform their own authenticated operations with command line git in sh, bat, and powershell steps. In those cases where a user finds that the git plugin is much slower than command line git (memory pressure in the JNLP container, need specific settings on git command line, etc.), the user can replace the checkout scm call and use a withCredentials block with the git commands inside an sh, bat, or powershell step

          For me it was because of lfs "filtering content"

          Andrew Somerville added a comment - For me it was because of lfs "filtering content"

            Unassigned Unassigned
            oliverp Oliver Pereira
            Votes:
            7 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated:
              Resolved: