Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51542

Git checkout is slower than the command line execution

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: git-plugin
    • Labels:
      None
    • Environment:
      Development
    • Similar Issues:

      Description

      The git checkout is much slower even after using reference repositories, shallow clones etc.

      Running the same commands via the command line is much faster. I am using the latest version of all the plugins.

      If you look at the below output, it runs the fetch more than once.

      09:53:13 Cloning the remote Git repository
      09:53:13 Using shallow clone
      09:53:13 Avoid fetching tags
      09:53:13 Cloning repository git@xxxxxxx:xxxx/xxxxxxx.git
      09:53:13  > git init /srv/jenkins/workspace/shared-buck-2-master # timeout=10
      09:53:14 Using reference repository: /var/lib/jenkins/reference-repositories/xxxxxx.git
      09:53:14 Fetching upstream changes from git@xxxxx:xxxx/xxxxxxx.git
      09:53:14  > git --version # timeout=10
      09:53:14  > git fetch --no-tags --progress git@xxxxxx:xxxx/xxxxxxx.git +refs/heads/*:refs/remotes/xxxxxxx/* --depth=1
      09:54:15  > git config remote.xxxxxxx.url git@xxxxxxxx:xxxxx/xxxxxxxx.git # timeout=10
      09:54:15  > git config --add remote.xxxxxxxx.fetch +refs/heads/*:refs/remotes/xxxxxxx/* # timeout=10
      09:54:15  > git config remote.xxxxxxx.url git@xxxxxxxx:xxxxx/xxxxxxx.git # timeout=10
      09:54:15 Fetching upstream changes from git@xxxxxxxx:xxxx/xxxxxxx.git
      09:54:15  > git fetch --no-tags --progress git@xxxxxxxx:xxxxx/xxxxxxxx.git +refs/heads/*:refs/remotes/xxxxxx/* --depth=1
      09:54:18  > git rev-parse 87dc72cf506dcf684775c7e3be56184e09c44701^{commit} # timeout=10
      09:54:18 Checking out Revision 87dc72cf506dcf684775c7e3be56184e09c44701 (detached)
      09:54:18  > git config core.sparsecheckout # timeout=10
      09:54:18  > git checkout -f 87dc72cf506dcf684775c7e3be56184e09c44701
      09:54:46 Commit message: "@MS-123 - Increase the jvm memory size for the bat tests"

        Attachments

          Issue Links

            Activity

            Hide
            markewaite Mark Waite added a comment -

            Hosh I do not know if the proposed withGitCredentials pipeline step would help in your case. I've not yet found a case where the original claims in this bug report could be verified. As far as I can tell, the git plugin uses command line git to fetch and checkout content from the remote repository. As far as I can tell, it does that with speed that is comparable to git clone.

            Since you're using Pipeline, you probably don't want "Wipe out repository and force clone". That same operation can be done from the pipeline itself with a pipeline task. Moving that into the pipeline definition places one more part of the job definition inside source control.

            You probably also do not want branch specifier as "$BRANCH" because that means the change history in the job will not be usable. The change history shows the changes from one build to the next, but if you build a different branch on job n-1 than is built on job n, then change log is not very useful. In most cases that I've seen, it is better to use a multibranch pipeline and allow Jenkins to create and destroy jobs as branches are created and destroyed in the repository.

            If your git provider is GitHub, Bitbucket, GitLab, or Gitea, then you should probably use the plugin that is specific to those implementations, rather than the general purpose "Git" provider that you've selected as your SCM provider. The Git SCM provider does not know that GitHub, Bitbucket, GitLab, and Gitea all provide REST APIs that can make some git operations (like polling and reading the Jenkinsfile) much faster.

            Show
            markewaite Mark Waite added a comment - Hosh I do not know if the proposed withGitCredentials pipeline step would help in your case. I've not yet found a case where the original claims in this bug report could be verified. As far as I can tell, the git plugin uses command line git to fetch and checkout content from the remote repository. As far as I can tell, it does that with speed that is comparable to git clone. Since you're using Pipeline, you probably don't want "Wipe out repository and force clone". That same operation can be done from the pipeline itself with a pipeline task. Moving that into the pipeline definition places one more part of the job definition inside source control. You probably also do not want branch specifier as "$BRANCH" because that means the change history in the job will not be usable. The change history shows the changes from one build to the next, but if you build a different branch on job n-1 than is built on job n, then change log is not very useful. In most cases that I've seen, it is better to use a multibranch pipeline and allow Jenkins to create and destroy jobs as branches are created and destroyed in the repository. If your git provider is GitHub, Bitbucket, GitLab, or Gitea, then you should probably use the plugin that is specific to those implementations, rather than the general purpose "Git" provider that you've selected as your SCM provider. The Git SCM provider does not know that GitHub, Bitbucket, GitLab, and Gitea all provide REST APIs that can make some git operations (like polling and reading the Jenkinsfile) much faster.
            Hide
            thehosh Hosh added a comment -

            Thanks Mark Waite.

            I suppose withGitCredentials will be useful either way. I've had a need for that previously, and instead have had to workaround the issue.

            We're using job-dsl, so everything is already in git. Although, I'd love to move wipe out repository option to the pipeline, but I have on found the option in the directive generator. Maybe I'm missing something?

            I'm aware of the change history, it's not a huge issue for us. This specific job is running a test suite against our live environment. In this scenario multibranch isn't usable for us. We're planning to multibranch pipelines for other things, though the limitation of not being able to specify subdirectory to look at for monorepo setups (like github-branch-pr-change-filter but for branches). There's a Jira raised on that, but doesn't seem to be have been acted on.

            Re github provider, non-multibranch pipeline doesn't seem to support anything but git and mercurial. So we're unable to use that.

            That's all useful feedback, and appreciate it. Though the slow checkout is still an issue. It's extremely painful. Is there anything I can do to help have this debugged? Running almost the exact same setup locally (though, through docker), the checkout is as fast as I expect it to be.

            Show
            thehosh Hosh added a comment - Thanks Mark Waite . I suppose withGitCredentials will be useful either way. I've had a need for that previously, and instead have had to workaround the issue. We're using job-dsl, so everything is already in git. Although, I'd love to move wipe out repository option to the pipeline, but I have on found the option in the directive generator. Maybe I'm missing something? I'm aware of the change history, it's not a huge issue for us. This specific job is running a test suite against our live environment. In this scenario multibranch isn't usable for us. We're planning to multibranch pipelines for other things, though the limitation of not being able to specify subdirectory to look at for monorepo setups (like github-branch-pr-change-filter but for branches). There's a Jira raised on that, but doesn't seem to be have been acted on. Re github provider, non-multibranch pipeline doesn't seem to support anything but git and mercurial. So we're unable to use that. That's all useful feedback, and appreciate it. Though the slow checkout is still an issue. It's extremely painful. Is there anything I can do to help have this debugged? Running almost the exact same setup locally (though, through docker), the checkout is as fast as I expect it to be.
            Hide
            markewaite Mark Waite added a comment -

            Hosh one of the earlier comments mentions that memory pressure on the container process can significantly slow the git process. You might check the memory available to the agent process that performs the git operations.. If it is a kubernetes agent, then you'll need to assure that the JNLP agent has enough memory allocated.

            I can't duplicate the problem and haven't seen new information that indicates I should again attempt to duplicate the problem. If you find a way that allows others to duplicate the problem, I would be willing to try to duplicate it.

            Show
            markewaite Mark Waite added a comment - Hosh one of the earlier comments mentions that memory pressure on the container process can significantly slow the git process. You might check the memory available to the agent process that performs the git operations.. If it is a kubernetes agent, then you'll need to assure that the JNLP agent has enough memory allocated. I can't duplicate the problem and haven't seen new information that indicates I should again attempt to duplicate the problem. If you find a way that allows others to duplicate the problem, I would be willing to try to duplicate it.
            Hide
            thehosh Hosh added a comment -

            Thank you Mark Waite helpful.

            After reading your comment, and moments earlier noticing our Jenkins backup job taking >10 hours, something clicked in my brain and made the connection. I realised this might not be a Jenkins issue at all, and after some more investigating, it looked like a disk IO issue. We're using AWS EFS (which is NFS under the hood) to store Jenkins home, so when I said I tested it earlier, sadly, I was testing in /tmp which is not on the NFS storage, so this explains why when I ran it, it ran fine. And these manual tests I ran were the main reason I thought it might be Jenkins.
            It turned out that due to our backup job (which we had misconfigured in the first place), all our IOPS were being eaten up. We're going change our EFS storage so that we get a little bit more oomph from it, and of course we'll fix our backup job too.

            For anyone else running into this issue, I suggest you look at resources available to Jenkins.

            Show
            thehosh Hosh added a comment - Thank you Mark Waite helpful. After reading your comment, and moments earlier noticing our Jenkins backup job taking >10 hours, something clicked in my brain and made the connection. I realised this might not be a Jenkins issue at all, and after some more investigating, it looked like a disk IO issue. We're using AWS EFS (which is NFS under the hood) to store Jenkins home, so when I said I tested it earlier, sadly, I was testing in /tmp which is not on the NFS storage, so this explains why when I ran it, it ran fine. And these manual tests I ran were the main reason I thought it might be Jenkins. It turned out that due to our backup job (which we had misconfigured in the first place), all our IOPS were being eaten up. We're going change our EFS storage so that we get a little bit more oomph from it, and of course we'll fix our backup job too. For anyone else running into this issue, I suggest you look at resources available to Jenkins.
            Hide
            markewaite Mark Waite added a comment -

            Good to hear that Hosh. Thanks for sharing. Experiences with git on network file systems are often complicated by the different locking semantics and performance characteristics of network file systems. You've provided excellent advice to network file system users. Thanks again.

            Show
            markewaite Mark Waite added a comment - Good to hear that Hosh . Thanks for sharing. Experiences with git on network file systems are often complicated by the different locking semantics and performance characteristics of network file systems. You've provided excellent advice to network file system users. Thanks again.

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              oliverp Oliver Pereira
              Votes:
              7 Vote for this issue
              Watchers:
              18 Start watching this issue

                Dates

                Created:
                Updated: