Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-61439

Many branches on a multibranch repo are slowed by cache lock on master

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • git-plugin
    • None

      On our project we have a large number of feature builds across several versions that run as separate builds on Jenkins. We have noticed that our multibranch builds for this have been creeping up in run times, with no immediately obvious cause in the code changes, and after some digging, it seems that even though each build is running independently, there is a singular lock for each unique remote name that each git process waits to acquire. Even though each fetch and checkout is relatively fast, this seems to lead to a huge backlog of builds in contention over the lock as the multibranch pipeline starts each build on dev before requisitioning a slave and executing the specific tasks needed.

      I have been digging and not found anyone who seems to have hit the same issue so it is likely that we have some unusual setup that doesn't match with standard usage, but it also seems strange that we are locking on the remote name rather than the directory (which is more likely to have synchronisation issues).

      The specific bit of the plugin that is acquiring the lock seems to be AbstractGitSCMSource.getCacheLock(). I've attached a thread dump illustrating the various threads all stacked up on the singular lock.

          [JENKINS-61439] Many branches on a multibranch repo are slowed by cache lock on master

          Mark Waite added a comment - - edited

          I'm not sure what you mean by "remote name". Can you explain further?

          I would assume that the lock is being acquired on the URL of the remote repository. A single remote repository URL has a single cache directory on the Jenkins master. That cache is locked for the duration of an update to retrieve the Jenkinsfile for each branch.

          If your repository has grown significantly or if the git cache on the Jenkins master has not recently been garbage collected, you might try performing a git gc on the specific cache directory on the master, just as an exploration to see if that would improve performance.

          We could consider multiple copies of the cache on the Jenkins master with some way of choosing one of the pool of caches. That seems more complicated and has different challenges in trying to decide how deep to make the pool of caches and how to avoid disc space bloat in the pool of caches.

          You might also confirm that you are using the best branch source for your git provider. If you're using GitHub, then you should use the GitHub branch source rather than the branch source provided by the git plugin. The GitHub branch source can use REST API calls to retrieve the Jenkinsfile. Likewise for Bitbucket, Gitea, and GitLab. Use their higher level branch source if you're using one of those git providers.

          If your multibranch job definition has a "lightweight checkout" option, you might also compare the results when it is enabled and when it is disabled. We've seen cases with large repositories where enabling lightweight checkout makes the operations slower.

          Mark Waite added a comment - - edited I'm not sure what you mean by "remote name". Can you explain further? I would assume that the lock is being acquired on the URL of the remote repository. A single remote repository URL has a single cache directory on the Jenkins master. That cache is locked for the duration of an update to retrieve the Jenkinsfile for each branch. If your repository has grown significantly or if the git cache on the Jenkins master has not recently been garbage collected, you might try performing a git gc on the specific cache directory on the master, just as an exploration to see if that would improve performance. We could consider multiple copies of the cache on the Jenkins master with some way of choosing one of the pool of caches. That seems more complicated and has different challenges in trying to decide how deep to make the pool of caches and how to avoid disc space bloat in the pool of caches. You might also confirm that you are using the best branch source for your git provider. If you're using GitHub, then you should use the GitHub branch source rather than the branch source provided by the git plugin. The GitHub branch source can use REST API calls to retrieve the Jenkinsfile. Likewise for Bitbucket, Gitea, and GitLab. Use their higher level branch source if you're using one of those git providers. If your multibranch job definition has a "lightweight checkout" option, you might also compare the results when it is enabled and when it is disabled. We've seen cases with large repositories where enabling lightweight checkout makes the operations slower.

            Unassigned Unassigned
            mellams Matthew Ellams
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: