Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-33273

Optimize Jenkinsfile loading and branch detection

      Currently if use the Git branch source, your repository gets cloned three times:

      1. Once to determine the head revision of the branch.
      2. Once to load Jenkinsfile.
      3. Once when you checkout scm as part of your build. (Normally once. Could be multiple times, or even zero.)

      This could be a performance issue for large repositories. What we would rather do in the second step is use SCMFileSystem to locate Jenkinsfile. Unfortunately this is not currently implemented in any of the current scm-api clients (git, subversion, mercurial, github-branch-source).

      One of the first two clones would still be necessary for Git since the remote protocol functionality here is limited to git-ls-remote, which could yield the head revision needed for step one (which is currently implemented using a local cache of the repository), but not the contents of Jenkinsfile (which, unlike for Literate, we need to have before the build starts). The Git plugin could be enhanced to avoid needing a cache for the first step, but this would not help workflow-multibranch at all since it would wind up needing it for an implementation of SCMFileSystem in the second step anyway.

      For Mercurial, the situation is somewhat similar in that the wire protocol does not support remote file access. It might in principle support remote head revision calculation, but not using the standard hg binary, which this plugin is based on. So the Mercurial plugin when used as a branch source just requests that you enabling the "caching" feature, which existed long before this, and which maintains a local clone of the repository. That could serve as the implementation of SCMFileSystem as well as providing the tip revision. Unlike with the Git plugin, the Mercurial plugin will actually reuse this cache during a regular workspace checkout, which has some advantages—for example, your slave need not be able to make a direct connection to the Mercurial server, so long as the master can—but may also be undesirable for performance, since it uses the Jenkins remoting channel. Probably the Mercurial plugin needs an intermediate mode, whereby caches are maintained on both master and slave, yet the slave cache is synchronized directly with the remote server rather than with the master. Or whereby the master maintains a cache, but the slave does not use caching at all.

      For Subversion the situation is simpler since the wire protocol (and, as far as I know, SVNKit) supports both remote head revision calculation and remote file retrieval. Therefore the only actual checkout would be in the user workspace.

      Note that in principle the first two steps could be collapsed: check out the SCM including Jenkinsfile to the master's jobname@script workspace, as with pre-multibranch CpsScmFlowDefinition, and then inspect the checkout after the fact to find its revision somehow for SCMBinder: for example, using git rev-parse HEAD, or hg log -r . --template '{node}\n', or svnversion .. Unfortunately scm-api offers no generic way of doing this; you need to call build with an SCMRevision but that revision can only be gotten by the repository inspection. So if this approach is to be taken, a new API would need to be introduced and implemented in the major SCM plugins. Anyway this approach is less desirable for the case of a massive working copy.

          [JENKINS-33273] Optimize Jenkinsfile loading and branch detection

          jglick The goal here is to reduce the number of checkouts, right? github-branch-source only does the third clone but according to the changes needs here JENKINS-33237, github-branch-source will need a second checkout.

          I agree to reduce all of these checkouts to one.

          Manuel Recena Soto added a comment - jglick The goal here is to reduce the number of checkouts, right? github-branch-source only does the third clone but according to the changes needs here JENKINS-33237 , github-branch-source will need a second checkout. I agree to reduce all of these checkouts to one.

          Jesse Glick added a comment -

          Need to update ReadTrustedStep as well.

          Jesse Glick added a comment - Need to update ReadTrustedStep as well.

          Jesse Glick added a comment -

          The first copy has not been a problem in github-branch-source for a long time, but the second still is.

          Jesse Glick added a comment - The first copy has not been a problem in github-branch-source for a long time, but the second still is.

          Peter Wiseman added a comment -

          I work on a multi-team project where each checkout is 3GB, we have team branches and are considering a move to feature branches. A checkout takes 5 minutes. Hence both the storage and time are rather significant.

          For SVN, another option may be to add support for depth=files, as is provided for a Single Repository..

          Peter Wiseman added a comment - I work on a multi-team project where each checkout is 3GB, we have team branches and are considering a move to feature branches. A checkout takes 5 minutes. Hence both the storage and time are rather significant. For SVN, another option may be to add support for depth=files, as is provided for a Single Repository..

          John Zila added a comment -

          An additional issue is that the cloned Jenkinsfile workspace is never cleaned up from the master, so my master is filling up. I have to go in manually and wipe previous workspaces. Is there a way to deal with this?

          John Zila added a comment - An additional issue is that the cloned Jenkinsfile workspace is never cleaned up from the master, so my master is filling up. I have to go in manually and wipe previous workspaces. Is there a way to deal with this?

          Kevin Phillips added a comment - - edited

          Would there not be a way to simply have a pipeline / multibranch job checkout the code for a specific branch on a specific node, and then execute the DSL from the Jenkinsfile directly on that node so that the checkout and build may be performed in the same workspace on the same agent? This would seem to be more inline with how other, more modern CI systems work like travis-ci.

          I could be wrong, but it seems to me the design of the pipeline / multibranch plugins assume that the configuration file will be stored independently from the code being built by the job orchestration. This seems like a design flaw to me seeing as how the main benefit of these plugins is in having the config file stored with the application code. If storing the Jenkinsfile with the code is in fact a requirement for these plugins then any solution that requires the application code (ie: an entire repository in the Git case) to be checked out multiple times in multiple locations for each build is simply unscalable.

          Kevin Phillips added a comment - - edited Would there not be a way to simply have a pipeline / multibranch job checkout the code for a specific branch on a specific node, and then execute the DSL from the Jenkinsfile directly on that node so that the checkout and build may be performed in the same workspace on the same agent? This would seem to be more inline with how other, more modern CI systems work like travis-ci. I could be wrong, but it seems to me the design of the pipeline / multibranch plugins assume that the configuration file will be stored independently from the code being built by the job orchestration. This seems like a design flaw to me seeing as how the main benefit of these plugins is in having the config file stored with the application code. If storing the Jenkinsfile with the code is in fact a requirement for these plugins then any solution that requires the application code (ie: an entire repository in the Git case) to be checked out multiple times in multiple locations for each build is simply unscalable.

          Kevin Phillips added a comment - - edited

          pwiseman In some cases, where checkouts take dozens of minutes or more, this overhead can be even more significant. Also, besides the obvious performance impact of having multiple checkouts of the same repository across a build farm is the storage overhead this causes on the Jenkins master. Suppose you have a large repository with dozens of gigabytes of data, and your teams are encouraged to follow agile methodologies and create feature branches regularly. Each one of these short-lived branches would then be checked out in it's entirety on the master consuming enormous amounts of space. Combine this with the fact that these "temporary" workspaces seem to persist indefinitely would inevitably result in the Jenkins master running out of storage space.

          IMO this is a fundamental design flaw with the pipeline plugin that completely prevents it from being usable at scale.

          PS: Even though there are ways to checkout subsets of a repository with certain SCM tools like SVN, many popular alternatives (ie: Git) either can't do such operations at all or at the very least are very difficult to do so.

          Kevin Phillips added a comment - - edited pwiseman In some cases, where checkouts take dozens of minutes or more, this overhead can be even more significant. Also, besides the obvious performance impact of having multiple checkouts of the same repository across a build farm is the storage overhead this causes on the Jenkins master. Suppose you have a large repository with dozens of gigabytes of data, and your teams are encouraged to follow agile methodologies and create feature branches regularly. Each one of these short-lived branches would then be checked out in it's entirety on the master consuming enormous amounts of space. Combine this with the fact that these "temporary" workspaces seem to persist indefinitely would inevitably result in the Jenkins master running out of storage space. IMO this is a fundamental design flaw with the pipeline plugin that completely prevents it from being usable at scale. PS: Even though there are ways to checkout subsets of a repository with certain SCM tools like SVN, many popular alternatives (ie: Git) either can't do such operations at all or at the very least are very difficult to do so.

          Jesse Glick added a comment -

          Could be useful for Global Variable Reference support in JENKINS-31155.

          Jesse Glick added a comment - Could be useful for Global Variable Reference support in JENKINS-31155 .

          It would be great to have one reference repo at master or one per node. Nearby nodes could also clone from master neither from GitHub.

          1st fetch can be done on the reference repo at master

          2nd fetch or checkout on master to retrieve Jenkinsfile can be avoided with use of git archive. Something like:

          git archive --format=tar --remote=file://"${REF_CLONE}"/.git ${GIT_BRANCH}:Jenkinsfile | tar xf -

          3rd fetch on slave can be done from local reference repo, that itself might be cloned from reference repo at master. This can be done with git-worktree.

          Yet 3rd fetch on slaves can be optimized in Jenkinsfile with use of manually created local reference repo in ${WORKSPACE}/../ref_repo. A bit dirty, but still works:

          def ws = pwd()
              checkout([                         $class: 'GitSCM',
                                               branches: [[name: "**/${env.BRANCH_NAME}".replaceAll('PR-','pr/')]],
                      doGenerateSubmoduleConfigurations: false,
                                             extensions: [[$class: 'CloneOption',
                                                            depth: 0,
                                                           noTags: false,
                                                        reference: "${ws}/../ref_repo/.git",
                                                          shallow: false]],
                                           submoduleCfg: [],
                                      userRemoteConfigs: [[refspec: '+refs/pull/*/head:refs/remotes/origin/pr/*',
                                                               url: 'git@github.com:${ORG}/${REPO}.git']]
              ])

          Alexander Vorobiev added a comment - It would be great to have one reference repo at master or one per node. Nearby nodes could also clone from master neither from GitHub. 1st fetch can be done on the reference repo at master 2nd fetch or checkout on master to retrieve Jenkinsfile can be avoided with use of git archive. Something like: git archive --format=tar --remote=file: // "${REF_CLONE}" /.git ${GIT_BRANCH}:Jenkinsfile | tar xf - 3rd fetch on slave can be done from local reference repo, that itself might be cloned from reference repo at master. This can be done with git-worktree. Yet 3rd fetch on slaves can be optimized in Jenkinsfile with use of manually created local reference repo in ${WORKSPACE}/../ref_repo. A bit dirty, but still works: def ws = pwd() checkout([ $class: 'GitSCM' , branches: [[name: "**/${env.BRANCH_NAME}" .replaceAll( 'PR-' , 'pr/' )]], doGenerateSubmoduleConfigurations: false , extensions: [[$class: 'CloneOption' , depth: 0, noTags: false , reference: "${ws}/../ref_repo/.git" , shallow: false ]], submoduleCfg: [], userRemoteConfigs: [[refspec: '+refs/pull/*/head:refs/remotes/origin/pr/*' , url: 'git@github.com:${ORG}/${REPO}.git' ]] ])

          Alex Ehlke added a comment -

          Just chiming in that this is a complete blocker for us (our repo takes too long to do a clone just to get the Jenkinsfile for each step). Thanks.

          Alex Ehlke added a comment - Just chiming in that this is a complete blocker for us (our repo takes too long to do a clone just to get the Jenkinsfile for each step). Thanks.

          Martin Ringehahn added a comment - - edited

          Martin Ringehahn added a comment - - edited We're using this (git) workaround for now https://github.com/tophatmonocle/workflow-multibranch-plugin/commit/JENKINS-33273-git-hack

          Total blocker here too: Our 15-years old C++ repo takes a whoppin' 5 gigabytes.

          Christophe Carpentier added a comment - Total blocker here too: Our 15-years old C++ repo takes a whoppin' 5 gigabytes.

          Edward Easton added a comment -

          Hi there,
          A big +1 from me too on this one. Some of our repos are 15gigs (full of test data mainly, not ideal I know!).
          One suggestion I have:

          • if can we configure which file the server uses for the build instructions (ie, defaults to 'Jenkinsfile'), then we'd only need to check out that one file as part of steps 1) and 2) int he original ticket
          • if a project's Jenkinsfile loads other groovy files as well, then these could be specified in the build config to check out as well as the Jenkinsfile, for step 2) above

          Thanks!

          Edward Easton added a comment - Hi there, A big +1 from me too on this one. Some of our repos are 15gigs (full of test data mainly, not ideal I know!). One suggestion I have: if can we configure which file the server uses for the build instructions (ie, defaults to 'Jenkinsfile'), then we'd only need to check out that one file as part of steps 1) and 2) int he original ticket if a project's Jenkinsfile loads other groovy files as well, then these could be specified in the build config to check out as well as the Jenkinsfile, for step 2) above Thanks!

          Eli White added a comment -

          If these checkouts on the build slaves could be done by doing a shallow clone with a specific refspec then the steps would take about 15 seconds down from 20 minutes.

          I'd be curious to know if that would work for others' use cases as well.

          Eli White added a comment - If these checkouts on the build slaves could be done by doing a shallow clone with a specific refspec then the steps would take about 15 seconds down from 20 minutes. I'd be curious to know if that would work for others' use cases as well.

          Jesse Glick added a comment -

          Currently limiting scope to individual files requested to load. Could perhaps make SCMSourceRetriever (for external libraries) also use SCMFileSystem, though it is probably a toss-up since (a) Pipeline libraries are expected to be small repositories, (b) they may have multiple files in them, which would be more efficiently loaded from a single checkout.

          Jesse Glick added a comment - Currently limiting scope to individual files requested to load. Could perhaps make SCMSourceRetriever (for external libraries) also use SCMFileSystem , though it is probably a toss-up since (a) Pipeline libraries are expected to be small repositories, (b) they may have multiple files in them, which would be more efficiently loaded from a single checkout.

          Eli White added a comment -

          When you say pipeline libraries, do you mean repos that contain groovy scripts for use in pipeline? Repos that use pipeline to run their tests? Or something else?

          It would not seem like a reasonable assumption that repos that use pipeline to run their tests would be small repositories. Just making sure we are thinking the same thing here.

          Eli White added a comment - When you say pipeline libraries, do you mean repos that contain groovy scripts for use in pipeline? Repos that use pipeline to run their tests? Or something else? It would not seem like a reasonable assumption that repos that use pipeline to run their tests would be small repositories. Just making sure we are thinking the same thing here.

          Jesse Glick added a comment -

          When you say pipeline libraries, do you mean

          Documentation

          These are intended to be very small, often just a single source file.

          Jesse Glick added a comment - When you say pipeline libraries, do you mean Documentation These are intended to be very small, often just a single source file.

          Code changed in jenkins
          User: Stephen Connolly
          Path:
          demo/Dockerfile
          demo/Makefile
          demo/plugins.txt
          src/main/java/org/jenkinsci/plugins/github_branch_source/GitHubSCMFileSystem.java
          http://jenkins-ci.org/commit/github-branch-source-plugin/29125b2e488fc9cd60a6c914b7fb38ce97d3d606
          Log:
          Merge pull request #104 from jglick/SCMFileSystem-JENKINS-33273

          JENKINS-33273 Jenkinsfile from SCMFileSystem

          Compare: https://github.com/jenkinsci/github-branch-source-plugin/compare/a6058dfc17b8...29125b2e488f

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: demo/Dockerfile demo/Makefile demo/plugins.txt src/main/java/org/jenkinsci/plugins/github_branch_source/GitHubSCMFileSystem.java http://jenkins-ci.org/commit/github-branch-source-plugin/29125b2e488fc9cd60a6c914b7fb38ce97d3d606 Log: Merge pull request #104 from jglick/SCMFileSystem- JENKINS-33273 JENKINS-33273 Jenkinsfile from SCMFileSystem Compare: https://github.com/jenkinsci/github-branch-source-plugin/compare/a6058dfc17b8...29125b2e488f

          Code changed in jenkins
          User: Jesse Glick
          Path:
          pom.xml
          src/main/java/org/jenkinsci/plugins/workflow/multibranch/ReadTrustedStep.java
          src/main/java/org/jenkinsci/plugins/workflow/multibranch/SCMBinder.java
          src/test/java/org/jenkinsci/plugins/workflow/multibranch/SCMBinderTest.java
          http://jenkins-ci.org/commit/workflow-multibranch-plugin/683cce9db06b33cb675c745053b98e8567d2d873
          Log:
          JENKINS-33273 Make SCMBinder use SCMFileSystem where available.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml src/main/java/org/jenkinsci/plugins/workflow/multibranch/ReadTrustedStep.java src/main/java/org/jenkinsci/plugins/workflow/multibranch/SCMBinder.java src/test/java/org/jenkinsci/plugins/workflow/multibranch/SCMBinderTest.java http://jenkins-ci.org/commit/workflow-multibranch-plugin/683cce9db06b33cb675c745053b98e8567d2d873 Log: JENKINS-33273 Make SCMBinder use SCMFileSystem where available.

          Code changed in jenkins
          User: Stephen Connolly
          Path:
          src/main/java/org/jenkinsci/plugins/workflow/multibranch/ReadTrustedStep.java
          src/main/java/org/jenkinsci/plugins/workflow/multibranch/SCMBinder.java
          http://jenkins-ci.org/commit/workflow-multibranch-plugin/8431c36d9dd7c3150fc0cca97f1a3910ee483a0d
          Log:
          JENKINS-33273 Release resources

          Compare: https://github.com/jenkinsci/workflow-multibranch-plugin/compare/973fc794757d...8431c36d9dd7

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: src/main/java/org/jenkinsci/plugins/workflow/multibranch/ReadTrustedStep.java src/main/java/org/jenkinsci/plugins/workflow/multibranch/SCMBinder.java http://jenkins-ci.org/commit/workflow-multibranch-plugin/8431c36d9dd7c3150fc0cca97f1a3910ee483a0d Log: JENKINS-33273 Release resources Compare: https://github.com/jenkinsci/workflow-multibranch-plugin/compare/973fc794757d...8431c36d9dd7

          Would it be possible to move the Jenkinsfile to a subfolder and then let the jenkins master only checkout this directory? This is quite an issue if you have a jenkins instance for many repositories.

          Roman Bäriswyl added a comment - Would it be possible to move the Jenkinsfile to a subfolder and then let the jenkins master only checkout this directory? This is quite an issue if you have a jenkins instance for many repositories.

          We tried this successfully with SVN SCM, but the multibranch-pipeline-plugin will append the sub-directory as pipeline name (e.g. "trunk/jenkins"). Another option is to reduce the checkout depth when you do not use the multibranch-pipeline-plugin.

          Christian Kulenkampff added a comment - We tried this successfully with SVN SCM, but the multibranch-pipeline-plugin will append the sub-directory as pipeline name (e.g. "trunk/jenkins"). Another option is to reduce the checkout depth when you do not use the multibranch-pipeline-plugin.

          Unfortunately I am using (and need) the multibranch-pipeline-plugin. There I do not have any possibilities to define a subfolder or reduce the checkout depth.

          Roman Bäriswyl added a comment - Unfortunately I am using (and need) the multibranch-pipeline-plugin. There I do not have any possibilities to define a subfolder or reduce the checkout depth.

          Christian Kulenkampff added a comment - - edited

          You can use a subfolder, but it's really just a workaround...

          Use this for the "Include branches"-field:

          trunk/jenkins,branches/*/jenkins,tags/*/jenkins,sandbox/*/jenkins
          

          And then start with this in the Jenkinsfile (which resides in a subfolder named jenkins):

          def scmLoc = scm.locations[0].withRemote(scm.locations[0].remote.replaceAll(/\/jenkins\/?(@\d+)$/,'$1'))
          
          checkout([$class: 'SubversionSCM', additionalCredentials: scm.additionalCredentials, excludedCommitMessages: scm.excludedCommitMessages, excludedRegions: scm.excludedRegions, excludedRevprop: scm.excludedRevprop, excludedUsers: scm.excludedUsers, filterChangelog: scm.filterChangelog, ignoreDirPropChanges: scm.ignoreDirPropChanges, includedRegions: scm.includedRegions, locations: [scmLoc], workspaceUpdater: [$class: 'UpdateWithCleanUpdater']])
          

          Then the master will only checkout the jenkins folders and the script will checkout the parent folder of the jenkins-folder.

          Christian Kulenkampff added a comment - - edited You can use a subfolder, but it's really just a workaround... Use this for the "Include branches"-field: trunk/jenkins,branches/*/jenkins,tags/*/jenkins,sandbox/*/jenkins And then start with this in the Jenkinsfile (which resides in a subfolder named jenkins): def scmLoc = scm.locations[0].withRemote(scm.locations[0].remote.replaceAll(/\/jenkins\/?(@\d+)$/,'$1')) checkout([$class: 'SubversionSCM', additionalCredentials: scm.additionalCredentials, excludedCommitMessages: scm.excludedCommitMessages, excludedRegions: scm.excludedRegions, excludedRevprop: scm.excludedRevprop, excludedUsers: scm.excludedUsers, filterChangelog: scm.filterChangelog, ignoreDirPropChanges: scm.ignoreDirPropChanges, includedRegions: scm.includedRegions, locations: [scmLoc], workspaceUpdater: [$class: 'UpdateWithCleanUpdater']]) Then the master will only checkout the jenkins folders and the script will checkout the parent folder of the jenkins-folder.

          Roman Bäriswyl added a comment - - edited

          Very clever solution but unfortunately does not work for git Also the BitBucket Branch Source Plugin seems not to help. The entire branch is still checked out on the master.

          Roman Bäriswyl added a comment - - edited Very clever solution but unfortunately does not work for git Also the BitBucket Branch Source Plugin seems not to help. The entire branch is still checked out on the master.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/org/jenkinsci/plugins/workflow/multibranch/ReadTrustedStep.java
          src/main/java/org/jenkinsci/plugins/workflow/multibranch/SCMBinder.java
          src/test/java/org/jenkinsci/plugins/workflow/multibranch/ReadTrustedStepTest.java
          src/test/java/org/jenkinsci/plugins/workflow/multibranch/SCMBinderTest.java
          http://jenkins-ci.org/commit/workflow-multibranch-plugin/8aa119c1fa33a53bd5e84f3abab67aedcdbe6e5a
          Log:
          Merge pull request #49 from jglick/SCMFileSystem-JENKINS-33273

          JENKINS-33273 Use SCMFileSystem where available

          Compare: https://github.com/jenkinsci/workflow-multibranch-plugin/compare/c26d3dc85bf0...8aa119c1fa33

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/org/jenkinsci/plugins/workflow/multibranch/ReadTrustedStep.java src/main/java/org/jenkinsci/plugins/workflow/multibranch/SCMBinder.java src/test/java/org/jenkinsci/plugins/workflow/multibranch/ReadTrustedStepTest.java src/test/java/org/jenkinsci/plugins/workflow/multibranch/SCMBinderTest.java http://jenkins-ci.org/commit/workflow-multibranch-plugin/8aa119c1fa33a53bd5e84f3abab67aedcdbe6e5a Log: Merge pull request #49 from jglick/SCMFileSystem- JENKINS-33273 JENKINS-33273 Use SCMFileSystem where available Compare: https://github.com/jenkinsci/workflow-multibranch-plugin/compare/c26d3dc85bf0...8aa119c1fa33

          An additional concern is security:

          If you have enterprise-wide requirements about where source can "rest", then storing more than the Jenkinsfile (e.g. the .git directory) on the master can cause headaches.

          Christian Höltje added a comment - An additional concern is security: If you have enterprise-wide requirements about where source can "rest", then storing more than the Jenkinsfile (e.g. the .git directory) on the master can cause headaches.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/org/jenkinsci/plugins/workflow/multibranch/ReadTrustedStep.java
          src/main/java/org/jenkinsci/plugins/workflow/multibranch/SCMBinder.java
          http://jenkins-ci.org/commit/workflow-multibranch-plugin/a979843802fff36c31338dffe556e803a8449557
          Log:
          JENKINS-33273 kill switch.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/org/jenkinsci/plugins/workflow/multibranch/ReadTrustedStep.java src/main/java/org/jenkinsci/plugins/workflow/multibranch/SCMBinder.java http://jenkins-ci.org/commit/workflow-multibranch-plugin/a979843802fff36c31338dffe556e803a8449557 Log: JENKINS-33273 kill switch.

          Chris Denneen added a comment - - edited

          jglick I have referenced this issue from https://issues.jenkins-ci.org/browse/JENKINS-35282
          I'm curious... it seems the problem stems from the Pipeline from SCM is checking out as JOB@script vs just JOB workspace.
          Since the files are already in the JOB@script workspace I'm not sure why a git checkout would be necessary.
          Is there anyway to have the Pipeline just do the checkout in correct workspace of JOB and not JOB@script?
          That should resolve the issues and allow for something like this to work:

          node {
            jobDsl targets: ['jobs/*.groovy'].join('\n'),
              removedJobAction: 'DELETE',
              removedViewAction: 'DELETE',
              lookupStrategy: 'SEED_JOB'
          }
          

          workspace for test is empty
          workspace of test@script has the checked out repo with Jenkinsfile and the jobs subfolder with the groovy scripts to load.

          jenkins@50eeb9a40763:~/workspace$ ls -la *
          test:
          total 8
          drwxr-xr-x 2 jenkins jenkins 4096 Mar  1 18:03 .
          drwxr-xr-x 4 jenkins jenkins 4096 Mar  1 18:03 ..
          
          test@script:
          total 20
          drwxr-xr-x 4 jenkins jenkins 4096 Mar  1 18:19 .
          drwxr-xr-x 4 jenkins jenkins 4096 Mar  1 18:03 ..
          drwxr-xr-x 8 jenkins jenkins 4096 Mar  1 18:19 .git
          -rw-r--r-- 1 jenkins jenkins  835 Mar  1 18:19 Jenkinsfile
          drwxr-xr-x 2 jenkins jenkins 4096 Mar  1 18:03 jobs
          

          test job runs and fails:

          ERROR: no Job DSL script(s) found at jobs/*.groovy
          

          Chris Denneen added a comment - - edited jglick I have referenced this issue from https://issues.jenkins-ci.org/browse/JENKINS-35282 I'm curious... it seems the problem stems from the Pipeline from SCM is checking out as JOB@script vs just JOB workspace. Since the files are already in the JOB@script workspace I'm not sure why a git checkout would be necessary. Is there anyway to have the Pipeline just do the checkout in correct workspace of JOB and not JOB@script? That should resolve the issues and allow for something like this to work: node { jobDsl targets: [ 'jobs/*.groovy' ].join( '\n' ), removedJobAction: 'DELETE' , removedViewAction: 'DELETE' , lookupStrategy: 'SEED_JOB' } workspace for test is empty workspace of test@script has the checked out repo with Jenkinsfile and the jobs subfolder with the groovy scripts to load. jenkins@50eeb9a40763:~/workspace$ ls -la * test: total 8 drwxr-xr-x 2 jenkins jenkins 4096 Mar 1 18:03 . drwxr-xr-x 4 jenkins jenkins 4096 Mar 1 18:03 .. test@script: total 20 drwxr-xr-x 4 jenkins jenkins 4096 Mar 1 18:19 . drwxr-xr-x 4 jenkins jenkins 4096 Mar 1 18:03 .. drwxr-xr-x 8 jenkins jenkins 4096 Mar 1 18:19 .git -rw-r--r-- 1 jenkins jenkins 835 Mar 1 18:19 Jenkinsfile drwxr-xr-x 2 jenkins jenkins 4096 Mar 1 18:03 jobs test job runs and fails: ERROR: no Job DSL script(s) found at jobs/*.groovy

          Code changed in jenkins
          User: Jesse Glick
          Path:
          pom.xml
          src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java
          src/main/resources/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition/config.jelly
          src/main/resources/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition/help-lightweight.html
          src/test/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinitionTest.java
          http://jenkins-ci.org/commit/workflow-cps-plugin/5960d4bb4d84e704d1154d9623ee04c269889506
          Log:
          JENKINS-33273 Allow lightweight checkouts to be used from CpsScmFlowDefinition.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: pom.xml src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java src/main/resources/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition/config.jelly src/main/resources/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition/help-lightweight.html src/test/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinitionTest.java http://jenkins-ci.org/commit/workflow-cps-plugin/5960d4bb4d84e704d1154d9623ee04c269889506 Log: JENKINS-33273 Allow lightweight checkouts to be used from CpsScmFlowDefinition.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java
          src/main/resources/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition/config.jelly
          src/main/resources/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition/help-lightweight.html
          src/test/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinitionTest.java
          http://jenkins-ci.org/commit/workflow-cps-plugin/89b4e8df3a9f417283e942de0d837a7e2c50367d
          Log:
          Merge pull request #97 from jglick/SCMFileSystem-JENKINS-33273

          JENKINS-33273 Allow lightweight checkouts to be used from CpsScmFlowDefinition

          Compare: https://github.com/jenkinsci/workflow-cps-plugin/compare/6adb71c47c2e...89b4e8df3a9f

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java src/main/resources/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition/config.jelly src/main/resources/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition/help-lightweight.html src/test/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinitionTest.java http://jenkins-ci.org/commit/workflow-cps-plugin/89b4e8df3a9f417283e942de0d837a7e2c50367d Log: Merge pull request #97 from jglick/SCMFileSystem- JENKINS-33273 JENKINS-33273 Allow lightweight checkouts to be used from CpsScmFlowDefinition Compare: https://github.com/jenkinsci/workflow-cps-plugin/compare/6adb71c47c2e...89b4e8df3a9f

          Hi.

          We are badly affected by this problem (our git repo is 2GB) and it is still present for us even though we're on an up to date Jenkins.

          Here is a transcript of the first build for a PR:

          Branch event
          21:46:11 Connecting to https://api.github.com using navaati/****** (navaati github superpower token)
          Checking out git https://github.com/H3IO/HyperCube.git https://github.com/H3IO/HyperCube.git to read Jenkinsfile
          Cloning the remote Git repository
          Cloning repository https://github.com/H3IO/HyperCube.git
           > git init /var/lib/jenkins/workspace/H3IO_HyperCube_PR-2456-QRWOPV3KS7P4NT653VHEA3YTIWGDI2HVFN6AWXB4RVJQQDYPR3UA@script # timeout=10
          Fetching upstream changes from https://github.com/H3IO/HyperCube.git
           > git --version # timeout=10
          using GIT_ASKPASS to set credentials navaati github superpower token
           > git fetch --tags --progress https://github.com/H3IO/HyperCube.git +refs/heads/*:refs/remotes/origin/*
           > git config remote.origin.url https://github.com/H3IO/HyperCube.git # timeout=10
           > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
           > git config remote.origin.url https://github.com/H3IO/HyperCube.git # timeout=10
          Fetching upstream changes from https://github.com/H3IO/HyperCube.git
          using GIT_ASKPASS to set credentials navaati github superpower token
           > git fetch --tags --progress https://github.com/H3IO/HyperCube.git +refs/heads/*:refs/remotes/origin/*
           > git config remote.origin1.url https://github.com/H3IO/HyperCube.git # timeout=10
          Fetching upstream changes from https://github.com/H3IO/HyperCube.git
          using GIT_ASKPASS to set credentials navaati github superpower token
           > git fetch --tags --progress https://github.com/H3IO/HyperCube.git +refs/pull/*/head:refs/remotes/origin/pr/*
          Merging master commit eb08b54b49658077f34f3a5e71010ae9ccbbf879 into PR head commit a674225cb3cbd2dc4a126996a50d73262d5132f4
           > git config core.sparsecheckout # timeout=10
           > git checkout -f a674225cb3cbd2dc4a126996a50d73262d5132f4
           > git merge eb08b54b49658077f34f3a5e71010ae9ccbbf879 # timeout=10
           > git rev-parse HEAD^{commit} # timeout=10
          Merge succeeded, producing a674225cb3cbd2dc4a126996a50d73262d5132f4
          Checking out Revision a674225cb3cbd2dc4a126996a50d73262d5132f4 (PR-2456)
           > git config core.sparsecheckout # timeout=10
           > git checkout -f a674225cb3cbd2dc4a126996a50d73262d5132f4

          As you can see a full fetch is still done, in a new directory for each PR.
          Do I need to do something in particular for this fix to take effect ?

          Regards,
          Léo Gillot-Lamure.

          Léo Gillot-Lamure added a comment - Hi. We are badly affected by this problem (our git repo is 2GB) and it is still present for us even though we're on an up to date Jenkins. Here is a transcript of the first build for a PR: Branch event 21:46:11 Connecting to https://api.github.com using navaati/****** (navaati github superpower token) Checking out git https://github.com/H3IO/HyperCube.git https://github.com/H3IO/HyperCube.git to read Jenkinsfile Cloning the remote Git repository Cloning repository https://github.com/H3IO/HyperCube.git > git init /var/lib/jenkins/workspace/H3IO_HyperCube_PR-2456-QRWOPV3KS7P4NT653VHEA3YTIWGDI2HVFN6AWXB4RVJQQDYPR3UA@script # timeout=10 Fetching upstream changes from https://github.com/H3IO/HyperCube.git > git --version # timeout=10 using GIT_ASKPASS to set credentials navaati github superpower token > git fetch --tags --progress https://github.com/H3IO/HyperCube.git +refs/heads/*:refs/remotes/origin/* > git config remote.origin.url https://github.com/H3IO/HyperCube.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/H3IO/HyperCube.git # timeout=10 Fetching upstream changes from https://github.com/H3IO/HyperCube.git using GIT_ASKPASS to set credentials navaati github superpower token > git fetch --tags --progress https://github.com/H3IO/HyperCube.git +refs/heads/*:refs/remotes/origin/* > git config remote.origin1.url https://github.com/H3IO/HyperCube.git # timeout=10 Fetching upstream changes from https://github.com/H3IO/HyperCube.git using GIT_ASKPASS to set credentials navaati github superpower token > git fetch --tags --progress https://github.com/H3IO/HyperCube.git +refs/pull/*/head:refs/remotes/origin/pr/* Merging master commit eb08b54b49658077f34f3a5e71010ae9ccbbf879 into PR head commit a674225cb3cbd2dc4a126996a50d73262d5132f4 > git config core.sparsecheckout # timeout=10 > git checkout -f a674225cb3cbd2dc4a126996a50d73262d5132f4 > git merge eb08b54b49658077f34f3a5e71010ae9ccbbf879 # timeout=10 > git rev-parse HEAD^{commit} # timeout=10 Merge succeeded, producing a674225cb3cbd2dc4a126996a50d73262d5132f4 Checking out Revision a674225cb3cbd2dc4a126996a50d73262d5132f4 (PR-2456) > git config core.sparsecheckout # timeout=10 > git checkout -f a674225cb3cbd2dc4a126996a50d73262d5132f4 As you can see a full fetch is still done, in a new directory for each PR. Do I need to do something in particular for this fix to take effect ? Regards, Léo Gillot-Lamure.

          Peter Wiseman added a comment -

          When I tried to validate this it appeared to be only resolved for the Pipeline job, not the Multibranch Pipeline.  And I need it for Subversion which hasn't been implemented.  As this issue is marked as resolve, I expect new issues will need to be created for those additional bits?

          Peter Wiseman added a comment - When I tried to validate this it appeared to be only resolved for the Pipeline job, not the Multibranch Pipeline.  And I need it for Subversion which hasn't been implemented.  As this issue is marked as resolve, I expect new issues will need to be created for those additional bits?

          Shannon Kerr added a comment - - edited

          We are blocked by this as well.  Using Subversion with a working copy that is ~6.5G.  If we use the "Jenkins" sub-directory work-around, I think we'll lose the change tracking from build to build, which would be a big loss.  Also, we want to be able to share a workspace between product branches (Client, Server, Web etc), so all Server branch builds share a common workspace, Client in a shared workspace etc.  We don't want each Client branch to have a full 6.5G workspace + the build artifacts.  If we go this shared workspace route, we need to be able to throttle Client branch builds so we only have one build at a time (per slave).

          Shannon Kerr added a comment - - edited We are blocked by this as well.  Using Subversion with a working copy that is ~6.5G.  If we use the "Jenkins" sub-directory work-around, I think we'll lose the change tracking from build to build, which would be a big loss.  Also, we want to be able to share a workspace between product branches (Client, Server, Web etc), so all Server branch builds share a common workspace, Client in a shared workspace etc.  We don't want each Client branch to have a full 6.5G workspace + the build artifacts.  If we go this shared workspace route, we need to be able to throttle Client branch builds so we only have one build at a time (per slave).

          Jesse Glick added a comment -

          Please do not reopen.

          Jesse Glick added a comment - Please do not reopen.

          Jesse Glick added a comment -

          First of all, make sure you have actually updated all plugins, and check release notes.

          Second, there is no implementation yet for Subversion. That would be an RFE for subversion-plugin. AFAIK no one is actively developing that plugin.

          Third, there is no implementation currently for github-branch-source-plugin in the case of a PR job configured to merge with the base branch, as GitHub does not offer an API dedicated to this purpose. In the case that the merge can be assumed to be a fast-forward (there is no base branch change subsequent to the common ancestor), this plugin could in principle load content via API from the PR branch; it would still need to fall back to full checkout and Git merge otherwise. Again this would be an RFE for that plugin.

          Jesse Glick added a comment - First of all, make sure you have actually updated all plugins, and check release notes. Second, there is no implementation yet for Subversion. That would be an RFE for subversion-plugin . AFAIK no one is actively developing that plugin. Third, there is no implementation currently for github-branch-source-plugin in the case of a PR job configured to merge with the base branch, as GitHub does not offer an API dedicated to this purpose. In the case that the merge can be assumed to be a fast-forward (there is no base branch change subsequent to the common ancestor), this plugin could in principle load content via API from the PR branch; it would still need to fall back to full checkout and Git merge otherwise. Again this would be an RFE for that plugin.

          I'm affected by this problem too (our git repo is ~300GB and ~ 200+ pull requests)
          When it run indexing, it scan repository and trigger jobs for new PR. It kill the master after 50+ git processes.
          If I add property "suppresses the normal SCM commit trigger coming from branch indexing", it also block the trigger job from web hook.
          Is it possible enable webhook and don't run PR jobs from indexing?

          Viacheslav Dubrovskyi added a comment - I'm affected by this problem too (our git repo is ~300GB and ~ 200+ pull requests) When it run indexing, it scan repository and trigger jobs for new PR. It kill the master after 50+ git processes. If I add property "suppresses the normal SCM commit trigger coming from branch indexing", it also block the trigger job from web hook. Is it possible enable webhook and don't run PR jobs from indexing?

          dubrsl The issue was that it was done several times; it's not the case anymore.

          At one point or another, the system has to use git and create a local repo. It's done in a cache folder located in the jenkins folder, maybe you can reuse an existing one.

          But, seriously, 300GB? Our 6GB repo feels wrong, but we have ways to work around that. But One third of a tera, seriously? I'd suggest you look into dependency management systems.

           

          Christophe Carpentier added a comment - dubrsl The issue was that it was done several times; it's not the case anymore. At one point or another, the system has to use git and create a local repo. It's done in a cache folder located in the jenkins folder, maybe you can reuse an existing one. But, seriously, 300GB? Our 6GB repo feels wrong, but we have ways to work around that. But One third of a tera, seriously? I'd suggest you look into dependency management systems.  

          It's case. It run git in parallel for all detected PR and it's the problem.

          Yes, we use monolithic repository. Don't ask why. Yes, we need keep some third party libraries

          Still need some solution. How can I remove PR-* without run indexing?

          Viacheslav Dubrovskyi added a comment - It's case. It run git in parallel for all detected PR and it's the problem. Yes, we use monolithic repository. Don't ask why. Yes, we need keep some third party libraries Still need some solution. How can I remove PR-* without run indexing?

           We had a similar problem. What I ended up was cleaning the repository with all 3rd party libraries with BFG Repo-Cleaner and then added a sub module with all those 3rd party libraries. This has the benefit that the original repo is small and (when checking out) only the latest version of the sub modules needs to be pulled and not the entire history.

          Roman Bäriswyl added a comment -  We had a similar problem. What I ended up was cleaning the repository with all 3rd party libraries with BFG Repo-Cleaner and then added a sub module with all those 3rd party libraries. This has the benefit that the original repo is small and (when checking out) only the latest version of the sub modules needs to be pulled and not the entire history.

          Jesse Glick added a comment -

          Please stick to the user list (or stackoverflow.com, #jenkins IRC, etc.) when discussing tips and usage questions.

          Jesse Glick added a comment - Please stick to the user list (or stackoverflow.com, #jenkins IRC, etc.) when discussing tips and usage questions.

          trejkaz added a comment -

          If we're saying this is fixed, can I assume that the 30 second wait time to check each Jenkinsfile is some other performance issue which is tracked in another ticket?

           

          trejkaz added a comment - If we're saying this is fixed, can I assume that the 30 second wait time to check each Jenkinsfile is some other performance issue which is tracked in another ticket?  

          Steve Berube added a comment -

          Would love to see this fixed too. One of our repos is gigantic and the prep-phase takes forever just to get a single jenkinsfile.

           

          Steve Berube added a comment - Would love to see this fixed too. One of our repos is gigantic and the prep-phase takes forever just to get a single jenkinsfile.  

          Steve Berube added a comment - - edited

          The issue appears to be fixed on main branches, however pull requests still appear to be checking out the entire repo vs just getting the Jenkinsfile.

           

          e.g. Pull Request.

          Pull request #14924 opened05:15:28 Connecting to https://github.houston.softwaregrp.net/api/v3 using steve-berube/****** (GITHUB Service Account (Using Steve Berube))Checking out git 
          https://github.houston.softwaregrp.net/CSA/csa.git
          into /var/jenkins_home/workspace/A_CSA-PIPELINE_csa_PR-14924-Z4QJL7VGNYZFORO2QDHYKCLF6JHNT2KX6T7GPPU2XP5ECHCOF7EQ@script to read Jenkinsfile
          Cloning the remote Git repository
          

           

          E.g. Non-pull request.

          originally caused by:
          Push event to branch v04.93.00022:58:31 Connecting to https://github.houston.softwaregrp.net/api/v3 using steve-berube/****** (GITHUB Service Account (Using Steve Berube))Obtained Jenkinsfile from 2bfe852c7aad46b7dc90ffb6e53c2b177f07ae00
          Running in Durability level: MAX_SURVIVABILITY
          

           

          Is this a limitation or a defect?

           

          Steve Berube added a comment - - edited The issue appears to be fixed on main branches, however pull requests still appear to be checking out the entire repo vs just getting the Jenkinsfile.   e.g. Pull Request. Pull request #14924 opened05:15:28 Connecting to https: //github.houston.softwaregrp.net/api/v3 using steve-berube/****** (GITHUB Service Account (Using Steve Berube))Checking out git  https: //github.houston.softwaregrp.net/CSA/csa.git into / var /jenkins_home/workspace/A_CSA-PIPELINE_csa_PR-14924-Z4QJL7VGNYZFORO2QDHYKCLF6JHNT2KX6T7GPPU2XP5ECHCOF7EQ@script to read Jenkinsfile Cloning the remote Git repository   E.g. Non-pull request. originally caused by: Push event to branch v04.93.00022:58:31 Connecting to https: //github.houston.softwaregrp.net/api/v3 using steve-berube/****** (GITHUB Service Account (Using Steve Berube))Obtained Jenkinsfile from 2bfe852c7aad46b7dc90ffb6e53c2b177f07ae00 Running in Durability level: MAX_SURVIVABILITY   Is this a limitation or a defect?  

          Steve Berube added a comment -

          Steve Berube added a comment - seems a limitation: https://support.cloudbees.com/hc/en-us/articles/115002991272-Why-is-my-multibranch-project-cloning-the-whole-repository-on-the-master-  

          Steve Berube added a comment -

          One more update. If you configure your PR strategy to be Build Pull Request Revision, this works around the issue and it can read the jenkinsfile via the API.

           

          Steve Berube added a comment - One more update. If you configure your PR strategy to be Build Pull Request Revision, this works around the issue and it can read the jenkinsfile via the API.  

          Adam Bialas added a comment -

          I configured my PR strategy to Build Pull Request Revision. However, it throws an exception:
          ERROR: Could not do lightweight checkout, falling back to heavyweight
          java.io.FileNotFoundException: URL: /rest/api/1.0/projects/sample/repos/sample-repo/browse/Jenkinsfile?at=PR-62&start=0&limit=500
          and switches to heavy checkout.

          Adam Bialas added a comment - I configured my PR strategy to Build Pull Request Revision. However, it throws an exception: ERROR: Could not do lightweight checkout, falling back to heavyweight java.io.FileNotFoundException: URL: /rest/api/1.0/projects/sample/repos/sample-repo/browse/Jenkinsfile?at=PR-62&start=0&limit=500 and switches to heavy checkout.

          Jesse Glick added a comment -

          This issue is closed. Please file separate issues with complete steps to reproduce from scratch if you observe any issues using the latest releases of all applicable software that are not already tracked in JIRA.

          Jesse Glick added a comment - This issue is closed. Please file separate issues with complete steps to reproduce from scratch if you observe any issues using the latest releases of all applicable software that are not already tracked in JIRA.

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            59 Vote for this issue
            Watchers:
            91 Start watching this issue

              Created:
              Updated:
              Resolved: