Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-43894

Environment variables not resolved in Pipeline SCM -> Advanced clone behaviours -> Path of the reference repo

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Major Major
    • git-plugin
    • Jenkins 2.89.3, workflow-job 2.18-SNAPSHOT

      Environment variables are not resolved in 'Advanced clone behaviours' -> 'Path of the reference repo to use during clone' when configuring git reference repository in the Pipeline SCM section.

      Steps to reproduce:

      • Create pipeline project
      • In configure: fill in repository URL, Branch Specifier and Script Path (Jenkinsfile).
      • In 'Additional Behaviours' select 'Advanced Clone Behaviours'. Set 'Path of the reference repo to use during clone' to some value depending on an environment variable, e.g. ${HOME}/jenkins/reference/big-git-repo.git
      • At URL, create a git repository with Jenkinsfile. Put 'node('master') { checkout scm }' in the Jenkinsfile.
      • Run the job. It will report: 
        Cloning repository ...URL...
         > git init ... # timeout=10
        ERROR: Reference path does not exist: ${HOME}/jenkins/reference/big-git-repo.git

      Would be very handy to have this feature. In my case, I want to use the 'checkout scm' shortcut in a pipeline script that runs several builds in parallel on different platforms. So I want to specify where a reference repository exists on a specific node. For instance, on a Windows node I want to put it in D:\jenkins\reference, on a Mac – to /Users/jenkins/reference and, finally, on linux – /home/jenkins/reference. I could set an environment variable ${REFERENCE} to the above values in node configuration and than use it in 'Path of the reference repo to use during clone' as ${REFERENCE}.

       

          [JENKINS-43894] Environment variables not resolved in Pipeline SCM -> Advanced clone behaviours -> Path of the reference repo

          Originally, I asked about this behaviour here: https://issues.jenkins-ci.org/browse/JENKINS-28447

          Was told to submit an RFE for git plugin.

          Anna Tikhonova added a comment - Originally, I asked about this behaviour here: https://issues.jenkins-ci.org/browse/JENKINS-28447 Was told to submit an RFE for git plugin.

          Mark Waite added a comment -

          As another alternate, since you're using pipeline, the git plugin could be extended to automatically use a reference repository. The mercurial plugin uses that technique to reduce duplication on disc.

          Mark Waite added a comment - As another alternate, since you're using pipeline, the git plugin could be extended to automatically use a reference repository. The mercurial plugin uses that technique to reduce duplication on disc.

          markewaite Could you please clarify a bit how it works in the Mercurial plugin? I've no experience with Mercurial and can't find anything resembling reference repositories implementation in the source code.

          Anna Tikhonova added a comment - markewaite Could you please clarify a bit how it works in the Mercurial plugin? I've no experience with Mercurial and can't find anything resembling reference repositories implementation in the source code.

          Mark Waite added a comment -

          I believe Jesse Glick described that the SCMFileSystem implementation in the Mercurial plugin uses the initial checkout of the repository (for polling purposes) as a cache for other copies on the disc. The concept was what was interesting to me, more than the specifics of the implementation in the Mercurial plugin.

          It will be a different implementation with the git plugin, since git's concept of a reference repository is something that Mercurial does not have (as far as I know).

          Mark Waite added a comment - I believe Jesse Glick described that the SCMFileSystem implementation in the Mercurial plugin uses the initial checkout of the repository (for polling purposes) as a cache for other copies on the disc. The concept was what was interesting to me, more than the specifics of the implementation in the Mercurial plugin. It will be a different implementation with the git plugin, since git's concept of a reference repository is something that Mercurial does not have (as far as I know).

          markewaite Ah, I see. Thanks! Having an automatic cache for git repositories would be totally cool for Multi-branch pipelines. Last time I checked this project type (a long time ago), it created a new clone for each branch, which is not good for large repositories. Maybe it's been fixed somehow, dunno. Would be nice to have it as well.

          This issue is unassigned right now. Do I have any chance to have it fixed soon?

          Anna Tikhonova added a comment - markewaite Ah, I see. Thanks! Having an automatic cache for git repositories would be totally cool for Multi-branch pipelines. Last time I checked this project type (a long time ago), it created a new clone for each branch, which is not good for large repositories. Maybe it's been fixed somehow, dunno. Would be nice to have it as well. This issue is unassigned right now. Do I have any chance to have it fixed soon?

          Mark Waite added a comment -

          I won't be working on this for a long time. My focus is on evaluating and including the pull requests submitted by others, while tracking bug reports and resolving bugs.

          Mark Waite added a comment - I won't be working on this for a long time. My focus is on evaluating and including the pull requests submitted by others, while tracking bug reports and resolving bugs.

          markewaite I have an initial implementation of what you suggested. It's very short and simple. It works with certain limitations/assumptions (single SCM is used, all nodes are online, slave workspaces do not exist before the first run, etc).

          I've extended GitSCM to use the same caches implemented in GitSCMSource for GitSCMFileSystem on master and manage similar caches on all slaves. Could you please have a look at it and advise me on improvements, so that we can have it in the upstream someday?

          Surely I will remove the limitations. What bothers me is architectural decisions. E.g sharing caches with GitSCM might require refactoring the cache implementation out of AbstractGitSCMSource, which means some invasion into GitSCMFileSystem. It's not absolutely necessary because of the implementation details (cache entry is calculated from remote URL). But it might be a better implementation architecture wise. I'm willing to do it the best way.

          The implementation: https://github.com/atikhono/git-plugin

          Anna Tikhonova added a comment - markewaite  I have an initial implementation of what you suggested. It's very short and simple. It works with certain limitations/assumptions (single SCM is used, all nodes are online, slave workspaces do not exist before the first run, etc). I've extended GitSCM to use the same caches implemented in GitSCMSource for GitSCMFileSystem on master and manage similar caches on all slaves. Could you please have a look at it and advise me on improvements, so that we can have it in the upstream someday? Surely I will remove the limitations. What bothers me is architectural decisions. E.g sharing caches with GitSCM might require refactoring the cache implementation out of AbstractGitSCMSource, which means some invasion into GitSCMFileSystem. It's not absolutely necessary because of the implementation details (cache entry is calculated from remote URL). But it might be a better implementation architecture wise. I'm willing to do it the best way. The implementation: https://github.com/atikhono/git-plugin

          Mark Waite added a comment -

          atikhono thanks very much! I will review it.

          Mark Waite added a comment - atikhono thanks very much! I will review it.

          Jesse Glick added a comment -

          Actually no. The linked PR is implementing something quite unrelated: master-based caching. The stated bug here (really an RFE) is to expand environment variables in one Git extension.

          Jesse Glick added a comment - Actually no. The linked PR is implementing something quite unrelated: master-based caching. The stated bug here (really an RFE) is to expand environment variables in one Git extension.

          Jesse Glick added a comment -

          Now the root desire here may have been master-based caching. That is what markewaite suggested as an alternative path to the solution you were working on. But the stated issue here is that a certain behavior does not expand variables, which would have been necessary for you to use your original workaround. So the JIRA situation here is all messed up.

          Jesse Glick added a comment - Now the root desire here may have been master-based caching. That is what markewaite suggested as an alternative path to the solution you were working on. But the stated issue here is that a certain behavior does not expand variables, which would have been necessary for you to use your original workaround. So the JIRA situation here is all messed up.

          Jesse Glick added a comment -

          JENKINS-44729 might better reflect what the posted PR is about.

          Jesse Glick added a comment - JENKINS-44729  might better reflect what the posted PR is about.

          Anna Tikhonova added a comment - - edited

          I've done a preliminary investigation of the original issue. Reference path expansion is done in hudson.plugins.git.extensions.impl.CloneOption.decorateCloneCommand:

          cmd.reference(build.getEnvironment(listener).expand(reference))
          

          TaskListener doesn't get any other environment variables but:

            BUILD_DISPLAY_NAME
            BUILD_ID
            BUILD_NUMBER
            BUILD_TAG
            BUILD_URL
            CLASSPATH
            HUDSON_HOME
            HUDSON_SERVER_COOKIE
            HUDSON_URL
            JENKINS_HOME
            JENKINS_SERVER_COOKIE
            JENKINS_URL
            JOB_BASE_NAME
            JOB_DISPLAY_URL
            JOB_NAME
            JOB_URL
            RUN_CHANGES_DISPLAY_URL
            RUN_DISPLAY_URL

          So if I reconfigure the job with JENKINS_HOME instead of HOME in 'Path of the reference repo to use during clone', it will expand that path properly.

          Anna Tikhonova added a comment - - edited I've done a preliminary investigation of the original issue. Reference path expansion is done in hudson.plugins.git.extensions.impl.CloneOption.decorateCloneCommand: cmd.reference(build.getEnvironment(listener).expand(reference)) TaskListener doesn't get any other environment variables but: BUILD_DISPLAY_NAME BUILD_ID BUILD_NUMBER BUILD_TAG BUILD_URL CLASSPATH HUDSON_HOME HUDSON_SERVER_COOKIE HUDSON_URL JENKINS_HOME JENKINS_SERVER_COOKIE JENKINS_URL JOB_BASE_NAME JOB_DISPLAY_URL JOB_NAME JOB_URL RUN_CHANGES_DISPLAY_URL RUN_DISPLAY_URL So if I reconfigure the job with JENKINS_HOME instead of HOME in 'Path of the reference repo to use during clone', it will expand that path properly.

          Patrick Ruckstuhl added a comment - - edited

          As none of those variables provide us with something that can be configure on a slave level, this is what we did as an (ugly) workaround. We define an environment variable called REFERENCE_REPO on each slave and the master.

          Inside our jenkins pipeline library we create a step called checkoutCurrent

          import java.util.regex.Pattern
          import java.lang.reflect.Field
          import java.lang.reflect.Modifier
          
          // checkout current scm and resolve the REFERENCE_REPO environment variable, workaround for JENKINS-43894
          def call(originalScm) {
          	Field field = null
          	def updatedExtension = null
          	def originalReference = null;
          
          	try {
          		if (originalScm.hasProperty('extensions') && originalScm.extensions) {
          			print('has extensions')
          			def extensions = originalScm.extensions
          
          			for (int i = 0; i < extensions.size(); i++) {
          				def extension = extensions[i]
          				if (extension.hasProperty('reference') && extension.reference instanceof String) {
          					updatedExtension = extension
          					originalReference = extension.reference
          					print('replacing reference: ' + originalReference)
          					def reference = originalReference.replaceAll(Pattern.quote('${REFERENCE_REPO}'), env.REFERENCE_REPO)
          					print('with: ' + reference)
          
          					// https://gist.github.com/pditommaso/263721865d84dee6ebaf
          					field = extension.class.getDeclaredField("reference")
          					Field modifiersField = Field.class.getDeclaredField("modifiers")
          					modifiersField.setAccessible(true)
          					modifiersField.setInt(field, field.getModifiers() & ~Modifier.FINAL)
          					field.setAccessible(true)
          					field.set(extension, reference)
          				}
          			}
          		}
          		checkout changelog: false, poll: false, scm: originalScm
          	} finally {
          		if (field) {
          			print('reset reference to: ' + originalReference)
          			field.set(updatedExtension, originalReference)
          		}
          	}
          }
          
          

          and then in the pipeline instead of

          checkout changelog: false, poll: false, scm: scm
          

          we do

          checkoutCurrent(scm)
          

          Patrick Ruckstuhl added a comment - - edited As none of those variables provide us with something that can be configure on a slave level, this is what we did as an (ugly) workaround. We define an environment variable called REFERENCE_REPO on each slave and the master. Inside our jenkins pipeline library we create a step called checkoutCurrent import java.util.regex.Pattern import java.lang.reflect.Field import java.lang.reflect.Modifier // checkout current scm and resolve the REFERENCE_REPO environment variable, workaround for JENKINS-43894 def call(originalScm) { Field field = null def updatedExtension = null def originalReference = null ; try { if (originalScm.hasProperty( 'extensions' ) && originalScm.extensions) { print( 'has extensions' ) def extensions = originalScm.extensions for ( int i = 0; i < extensions.size(); i++) { def extension = extensions[i] if (extension.hasProperty( 'reference' ) && extension.reference instanceof String ) { updatedExtension = extension originalReference = extension.reference print( 'replacing reference: ' + originalReference) def reference = originalReference.replaceAll(Pattern.quote( '${REFERENCE_REPO}' ), env.REFERENCE_REPO) print( 'with: ' + reference) // https://gist.github.com/pditommaso/263721865d84dee6ebaf field = extension. class. getDeclaredField( "reference" ) Field modifiersField = Field. class. getDeclaredField( "modifiers" ) modifiersField.setAccessible( true ) modifiersField.setInt(field, field.getModifiers() & ~Modifier.FINAL) field.setAccessible( true ) field.set(extension, reference) } } } checkout changelog: false , poll: false , scm: originalScm } finally { if (field) { print( 'reset reference to: ' + originalReference) field.set(updatedExtension, originalReference) } } } and then in the pipeline instead of checkout changelog: false , poll: false , scm: scm we do checkoutCurrent(scm)

          tario So 'scm' extensions can be overridden actually... awesome tip, thanks! I have submitted a PR to git-plugin some time ago: https://github.com/jenkinsci/git-plugin/pull/575. No one who can merge seems to have time / be interested in it. I have just built the plugin with my patch and installed it in my instance.

          Anna Tikhonova added a comment - tario  So 'scm' extensions can be overridden actually... awesome tip, thanks! I have submitted a PR to git-plugin some time ago: https://github.com/jenkinsci/git-plugin/pull/575.  No one who can merge seems to have time / be interested in it. I have just built the plugin with my patch and installed it in my instance.

          Updated my code a little bit, now it will revert the changes after doing the checkout (as otherwise this modified the general config I had in the job).

          Would be really nice to get your PR merged.

          Patrick Ruckstuhl added a comment - Updated my code a little bit, now it will revert the changes after doing the checkout (as otherwise this modified the general config I had in the job). Would be really nice to get your PR merged.

          Steven Foster added a comment -

          Does this only work on environment variables configured on the node in jenkins? and so wouldn't work for non-static agents i.e. launched from cloud provider?

           

          Or does it work on environment variables set on the agent itself?

          Steven Foster added a comment - Does this only work on environment variables configured on the node in jenkins? and so wouldn't work for non-static agents i.e. launched from cloud provider?   Or does it work on environment variables set on the agent itself?

          stevenfoster For cloud slaves you will need to support setting environment variables via Node Properties in Jenkins. That should be implemented in a plugin that provides those slaves.

          The patch above should work for environment variables set on an agent.

          Anna Tikhonova added a comment - stevenfoster  For cloud slaves you will need to support setting environment variables via Node Properties in Jenkins. That should be implemented in a plugin that provides those slaves. The patch above should work for environment variables set on an agent.

          As far as I can tell the fix for this https://github.com/jenkinsci/git-plugin/pull/575 was merged to master, so should be in the next 4.0 based release.

          Patrick Ruckstuhl added a comment - As far as I can tell the fix for this https://github.com/jenkinsci/git-plugin/pull/575 was merged to master, so should be in the next 4.0 based release.

          Jesse Glick added a comment -

          By the way the use case described in the issue description seems like a bad idea to me. More straightforward to have the script determine the Git commit being built, and then rather than running checkout scm on each node, explicitly construct a GitSCM configured exactly the way you want, according to whatever Groovy logic you need. The kind of plugin-based environment variable expansion seen in this PR should only be necessary in freestyle projects.

          Jesse Glick added a comment - By the way the use case described in the issue description seems like a bad idea to me. More straightforward to have the script determine the Git commit being built, and then rather than running checkout scm on each node, explicitly construct a GitSCM configured exactly the way you want, according to whatever Groovy logic you need. The kind of plugin-based environment variable expansion seen in this PR should only be necessary in freestyle projects.

          jglick maybe I'm missing something but to me this looks totally normal to use in pipeline jobs.
          We have some linux slaves and some windows slaves and they each have a different location of the reference repo. I can even think of cases where you have e.g. linux nodes which for whatever reason have different paths to the reference repo. Having this configurable on a slave level and then just use it in the multibranch config is perfect, that way the pipeline does not care about how the checkout happens.

          Patrick Ruckstuhl added a comment - jglick maybe I'm missing something but to me this looks totally normal to use in pipeline jobs. We have some linux slaves and some windows slaves and they each have a different location of the reference repo. I can even think of cases where you have e.g. linux nodes which for whatever reason have different paths to the reference repo. Having this configurable on a slave level and then just use it in the multibranch config is perfect, that way the pipeline does not care about how the checkout happens.

          Exactly that. Reference repo paths, library paths, tool paths: This all are node specific settings which have no place in the actual pipeline description. As an extreme example imagine that your project is published on Github or the like. Do you really want to share with the world the gritty details of your local Jenkins configuration?

          Mathias Hasselmann added a comment - Exactly that. Reference repo paths, library paths, tool paths: This all are node specific settings which have no place in the actual pipeline description. As an extreme example imagine that your project is published on Github or the like. Do you really want to share with the world the gritty details of your local Jenkins configuration?

          Jesse Glick added a comment -

          library paths, tool paths

          These are of course better handled via Docker containers or the like if you can manage it.

          node specific settings which have no place in the actual pipeline description

          I did mean to imply that the Jenkinsfile must list actual filesystem paths on each possible node, merely that the program has some way of determining this information on the fly, preferably not involving Jenkins global configuration. That could be an environment variable set on the agent’s account, etc. The point is that the selection and configuration of a reference repo is something under the control of the job maintainer or Jenkins admin, rather than being baked into a plugin.

          imagine that your project is published on Github

          I do not have to imagine that, it is a daily reality.

          Do you really want to share with the world the gritty details of your local Jenkins configuration?

          In the case of the Jenkins project, in fact we do, but in case you do not, there are various alternatives.

          Jesse Glick added a comment - library paths, tool paths These are of course better handled via Docker containers or the like if you can manage it. node specific settings which have no place in the actual pipeline description I did mean to imply that the Jenkinsfile must list actual filesystem paths on each possible node, merely that the program has some way of determining this information on the fly, preferably not involving Jenkins global configuration. That could be an environment variable set on the agent’s account, etc. The point is that the selection and configuration of a reference repo is something under the control of the job maintainer or Jenkins admin, rather than being baked into a plugin . imagine that your project is published on Github I do not have to imagine that, it is a daily reality. Do you really want to share with the world the gritty details of your local Jenkins configuration? In the case of the Jenkins project, in fact we do , but in case you do not, there are various alternatives .

          D Pasto added a comment -

          Any updates on releasing this?  I verified it does not work on my v4.0 plugin (can resolve $JENKINS_HOME but not $HOME so is useless) and the changelog does not list it for that or v4.1 beta

          We have pipelines across many nodes, including different platforms, so we can't take full advantage of reference repos (the workaround of manually re-defining the checkout for every stage would be maintenance disaster) — we need a way of dynamically building the reference repo path per node per pipeline.

          D Pasto added a comment - Any updates on releasing this?  I verified it does not work on my v4.0 plugin (can resolve $JENKINS_HOME but not $HOME so is useless) and the changelog does not list it for that or v4.1 beta We have pipelines across many nodes, including different platforms, so we can't take full advantage of reference repos (the workaround of manually re-defining the checkout for every stage would be maintenance disaster) — we need a way of dynamically building the reference repo path per node per pipeline.

          Mark Waite added a comment -

          I've proposed automatic repository caching as a Google Summer of Code idea and have offered the pull request as a starting point for consideration.

          Have you tried the alternative that is described by Jesse Glick? It seems like it would work in your case if you perform the environment variable expansion in the Jenkins checkout statement.

          Mark Waite added a comment - I've proposed automatic repository caching as a Google Summer of Code idea and have offered the pull request as a starting point for consideration. Have you tried the alternative that is described by Jesse Glick? It seems like it would work in your case if you perform the environment variable expansion in the Jenkins checkout statement.

          wgc123 did you actually specify HOME as an explicit variable on the node or are you just depending on an existing environment variable. If I remember correctly we're using this with some 4.* version for a while, but always with explicitly specified node variables.

          Patrick Ruckstuhl added a comment - wgc123 did you actually specify HOME as an explicit variable on the node or are you just depending on an existing environment variable. If I remember correctly we're using this with some 4.* version for a while, but always with explicitly specified node variables.

          D Pasto added a comment - - edited

          Good call tario  It works!  I assumed I could just use environment variables that I believe existed, but whether it doesn't really exist, or the plugin only sees a subset of environment variables, it correctly found the reference repo using an env defined in the node!  It's not as clean as an automatically defined variable would be, but it looks like it works!  We have a good sized repo so even though we're on-prem, I was seeing 2+ minute fetches for those stages where the hard-coded path was wrong, but this brings it down to 15 seconds in my first few runs

          Edit: forgot the most important part - this lets me define a single path (using forward slashes) that works on both Linux and Windows nodes.

          D Pasto added a comment - - edited Good call tario   It works!  I assumed I could just use environment variables that I believe existed, but whether it doesn't really exist, or the plugin only sees a subset of environment variables, it correctly found the reference repo using an env defined in the node!  It's not as clean as an automatically defined variable would be, but it looks like it works!  We have a good sized repo so even though we're on-prem, I was seeing 2+ minute fetches for those stages where the hard-coded path was wrong, but this brings it down to 15 seconds in my first few runs Edit: forgot the most important part - this lets me define a single path (using forward slashes) that works on both Linux and Windows nodes.

            atikhono Anna Tikhonova
            atikhono Anna Tikhonova
            Votes:
            3 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: