Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59806

Want ability to stash and unstash between a parent job and a child job

      Before I explain my use case (which does not work today), I will explain a related use case (which does work today) to help illustrate my point.

      Working Use Case

      The working use case is as follows. We start with the naïve version:

      1. The user starts a Pipeline job against a branch of a fork in GitHub/GitLab.
      2. The job runs many tests in parallel using a parallel step: (unit tests, integration tests, and functional tests, for example). Each branch of the parallel step does a Git checkout of the user's code and then runs the relevant tests.

      The above tests take a while to run, during which the user might push again to their branch on GitHub/GitLab. Our users expect that once they start a Jenkins job, the job will test their code at a specific revision. Since the implementation described above checks out the Git repository in each test branch, this constraint could be violated, as evidenced by the following sequence of events:

      1. The user starts a Pipeline job against a branch of a fork in GitHub/GitLab.
      2. The job clones their Git repository and starts the unit tests. But there is no executor available for the functional tests, so the job blocks in that branch.
      3. The user pushes again to the same branch of their fork in GitHub/GitLab.
      4. The job now enters the branch for the functional tests and clones the Git repository.

      At this point, the job is testing an inconsistent set of code: the parallel branch from step 2 is testing the code before the user pushed the second commit, but the parallel branch from step 4 is testing the code after the user pushed the second commit.

      Since the Git repository is a moving target that is out of our control, we clearly need to "snapshot" the state of the Git repository at the beginning of the job to avoid this scenario. So one might consider using stash at the beginning of the job and then unstash in each branch of the job to unstash the Git repository at the same exact commit before starting each set of tests. We can correct the naïve implementation with this in mind:

      1. The user starts a Pipeline job against a branch of a fork in GitHub/GitLab.
      2. The job checks out the Git repository and uses the stash step to "snapshot" it.
      3. The job runs many tests in parallel using a parallel step: (unit tests, integration tests, and functional tests, for example). Each branch of the parallel step runs unstash to retrieve the snapshot and then runs the relevant tests.

      This solves the problem.


      Now let me describe my use case, where this problem cannot be solved.

      Failing Use Case

      The problem comes when the tests being run in parallel are not run from the same pipeline, but rather from different pipelines altogether (invoked with the build step). There could be a variety of legitimate reasons for this: for example, to do "parallel within parallel" testing (which is not possible within a single pipeline), or to visually separate functional testing or deployment processes that are sufficiently complex to warrant their own full-fledged job. The naïve implementation of this is as follows:

      1. The user starts a Pipeline job against a branch of a fork in GitHub/GitLab.
      2. The parent job starts many child Jenkins jobs (unit tests, integration tests, and functional tests, for example), using the build step. Each child job does a Git checkout of the user's code and then runs the relevant tests.

      This has the same problem as the naïve implementation of the first use case; namely, a user can push to their fork on GitHub/GitLab while the job is running and affect the consistency of the job. However, this problem cannot be solved with stash and unstash. Suppose one tries the following:

      1. The user starts a Pipeline job against a branch of a fork in GitHub/GitLab.
      2. The job checks out the Git repository and uses the stash step to "snapshot" it.
      3. The parent job starts many child Jenkins jobs (unit tests, integration tests, and functional tests, for example), using the build step. Each child job runs unstash to retrieve the snapshot and then runs the relevant tests.

      This does not work, because the stash operation was done in the parent job. The child job does not have access to the stash and fails with ERROR: No such saved stash 'stash'.

          [JENKINS-59806] Want ability to stash and unstash between a parent job and a child job

          Basil Crow added a comment -

          I could see two ways of adding support for the failing use case:

          1. The stash from the parent job isn't deleted until the parent job ends. Presumably each child job has a reference to its parent job. So in the unstash step, if the requested stash is not available in the current job, we could try to find the stash in its parent job recursively until we get to the top of the parent/child hierarchy, failing only if the topmost job does not have the requested stash.
          2. The unstash step could be modified to take in an optional project name and selector (like the copyArtifact step does). When such a project name and selector is provided, the unstash step could unstash from a different job.

          jglick and dnusbaum, am I understanding the problem correctly, or is there some other solution for my use case that I've missed? If I've understood the problem correctly, what do you think of my two ideas for fixing this? Are there any security implications about having access to stashes from other jobs? If either of my ideas sounds acceptable, I could start working on a fix.

          Basil Crow added a comment - I could see two ways of adding support for the failing use case: The stash from the parent job isn't deleted until the parent job ends. Presumably each child job has a reference to its parent job. So in the unstash step, if the requested stash is not available in the current job, we could try to find the stash in its parent job recursively until we get to the top of the parent/child hierarchy, failing only if the topmost job does not have the requested stash. The unstash step could be modified to take in an optional project name and selector (like the copyArtifact step does ). When such a project name and selector is provided, the unstash step could unstash from a different job. jglick and dnusbaum , am I understanding the problem correctly, or is there some other solution for my use case that I've missed? If I've understood the problem correctly, what do you think of my two ideas for fixing this? Are there any security implications about having access to stashes from other jobs? If either of my ideas sounds acceptable, I could start working on a fix.

          Jesse Glick added a comment -

          At this point, the job is testing an inconsistent set of code: the parallel branch from step 2 is testing the code before the user pushed the second commit, but the parallel branch from step 4 is testing the code after the user pushed the second commit.

          True if you are using the git step with, say, a simple branch argument. If you use checkout scm you are guaranteed to get the same commit regardless of when this step runs: the commit ID is fixed when the Jenkinsfile is loaded.

          Jesse Glick added a comment - At this point, the job is testing an inconsistent set of code: the parallel branch from step 2 is testing the code before the user pushed the second commit, but the parallel branch from step 4 is testing the code after the user pushed the second commit. True if you are using the git step with, say, a simple branch argument. If you use checkout scm you are guaranteed to get the same commit regardless of when this step runs: the commit ID is fixed when the Jenkinsfile is loaded.

          Jesse Glick added a comment -

          "parallel within parallel" testing (which is not possible within a single pipeline)

          Sure it is. You can freely nest parallel steps (in Scripted syntax). Blue Ocean will not currently display the results neatly, but that I just consider a limitation in B.O.

          Jesse Glick added a comment - "parallel within parallel" testing (which is not possible within a single pipeline) Sure it is. You can freely nest parallel steps (in Scripted syntax). Blue Ocean will not currently display the results neatly, but that I just consider a limitation in B.O.

          Jesse Glick added a comment -

          The stash from the parent job isn't deleted until the parent job ends.

          This is already true. (In some cases it is kept longer.)

          in the unstash step, if the requested stash is not available in the current job, we could try to find the stash in its parent job recursively

          Perhaps.

          Are there any security implications about having access to stashes from other jobs?

          Yes, and copyartifact security is a mine field.

          The recommendation is to use checkout scm within a single job. If for whatever reason you really need to have nested jobs and use the build step, you have at least two options available:

          • Use def commit = checkout(scm).GIT_COMMIT (IIRC) to save the commit ID, then pass this to git steps in child builds using a string build parameter.
          • Use archiveArtifacts in the parent build to save a checkout (and anything else you want), and copyartifact in the child build to retrieve it. Unlike stashes, copyartifact can freely refer to arbitrary builds of arbitrary projects (subject to certain permission checks). The recommendation is to pass the parent build number as a (string) build parameter to the child build.

          Jesse Glick added a comment - The stash from the parent job isn't deleted until the parent job ends. This is already true. (In some cases it is kept longer.) in the unstash step, if the requested stash is not available in the current job, we could try to find the stash in its parent job recursively Perhaps. Are there any security implications about having access to stashes from other jobs? Yes, and copyartifact security is a mine field. The recommendation is to use checkout scm within a single job. If for whatever reason you really need to have nested jobs and use the build step, you have at least two options available: Use def commit = checkout(scm).GIT_COMMIT (IIRC) to save the commit ID, then pass this to git steps in child builds using a string build parameter. Use archiveArtifacts in the parent build to save a checkout (and anything else you want), and copyartifact in the child build to retrieve it. Unlike stashes, copyartifact can freely refer to arbitrary builds of arbitrary projects (subject to certain permission checks). The recommendation is to pass the parent build number as a (string) build parameter to the child build.

          Basil Crow added a comment -

          Thanks for the reply, and I agree with all your points. I didn't realize the checkout step could return a Map that includes GIT_COMMIT. That definitely sounds like it would work, and I'll give it a shot. I still think the RFE described in this issue might be useful, but without a specific use case there doesn't seem to be a driver for implementing it at present.

          Basil Crow added a comment - Thanks for the reply, and I agree with all your points. I didn't realize the checkout step could return a Map that includes GIT_COMMIT . That definitely sounds like it would work, and I'll give it a shot. I still think the RFE described in this issue might be useful, but without a specific use case there doesn't seem to be a driver for implementing it at present.

            Unassigned Unassigned
            basil Basil Crow
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: