Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47163

Use same workspace or node in multiple stages in pipeline

      A question that always crops up when I try to introduce pipelines in my organisation, is how to use the same workspace across different stages that are run in sequence (but where the later stage may be joined by other stages that are being run in parallel). For example:

      pipeline {
        agent none
        stages {
          stage('first stage') { agent { label 'linux' } steps { ... } }
          stage('parallel stage') {
            parallel {
              stage('second stage') { agent { label 'linux'} steps { ... } }
              stage('other stage') { agent { label 'windows' } steps { ... } }
            }
          }
        }
      }
      

      Here, we would like some way of representing that "second stage" should use the same node, and the same workspace, that was used for "first stage". They are both going to be run on a node labeled "linux", but we would like to force them to use the same node and even the same workspace. The last stage, called "other stage", is probably destined for a different type of node, since it is labeled differently. But it is important that "second stage" and "other stage" be able to run in parallel, if you decide to implement the feature we are requesting.

      Our particular scenario is that we write a cross-platform C++ product, which has conceptually three phases - building, testing and bundling. The build directories are typically quite big, around 10-20 GB spread across the order of 50k-100k files. So stashing or uploading the interesting parts of the workspace as an artifact, and downloading them on another (or the same) node is not something we want to do.

      Moreover, we wouldn't be comfortable doing this anyway, due to technicalities of what we do - there could be minor version differences in libraries on nodes, and there is the topic of debug symbols needing to match the code being executed. etc.

      We have more than one node for each supported platform.

      The current workaround is to put building, testing and bundling into the same stage. This has the following disadvantages:

      • It doesn't make it possible to start building and testing in parallel on another platform as a consequence of just the build going well on the first platform.
      • Feedback to developers about the state of the build is delayed until the tests have also had time to run (which can take some time).
      • The typical visualization of the pipeline will not allow developers to discern the health of the build and test stages separately, since they will be illustrated together by a single node in the graph.

      We would therefore like to request that the declarative pipeline syntax be extended to support some way of expressing that when a node matching some label is selected, it should be the same node as was selected in an earlier stage that used the same label, in the same pipeline.

      Alternatively, the declarative pipeline syntax could have some way of assigning an identifier to the workspace used for the job at one stage in the pipeline, and some way of requiring that same workspace to be present at a later stage in the pipeline - effectively forcing it to be the same node as well.

      See attachments for a simplified version of our current Jenkinsfile, and a diagram showing the kind of pipeline we would like to be make.

          [JENKINS-47163] Use same workspace or node in multiple stages in pipeline

          Is the requirement to run on the same node, or to use the same workspace? Does the External Workspace Manager Plugin address your need?

          Martin d'Anjou added a comment - Is the requirement to run on the same node, or to use the same workspace? Does the External Workspace Manager Plugin address your need?

          Andrew Bayer added a comment -

          This is pretty nontrivial, sadly. It'd need a combination of some behind-the-scenes logic in Declarative that checks the agent used for a particular stage and forces reusing that agent, and an even more complex set of logic to allow for guaranteeing the same workspace is used as well. As deepchip pointed out, External Workspace Manager is probably the best bet for the workspace issue. I'll see if I can come up with something for the agent reuse, but probably won't try to handle the workspace reuse too, since External Workspace Manager is probably a better solution than anything I could come up with in Declarative that didn't make me cry. =)

          Andrew Bayer added a comment - This is pretty nontrivial, sadly. It'd need a combination of some behind-the-scenes logic in Declarative that checks the agent used for a particular stage and forces reusing that agent, and an even more complex set of logic to allow for guaranteeing the same workspace is used as well. As deepchip pointed out, External Workspace Manager is probably the best bet for the workspace issue. I'll see if I can come up with something for the agent reuse, but probably won't try to handle the workspace reuse too, since External Workspace Manager is probably a better solution than anything I could come up with in Declarative that didn't make me cry. =)

          Andrew Bayer added a comment -

          Thinking out loud on how I'd do the agent reuse...

          • Add a new agent type for reusing a previous stage's node, i.e., something like
            agent {
              reuseLabelAgent "previousStageName"
            }
            
            • This would only be allowed on stages, not the top level, which means adding an additional field to agent type descriptors to mark where they're allowed.
            • Validation would need to determine that the stage name provided is one that was defined earlier in the Pipeline than the current stage. We'd also need to make sure that the previous stage was using a label (aka node) or possibly any agent type - no trying to support docker etc. Too painful. I think we'd also want to error out if the previous stage didn't have an explicit agent definition - i.e., if it's just using the top-level agent, there's no actual point in reusing its agent, since that's the default you get if you don't specify an agent on the later stage anyway!
          • When we hit the new reuseLabelAgent type at runtime, we'd need to then find the stage with that name. I'm not 100% what the most efficient way to do that graph query is, but I'm not that worried about it - I know it's possible, and trial and error + code review will give the right answer. =)
          • Once we've got that stage, we need to determine what agent it actually ran on. I think the way we'd do that is to look for the first node step within the previous stage and grab its WorkspaceAction.getNode() value. Again, the exact mechanics of how we do that query, I'm not sure of yet.
          • Now that we've got the name of the agent used on the previous stage, we'd do something like AnyScript but with the name of the agent rather than null.

          So yeah, I think the agent reuse is possible. I want to give it a couple weeks for me to think on it to figure out whether it's actually a good idea to implement, though.

          Andrew Bayer added a comment - Thinking out loud on how I'd do the agent reuse... Add a new agent type for reusing a previous stage's node, i.e., something like agent { reuseLabelAgent "previousStageName" } This would only be allowed on stages, not the top level, which means adding an additional field to agent type descriptors to mark where they're allowed. Validation would need to determine that the stage name provided is one that was defined earlier in the Pipeline than the current stage. We'd also need to make sure that the previous stage was using a label (aka node ) or possibly any agent type - no trying to support docker etc. Too painful. I think we'd also want to error out if the previous stage didn't have an explicit agent definition - i.e., if it's just using the top-level agent , there's no actual point in reusing its agent, since that's the default you get if you don't specify an agent on the later stage anyway! When we hit the new reuseLabelAgent type at runtime, we'd need to then find the stage with that name. I'm not 100% what the most efficient way to do that graph query is, but I'm not that worried about it - I know it's possible, and trial and error + code review will give the right answer. =) Once we've got that stage, we need to determine what agent it actually ran on. I think the way we'd do that is to look for the first node step within the previous stage and grab its WorkspaceAction.getNode() value. Again, the exact mechanics of how we do that query, I'm not sure of yet. Now that we've got the name of the agent used on the previous stage, we'd do something like AnyScript but with the name of the agent rather than null. So yeah, I think the agent reuse is possible. I want to give it a couple weeks for me to think on it to figure out whether it's actually a good idea to implement, though.

          Steven Foster added a comment -

          Thought about this kind of suggestion when starting to work with parallel stages in declarative recently, and with JENKINS-46809 underway. More flexibility in agent use would be very helpful. I think sequential stages in a parallel branch could really benefit from allowing an agent to be assigned to the entire group (as well as post / other directives as a bonus) because it does get very long winded and wasteful to constantly assign and cleanup an agent for each stage if you are looking for node-based parallelism.

          Steven Foster added a comment - Thought about this kind of suggestion when starting to work with parallel stages in declarative recently, and with JENKINS-46809 underway. More flexibility in agent use would be very helpful. I think sequential stages in a parallel branch could really benefit from allowing an agent to be assigned to the entire group (as well as post / other directives as a bonus) because it does get very long winded and wasteful to constantly assign and cleanup an agent for each stage if you are looking for node-based parallelism.

          Andrew Bayer added a comment -

          This'll be part of the benefits of sequential stages.

          Andrew Bayer added a comment - This'll be part of the benefits of sequential stages.

          Is it correct, that the following example cannot be implemented with the current features, as described in https://jenkins.io/blog/2018/07/02/whats-new-declarative-piepline-13x-sequential-stages/ ?

          pipeline {
            agent none
            stages {
              stage("sequential-1") {
                agent { label "linux" }
                stages {
                  stage("first") { }
                  // ...
                }
              }
          
              stage("sequential-2") {
                agent { label "windows" }
                stages {
                  stage("second") { }
                  // ...
                }
              }
          	
              stage ('parallel') {
                parallel {
                  stage("third") {
                    // enforce usage of same linux agent as in "sequential-1"
                  }
                  stage('fourth') {
                    // enforce usage of same windows agent as in "sequential-2"
                  }
                }
              }
            }
          }
          

           If there is a way to enforce the usage of the previously used node/workspace in the stages "third" and "fourth", without giving up the parallelization, I would appreciate your suggestions.

          Thanks!

          Stefan Rademacher added a comment - Is it correct, that the following example cannot be implemented with the current features, as described in  https://jenkins.io/blog/2018/07/02/whats-new-declarative-piepline-13x-sequential-stages/  ? pipeline { agent none stages { stage( "sequential-1" ) { agent { label "linux" } stages { stage( "first" ) { } // ... } } stage( "sequential-2" ) { agent { label "windows" } stages { stage( "second" ) { } // ... } } stage ( 'parallel' ) { parallel { stage( "third" ) { // enforce usage of same linux agent as in "sequential-1" } stage( 'fourth' ) { // enforce usage of same windows agent as in "sequential-2" } } } } }  If there is a way to enforce the usage of the previously used node/workspace in the stages "third" and "fourth", without giving up the parallelization, I would appreciate your suggestions. Thanks!

          Liam Newman added a comment -

          Bulk closing resolved issues.

          Liam Newman added a comment - Bulk closing resolved issues.

            victorbjelkholm Victor Bjelkholm
            harms Kristian Harms
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: