Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38640

Unstash painfully slow for large artifacts (due to compression?)

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • workflow-api-plugin

      When sending a stash across the network, the data transfer can be significantly slower than the underlying network. This seems to be due to some slow on-the-fly compression happening during the transfer. Using the following script, the Unstash phase takes 1-2 seconds for 10MB of /dev/zero, and 49 seconds for 10MB of /dev/urandom. The Stash phase consistently takes 1-2 seconds. This makes me think there's a compression step, since /dev/zero should compress much faster than /dev/urandom.

      node {
          deleteDir()
          stage "Stash"
          sh "dd if=/dev/zero of=data bs=1M count=10"
          stash name: 'build_outputs', includes: 'data'
          sh "date"
          node('lab') {
              deleteDir()
              stage "Unstash"
              unstash 'build_outputs'
              sh "ls -al"
          }
      }
      

      This looks similar to JENKINS-36914, in particular this comment:
      https://issues.jenkins-ci.org/browse/JENKINS-36914?focusedCommentId=268472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-268472

      I'm not calling this a dupe of that because the original text refers to stash only, and this bug seems isolated to unstash. Also this is on x86, not ARM.

          [JENKINS-38640] Unstash painfully slow for large artifacts (due to compression?)

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/resources/org/jenkinsci/plugins/workflow/support/steps/stash/StashStep/help.html
          http://jenkins-ci.org/commit/workflow-basic-steps-plugin/413df48bdcb832261e8fb110150eeb8069e77c33
          Log:
          JENKINS-38640 JENKINS-36914 Warn users to avoid stash/unstash of large files

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/resources/org/jenkinsci/plugins/workflow/support/steps/stash/StashStep/help.html http://jenkins-ci.org/commit/workflow-basic-steps-plugin/413df48bdcb832261e8fb110150eeb8069e77c33 Log: JENKINS-38640 JENKINS-36914 Warn users to avoid stash/unstash of large files

          John Koleszar added a comment -

          Thanks for the workaround. It won't help in my application, as the master can only initiate connections to the slaves via ssh, and no connections can be initiated in the other direction. A shared filesystem via NFS or similar isn't really practical, unfortunately. Stash does what I want, and I'm willing to take some performance hit, but it should be able to do much better than 200KB/s on this hardware. Looking forward to trying out the greedy change to see how that does.

          John Koleszar added a comment - Thanks for the workaround. It won't help in my application, as the master can only initiate connections to the slaves via ssh, and no connections can be initiated in the other direction. A shared filesystem via NFS or similar isn't really practical, unfortunately. Stash does what I want, and I'm willing to take some performance hit, but it should be able to do much better than 200KB/s on this hardware. Looking forward to trying out the greedy change to see how that does.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/resources/org/jenkinsci/plugins/workflow/support/steps/stash/StashStep/help.html
          http://jenkins-ci.org/commit/workflow-basic-steps-plugin/f541cd2cda5f316042d38af556f4160f7e470ccf
          Log:
          Merge pull request #23 from jenkinsci/jglick-stash-docs

          JENKINS-38640 JENKINS-36914 Warn users to avoid stash/unstash of large files

          Compare: https://github.com/jenkinsci/workflow-basic-steps-plugin/compare/95e202bec553...f541cd2cda5f

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/resources/org/jenkinsci/plugins/workflow/support/steps/stash/StashStep/help.html http://jenkins-ci.org/commit/workflow-basic-steps-plugin/f541cd2cda5f316042d38af556f4160f7e470ccf Log: Merge pull request #23 from jenkinsci/jglick-stash-docs JENKINS-38640 JENKINS-36914 Warn users to avoid stash/unstash of large files Compare: https://github.com/jenkinsci/workflow-basic-steps-plugin/compare/95e202bec553...f541cd2cda5f

          Sean Flanigan added a comment -

          The External Workspace Manager plugin seems to require sharing drives between master and slaves, which rules it out in a lot of situations.

          If, instead of stashing, the official recommendation is to use an external repository manager such as Nexus or Artifactory, it would be good to have some pipeline examples. For instance, how to stash the majority of workspace files, whilst letting the repository manager handle the larger files, and still ensuring that the right files are used together throughout the pipeline.

          Sean Flanigan added a comment - The External Workspace Manager plugin seems to require sharing drives between master and slaves, which rules it out in a lot of situations. If, instead of stashing, the official recommendation is to use an external repository manager such as Nexus or Artifactory, it would be good to have some pipeline examples. For instance, how to stash the majority of workspace files, whilst letting the repository manager handle the larger files, and still ensuring that the right files are used together throughout the pipeline.

          John Koleszar added a comment -

          The admonition not to use stash is too strong, in my opinion. I can see plenty of situations where having a repository manager or setting up shared storage are completely unnecessary complexity. The latest release of this plugin performs as I'd expect.

          John Koleszar added a comment - The admonition not to use stash is too strong, in my opinion. I can see plenty of situations where having a repository manager or setting up shared storage are completely unnecessary complexity. The latest release of this plugin performs as I'd expect.

          Oleg Nenashev added a comment -

          There is NO official recommendation to avoid stash/unstash. I've just mentioned it as a possible workaround

          Oleg Nenashev added a comment - There is NO official recommendation to avoid stash/unstash. I've just mentioned it as a possible workaround

          John Koleszar added a comment -

          Thanks, Oleg. This is the documentation I was referring to about stash only being suitable for small files:
          https://github.com/jenkinsci/workflow-basic-steps-plugin/commit/413df48bdcb832261e8fb110150eeb8069e77c33

          Maybe there's an upper limit where it falls over, but I haven't hit it yet. In my opinion, if there were one, it'd be a bug worth fixing, rather than something to document away as WAI.

          John Koleszar added a comment - Thanks, Oleg. This is the documentation I was referring to about stash only being suitable for small files: https://github.com/jenkinsci/workflow-basic-steps-plugin/commit/413df48bdcb832261e8fb110150eeb8069e77c33 Maybe there's an upper limit where it falls over, but I haven't hit it yet. In my opinion, if there were one, it'd be a bug worth fixing, rather than something to document away as WAI.

          Oleg Nenashev added a comment -

          The fix has been backported to 2.19.3

          Oleg Nenashev added a comment - The fix has been backported to 2.19.3

          This ticket is quite old but in the past lots of changes have happened and I wonder what the current network usage with stashing is. Did the Jenkins guys improve stashing?

          My background is that the CI builds various C++ testsuites and I would like to split up building and testing. That's where stashing would be useful if the performance of copying the involved files would be acceptable. In the past this has been really bad - with or without compression enabled.

          Any current experiences?

          I'm talking about folders to be copied with e.g. 500MB.

          Heiko Nardmann added a comment - This ticket is quite old but in the past lots of changes have happened and I wonder what the current network usage with stashing is. Did the Jenkins guys improve stashing? My background is that the CI builds various C++ testsuites and I would like to split up building and testing. That's where stashing would be useful if the performance of copying the involved files would be acceptable. In the past this has been really bad - with or without compression enabled. Any current experiences? I'm talking about folders to be copied with e.g. 500MB.

          tsondergaard added a comment -

          heiko_nardmann, I've been wondering the same thing - the following warning is still present in /pipeline-syntax:

           

          Note that the stash and unstash steps are designed for use with small files. For large data transfers, use the External Workspace Manager plugin, or use an external repository manager such as Nexus or Artifactory. This is because stashed files are archived in a compressed TAR, and with large files this demands considerable resources on the controller, particularly CPU time. There's not a hard stash size limit, but between 5-100 MB you should probably consider alternatives. If you use the Artifact Manager on S3 plugin, or another plugin with a remote atifact manager, you can use this step without affecting controller performance since stashes will be sent directly to S3 from the agent (and similarly for unstash). 

          I ran the following two pipelines as a test while watching memory and CPU consumption to compare stash/unstash vs archiveArtifacts/curl with a 2GB file:

          stash/unstash:

          pipeline {
              agent { label "linux" }
              stages {
                  stage('Create and Stash File') {
                      steps {
                          script {
                              sh 'dd if=/dev/urandom of=randomfile bs=1M count=2048'                   
                              stash includes: 'randomfile', name: 'randomFileStash'
                          }
                      }
                  }
                  stage('Unstash') {
                      steps {
                          script {
                              unstash 'randomFileStash'
                              sh 'ls -lh randomfile'
                          }
                      }
                  }
              }
          } 

          archiveArtifacts/curl:

          pipeline {
              agent { label "linux" }
              stages {
                  stage('Create and Archive File') {
                      steps {
                          script {
                              sh 'dd if=/dev/urandom of=randomfile bs=1M count=2048'
                              archiveArtifacts artifacts: 'randomfile', onlyIfSuccessful: true
                          }
                      }
                  }
                  stage('Retrieve') {
                      steps {
                          script {
                              sh "curl -O ${env.JENKINS_URL}/job/${env.JOB_NAME}/${env.BUILD_NUMBER}/artifact/randomfile"
                              sh 'ls -lh randomfile'
                          }
                      }
                  }
              }
          }  

           

           

          For both pipelines the first stage takes 2 minutes on my installation - the dd part is very fast so it is dominated by the stash/archiveArtifacts step. For the second stage the unstash takes about 55 seconds where curl takes about 8 seconds.

           

          I informally watched the CPU and memory usage on the main server and the build agent and did not notice any significant difference in memory usage. CPU usage on the main server was also about the same in both cases. The CPU usage on the jenkins agent was much higher with stash than with archiveArtifacts. In summary:

          • stash and archiveArtifacts take about the same wall-clock time - resource consumption-wise the jenkins agent uses more CPU with stash
          • unstash takes about 55 seconds vs 8 for curl

           

          tsondergaard added a comment - heiko_nardmann , I've been wondering the same thing - the following warning is still present in /pipeline-syntax:   Note that the stash and unstash steps are designed for use with small files. For large data transfers, use the External Workspace Manager plugin, or use an external repository manager such as Nexus or Artifactory. This is because stashed files are archived in a compressed TAR, and with large files this demands considerable resources on the controller, particularly CPU time. There's not a hard stash size limit, but between 5-100 MB you should probably consider alternatives. If you use the Artifact Manager on S3 plugin, or another plugin with a remote atifact manager, you can use this step without affecting controller performance since stashes will be sent directly to S3 from the agent (and similarly for unstash ).  I ran the following two pipelines as a test while watching memory and CPU consumption to compare stash/unstash vs archiveArtifacts/curl with a 2GB file: stash/unstash: pipeline {     agent { label "linux" }     stages {         stage( 'Create and Stash File' ) {             steps {                 script {                     sh 'dd if =/dev/urandom of=randomfile bs=1M count=2048'                                        stash includes: 'randomfile' , name: 'randomFileStash'                 }             }         }       stage( 'Unstash' ) {             steps {                 script {                     unstash 'randomFileStash'                     sh 'ls -lh randomfile'                 }             }         }     } } archiveArtifacts/curl: pipeline {     agent { label "linux" }     stages {         stage( 'Create and Archive File' ) {             steps {                 script {                     sh 'dd if =/dev/urandom of=randomfile bs=1M count=2048'                     archiveArtifacts artifacts: 'randomfile' , onlyIfSuccessful: true                 }             }         }         stage( 'Retrieve' ) {             steps {                 script {                     sh "curl -O ${env.JENKINS_URL}/job/${env.JOB_NAME}/${env.BUILD_NUMBER}/artifact/randomfile"                     sh 'ls -lh randomfile'                 }             }         }     } }     For both pipelines the first stage takes 2 minutes on my installation - the dd part is very fast so it is dominated by the stash/archiveArtifacts step. For the second stage the unstash takes about 55 seconds where curl takes about 8 seconds.   I informally watched the CPU and memory usage on the main server and the build agent and did not notice any significant difference in memory usage. CPU usage on the main server was also about the same in both cases. The CPU usage on the jenkins agent was much higher with stash than with archiveArtifacts. In summary: stash and archiveArtifacts take about the same wall-clock time - resource consumption-wise the jenkins agent uses more CPU with stash unstash takes about 55 seconds vs 8 for curl  

            andresrc Andres Rodriguez
            jkoleszar John Koleszar
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: