• Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • Solaris/x86, cluster of 5 nodes, all machines in cluster are identical

      Clone workspace performance is poor. Whilst compilation takes around 4 minutes, archiving the workspace afterwards takes the total job time out to 35 minutes! Similarly un-archiving it at the start of downstream jobs.

      We have 4 steps, Compile, quick test, package, full test.
      Compile uses Perforce SCM, we use clone workspace after that to ensure downstream builds use identical files but other compiles can continue in parallel. So we need almost the files in the entire workspace.
      The filesystem workspace size is about 1.6Gb
      The archived workspace.zip file size is about 1.4Gb

      An "exclude" filter may help a little but I think there is something slow going on.
      Note that we use slaves as well so the piped data may have an impact, but all the machines are very close together.
      When the compile runs on the master node it doesn't seem any quicker.

          [JENKINS-10520] clone-workspace-scm performance is poor

          Ed Randall added a comment - - edited

          I did some tests, to archive a workspace from a slave to the master using "tar | ssh" takes around 2 minutes 15 seconds (2 min 10 with -z compression).

          clone-workspace plugin took 22 minutes to do it from the same workspace on the same node.

          Ed Randall added a comment - - edited I did some tests, to archive a workspace from a slave to the master using "tar | ssh" takes around 2 minutes 15 seconds (2 min 10 with -z compression). clone-workspace plugin took 22 minutes to do it from the same workspace on the same node.

          Ed Randall added a comment - - edited

          This is our build pipeline:

          1) Compile - checks files out of Perforce, cleans, configures for the target architecture and compiles. Archives the workspace for susequent steps in the pipeline.

          2) Zip - builds the .ear file, verifies it, builds a complete shippable .zip of all project deliverables.

          3) Quick test - runs a relatively short test suite.

          2a) Zip is "promoted" if all tests in QuickTest pass. This delivers the Zip to the System test team (outside of Jenkins).

          Now we didn't want each step to sync the files down from Perforce and compile afresh, the clone-workspace plugin seemed ideal for us. Unfortunately there are a number of problems with it including:

          a) Only the latest workspace from step(1) is maintained. So if another change triggers a Compile whilst Zip is in progress, it could be that (3) Quick test actually tests that subsequent build and not the one it is supposed to be testing. This has regularly caused us considerable confusion as the jobs were out-of-step.

          b) It takes so long ... our project is somewhat monolithic, including all binaries required within the workspace. The workspace.zip archive is about 1.5Gb and creating it takes around 50 minutes using clone-workspace-plugin vs. only 10 minutes for the actual compile, that's unacceptable. Un-zipping at the start of the steps (2) and (3) is not quite so bad but still 10-20 minutes. It seems to be something to do with the way data is piped between master and slave.

          The solution we now have working is to replace this plugin by a shell script. This script takes 2 minutes to archive the workspace at the end of job (1) and 2 minutes at to un-archive it at the start of the other steps. It is attached to this report to illustrate our use case which we would like this plugin to support.

          Use the script as follows:

          At the end of job (1) invoke the script using "Execute Shell":

          ${JENKINS_BIN}/clone-workspace PACK "${BUILD_TAG}"

          Optional INCLUDE= and EXCLUDE= filters can be used to limit the contents of the archive. We actually create 2 archives, one is much smaller containing just the information required for the "promote" step (2a) to work.

          Also add a Post-Build action "Trigger parameterized build on other projects" to start job (2) using the parameter:

          UPSTREAM_BUILD_TAG=${BUILD_TAG}

          Jobs (2) and (3) are parameterized, accepting UPSTREAM_BUILD_TAG. The first task that each of these downstream jobs runs is:

          Execute Shell

          ${JENKINS_BIN}/clone-workspace UNPACK "${UPSTREAM_BUILD_TAG}"

          The archive dir must exist on the master node and be referenced by the JOBS_ARCHIVE environment variable.
          The last job in the pipeline removes the redundant archive by invoking:

          ${JENKINS_BIN}/clone-workspace CLEAN "${UPSTREAM_BUILD_TAG}"

          We also purge all archives from it overnight using a "cron" job, just to be sure.

          Now the downstream jobs don't show the change history; that was solved by re-instating the clone-workspace-plugin but configured to archive just one file.

          Ed Randall added a comment - - edited This is our build pipeline: 1) Compile - checks files out of Perforce, cleans, configures for the target architecture and compiles. Archives the workspace for susequent steps in the pipeline. 2) Zip - builds the .ear file, verifies it, builds a complete shippable .zip of all project deliverables. 3) Quick test - runs a relatively short test suite. 2a) Zip is "promoted" if all tests in QuickTest pass. This delivers the Zip to the System test team (outside of Jenkins). Now we didn't want each step to sync the files down from Perforce and compile afresh, the clone-workspace plugin seemed ideal for us. Unfortunately there are a number of problems with it including: a) Only the latest workspace from step(1) is maintained. So if another change triggers a Compile whilst Zip is in progress, it could be that (3) Quick test actually tests that subsequent build and not the one it is supposed to be testing. This has regularly caused us considerable confusion as the jobs were out-of-step. b) It takes so long ... our project is somewhat monolithic, including all binaries required within the workspace. The workspace.zip archive is about 1.5Gb and creating it takes around 50 minutes using clone-workspace-plugin vs. only 10 minutes for the actual compile, that's unacceptable. Un-zipping at the start of the steps (2) and (3) is not quite so bad but still 10-20 minutes. It seems to be something to do with the way data is piped between master and slave. The solution we now have working is to replace this plugin by a shell script. This script takes 2 minutes to archive the workspace at the end of job (1) and 2 minutes at to un-archive it at the start of the other steps. It is attached to this report to illustrate our use case which we would like this plugin to support. Use the script as follows: At the end of job (1) invoke the script using "Execute Shell": ${JENKINS_BIN}/clone-workspace PACK "${BUILD_TAG}" Optional INCLUDE= and EXCLUDE= filters can be used to limit the contents of the archive. We actually create 2 archives, one is much smaller containing just the information required for the "promote" step (2a) to work. Also add a Post-Build action "Trigger parameterized build on other projects" to start job (2) using the parameter: UPSTREAM_BUILD_TAG=${BUILD_TAG} Jobs (2) and (3) are parameterized, accepting UPSTREAM_BUILD_TAG. The first task that each of these downstream jobs runs is: Execute Shell ${JENKINS_BIN}/clone-workspace UNPACK "${UPSTREAM_BUILD_TAG}" The archive dir must exist on the master node and be referenced by the JOBS_ARCHIVE environment variable. The last job in the pipeline removes the redundant archive by invoking: ${JENKINS_BIN}/clone-workspace CLEAN "${UPSTREAM_BUILD_TAG}" We also purge all archives from it overnight using a "cron" job, just to be sure. Now the downstream jobs don't show the change history; that was solved by re-instating the clone-workspace-plugin but configured to archive just one file.

          I'm getting this slow performance locally (unzipping and zipping locally, no slaves or anything) and it is still very slow (4min for something that takes 1s if done manually).

          Looking at the code, it looks like they are just using the FilePath.zip and tar methods. So not sure if this is an issue in the plugin or in the FilePath.

          Pedro Rodriguez added a comment - I'm getting this slow performance locally (unzipping and zipping locally, no slaves or anything) and it is still very slow (4min for something that takes 1s if done manually). Looking at the code, it looks like they are just using the FilePath.zip and tar methods. So not sure if this is an issue in the plugin or in the FilePath.

          David Schmitt added a comment -

          Here are two logs (with "Elapsed Time" timestamps):

          master log
          00:02:46.164 	 20 Warning(s)
          00:02:46.164 	 0 Error(s)
          00:02:46.164 
          00:02:46.166 Time Elapsed 00:00:11.2877140
          00:02:46.216 Archiving artifacts
          00:02:48.510 Archiving workspace
          00:03:20.968 Deleting old workspace snapshot from #197.
          00:03:22.846 Recording fingerprints
          00:03:54.751 Triggering a new build of zetbox.Parties-develop-Linux-Deploy #173
          00:03:54.751 Triggering a new build of zetbox.Parties-develop-Windows-Deploy #150
          00:03:54.755 Finished: SUCCESS
          
          slave log
          00:02:58.645     0 Warning(s)
          00:02:58.645     0 Error(s)
          00:02:58.645 
          00:02:58.645 Time Elapsed 00:00:09.07
          00:02:58.899 Archiving artifacts
          00:04:05.101 Archiving workspace
          00:05:38.241 Deleting old workspace snapshot from #197.
          00:05:38.261 Recording fingerprints
          00:05:44.987 Triggering a new build of zetbox.Parties-develop-Linux-Deploy #172
          00:05:45.014 Triggering a new build of zetbox.Parties-develop-Windows-Deploy #149
          00:05:45.018 Finished: SUCCESS
          

          master and slave are virtual machines running on the same host, communicating directly via the jnlp started service.

          The master is running Debian GNU/Linux 7.2 (wheezy) and the slave is a Windows 7 box. Jenkins version is 1.537

          As you can see from the first few lines, the compilation power of both machines is similar. The big difference is in the archiving steps: local on master it takes 2.2 seconds for Archiving artifacts and ~30s for "Archiving workspace" which are both around 800MB.

          David Schmitt added a comment - Here are two logs (with "Elapsed Time" timestamps): master log 00:02:46.164 20 Warning(s) 00:02:46.164 0 Error(s) 00:02:46.164 00:02:46.166 Time Elapsed 00:00:11.2877140 00:02:46.216 Archiving artifacts 00:02:48.510 Archiving workspace 00:03:20.968 Deleting old workspace snapshot from #197. 00:03:22.846 Recording fingerprints 00:03:54.751 Triggering a new build of zetbox.Parties-develop-Linux-Deploy #173 00:03:54.751 Triggering a new build of zetbox.Parties-develop-Windows-Deploy #150 00:03:54.755 Finished: SUCCESS slave log 00:02:58.645 0 Warning(s) 00:02:58.645 0 Error(s) 00:02:58.645 00:02:58.645 Time Elapsed 00:00:09.07 00:02:58.899 Archiving artifacts 00:04:05.101 Archiving workspace 00:05:38.241 Deleting old workspace snapshot from #197. 00:05:38.261 Recording fingerprints 00:05:44.987 Triggering a new build of zetbox.Parties-develop-Linux-Deploy #172 00:05:45.014 Triggering a new build of zetbox.Parties-develop-Windows-Deploy #149 00:05:45.018 Finished: SUCCESS master and slave are virtual machines running on the same host, communicating directly via the jnlp started service. The master is running Debian GNU/Linux 7.2 (wheezy) and the slave is a Windows 7 box. Jenkins version is 1.537 As you can see from the first few lines, the compilation power of both machines is similar. The big difference is in the archiving steps: local on master it takes 2.2 seconds for Archiving artifacts and ~30s for "Archiving workspace" which are both around 800MB.

          A simple way to improve this would probably be to optionally specify native tar / gzip executables that the plugin would use instead of doing the compression in Java code.

          Another more complex idea would be to implement something like copy-on-write semantics.

          Sebastian Schuberth added a comment - A simple way to improve this would probably be to optionally specify native tar / gzip executables that the plugin would use instead of doing the compression in Java code. Another more complex idea would be to implement something like copy-on-write semantics.

          David Schmitt added a comment - - edited

          Can someone with insight into the code please check whether this is not caused by something trivial like kohsuke described in jenkins-ci.org/content/maven-job-type-performance-improvements-maven-plugin-22 ?

          David Schmitt added a comment - - edited Can someone with insight into the code please check whether this is not caused by something trivial like kohsuke described in jenkins-ci.org/content/maven-job-type-performance-improvements-maven-plugin-22 ?

            Unassigned Unassigned
            edrandall Ed Randall
            Votes:
            13 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated: