Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-36914

stash step is excessively slow on ARM

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Not A Defect
    • Component/s: workflow-api-plugin
    • Labels:
    • Environment:
    • Similar Issues:

      Description

      When using the stash build step on a Tegra X1 build slave, it is orders of magnitude slower than it should be: a single file of 20MB takes 29 seconds to archive, and larger files take proportionally longer. The device is connected by 1Gb/s ethernet and the file is stored on an SSD, so it should take well under 1s. While this is happening, the java process on the slave is at 100% CPU usage.

      Sample pipeline script:

      stage 'prepare'
      node('tegra-cuda') {
          deleteDir()
          sh 'dd if=/dev/zero of=dummy bs=1M count=20'
          sh 'date'
          stash name: 'source', includes: 'dummy'
          sh 'date'
      }
      

      Output:

      [Pipeline] stage (prepare)
      Entering stage prepare
      Proceeding
      [Pipeline] node
      Running on e5f011df5aa1-e069b02f in /var/lib/jenkins/workspace/ARM stash test
      [Pipeline] {
      [Pipeline] deleteDir
      [Pipeline] sh
      [ARM stash test] Running shell script
      + dd if=/dev/zero of=dummy bs=1M count=20
      20+0 records in
      20+0 records out
      20971520 bytes (21 MB) copied, 0.0732756 s, 286 MB/s
      [Pipeline] sh
      [ARM stash test] Running shell script
      + date
      Mon Jul 25 11:02:05 UTC 2016
      [Pipeline] stash
      Stashed 1 file(s)
      [Pipeline] sh
      [ARM stash test] Running shell script
      + date
      Mon Jul 25 11:02:34 UTC 2016
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      Finished: SUCCESS
      

      Thread dump for the relevant slave:

      Channel reader thread: channel
      
      "Channel reader thread: channel" Id=11 Group=main RUNNABLE (in native)
      	at java.net.SocketInputStream.socketRead0(Native Method)
      	at java.net.SocketInputStream.read(SocketInputStream.java:152)
      	at java.net.SocketInputStream.read(SocketInputStream.java:122)
      	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
      	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
      	-  locked java.io.BufferedInputStream@b24124
      	at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
      	at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
      	at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
      	at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
      	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
      	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
      
      main
      
      "main" Id=1 Group=main WAITING on hudson.remoting.Engine@13108c4
      	at java.lang.Object.wait(Native Method)
      	-  waiting on hudson.remoting.Engine@13108c4
      	at java.lang.Thread.join(Thread.java:1281)
      	at java.lang.Thread.join(Thread.java:1355)
      	at hudson.remoting.jnlp.Main.main(Main.java:137)
      	at hudson.remoting.jnlp.Main._main(Main.java:130)
      	at hudson.remoting.jnlp.Main.main(Main.java:96)
      	at hudson.plugins.swarm.SwarmClient.connect(SwarmClient.java:239)
      	at hudson.plugins.swarm.Client.run(Client.java:107)
      	at hudson.plugins.swarm.Client.main(Client.java:68)
      
      Ping thread for channel hudson.remoting.Channel@195804b:channel
      
      "Ping thread for channel hudson.remoting.Channel@195804b:channel" Id=16 Group=main TIMED_WAITING
      	at java.lang.Thread.sleep(Native Method)
      	at hudson.remoting.PingThread.run(PingThread.java:90)
      
      pool-1-thread-274 for channel
      
      "pool-1-thread-274 for channel" Id=1122 Group=main RUNNABLE
      	at com.jcraft.jzlib.Deflate.fill_window(Deflate.java:966)
      	at com.jcraft.jzlib.Deflate.deflate_slow(Deflate.java:1125)
      	at com.jcraft.jzlib.Deflate.deflate(Deflate.java:1587)
      	at com.jcraft.jzlib.Deflater.deflate(Deflater.java:140)
      	at com.jcraft.jzlib.DeflaterOutputStream.deflate(DeflaterOutputStream.java:129)
      	at com.jcraft.jzlib.DeflaterOutputStream.write(DeflaterOutputStream.java:102)
      	at org.apache.commons.compress.utils.CountingOutputStream.write(CountingOutputStream.java:48)
      	at org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.writeRecord(TarArchiveOutputStream.java:571)
      	at org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.write(TarArchiveOutputStream.java:435)
      	at hudson.util.io.TarArchiver.visit(TarArchiver.java:100)
      	at hudson.util.DirScanner.scanSingle(DirScanner.java:49)
      	at hudson.util.DirScanner$Glob.scan(DirScanner.java:131)
      	at hudson.FilePath$1.invoke(FilePath.java:463)
      	at hudson.FilePath$1.invoke(FilePath.java:459)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2772)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:120)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      	at hudson.remoting.Request$2.run(Request.java:326)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at hudson.remoting.Engine$1$1.run(Engine.java:62)
      	at java.lang.Thread.run(Thread.java:745)
      
      	Number of locked synchronizers = 1
      	- java.util.concurrent.ThreadPoolExecutor$Worker@17d329e
      
      pool-1-thread-275 for channel
      
      "pool-1-thread-275 for channel" Id=1126 Group=main RUNNABLE
      	at sun.management.ThreadImpl.dumpThreads0(Native Method)
      	at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:446)
      	at hudson.Functions.getThreadInfos(Functions.java:1196)
      	at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:98)
      	at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:95)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:120)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      	at hudson.remoting.Request$2.run(Request.java:326)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at hudson.remoting.Engine$1$1.run(Engine.java:62)
      	at java.lang.Thread.run(Thread.java:745)
      
      	Number of locked synchronizers = 1
      	- java.util.concurrent.ThreadPoolExecutor$Worker@a0b835
      
      RemoteInvocationHandler [#1]
      
      "RemoteInvocationHandler [#1]" Id=10 Group=main TIMED_WAITING on java.lang.ref.ReferenceQueue$Lock@d2df5d
      	at java.lang.Object.wait(Native Method)
      	-  waiting on java.lang.ref.ReferenceQueue$Lock@d2df5d
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
      	at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:415)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
      	at java.lang.Thread.run(Thread.java:745)
      
      Thread-1
      
      "Thread-1" Id=9 Group=main TIMED_WAITING on hudson.remoting.Channel@195804b
      	at java.lang.Object.wait(Native Method)
      	-  waiting on hudson.remoting.Channel@195804b
      	at hudson.remoting.Channel.join(Channel.java:948)
      	at hudson.remoting.Engine.run(Engine.java:267)
      
      Finalizer
      
      "Finalizer" Id=3 Group=system WAITING on java.lang.ref.ReferenceQueue$Lock@466bb5
      	at java.lang.Object.wait(Native Method)
      	-  waiting on java.lang.ref.ReferenceQueue$Lock@466bb5
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
      	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
      
      process reaper
      
      "process reaper" Id=1123 Group=system TIMED_WAITING on java.util.concurrent.SynchronousQueue$TransferStack@6e9009
      	at sun.misc.Unsafe.park(Native Method)
      	-  waiting on java.util.concurrent.SynchronousQueue$TransferStack@6e9009
      	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
      	at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
      	at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
      	at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
      	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      
      Reference Handler
      
      "Reference Handler" Id=2 Group=system WAITING on java.lang.ref.Reference$Lock@1496b81
      	at java.lang.Object.wait(Native Method)
      	-  waiting on java.lang.ref.Reference$Lock@1496b81
      	at java.lang.Object.wait(Object.java:503)
      	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
      
      Signal Dispatcher
      
      "Signal Dispatcher" Id=4 Group=system RUNNABLE
      

      I've taken a guess at the appropriate component, but I'm not sure which component corresponds to "workflow-basic-steps" (https://jenkins.io/doc/pipeline/steps/workflow-basic-steps/)

        Attachments

          Issue Links

            Activity

            bmerry Bruce Merry created issue -
            rtyler R. Tyler Croy made changes -
            Field Original Value New Value
            Workflow JNJira [ 173667 ] JNJira + In-Review [ 185273 ]
            Hide
            miklelappo Mikhail Lappo added a comment - - edited

            I also have a problem with stash operation on ARM (Raspberry pi). The main buildserver is Intel, slave is arm. Stashed data transfer (Intel ->> ARM) is going on 300Kb/s, while scp between same computers works at 3MB/s....

            Show
            miklelappo Mikhail Lappo added a comment - - edited I also have a problem with stash operation on ARM (Raspberry pi). The main buildserver is Intel, slave is arm. Stashed data transfer (Intel ->> ARM) is going on 300Kb/s, while scp between same computers works at 3MB/s....
            jglick Jesse Glick made changes -
            Link This issue duplicates JENKINS-38640 [ JENKINS-38640 ]
            jglick Jesse Glick made changes -
            Resolution Duplicate [ 3 ]
            Status Open [ 1 ] Resolved [ 5 ]
            Hide
            jglick Jesse Glick added a comment -

            Possibly not a duplicate, unclear.

            Show
            jglick Jesse Glick added a comment - Possibly not a duplicate, unclear.
            jglick Jesse Glick made changes -
            Assignee rsandell [ rsandell ]
            Resolution Duplicate [ 3 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            jglick Jesse Glick made changes -
            Component/s workflow-api-plugin [ 21711 ]
            Component/s pipeline-utility-steps-plugin [ 21135 ]
            jglick Jesse Glick made changes -
            Labels arm pipeline arm performance
            jglick Jesse Glick made changes -
            Status Reopened [ 4 ] Open [ 1 ]
            Hide
            jglick Jesse Glick added a comment -

            stash and unstash are not intended for large files. Use the External Workspace Manager plugin, or an external artifact manager like Nexus or Artifactory.

            Show
            jglick Jesse Glick added a comment - stash and unstash are not intended for large files. Use the External Workspace Manager plugin, or an external artifact manager like Nexus or Artifactory.
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            src/main/resources/org/jenkinsci/plugins/workflow/support/steps/stash/StashStep/help.html
            http://jenkins-ci.org/commit/workflow-basic-steps-plugin/413df48bdcb832261e8fb110150eeb8069e77c33
            Log:
            JENKINS-38640 JENKINS-36914 Warn users to avoid stash/unstash of large files

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/resources/org/jenkinsci/plugins/workflow/support/steps/stash/StashStep/help.html http://jenkins-ci.org/commit/workflow-basic-steps-plugin/413df48bdcb832261e8fb110150eeb8069e77c33 Log: JENKINS-38640 JENKINS-36914 Warn users to avoid stash/unstash of large files
            Hide
            scm_issue_link SCM/JIRA link daemon added a comment -

            Code changed in jenkins
            User: Jesse Glick
            Path:
            src/main/resources/org/jenkinsci/plugins/workflow/support/steps/stash/StashStep/help.html
            http://jenkins-ci.org/commit/workflow-basic-steps-plugin/f541cd2cda5f316042d38af556f4160f7e470ccf
            Log:
            Merge pull request #23 from jenkinsci/jglick-stash-docs

            JENKINS-38640 JENKINS-36914 Warn users to avoid stash/unstash of large files

            Compare: https://github.com/jenkinsci/workflow-basic-steps-plugin/compare/95e202bec553...f541cd2cda5f

            Show
            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/resources/org/jenkinsci/plugins/workflow/support/steps/stash/StashStep/help.html http://jenkins-ci.org/commit/workflow-basic-steps-plugin/f541cd2cda5f316042d38af556f4160f7e470ccf Log: Merge pull request #23 from jenkinsci/jglick-stash-docs JENKINS-38640 JENKINS-36914 Warn users to avoid stash/unstash of large files Compare: https://github.com/jenkinsci/workflow-basic-steps-plugin/compare/95e202bec553...f541cd2cda5f
            Hide
            svanoort Sam Van Oort added a comment -

            Bruce Merry Unfortunately I'm quite confident this is a result of the GZIP compression applied to stashes – the hardware itself probably does not have high performance for that algorithm. This operation thus becomes CPU-bound rather than network or I/O bound.

            I don't think we can really resolve this one since it's tied to the algorithm itself – using the latest compatible JDKs may help some. Jesse has suggested some workarounds as well. Finally, we may be able to switch out some of the GZIP implementations to higher-performance versions within the Jenkins core, but even so I doubt it'll yield great performance on ARM hardware.

            Show
            svanoort Sam Van Oort added a comment - Bruce Merry Unfortunately I'm quite confident this is a result of the GZIP compression applied to stashes – the hardware itself probably does not have high performance for that algorithm. This operation thus becomes CPU-bound rather than network or I/O bound. I don't think we can really resolve this one since it's tied to the algorithm itself – using the latest compatible JDKs may help some. Jesse has suggested some workarounds as well. Finally, we may be able to switch out some of the GZIP implementations to higher-performance versions within the Jenkins core, but even so I doubt it'll yield great performance on ARM hardware.
            Hide
            svanoort Sam Van Oort added a comment -

            Closing since the root cause here is that GZIP is demanding on ARM hardware

            Show
            svanoort Sam Van Oort added a comment - Closing since the root cause here is that GZIP is demanding on ARM hardware
            svanoort Sam Van Oort made changes -
            Resolution Not A Defect [ 7 ]
            Status Open [ 1 ] Closed [ 6 ]
            Hide
            jglick Jesse Glick added a comment -

            Do you have some evidence of this being an issue with Gzip on ARM? If so, it would be appropriate to use an uncompressed transfer method on that platform. Really it might be better to do so all the time—the Remoting transport is generally responsible for compression.

            Of course if you are using https://plugins.jenkins.io/artifact-manager-s3 this would not be an issue.

            Show
            jglick Jesse Glick added a comment - Do you have some evidence of this being an issue with Gzip on ARM? If so, it would be appropriate to use an uncompressed transfer method on that platform. Really it might be better to do so all the time—the Remoting transport is generally responsible for compression. Of course if you are using https://plugins.jenkins.io/artifact-manager-s3 this would not be an issue.
            Hide
            svanoort Sam Van Oort added a comment -

            Jesse Glick I know that the discrepancy vs. raw IO speeds is almost certainly due to GZIP compress/decompress. Using an uncompressed method might be beneficial in some cases (especially with poorly-compressible data).

            I'd have to benchmark the GZIP implementation on that specific platform and compare to the same one on a Intel/AMD laptop processor – but the benchmarks here show pretty large differences in performance between ARM and Intel processors: https://quixdb.github.io/squash-benchmark/

            And if you're only using a single CPU thread to do decompression, with a pure-java implementation that is potentially less than optimal for the platform, then 0.7 MB/s seems not unreasonable for a compression rate. It's in the ballpark anyway - I'm seeing <10 MB compression rates reported for various quad-core+ ARM chips for native-code implementations of that compression algorithm, using multiple threads. Remember we're talking processors that only have a few watts to play with and fairly small cache sizes.

            Show
            svanoort Sam Van Oort added a comment - Jesse Glick I know that the discrepancy vs. raw IO speeds is almost certainly due to GZIP compress/decompress. Using an uncompressed method might be beneficial in some cases (especially with poorly-compressible data). I'd have to benchmark the GZIP implementation on that specific platform and compare to the same one on a Intel/AMD laptop processor – but the benchmarks here show pretty large differences in performance between ARM and Intel processors: https://quixdb.github.io/squash-benchmark/ And if you're only using a single CPU thread to do decompression, with a pure-java implementation that is potentially less than optimal for the platform, then 0.7 MB/s seems not unreasonable for a compression rate. It's in the ballpark anyway - I'm seeing <10 MB compression rates reported for various quad-core+ ARM chips for native-code implementations of that compression algorithm, using multiple threads. Remember we're talking processors that only have a few watts to play with and fairly small cache sizes.
            Hide
            jglick Jesse Glick added a comment -

            Could just switch to ArchiverFactory.TAR and do the compression on the master side (and similarly for unstashing). There is generally no benefit to using compression at this level regardless of the processor—a LAN is generally quite capable of handling high bandwidth, and many agent connection methods add transport-level compression anyway. On the other hand it is better to burn CPU on an agent, even at the expense of longer build times, if it saves a little CPU on the master, and we do not want to store stashes uncompressed.

            Show
            jglick Jesse Glick added a comment - Could just switch to ArchiverFactory.TAR and do the compression on the master side (and similarly for unstashing). There is generally no benefit to using compression at this level regardless of the processor—a LAN is generally quite capable of handling high bandwidth, and many agent connection methods add transport-level compression anyway. On the other hand it is better to burn CPU on an agent, even at the expense of longer build times, if it saves a little CPU on the master, and we do not want to store stashes uncompressed.
            Hide
            svanoort Sam Van Oort added a comment -

            Jesse Glick I'd agree we should burn agent CPU over master CPU, but do think it's worth saving some storage space on-master. The happy medium would be a high-performance algorithm such as LZ4, LZO, or LZF which gets most of the benefits of compression and can shrink highly-compressible content with a much lower CPU cost than Deflate (used by GZIP).

            I've seen very positive results with those algorithms in the past – they're fast enough that if you're transmitting fairly compressible content (ex JSON or XML payloads or source code) you can see time savings even in a data center with gigabit links.

            Show
            svanoort Sam Van Oort added a comment - Jesse Glick I'd agree we should burn agent CPU over master CPU, but do think it's worth saving some storage space on-master. The happy medium would be a high-performance algorithm such as LZ4, LZO, or LZF which gets most of the benefits of compression and can shrink highly-compressible content with a much lower CPU cost than Deflate (used by GZIP). I've seen very positive results with those algorithms in the past – they're fast enough that if you're transmitting fairly compressible content (ex JSON or XML payloads or source code) you can see time savings even in a data center with gigabit links.

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              bmerry Bruce Merry
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: