Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-3922

Slave is slow copying maven artifacts to master

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • remoting
    • None
    • Platform: All, OS: All

      The artifact transfer is currently a 3-4x penalty for the project that I am
      working on. I have reproduced the issue with a simple test pom that does
      nothing but jar hudson.war. I performed this test on a heterogeneous
      environment. Both master and slave are running Fedora 10, but the master is a
      faster machine. Still, it highlights the issue.

      Here are some stats (all stats are after caching dependencies in the local repos):
      Master build through Hudson: 19s
      Master build from command line (no Hudson): 9s
      Slave build through Hudson: 1m46s
      Slave build from command line (no Hudson): 16s

      To be fair we should at least add time to do a straight scp of the artifact from
      slave to master. The two nodes share a 100 Mbit switch:

      $ scp target/slow-rider-1.0.0-SNAPSHOT.jar master_node:
      slow-rider-1.0.0NAPSHOT.jar 100% 25MB 12.7MB/s 00:02

      Of course this example exaggerates the issue to make it more clear but not by
      too much. I originally noticed this in a completely separate environment that
      was all virtual. I reproduced this on two physical machines using a different
      switch and different ethernet drivers (both virtual and physical). The
      reproducibility plus the comparison against command line + scp leads me to
      suspect eager flushing.

          [JENKINS-3922] Slave is slow copying maven artifacts to master

          John McNair added a comment -

          Created an attachment (id=754)
          pom that simply jars hudson.war

          John McNair added a comment - Created an attachment (id=754) pom that simply jars hudson.war

          John McNair added a comment -

          Oops. I just noticed that I forgot to upgrade this environment. The above
          stats were collected on 309. On 312 we have:

          Master: 20s
          Slave: 1m18s

          There seems to be definite improvement but still a big penalty for the slave.

          John McNair added a comment - Oops. I just noticed that I forgot to upgrade this environment. The above stats were collected on 309. On 312 we have: Master: 20s Slave: 1m18s There seems to be definite improvement but still a big penalty for the slave.

          Would it be possible for you to run a packet capturing tool like Wireshark to
          obtain the network packet dump between the master and the slave?

          Kohsuke Kawaguchi added a comment - Would it be possible for you to run a packet capturing tool like Wireshark to obtain the network packet dump between the master and the slave?

          John McNair added a comment -

          You want the ssh traffic? Is that helpful? Also, it is ~63MB for this build.
          Is there a subset that you would want to see?

          John McNair added a comment - You want the ssh traffic? Is that helpful? Also, it is ~63MB for this build. Is there a subset that you would want to see?

          trodriguez added a comment -

          watching this issue

          trodriguez added a comment - watching this issue

          Andrew Bayer added a comment -

          Just noting that this is still very definitely the case - and for what it's worth, I had the same speed problems both with the current MavenArtifact contents and with a test I did using FilePath.copyRecursiveTo instead of FilePath.copyTo.

          Andrew Bayer added a comment - Just noting that this is still very definitely the case - and for what it's worth, I had the same speed problems both with the current MavenArtifact contents and with a test I did using FilePath.copyRecursiveTo instead of FilePath.copyTo.

          protocol7b added a comment -

          We're seeing similar problems in the ASF Hudson environment. Archiving frequently make up 90% of the build time for projects. Is the currently any work ongoing in this area? Is there anything we could assist that would help in debugging the problems we're seeing?

          protocol7b added a comment - We're seeing similar problems in the ASF Hudson environment. Archiving frequently make up 90% of the build time for projects. Is the currently any work ongoing in this area? Is there anything we could assist that would help in debugging the problems we're seeing?

          orekutin added a comment -

          I diagnosed this issue some and I see a stack trace being sent with every 8K data chunk for these transfers. There's some 12 packets being sent per 8K chunk of data.

          orekutin added a comment - I diagnosed this issue some and I see a stack trace being sent with every 8K data chunk for these transfers. There's some 12 packets being sent per 8K chunk of data.

          166_MMX added a comment -

          Linking issues together

          166_MMX added a comment - Linking issues together

          Aaron Simmons added a comment -

          I'm seeing this problem also. It turns a 10-minute build into a 60-minute build.

          Aaron Simmons added a comment - I'm seeing this problem also. It turns a 10-minute build into a 60-minute build.

          John McNair added a comment -

          Hudson claims to have resolved this ticket in their system:
          http://issues.hudson-ci.org/browse/HUDSON-3922

          More details in a duplicate:
          http://issues.hudson-ci.org/browse/HUDSON-7813

          Fix is here:
          https://github.com/hudson/hudson/commit/953af4eabc03be58abe8405a35090b4e5fd08933

          The fix is to give users an option to disable compression altogether for remoting. Can we have something similar in jenkins? I realize that there will be a class of use cases where good, working compression makes sense, but in a setup where all jenkins nodes are on the same physical switch, compression rarely makes sense. So the deeper fix is probably to provide an option to turn off compression, perhaps per slave node AND fix the compression performance on Linux. I'd be ecstatic to get the first part in the short term.

          John McNair added a comment - Hudson claims to have resolved this ticket in their system: http://issues.hudson-ci.org/browse/HUDSON-3922 More details in a duplicate: http://issues.hudson-ci.org/browse/HUDSON-7813 Fix is here: https://github.com/hudson/hudson/commit/953af4eabc03be58abe8405a35090b4e5fd08933 The fix is to give users an option to disable compression altogether for remoting. Can we have something similar in jenkins? I realize that there will be a class of use cases where good, working compression makes sense, but in a setup where all jenkins nodes are on the same physical switch, compression rarely makes sense. So the deeper fix is probably to provide an option to turn off compression, perhaps per slave node AND fix the compression performance on Linux. I'd be ecstatic to get the first part in the short term.

          orekutin added a comment -

          While I haven't tested that fix, I am highly skeptical that the fix addresses this problem. From my debugging, the problem seemed to be excess network traffic and not the overhead of compression. Specifically, there were something like 10-12 packets per 8K chunks of data sent, there was a stack trace attached to every chunk (seems like an easy fix, this last one) and who knows what else.

          Just spend a little time stepping through the file transmission loop and watch that Wireshark output.

          orekutin added a comment - While I haven't tested that fix, I am highly skeptical that the fix addresses this problem. From my debugging, the problem seemed to be excess network traffic and not the overhead of compression. Specifically, there were something like 10-12 packets per 8K chunks of data sent, there was a stack trace attached to every chunk (seems like an easy fix, this last one) and who knows what else. Just spend a little time stepping through the file transmission loop and watch that Wireshark output.

          John McNair added a comment -

          I'm open to considering multiple causes, but I'll add another data point or two. I changed the chunking size to multiple megabytes to limit the number of object serializations, and it had a very small impact. I hacked together an enhancement of the remoting API so that files could be exchanged in a single object serialization. Even testing with 1 big file showed only limited improvement. I had exactly the same suspicions initially that there were simply too many packets exchanged for the amount of data flowing. My own testing showed otherwise for my particular setup. I convinced myself that there were some small gains to be made along these lines, but the real problem was elsewhere. I never got to the bottom of it though. I don't have proof, but the testing done on hudson and the explanation of their fix jive with my experience.

          John McNair added a comment - I'm open to considering multiple causes, but I'll add another data point or two. I changed the chunking size to multiple megabytes to limit the number of object serializations, and it had a very small impact. I hacked together an enhancement of the remoting API so that files could be exchanged in a single object serialization. Even testing with 1 big file showed only limited improvement. I had exactly the same suspicions initially that there were simply too many packets exchanged for the amount of data flowing. My own testing showed otherwise for my particular setup. I convinced myself that there were some small gains to be made along these lines, but the real problem was elsewhere. I never got to the bottom of it though. I don't have proof, but the testing done on hudson and the explanation of their fix jive with my experience.

          John McNair added a comment -

          Nevermind. I changed TarCompression.GZIP to TarCompression.NONE and tested that. No difference. Back to square 1.

          John McNair added a comment - Nevermind. I changed TarCompression.GZIP to TarCompression.NONE and tested that. No difference. Back to square 1.

          Marcin Walach added a comment -

          We have same issue on our environment. All our slaves are started over SSH. I have moved single job to JLNP slave and artefacts are copied way faster. Retrieving files from git and then console is also almost instant in comparison to job running over SSH.
          Do you thing it is worth moving all jobs to JLNP or after moving all congestion will move together?

          Marcin Walach added a comment - We have same issue on our environment. All our slaves are started over SSH. I have moved single job to JLNP slave and artefacts are copied way faster. Retrieving files from git and then console is also almost instant in comparison to job running over SSH. Do you thing it is worth moving all jobs to JLNP or after moving all congestion will move together?

          For me (FreeBSD), it is locked in next stack trace:

          "Channel reader thread: Channel to Maven [/usr/local/openjdk6//bin/java, -cp, /usr/home/builder/jenkins/builder/maven-agent.jar:/usr/home/builder/jenkins/builder/classworlds.jar, hudson.maven.agent.Main, /usr/local/share/java/maven2, /usr/home/builder/jenkins/builder/slave.jar, /usr/home/builder/jenkins/builder/maven-interceptor.jar, 40088, /usr/home/builder/jenkins/builder/maven2.1-interceptor.jar] / waiting for hudson.remoting.Channel@77cd18d:builder" prio=5 tid=0x0000000851778000 nid=0x84e460740 in Object.wait() [0x00007ffffa9ac000..0x00007ffffa9ac920]
          java.lang.Thread.State: TIMED_WAITING (on object monitor)
          at java.lang.Object.wait(Native Method)
          at hudson.remoting.Request.call(Request.java:127)

          • locked <0x000000083eb71ca8> (a hudson.remoting.ProxyInputStream$Chunk)
            at hudson.remoting.ProxyInputStream._read(ProxyInputStream.java:74)
          • locked <0x000000081ae366b8> (a hudson.remoting.ProxyInputStream)
            at hudson.remoting.ProxyInputStream.read(ProxyInputStream.java:80)
            at hudson.remoting.RemoteInputStream.read(RemoteInputStream.java:91)
            at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
          • locked <0x000000081ae36668> (a java.io.BufferedInputStream)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
            at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
          • locked <0x000000081ae36638> (a java.io.BufferedInputStream)
            at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2264)
            at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2666)
            at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2696)
            at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1648)
            at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1323)
            at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1945)
            at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1869)
            at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)
            at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
            at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
            at hudson.remoting.Channel$ReaderThread.run(Channel.java:1087)

          Looking at the code, there is a comment here:
          // I don't know exactly when this can happen, as pendingCalls are cleaned up by Channel,
          // but in production I've observed that in rare occasion it can block forever, even after a channel
          // is gone. So be defensive against that.
          wait(30*1000);

          It seems that it is time to get when this does occur

          Vitalii Tymchyshyn added a comment - For me (FreeBSD), it is locked in next stack trace: "Channel reader thread: Channel to Maven [/usr/local/openjdk6//bin/java, -cp, /usr/home/builder/jenkins/builder/maven-agent.jar:/usr/home/builder/jenkins/builder/classworlds.jar, hudson.maven.agent.Main, /usr/local/share/java/maven2, /usr/home/builder/jenkins/builder/slave.jar, /usr/home/builder/jenkins/builder/maven-interceptor.jar, 40088, /usr/home/builder/jenkins/builder/maven2.1-interceptor.jar] / waiting for hudson.remoting.Channel@77cd18d:builder" prio=5 tid=0x0000000851778000 nid=0x84e460740 in Object.wait() [0x00007ffffa9ac000..0x00007ffffa9ac920] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:127) locked <0x000000083eb71ca8> (a hudson.remoting.ProxyInputStream$Chunk) at hudson.remoting.ProxyInputStream._read(ProxyInputStream.java:74) locked <0x000000081ae366b8> (a hudson.remoting.ProxyInputStream) at hudson.remoting.ProxyInputStream.read(ProxyInputStream.java:80) at hudson.remoting.RemoteInputStream.read(RemoteInputStream.java:91) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) locked <0x000000081ae36668> (a java.io.BufferedInputStream) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) locked <0x000000081ae36638> (a java.io.BufferedInputStream) at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2264) at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2666) at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2696) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1648) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1323) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1945) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1869) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1087) Looking at the code, there is a comment here: // I don't know exactly when this can happen, as pendingCalls are cleaned up by Channel, // but in production I've observed that in rare occasion it can block forever, even after a channel // is gone. So be defensive against that. wait(30*1000); It seems that it is time to get when this does occur

          BTW: Something is bran-damaging in this stack trace. It is channel reader thread that is executing. It then sends a read request (hudson.remoting.ProxyInputStream.Chunk) to remote side and waits for an answer. But an answer should be read by exactly this thread that is waiting (unless there are two channels and two thread).

          Vitalii Tymchyshyn added a comment - BTW: Something is bran-damaging in this stack trace. It is channel reader thread that is executing. It then sends a read request (hudson.remoting.ProxyInputStream.Chunk) to remote side and waits for an answer. But an answer should be read by exactly this thread that is waiting (unless there are two channels and two thread).

          OK, it seems I got this down. The problem is in SSH connection buffering. Fix is in https://github.com/jenkinsci/ssh-slaves-plugin/pull/4

          Vitalii Tymchyshyn added a comment - OK, it seems I got this down. The problem is in SSH connection buffering. Fix is in https://github.com/jenkinsci/ssh-slaves-plugin/pull/4

          Code changed in jenkins
          User: Seiji Sogabe
          Path:
          src/main/java/hudson/plugins/sshslaves/SSHLauncher.java
          http://jenkins-ci.org/commit/ssh-slaves-plugin/aa61fad787d7c49d5c5b417d6e38371ffa7e6397
          Log:
          Merge pull request #4 from tivv/master

          A fix for https://issues.jenkins-ci.org/browse/JENKINS-3922

          Compare: https://github.com/jenkinsci/ssh-slaves-plugin/compare/2c0afb6...aa61fad

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Seiji Sogabe Path: src/main/java/hudson/plugins/sshslaves/SSHLauncher.java http://jenkins-ci.org/commit/ssh-slaves-plugin/aa61fad787d7c49d5c5b417d6e38371ffa7e6397 Log: Merge pull request #4 from tivv/master A fix for https://issues.jenkins-ci.org/browse/JENKINS-3922 Compare: https://github.com/jenkinsci/ssh-slaves-plugin/compare/2c0afb6...aa61fad

          I don't see the fix in the last released versions?

          Boris Granveaud added a comment - I don't see the fix in the last released versions?

          Jimmi Dyson added a comment -

          The fix is present, but doesn't actually seem to fix the problem. Rather it just improves the speed a bit, but it is still very slow compared to native SSH.

          I have done some tests & this seems to be due to the library that Jenkins uses for SSH - org.jvnet.hudson:trilead-ssh2:build212-hudson-5. Simple tests show really slow SFTPing. I've experimented with JSch & get comparable speeds against native SFTP. I'm working on porting the SSH slaves plugin to use JSch (which is released under a BSD-style license - any compatibility issues there?).

          One drawback of the JSch library is the lack of Putty key support. I don't know if this is such a big deal as users can always convert Putty keys to OpenSSH format keys using puttygen?

          I notice that the org.jvnet.hudson:trilead-ssh2:build212-hudson-5 dependency comes as transitive from jenkins-core. Should the SSH library actually be a part of core dependencies?

          Jimmi Dyson added a comment - The fix is present, but doesn't actually seem to fix the problem. Rather it just improves the speed a bit, but it is still very slow compared to native SSH. I have done some tests & this seems to be due to the library that Jenkins uses for SSH - org.jvnet.hudson:trilead-ssh2:build212-hudson-5. Simple tests show really slow SFTPing. I've experimented with JSch & get comparable speeds against native SFTP. I'm working on porting the SSH slaves plugin to use JSch (which is released under a BSD-style license - any compatibility issues there?). One drawback of the JSch library is the lack of Putty key support. I don't know if this is such a big deal as users can always convert Putty keys to OpenSSH format keys using puttygen? I notice that the org.jvnet.hudson:trilead-ssh2:build212-hudson-5 dependency comes as transitive from jenkins-core. Should the SSH library actually be a part of core dependencies?

          Jimmi Dyson added a comment -

          The fix slightly speeds it up, but it is still slow enough to extend build times considerably with big artifacts. Experimenting with a different SSH library that shows good initial signs of speeding things up to near-native SSH speed.

          Jimmi Dyson added a comment - The fix slightly speeds it up, but it is still slow enough to extend build times considerably with big artifacts. Experimenting with a different SSH library that shows good initial signs of speeding things up to near-native SSH speed.

          Jimmi Dyson added a comment - - edited

          I've updated the ssh slaves plugin to use JSch & all works fine for connection, starting, running builds, disconnecting, etc. But it doesn't solve the SFTP speed issue... I now realise that it doesn't actually use SFTP for archiving artifacts back to the master - that is done through the FilePath abstraction I believe, although using the streams created by the SSHLauncher.

          So why is this slow? In our environment, doing native SSH transfers takes around 10 seconds for a 100MB transfer. Jenkins archiving a 100MB artifact takes about 50 seconds using an SSH slave. Using an SFTP client using JSch 100MB is transferred in the same time as native (10 seconds).

          Jimmi Dyson added a comment - - edited I've updated the ssh slaves plugin to use JSch & all works fine for connection, starting, running builds, disconnecting, etc. But it doesn't solve the SFTP speed issue... I now realise that it doesn't actually use SFTP for archiving artifacts back to the master - that is done through the FilePath abstraction I believe, although using the streams created by the SSHLauncher. So why is this slow? In our environment, doing native SSH transfers takes around 10 seconds for a 100MB transfer. Jenkins archiving a 100MB artifact takes about 50 seconds using an SSH slave. Using an SFTP client using JSch 100MB is transferred in the same time as native (10 seconds).

          David Reiss added a comment -

          This issue affected us in a big way once we moved our slaves to a remote datacenter. From the descriptions, it seems like not everyone has the same problem that we did, but I'll explain how we fixed it.

          Diagnostics

          • Make sure you can log into your master and run "scp slave:somefile ." and get the bandwidth that you expect. If not, Jenkins is not your problem. Check out http://wwwx.cs.unc.edu/~sparkst/howto/network_tuning.php if you are on a high-latency link.
          • Compute your bandwidth-delay product. This is the bandwidth you get from a raw scp in bytes per second times the round-trip-time you get from ping. In my case, this was about 4,000,000 (4 MB/s) * 0.06 (60ms) = 240,000 bytes (240 kB).
          • If you are using ssh slaves and your BDP is greater than 16 kB, you are definitely having the same problem that we were. This is the trilead ssh window problem.
          • If you are using any type of slave and your BDP is greater than 128 kB, then you are also affected by the jenkins remoting pipe window problem.

          trilead ssh window problem

          The ssh-slaves-plugin uses the trilead ssh library to connect to the slaves. Unfortunately, that library uses a hard-coded 30,000-byte receive buffer, which limits the amount of in-flight data to 30,000 bytes. In practice, the algorithm it uses for updating its receive window rounds that down to a power of two, so you only get 16kB.

          I created a pull request at https://github.com/jenkinsci/trilead-ssh2/pull/1 to make this configurable at JVM startup time. Making this window large increased our bandwidth by a factor of almost 8. Note that two of these buffers are allocated for each slave, so turning this up can consume memory quickly if you have several slaves. In our case, we have memory to spare, so it wasn't a problem. It might be useful to switch to another ssh library that allocates window memory dynamically.

          Fixing this will get your BDP up to almost 128kB, but beyond that, you run into another problem.

          jenkins remoting pipe window problem

          The archiving process uses a hudson.remoting.Pipe object to send the data back. This object uses flow control to avoid overwhelming the receiver. By default, it only allows 128kB of in-flight data. There is already a system property that controls this constant, but it has a space in its name, which makes it a bit complicated to set. I created a pull request at https://github.com/jenkinsci/remoting/pull/4 to fix the name.

          Note that this property must be set on the slave's JVM, not the master's. Therefore, to set it, you must go into your ssh slave configuration, open the advanced button, find the "JVM Options" input, and enter "-Dclass\ hudson.remoting.Channel.pipeWindowSize=1234567" (no quotes, change the number to whatever is appropriate for your environment). If my pull request is accepted, this will change to "-Dhudson.remoting.Channel.pipeWindowSize=1234567". Note that this window is not preallocated, so you can make this number fairly large and excess memory will not be consumed unless the master is unable to keep up with data from the slave.

          Increasing both of these windows increased our bandwidth by a factor about 15, matching the 4MB/s we were getting from raw scp.

          Good luck!

          David Reiss added a comment - This issue affected us in a big way once we moved our slaves to a remote datacenter. From the descriptions, it seems like not everyone has the same problem that we did, but I'll explain how we fixed it. Diagnostics Make sure you can log into your master and run "scp slave:somefile ." and get the bandwidth that you expect. If not, Jenkins is not your problem. Check out http://wwwx.cs.unc.edu/~sparkst/howto/network_tuning.php if you are on a high-latency link. Compute your bandwidth-delay product. This is the bandwidth you get from a raw scp in bytes per second times the round-trip-time you get from ping. In my case, this was about 4,000,000 (4 MB/s) * 0.06 (60ms) = 240,000 bytes (240 kB). If you are using ssh slaves and your BDP is greater than 16 kB, you are definitely having the same problem that we were. This is the trilead ssh window problem. If you are using any type of slave and your BDP is greater than 128 kB, then you are also affected by the jenkins remoting pipe window problem. trilead ssh window problem The ssh-slaves-plugin uses the trilead ssh library to connect to the slaves. Unfortunately, that library uses a hard-coded 30,000-byte receive buffer, which limits the amount of in-flight data to 30,000 bytes. In practice, the algorithm it uses for updating its receive window rounds that down to a power of two, so you only get 16kB. I created a pull request at https://github.com/jenkinsci/trilead-ssh2/pull/1 to make this configurable at JVM startup time. Making this window large increased our bandwidth by a factor of almost 8. Note that two of these buffers are allocated for each slave, so turning this up can consume memory quickly if you have several slaves. In our case, we have memory to spare, so it wasn't a problem. It might be useful to switch to another ssh library that allocates window memory dynamically. Fixing this will get your BDP up to almost 128kB, but beyond that, you run into another problem. jenkins remoting pipe window problem The archiving process uses a hudson.remoting.Pipe object to send the data back. This object uses flow control to avoid overwhelming the receiver. By default, it only allows 128kB of in-flight data. There is already a system property that controls this constant, but it has a space in its name, which makes it a bit complicated to set. I created a pull request at https://github.com/jenkinsci/remoting/pull/4 to fix the name. Note that this property must be set on the slave 's JVM, not the master's. Therefore, to set it, you must go into your ssh slave configuration, open the advanced button, find the "JVM Options" input, and enter "-Dclass\ hudson.remoting.Channel.pipeWindowSize=1234567" (no quotes, change the number to whatever is appropriate for your environment). If my pull request is accepted, this will change to "-Dhudson.remoting.Channel.pipeWindowSize=1234567". Note that this window is not preallocated, so you can make this number fairly large and excess memory will not be consumed unless the master is unable to keep up with data from the slave. Increasing both of these windows increased our bandwidth by a factor about 15, matching the 4MB/s we were getting from raw scp. Good luck!

          Jesse Glick added a comment -

          Probably improved by JENKINS-7813 fixes.

          Jesse Glick added a comment - Probably improved by JENKINS-7813 fixes.

            kohsuke Kohsuke Kawaguchi
            pamdirac John McNair
            Votes:
            35 Vote for this issue
            Watchers:
            34 Start watching this issue

              Created:
              Updated:
              Resolved: