• Icon: Task Task
    • Resolution: Fixed
    • Icon: Critical Critical
    • maven-plugin
    • None

      Various people reported over time that Maven job type builds considerably slowly compared to the freestyle projects.

      The feature does have some overhead, in that it definitely does more (for example, artifacts get archived while Maven runs, whereas freestyle projects do that after Maven has run), but it's also good to take a deep look into where the overhead is and see if anything appears out of place.

      This issue tracks my investigation of this.

          [JENKINS-22354] Maven job type performance improvement

          Kohsuke Kawaguchi created issue -

          I've created a dummy single-module Maven project that produces 16MB artifact that consists of random bytes, in an attempt to reproduce artifact archiving overhead reported by some.

          Kohsuke Kawaguchi added a comment - I've created a dummy single-module Maven project that produces 16MB artifact that consists of random bytes, in an attempt to reproduce artifact archiving overhead reported by some.
          Kohsuke Kawaguchi made changes -
          Attachment New: pom.xml [ 25622 ]

          Kohsuke Kawaguchi added a comment - Test project attached .

          I've run exec sudo tc qdisc add dev lo root netem delay 200ms prior to the experiment to introduce artificial 400ms roundtrip delay into the remoting communication between master and Maven process, to really stretch the problem.

          Kohsuke Kawaguchi added a comment - I've run exec sudo tc qdisc add dev lo root netem delay 200ms prior to the experiment to introduce artificial 400ms roundtrip delay into the remoting communication between master and Maven process, to really stretch the problem.

          I wrote a script to continuously monitor the thread dumps of Maven process via jstack. And mostly what I see is the classloader related activities. I'm going to have to verify whether the remote jar file cache is properly taking effect or not.

          A noticeable amount of classloader activity took place trying to instantiate XStream2 like this:

                  at com.thoughtworks.xstream.XStream.buildMapper(XStream.java:474)
                  at com.thoughtworks.xstream.XStream.<init>(XStream.java:451)
                  at com.thoughtworks.xstream.XStream.<init>(XStream.java:381)
                  at com.thoughtworks.xstream.XStream.<init>(XStream.java:336)
                  at hudson.util.XStream2.<init>(XStream2.java:88)
                  at jenkins.model.Jenkins.<clinit>(Jenkins.java:3941)
                  at hudson.model.Computer.<clinit>(Computer.java:1358)
                  at hudson.FilePath.act(FilePath.java:914)
                  at hudson.FilePath.act(FilePath.java:887)
                  at hudson.FilePath.digest(FilePath.java:1726)
                  at hudson.maven.reporters.MavenFingerprinter.record(MavenFingerprinter.java:219)
          

          Another instantiation of XStream was induced from MavenArtifact.<clinit> through Run.<clinit>. XStream instantiates a large number of converters, which causes a lot of classloading activities, which in turn requires multiple roundtrips to the master.

          When I refactored code so as not to cause initialization of Jenkins nor Run class, I was able to cut down the execution time by more than 30% (3mins+ -> 2mins-)

          Kohsuke Kawaguchi added a comment - I wrote a script to continuously monitor the thread dumps of Maven process via jstack . And mostly what I see is the classloader related activities. I'm going to have to verify whether the remote jar file cache is properly taking effect or not. A noticeable amount of classloader activity took place trying to instantiate XStream2 like this: at com.thoughtworks.xstream.XStream.buildMapper(XStream.java:474) at com.thoughtworks.xstream.XStream.<init>(XStream.java:451) at com.thoughtworks.xstream.XStream.<init>(XStream.java:381) at com.thoughtworks.xstream.XStream.<init>(XStream.java:336) at hudson.util.XStream2.<init>(XStream2.java:88) at jenkins.model.Jenkins.<clinit>(Jenkins.java:3941) at hudson.model.Computer.<clinit>(Computer.java:1358) at hudson.FilePath.act(FilePath.java:914) at hudson.FilePath.act(FilePath.java:887) at hudson.FilePath.digest(FilePath.java:1726) at hudson.maven.reporters.MavenFingerprinter.record(MavenFingerprinter.java:219) Another instantiation of XStream was induced from MavenArtifact.<clinit> through Run.<clinit> . XStream instantiates a large number of converters, which causes a lot of classloading activities, which in turn requires multiple roundtrips to the master. When I refactored code so as not to cause initialization of Jenkins nor Run class, I was able to cut down the execution time by more than 30% (3mins+ -> 2mins-)

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          src/main/java/hudson/maven/PluginImpl.java
          src/main/java/hudson/maven/reporters/MavenArtifact.java
          http://jenkins-ci.org/commit/maven-plugin/a0d0100183b46294d229336d2f91bfdcadc2e318
          Log:
          JENKINS-22354

          MavenArtifact class is loaded into Maven process, so don't drag too many classes dependencies into it.

          Run class refers to a large number of classes, and in particular this code forces XStream instantiation which drags in quite a few number of classes

          Compare: https://github.com/jenkinsci/maven-plugin/compare/d62796891bd6...a0d0100183b4

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: src/main/java/hudson/maven/PluginImpl.java src/main/java/hudson/maven/reporters/MavenArtifact.java http://jenkins-ci.org/commit/maven-plugin/a0d0100183b46294d229336d2f91bfdcadc2e318 Log: JENKINS-22354 MavenArtifact class is loaded into Maven process, so don't drag too many classes dependencies into it. Run class refers to a large number of classes, and in particular this code forces XStream instantiation which drags in quite a few number of classes Compare: https://github.com/jenkinsci/maven-plugin/compare/d62796891bd6...a0d0100183b4

          Archiving for 16MB artifacts took unmesurable small amount of time when built on master (29ms reported.) Built over slave, it took 13sec (scp took 7secs to copy.)

          I didn't realize but starting Maven plugin 2.0, artifact archiving is queued up until the end of the module, and copy is done between master and slave, not between master and Maven, which I think helps considerably.

          Kohsuke Kawaguchi added a comment - Archiving for 16MB artifacts took unmesurable small amount of time when built on master (29ms reported.) Built over slave, it took 13sec (scp took 7secs to copy.) I didn't realize but starting Maven plugin 2.0, artifact archiving is queued up until the end of the module, and copy is done between master and slave, not between master and Maven, which I think helps considerably.

          I found that one of the inefficiencies is around using RemoteInputStream when launching Maven on a slave.

          When a channel is built to a Maven on slave, it'll look like this:

          Master                         slave                               Maven
          ========================================================================
          Channel                                                          Channel
          +- RemoteInputStream --> SocketInputStream --> <-- SocketOutputStream -+
          

          So each time master's Channel reads something, it would have to wait for a full roundtrip between master and slave. This is very bad if the slave is trying to send large amount of data over channel.

          I used a simple Callble from the master to Maven that returns 16MB of data with this latency-induced network, and verified that it took whopping 15mins.

          A better way to do this is to have the slave pump SocketInputStream and feed data into master, then have the master buffer it.

          Master                         slave                                      Maven
          ===============================================================================
          
          Channel                                                          
          +- FastPipedInputStream                        pump thread
             +- FastPipedOutputStream <-- RemoteOutputStream | SocketInputStream -> ...
          

          This hides latency better. With this change, the artificial 16MB callable completes in just 11secs.

          Kohsuke Kawaguchi added a comment - I found that one of the inefficiencies is around using RemoteInputStream when launching Maven on a slave. When a channel is built to a Maven on slave, it'll look like this: Master slave Maven ======================================================================== Channel Channel +- RemoteInputStream --> SocketInputStream --> <-- SocketOutputStream -+ So each time master's Channel reads something, it would have to wait for a full roundtrip between master and slave. This is very bad if the slave is trying to send large amount of data over channel. I used a simple Callble from the master to Maven that returns 16MB of data with this latency-induced network, and verified that it took whopping 15mins. A better way to do this is to have the slave pump SocketInputStream and feed data into master, then have the master buffer it. Master slave Maven =============================================================================== Channel +- FastPipedInputStream pump thread +- FastPipedOutputStream <-- RemoteOutputStream | SocketInputStream -> ... This hides latency better. With this change, the artificial 16MB callable completes in just 11secs.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          src/main/java/hudson/maven/AbstractMavenProcessFactory.java
          http://jenkins-ci.org/commit/maven-plugin/cea15ea5cb11dc9cdafb2caa44d18c4c350017fe
          Log:
          JENKINS-22354

          Avoid using RemoteInputStream that's inherently unsuitable for large
          "read till EOF" read workload.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: src/main/java/hudson/maven/AbstractMavenProcessFactory.java http://jenkins-ci.org/commit/maven-plugin/cea15ea5cb11dc9cdafb2caa44d18c4c350017fe Log: JENKINS-22354 Avoid using RemoteInputStream that's inherently unsuitable for large "read till EOF" read workload.

            kohsuke Kohsuke Kawaguchi
            kohsuke Kohsuke Kawaguchi
            Votes:
            3 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: