A user working with files in the 100+ Mb range asked whether fingerprinting them was expensive. Probably it is not that expensive but it is likely the implementation could be optimized and at least measured.

      As far as the I/O goes, FilePath.digest uses FileInputStream rather than NIO—hmm, maybe OK. It does not use buffering, which may or may not be an issue here. Then it calls Util.getDigestOf.

      That uses the JRE’s MD5 implementation which AFAIK takes advantage of any cryptographic hardware acceleration when available. It does read into a 1024-byte buffer however which seems excessively small. (And uses a proxy stream rather than simply calling the update method, which is odd.) http://stackoverflow.com/questions/9321912/very-slow-when-generaing-md5-using-java-with-large-file suggests using a custom library though this may be overkill; others suggest DigestUtils from Commons Codec. A

          [JENKINS-16301] Fingerprint performance

          Jesse Glick added a comment -

          New baseline times are a bit shorter: 0.942msec to save. Using a hand-written serializer, I can get that down to 0.7601msec, a 19% savings. Not exactly dramatic but this is based on a single thread doing serial operations; the hand-written version does not contend locks.

          So to check that, changed the test program to run under concurrency stress (each thread saving its own Fingerprint over and over), and also using a tmpfs so that actual I/O time can be ignored:

          package hudson.model;
          import hudson.Util;
          import java.io.File;
          import java.io.IOException;
          import java.util.ArrayList;
          import java.util.List;
          import java.util.concurrent.atomic.AtomicInteger;
          import java.util.concurrent.atomic.AtomicLong;
          public class Fingerprints {
              private static byte[] toByteArray(String md5sum) {
                  byte[] data = new byte[16];
                  for( int i=0; i<md5sum.length(); i+=2 )
                      data[i/2] = (byte)Integer.parseInt(md5sum.substring(i,i+2),16);
                  return data;
              }
              private static final byte[] SOME_MD5 = toByteArray(Util.getDigestOf("whatever"));
              public static void main(String[] args) throws InterruptedException {
                  for (int concurrency = 1; concurrency <= 10; concurrency++) {
                      List<Thread> threads = new ArrayList<Thread>();
                      final AtomicLong totalTime = new AtomicLong();
                      final AtomicInteger totalCount = new AtomicInteger();
                      for (int i = 0; i < concurrency; i++) {
                          final File x = new File("/run/user/" + System.getenv("USER") + "/fp" + i + ".xml");
                          Thread t = new Thread("saver-" + i) {
                              @Override public void run() {
                                  Fingerprint f = new Fingerprint(new Fingerprint.BuildPtr("foo", 13), "stuff.jar", SOME_MD5);
                                  f.addWithoutSaving("some", 1);
                                  f.addWithoutSaving("some", 2);
                                  f.addWithoutSaving("some", 3);
                                  f.addWithoutSaving("some", 10);
                                  f.addWithoutSaving("other", 6);
                                  for (int i = 0; i < 100; i++) {
                                      for (int b = 0; b < 100; b += 2) {
                                          f.addWithoutSaving("job" + i, b);
                                      }
                                  }
                                  int count = 10000;
                                  long start = System.currentTimeMillis();
                                  for (int i = 0; i < count; i++) {
                                      try {
                                          f.save(x);
                                      } catch (IOException e) {
                                          assert false : e;
                                      }
                                  }
                                  long end = System.currentTimeMillis();
                                  totalTime.addAndGet(end - start);
                                  totalCount.addAndGet(count);
                              }
                          };
                          t.start();
                          threads.add(t);
                      }
                      for (Thread t : threads) {
                          t.join();
                      }
                      System.out.println(concurrency + "×: " + (totalTime.floatValue() / totalCount.floatValue()) + "msec to save");
                  }
              }
          }
          

          Baseline times (on a quad-core):

          1×: 0.7479msec to save
          2×: 0.6742msec to save
          3×: 0.8395333msec to save
          4×: 1.042125msec to save
          5×: 1.2944msec to save
          6×: 1.5228167msec to save
          7×: 1.7666msec to save
          8×: 2.0307374msec to save
          9×: 2.297811msec to save
          10×: 2.55688msec to save
          

          With patch:

          1×: 0.2807msec to save
          2×: 0.234msec to save
          3×: 0.26313335msec to save
          4×: 0.329525msec to save
          5×: 0.41194msec to save
          6×: 0.49443334msec to save
          7×: 0.56754285msec to save
          8×: 0.650325msec to save
          9×: 0.7313111msec to save
          10×: 0.81052msec to save
          

          This shows a 2–3× speedup in CPU time, with the biggest gains under concurrent load. I am going to stop there for now; not bothering to optimize load which is I think called rather less often.

          Jesse Glick added a comment - New baseline times are a bit shorter: 0.942msec to save. Using a hand-written serializer, I can get that down to 0.7601msec, a 19% savings. Not exactly dramatic but this is based on a single thread doing serial operations; the hand-written version does not contend locks. So to check that, changed the test program to run under concurrency stress (each thread saving its own Fingerprint over and over), and also using a tmpfs so that actual I/O time can be ignored: package hudson.model; import hudson.Util; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.List; import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicLong; public class Fingerprints { private static byte [] toByteArray( String md5sum) { byte [] data = new byte [16]; for ( int i=0; i<md5sum.length(); i+=2 ) data[i/2] = ( byte ) Integer .parseInt(md5sum.substring(i,i+2),16); return data; } private static final byte [] SOME_MD5 = toByteArray(Util.getDigestOf( "whatever" )); public static void main( String [] args) throws InterruptedException { for ( int concurrency = 1; concurrency <= 10; concurrency++) { List< Thread > threads = new ArrayList< Thread >(); final AtomicLong totalTime = new AtomicLong(); final AtomicInteger totalCount = new AtomicInteger(); for ( int i = 0; i < concurrency; i++) { final File x = new File( "/run/user/" + System .getenv( "USER" ) + "/fp" + i + ".xml" ); Thread t = new Thread ( "saver-" + i) { @Override public void run() { Fingerprint f = new Fingerprint( new Fingerprint.BuildPtr( "foo" , 13), "stuff.jar" , SOME_MD5); f.addWithoutSaving( "some" , 1); f.addWithoutSaving( "some" , 2); f.addWithoutSaving( "some" , 3); f.addWithoutSaving( "some" , 10); f.addWithoutSaving( "other" , 6); for ( int i = 0; i < 100; i++) { for ( int b = 0; b < 100; b += 2) { f.addWithoutSaving( "job" + i, b); } } int count = 10000; long start = System .currentTimeMillis(); for ( int i = 0; i < count; i++) { try { f.save(x); } catch (IOException e) { assert false : e; } } long end = System .currentTimeMillis(); totalTime.addAndGet(end - start); totalCount.addAndGet(count); } }; t.start(); threads.add(t); } for ( Thread t : threads) { t.join(); } System .out.println(concurrency + "×: " + (totalTime.floatValue() / totalCount.floatValue()) + "msec to save" ); } } } Baseline times (on a quad-core): 1×: 0.7479msec to save 2×: 0.6742msec to save 3×: 0.8395333msec to save 4×: 1.042125msec to save 5×: 1.2944msec to save 6×: 1.5228167msec to save 7×: 1.7666msec to save 8×: 2.0307374msec to save 9×: 2.297811msec to save 10×: 2.55688msec to save With patch: 1×: 0.2807msec to save 2×: 0.234msec to save 3×: 0.26313335msec to save 4×: 0.329525msec to save 5×: 0.41194msec to save 6×: 0.49443334msec to save 7×: 0.56754285msec to save 8×: 0.650325msec to save 9×: 0.7313111msec to save 10×: 0.81052msec to save This shows a 2–3× speedup in CPU time, with the biggest gains under concurrent load. I am going to stop there for now; not bothering to optimize load which is I think called rather less often.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          changelog.html
          core/src/main/java/hudson/model/Fingerprint.java
          http://jenkins-ci.org/commit/jenkins/fff90c100d6b030f86d92cc4f0c23af3c648c75e
          Log:
          [FIXED JENKINS-16301] Optimized Fingerprint.save.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html core/src/main/java/hudson/model/Fingerprint.java http://jenkins-ci.org/commit/jenkins/fff90c100d6b030f86d92cc4f0c23af3c648c75e Log: [FIXED JENKINS-16301] Optimized Fingerprint.save.

          dogfood added a comment -

          Integrated in jenkins_main_trunk #2555
          [FIXED JENKINS-16301] Optimized Fingerprint.save. (Revision fff90c100d6b030f86d92cc4f0c23af3c648c75e)

          Result = UNSTABLE
          Jesse Glick : fff90c100d6b030f86d92cc4f0c23af3c648c75e
          Files :

          • changelog.html
          • core/src/main/java/hudson/model/Fingerprint.java

          dogfood added a comment - Integrated in jenkins_main_trunk #2555 [FIXED JENKINS-16301] Optimized Fingerprint.save. (Revision fff90c100d6b030f86d92cc4f0c23af3c648c75e) Result = UNSTABLE Jesse Glick : fff90c100d6b030f86d92cc4f0c23af3c648c75e Files : changelog.html core/src/main/java/hudson/model/Fingerprint.java

          Code changed in jenkins
          User: Jesse Glick
          Path:
          core/src/main/java/hudson/model/Fingerprint.java
          core/src/test/java/hudson/model/FingerprintTest.java
          http://jenkins-ci.org/commit/jenkins/f1b513cf91663498aa1677580776a7c42c7322f2
          Log:
          JENKINS-16301 Need to create parent directory of fingerprint file too.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/hudson/model/Fingerprint.java core/src/test/java/hudson/model/FingerprintTest.java http://jenkins-ci.org/commit/jenkins/f1b513cf91663498aa1677580776a7c42c7322f2 Log: JENKINS-16301 Need to create parent directory of fingerprint file too.

          dogfood added a comment -

          Integrated in jenkins_main_trunk #2556
          JENKINS-16301 Need to create parent directory of fingerprint file too. (Revision f1b513cf91663498aa1677580776a7c42c7322f2)

          Result = SUCCESS
          Jesse Glick : f1b513cf91663498aa1677580776a7c42c7322f2
          Files :

          • core/src/main/java/hudson/model/Fingerprint.java
          • core/src/test/java/hudson/model/FingerprintTest.java

          dogfood added a comment - Integrated in jenkins_main_trunk #2556 JENKINS-16301 Need to create parent directory of fingerprint file too. (Revision f1b513cf91663498aa1677580776a7c42c7322f2) Result = SUCCESS Jesse Glick : f1b513cf91663498aa1677580776a7c42c7322f2 Files : core/src/main/java/hudson/model/Fingerprint.java core/src/test/java/hudson/model/FingerprintTest.java

          Code changed in jenkins
          User: Jesse Glick
          Path:
          changelog.html
          core/src/main/java/hudson/Util.java
          test/src/test/java/hudson/maven/MavenBuildTest.java
          http://jenkins-ci.org/commit/jenkins/e0a3a1ddf5fa08e80a94a565dd05c39f76de6e93
          Log:
          [FIXED JENKINS-18178] Reverting an inessential part of the fix of JENKINS-16301 since it broke some Maven builds.
          Apparently the Maven 2 process factory fails to adequately insulate the Maven process from Jenkins library dependencies;
          if you specify a build extension that can override dependencies used by Jenkins core classes!
          A real fix would involve changing class loader delegation, since presumably similar bugs could still occur.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html core/src/main/java/hudson/Util.java test/src/test/java/hudson/maven/MavenBuildTest.java http://jenkins-ci.org/commit/jenkins/e0a3a1ddf5fa08e80a94a565dd05c39f76de6e93 Log: [FIXED JENKINS-18178] Reverting an inessential part of the fix of JENKINS-16301 since it broke some Maven builds. Apparently the Maven 2 process factory fails to adequately insulate the Maven process from Jenkins library dependencies; if you specify a build extension that can override dependencies used by Jenkins core classes! A real fix would involve changing class loader delegation, since presumably similar bugs could still occur.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          core/src/main/java/hudson/FilePath.java
          core/src/main/java/hudson/Util.java
          maven-plugin/src/main/java/hudson/maven/ExecutedMojo.java
          maven-plugin/src/main/java/hudson/maven/reporters/MavenArtifact.java
          maven-plugin/src/main/java/hudson/maven/reporters/MavenArtifactArchiver.java
          http://jenkins-ci.org/commit/jenkins/11953750ab422bcec28d12a436867d0888313aaa
          Log:
          [JENKINS-16301 Use a buffered input stream to get somewhat better performance also from Maven archiving.
          Introducing Util.getDigestOf(File) as a convenience.

          Compare: https://github.com/jenkinsci/jenkins/compare/d18a27e5bed4...11953750ab42

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/hudson/FilePath.java core/src/main/java/hudson/Util.java maven-plugin/src/main/java/hudson/maven/ExecutedMojo.java maven-plugin/src/main/java/hudson/maven/reporters/MavenArtifact.java maven-plugin/src/main/java/hudson/maven/reporters/MavenArtifactArchiver.java http://jenkins-ci.org/commit/jenkins/11953750ab422bcec28d12a436867d0888313aaa Log: [ JENKINS-16301 Use a buffered input stream to get somewhat better performance also from Maven archiving. Introducing Util.getDigestOf(File) as a convenience. Compare: https://github.com/jenkinsci/jenkins/compare/d18a27e5bed4...11953750ab42

          dogfood added a comment -

          Integrated in jenkins_main_trunk #2748
          [FIXED JENKINS-18178] Reverting an inessential part of the fix of JENKINS-16301 since it broke some Maven builds. (Revision e0a3a1ddf5fa08e80a94a565dd05c39f76de6e93)
          [JENKINS-16301 Use a buffered input stream to get somewhat better performance also from Maven archiving. (Revision 11953750ab422bcec28d12a436867d0888313aaa)

          Result = SUCCESS
          Jesse Glick : e0a3a1ddf5fa08e80a94a565dd05c39f76de6e93
          Files :

          • core/src/main/java/hudson/Util.java
          • changelog.html
          • test/src/test/java/hudson/maven/MavenBuildTest.java

          Jesse Glick : 11953750ab422bcec28d12a436867d0888313aaa
          Files :

          • maven-plugin/src/main/java/hudson/maven/reporters/MavenArtifactArchiver.java
          • maven-plugin/src/main/java/hudson/maven/ExecutedMojo.java
          • maven-plugin/src/main/java/hudson/maven/reporters/MavenArtifact.java
          • core/src/main/java/hudson/FilePath.java
          • core/src/main/java/hudson/Util.java

          dogfood added a comment - Integrated in jenkins_main_trunk #2748 [FIXED JENKINS-18178] Reverting an inessential part of the fix of JENKINS-16301 since it broke some Maven builds. (Revision e0a3a1ddf5fa08e80a94a565dd05c39f76de6e93) [ JENKINS-16301 Use a buffered input stream to get somewhat better performance also from Maven archiving. (Revision 11953750ab422bcec28d12a436867d0888313aaa) Result = SUCCESS Jesse Glick : e0a3a1ddf5fa08e80a94a565dd05c39f76de6e93 Files : core/src/main/java/hudson/Util.java changelog.html test/src/test/java/hudson/maven/MavenBuildTest.java Jesse Glick : 11953750ab422bcec28d12a436867d0888313aaa Files : maven-plugin/src/main/java/hudson/maven/reporters/MavenArtifactArchiver.java maven-plugin/src/main/java/hudson/maven/ExecutedMojo.java maven-plugin/src/main/java/hudson/maven/reporters/MavenArtifact.java core/src/main/java/hudson/FilePath.java core/src/main/java/hudson/Util.java

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/hudson/maven/ExecutedMojo.java
          src/main/java/hudson/maven/reporters/MavenArtifact.java
          src/main/java/hudson/maven/reporters/MavenArtifactArchiver.java
          http://jenkins-ci.org/commit/maven-plugin/db2b69174c511a1cba440ce3683de4cd6676734b
          Log:
          [JENKINS-16301 Use a buffered input stream to get somewhat better performance also from Maven archiving.
          Introducing Util.getDigestOf(File) as a convenience.
          Originally-Committed-As: 11953750ab422bcec28d12a436867d0888313aaa

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/hudson/maven/ExecutedMojo.java src/main/java/hudson/maven/reporters/MavenArtifact.java src/main/java/hudson/maven/reporters/MavenArtifactArchiver.java http://jenkins-ci.org/commit/maven-plugin/db2b69174c511a1cba440ce3683de4cd6676734b Log: [ JENKINS-16301 Use a buffered input stream to get somewhat better performance also from Maven archiving. Introducing Util.getDigestOf(File) as a convenience. Originally-Committed-As: 11953750ab422bcec28d12a436867d0888313aaa

          Code changed in jenkins
          User: Jesse Glick
          Path:
          test/src/test/java/hudson/maven/MavenBuildTest.java
          http://jenkins-ci.org/commit/maven-plugin/1342c5f819d6672ce53238822a93dddc087f0bd8
          Log:
          Reverting an inessential part of the fix of JENKINS-16301 since it broke some Maven builds. Apparently the Maven 2 process factory fails to adequately insulate the Maven process from Jenkins library dependencies; if you specify a build extension that can override dependencies used by Jenkins core classes! A real fix would involve changing class loader delegation, since presumably similar bugs could still occur.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: test/src/test/java/hudson/maven/MavenBuildTest.java http://jenkins-ci.org/commit/maven-plugin/1342c5f819d6672ce53238822a93dddc087f0bd8 Log: Reverting an inessential part of the fix of JENKINS-16301 since it broke some Maven builds. Apparently the Maven 2 process factory fails to adequately insulate the Maven process from Jenkins library dependencies; if you specify a build extension that can override dependencies used by Jenkins core classes! A real fix would involve changing class loader delegation, since presumably similar bugs could still occur.

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: