Several people reported that CLI has become slower. I'm not sure exactly when or why, but this is getting painful enough issue for many, so I'm opening ticket to keep track of this problem.

          [JENKINS-22310] CLI slow performance investigation

          jglick suggests I should use "who-am-i" as a benchmark.

          % time java -jar cli-1.557-SNAPSHOT-jar-with-dependencies.jar -s https://ci.jenkins-ci.org/ who-am-i
          Authenticated as: kohsuke
          Authorities:
            ROLE_ADMINS
            admins
            authenticated
            all
            ROLE_ALL
          java -jar cli-1.557-SNAPSHOT-jar-with-dependencies.jar -s  who-am-i  1.08s user 0.07s system 17% cpu 6.475 total
          [/files/kohsuke/ws/jenkins/jenkins/cli/target@elf] (master)
          % time java -jar cli-1.557-SNAPSHOT-jar-with-dependencies.jar -s https://jenkins.ci.cloudbees.com/ who-am-i
          Authenticated as: kkawaguchi@cloudbees.com
          Authorities:
            authenticated
            cloudbees-admin
          java -jar cli-1.557-SNAPSHOT-jar-with-dependencies.jar -s  who-am-i  1.04s user 0.06s system 6% cpu 15.863 total
          

          Kohsuke Kawaguchi added a comment - jglick suggests I should use "who-am-i" as a benchmark. % time java -jar cli-1.557-SNAPSHOT-jar-with-dependencies.jar -s https://ci.jenkins-ci.org/ who-am-i Authenticated as: kohsuke Authorities: ROLE_ADMINS admins authenticated all ROLE_ALL java -jar cli-1.557-SNAPSHOT-jar-with-dependencies.jar -s who-am-i 1.08s user 0.07s system 17% cpu 6.475 total [/files/kohsuke/ws/jenkins/jenkins/cli/target@elf] (master) % time java -jar cli-1.557-SNAPSHOT-jar-with-dependencies.jar -s https://jenkins.ci.cloudbees.com/ who-am-i Authenticated as: kkawaguchi@cloudbees.com Authorities: authenticated cloudbees-admin java -jar cli-1.557-SNAPSHOT-jar-with-dependencies.jar -s who-am-i 1.04s user 0.06s system 6% cpu 15.863 total

          ci.jenkins-ci.org runs Winstone, CloudBees runs on Tomcat. "java -jar jenkins.jar" on local machine:

          % time java -jar cli-1.557-SNAPSHOT-jar-with-dependencies.jar -s http://localhost:8080/ who-am-i
          [WARN] Failed to authenticate with your SSH keys. Proceeding as anonymous
          Authenticated as: anonymous
          Authorities:
            anonymous
          java -jar cli-1.557-SNAPSHOT-jar-with-dependencies.jar -s  who-am-i  1.00s user 0.05s system 71% cpu 1.472 total
          

          Kohsuke Kawaguchi added a comment - ci.jenkins-ci.org runs Winstone, CloudBees runs on Tomcat. "java -jar jenkins.jar" on local machine: % time java -jar cli-1.557-SNAPSHOT-jar-with-dependencies.jar -s http://localhost:8080/ who-am-i [WARN] Failed to authenticate with your SSH keys. Proceeding as anonymous Authenticated as: anonymous Authorities: anonymous java -jar cli-1.557-SNAPSHOT-jar-with-dependencies.jar -s who-am-i 1.00s user 0.05s system 71% cpu 1.472 total

          I find the difference between DEV@cloud masters and ci.jenkins-ci.org rather stark. ping time to ci.jenkins-ci.org is about 80ms. Can't measure that for jenkins.ci.cloudbees.com as it doesn't respond to ping.

          Here's the profiler measuring wall clock times and major contributors to the execution time:

          CloudBees

          • CLI.main() total: 15.6sec
          • 1.0sec to getCliTcpPort
          • 0.5sec to connectViaCliPort
          • 2.7sec to authenticate
            • including about 1.6sec to get the result of authentication in c.readBoolean()
          • Request.call() spent 10secs

          ci.jenkins-ci.org

          • CLI.main() total: 5.3sec
          • 1sec to getCliTcpPort
          • 0.5sec to connectViaCliPort
          • 1.5sec to authenticate
            • including 1.2sec to get the result back: c.readBoolean()
          • Request.call() spent 1.8sec

          Request.call() is the time between the client sending the command to the server and the server returns from that sync call, and to my surprise this accounts for almost all the difference of the performance. I was initially suspecting a problem like chatty protocol , but it's starting to look like the issue is purely on the server side.

          Next step: measure the performance on the server side.

          Kohsuke Kawaguchi added a comment - I find the difference between DEV@cloud masters and ci.jenkins-ci.org rather stark. ping time to ci.jenkins-ci.org is about 80ms. Can't measure that for jenkins.ci.cloudbees.com as it doesn't respond to ping. Here's the profiler measuring wall clock times and major contributors to the execution time: CloudBees CLI.main() total: 15.6sec 1.0sec to getCliTcpPort 0.5sec to connectViaCliPort 2.7sec to authenticate including about 1.6sec to get the result of authentication in c.readBoolean() Request.call() spent 10secs ci.jenkins-ci.org CLI.main() total: 5.3sec 1sec to getCliTcpPort 0.5sec to connectViaCliPort 1.5sec to authenticate including 1.2sec to get the result back: c.readBoolean() Request.call() spent 1.8sec Request.call() is the time between the client sending the command to the server and the server returns from that sync call, and to my surprise this accounts for almost all the difference of the performance. I was initially suspecting a problem like chatty protocol , but it's starting to look like the issue is purely on the server side. Next step: measure the performance on the server side.

          Jesse Glick added a comment -

          CLICommand.registerOptionHandlers seems to be a problem. Which would suggest that this is just a special case of JENKINS-21579. And that JENKINS-18677 might have introduced a regression.

          Jesse Glick added a comment - CLICommand.registerOptionHandlers seems to be a problem. Which would suggest that this is just a special case of JENKINS-21579 . And that JENKINS-18677 might have introduced a regression.

          Holy classloader recursion, Batman!

          import org.jvnet.hudson.annotation_indexer.*;
          import hudson.cli.declarative.*;
          
          for (int j=0; j<10; j++) {
              long l = System.currentTimeMillis();
              for (int i=0; i<10; i++) {
                  for (Class c : Index.list(OptionHandlerExtension.class, Jenkins.getInstance().pluginManager.uberClassLoader,Class.class))
                      ; // no-op
              }
              println System.currentTimeMillis()-l;
          }
          

          I've installed about 40 random plugins on Jenkins on Tomcat and run the code above on it to get the sense of how long it takes to run them. Each full lookup is taking a whopping 500ms!

          5595
          5628
          5604
          5610
          5573
          5622
          5783
          6417
          5703
          5687

          The problem is that AntClassLoader.findResources is calling parent.getResources() when it really shouldn't. The parent classloader in this case is DependencyClassLoader, which tries every dependency. The net result is that Tomcat classloader gets hit O(E*V) times for the number of edges and nodes in the plugin dependency graph.

          But this makes no sense, because we specifically call AntClassLoader.findResources(String) to prevent recursion into parent classloader. The reason this happens is because AntClassLoader has the following line:

                  if (parent != null && (!parentHasBeenSearched || parent != getParent())) {
                      // Delegate to the parent:
                      base = parent.getResources(name);
                      // Note: could cause overlaps in case
                      // ClassLoader.this.parent has matches and
                      // parentHasBeenSearched is true
                  } else {
                      // ClassLoader.this.parent is already delegated to for example from
                      // ClassLoader.getResources, no need:
                      base = new CollectionUtils.EmptyEnumeration();
                  }
          

          parent!=getParent() indeed because Ant doesn't call the superclass constructor and pass in the specified parent classloader.

          After fixing this in 9a2882dd704bece9b7ca51a52347dad15d79f843, this went down to the following:

          69
          13
          15
          14
          10
          10
          10
          9
          8
          9

          Problem solved!

          Kohsuke Kawaguchi added a comment - Holy classloader recursion, Batman! import org.jvnet.hudson.annotation_indexer.*; import hudson.cli.declarative.*; for (int j=0; j<10; j++) { long l = System.currentTimeMillis(); for (int i=0; i<10; i++) { for (Class c : Index.list(OptionHandlerExtension.class, Jenkins.getInstance().pluginManager.uberClassLoader,Class.class)) ; // no-op } println System.currentTimeMillis()-l; } I've installed about 40 random plugins on Jenkins on Tomcat and run the code above on it to get the sense of how long it takes to run them. Each full lookup is taking a whopping 500ms! 5595 5628 5604 5610 5573 5622 5783 6417 5703 5687 The problem is that AntClassLoader.findResources is calling parent.getResources() when it really shouldn't. The parent classloader in this case is DependencyClassLoader , which tries every dependency. The net result is that Tomcat classloader gets hit O(E*V) times for the number of edges and nodes in the plugin dependency graph. But this makes no sense, because we specifically call AntClassLoader.findResources(String) to prevent recursion into parent classloader. The reason this happens is because AntClassLoader has the following line: if (parent != null && (!parentHasBeenSearched || parent != getParent())) { // Delegate to the parent: base = parent.getResources(name); // Note: could cause overlaps in case // ClassLoader.this.parent has matches and // parentHasBeenSearched is true } else { // ClassLoader.this.parent is already delegated to for example from // ClassLoader.getResources, no need: base = new CollectionUtils.EmptyEnumeration(); } parent!=getParent() indeed because Ant doesn't call the superclass constructor and pass in the specified parent classloader. After fixing this in 9a2882dd704bece9b7ca51a52347dad15d79f843, this went down to the following: 69 13 15 14 10 10 10 9 8 9 Problem solved!

          Oleg Nenashev added a comment -

          Thanks a lot for the fix. It work well on my dev. installations.
          Marked the issue as lts-candidate

          Oleg Nenashev added a comment - Thanks a lot for the fix. It work well on my dev. installations. Marked the issue as lts-candidate

          Marco Miller added a comment - - edited

          fd9f273f3645fc670e1283bbed7967f789475c86
          and
          705f5eb8e737d1e0770f509edb5edb3d50f60cdc
          are the 2 commits by KK fixing this issue.
          -Which commits I back-ported to 1.554.1 (current 'stable’ branch).
          I did so after this successful testing of ours:

          • jenkinsci /jenkins unit tests, for regression (passing ones kept passing);
          • jenkinsci /acceptance-test-harness, incl. both parent- and plugin-1st class-loading plugins (failures looked unrelated);
          • Ericsson-internal CLI ‘stress' tests, which used to reproduce JENKINS-22310; incl. parent/plugin-1st plugins, too;
          • brief manual testing through quick clicking-around, exercising some parent+plugin-1st (plugins) class-loading.

          Marco Miller added a comment - - edited fd9f273f3645fc670e1283bbed7967f789475c86 and 705f5eb8e737d1e0770f509edb5edb3d50f60cdc are the 2 commits by KK fixing this issue. -Which commits I back-ported to 1.554.1 (current 'stable’ branch). I did so after this successful testing of ours: jenkinsci /jenkins unit tests, for regression (passing ones kept passing); jenkinsci /acceptance-test-harness, incl. both parent- and plugin-1st class-loading plugins (failures looked unrelated); Ericsson-internal CLI ‘stress' tests, which used to reproduce JENKINS-22310 ; incl. parent/plugin-1st plugins, too; brief manual testing through quick clicking-around, exercising some parent+plugin-1st (plugins) class-loading.

          Daniel Beck added a comment -

          Marco Miller: Those commit IDs do not exist in https://github.com/jenkinsci/jenkins, are they correct? Could you provide the URLs?

          Daniel Beck added a comment - Marco Miller: Those commit IDs do not exist in https://github.com/jenkinsci/jenkins , are they correct? Could you provide the URLs?

          Marco Miller added a comment - Corrected now (edited above); thx! => https://github.com/jenkinsci/jenkins/commits/stable https://github.com/jenkinsci/jenkins/commit/fd9f273f3645fc670e1283bbed7967f789475c86 https://github.com/jenkinsci/jenkins/commit/705f5eb8e737d1e0770f509edb5edb3d50f60cdc

          Oleg Nenashev added a comment -

          I also confirm that both commits can be easily backported to 1.509.x and 1.532.x.
          We use fixes for about 1 week on 1.509.4...

          Oleg Nenashev added a comment - I also confirm that both commits can be easily backported to 1.509.x and 1.532.x. We use fixes for about 1 week on 1.509.4...

            kohsuke Kohsuke Kawaguchi
            kohsuke Kohsuke Kawaguchi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: