Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47429

Absolutely atrocious user lookup performance when many legacy users

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • core 2.60.x, Git plugin, and user viewing changesets when a lot of users exist.

      The performance of user lookup when many users with legacy config files exist has been identified as the cause of a master's CPU usage spiking to the point of unresponsiveness. 

      Stack trace:

      at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
      at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
      at java.io.File.isFile(File.java:882)
      at hudson.model.User$3.accept(User.java:697)
      at java.io.File.listFiles(File.java:1291)
      at hudson.model.User.getLegacyConfigFilesFor(User.java:694)
      at hudson.model.User.getOrCreate(User.java:437)
      at hudson.model.User.getById(User.java:535)
      at hudson.model.User$UserIDCanonicalIdResolver.resolveCanonicalId(User.java:1086)
      at hudson.model.User.get(User.java:405)
      at hudson.model.User.get(User.java:374)
      at hudson.plugins.git.GitChangeSet.findOrCreateUser(GitChangeSet.java:379)
      at hudson.plugins.git.GitChangeSet.getAuthor(GitChangeSet.java:448)

      The root cause of the problem is that for every user we're doing an O(n) scan through the directories in that folder looking for legacy user config files:

       private static final File[] getLegacyConfigFilesFor(final String id) {
              return getRootDir().listFiles(new FileFilter() {
                  @Override
                  public boolean accept(File pathname) {
                      return pathname.isDirectory() && new File(pathname, "config.xml").isFile() && idStrategy().equals(
                              pathname.getName(), id);
                  }
              });
          }
      
      

      Code here: https://github.com/jenkinsci/jenkins/blob/2d2101215dc39dfcb03279e3cb8898b8b9e5bc5f/core/src/main/java/hudson/model/User.java#L680

      The likely best fix is either to be able to check for a single file (if case sensitive mode is on) or to cache the list of entries in that folder so we don't need to do a listFiles and linear-time scan through all entries with a FileFilter.

          [JENKINS-47429] Absolutely atrocious user lookup performance when many legacy users

          Jesse Glick added a comment -

          What is a “modern Jenkins core”?

          Recheck after JENKINS-45737 in 2.71+; information about performance of this class predating that change is not really useful now.

          Jesse Glick added a comment - What is a “modern Jenkins core”? Recheck after  JENKINS-45737 in 2.71+; information about performance of this class predating that change is not really useful now.

          Sam Van Oort added a comment -

          jglick I think that change may actually make the issue worse because we're still triggering the search through all legacy config files. Except now we're triggering it up front, when the AllUsers.scanAll initializer runs – which calls getOrCreate for every id, and if a proper config folder does not exist, a scan through all legacy folders is triggered.

          This seems like it becomes O(n^2) in the number of legacy users – although thankfully it ought to only run once, after which migration of legacy users will be complete.

          I am going to downgrade this to "major" though because at least we have all the pain put up-front for users. And I'm going to hope that I've missed something obvious and some people aren't going to get bitten after all.

          Sam Van Oort added a comment - jglick I think that change may actually make the issue worse because we're still triggering the search through all legacy config files. Except now we're triggering it up front, when the AllUsers.scanAll initializer runs – which calls getOrCreate for every id, and if a proper config folder does not exist, a scan through all legacy folders is triggered. This seems like it becomes O(n^2) in the number of legacy users – although thankfully it ought to only run once, after which migration of legacy users will be complete. I am going to downgrade this to "major" though because at least we have all the pain put up-front for users. And I'm going to hope that I've missed something obvious and some people aren't going to get bitten after all.

          Ryan Campbell added a comment -

          The code has significantly changed in 2.71 so we need evidence as to whether this is still an issue or not.

          Ryan Campbell added a comment - The code has significantly changed in 2.71 so we need evidence as to whether this is still an issue or not.

          Sam Van Oort added a comment -

          recampbell I've reviewed the modified code in 2.71 (that was the basis for the preceding comment). It doesn't touch the key problem path here – please see details in that comment though.

          Sam Van Oort added a comment - recampbell I've reviewed the modified code in 2.71 (that was the basis for the preceding comment). It doesn't touch the key problem path here – please see details in that comment though.

          Jesse Glick added a comment -

          Right the recent modifications would not make the code in question any faster, but IIUC they would make it be run during startup and only once, which might make this much less of an issue in practice (TBD).

          Jesse Glick added a comment - Right the recent modifications would not make the code in question any faster, but IIUC they would make it be run during startup and only once, which might make this much less of an issue in practice (TBD).

          Jesse Glick added a comment -

          Similar observed in 2.73.2, which has the getAll change:

          "Handling GET /blue/rest/organizations/jenkins/pipelines/…/pipelines/…/branches/…/runs/2/ from … : RequestHandlerThread[#…]" … runnable …
             java.lang.Thread.State: RUNNABLE
          	at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
          	at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
          	at java.io.File.isDirectory(File.java:849)
          	at hudson.model.User$3.accept(User.java:688)
          	at java.io.File.listFiles(File.java:1291)
          	at hudson.model.User.getLegacyConfigFilesFor(User.java:685)
          	at hudson.model.User.getOrCreate(User.java:443)
          	at hudson.model.User.getById(User.java:541)
          	at hudson.model.User$UserIDCanonicalIdResolver.resolveCanonicalId(User.java:1107)
          	at hudson.model.User.get(User.java:411)
          	at hudson.model.User.get(User.java:380)
          	at hudson.plugins.git.GitChangeSet.findOrCreateUser(GitChangeSet.java:379)
          	at hudson.plugins.git.GitChangeSet.getAuthor(GitChangeSet.java:448)
          	at io.jenkins.blueocean.service.embedded.rest.ChangeSetResource.getAuthor(ChangeSetResource.java:45)
          	at …
          	at io.jenkins.blueocean.commons.stapler.export.MethodProperty.getValue(MethodProperty.java:72)
          	at io.jenkins.blueocean.commons.stapler.export.ExportInterceptor$1.getValue(ExportInterceptor.java:43)
          	at io.jenkins.blueocean.commons.stapler.Export$BlueOceanExportInterceptor.getValue(Export.java:167)
          	at io.jenkins.blueocean.commons.stapler.export.Property.writeTo(Property.java:136)
          	at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:228)
          	at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224)
          	at io.jenkins.blueocean.commons.stapler.export.Property.writeValue(Property.java:314)
          	at io.jenkins.blueocean.commons.stapler.export.Property.writeBuffered(Property.java:178)
          	at io.jenkins.blueocean.commons.stapler.export.Property.writeValue(Property.java:239)
          	at io.jenkins.blueocean.commons.stapler.export.Property.writeValue(Property.java:172)
          	at io.jenkins.blueocean.commons.stapler.export.Property.writeTo(Property.java:157)
          	at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:228)
          	at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224)
          	at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224)
          	at io.jenkins.blueocean.commons.stapler.export.Model.writeTo(Model.java:199)
          	at io.jenkins.blueocean.commons.stapler.Export.writeOne(Export.java:148)
          	at io.jenkins.blueocean.commons.stapler.Export.serveExposedBean(Export.java:139)
          	at io.jenkins.blueocean.commons.stapler.Export.doJson(Export.java:79)
          	at io.jenkins.blueocean.commons.stapler.TreeResponse$Processor$1.generateResponse(TreeResponse.java:48)
          	at org.kohsuke.stapler.HttpResponseRenderer$Default.handleHttpResponse(HttpResponseRenderer.java:124)
          	at org.kohsuke.stapler.HttpResponseRenderer$Default.generateResponse(HttpResponseRenderer.java:69)
          	at …
          

          There are multiple things going on here. First, Blue Ocean is using Stapler export to write out changelog information. Probably this is intentional (the information is slated to be consumed in the UI somewhere), as opposed to the common antipattern of external tools asking for /jobs/*/*/api/json (without tree) and then discarding most of the result. This particular instance had set -Dcom.cloudbees.workflow.rest.external.ChangeSetExt.resolveCommitAuthors=false, which would have bypassed the problem for pipeline-stage-view, but perhaps there is no equivalent in Blue Ocean.

          Next there is the fact that GitChangeSet is doing a kind of reverse lookup, from User.fullName to User.id, which forces use of the relatively expensive CanonicalIdResolver. I tried to suppress this at one point, since it crops up all the time, but it turns out that this behavior is intentional and part of the expected behavior of the plugin.

          Then we have the specific behaviors of ID resolvers. FullNameIdResolver should I guess have been improved in 2.71 by making getAll be fast. DefaultUserCanonicalIdResolver is trivial. The problem here is UserIDCanonicalIdResolver, which has two potential performance issues. One is the SECURITY-243 defense, which was left enabled in this instance. The other, visible in the stack trace, is getById, which is supposed to be a relatively cheap operation, but is in fact still doing a full filesystem search of $JENKINS_HOME/users/*/config.xml for each newly cached User.

          So there are two problems visible in User. First, getLegacyConfigFilesFor is poorly optimized:

          pathname.isDirectory() && new File(pathname, "config.xml").isFile()
          

          is doing two stats when the single stat

          new File(pathname, "config.xml").isFile()
          

          would produce the same result (if a child file exists obviously the parent must be a directory); and IdStrategy.equals could be run first, avoiding any system calls unless and until there is a candidate match.

          More broadly, it may be the case that after the getAll change in 2.71, the whole getLegacyConfigFilesFor check in getOrCreate could just be deleted: if there were some legacy config file lying around, presumably scanAll would have already found it at startup, adding the user to the in-memory cache. It looks like UserTest.migration is testing this logic but needs to be confirmed.

          Jesse Glick added a comment - Similar observed in 2.73.2, which has the getAll change: "Handling GET /blue/rest/organizations/jenkins/pipelines/…/pipelines/…/branches/…/runs/2/ from … : RequestHandlerThread[#…]" … runnable … java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.getBooleanAttributes0(Native Method) at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242) at java.io.File.isDirectory(File.java:849) at hudson.model.User$3.accept(User.java:688) at java.io.File.listFiles(File.java:1291) at hudson.model.User.getLegacyConfigFilesFor(User.java:685) at hudson.model.User.getOrCreate(User.java:443) at hudson.model.User.getById(User.java:541) at hudson.model.User$UserIDCanonicalIdResolver.resolveCanonicalId(User.java:1107) at hudson.model.User.get(User.java:411) at hudson.model.User.get(User.java:380) at hudson.plugins.git.GitChangeSet.findOrCreateUser(GitChangeSet.java:379) at hudson.plugins.git.GitChangeSet.getAuthor(GitChangeSet.java:448) at io.jenkins.blueocean.service.embedded.rest.ChangeSetResource.getAuthor(ChangeSetResource.java:45) at … at io.jenkins.blueocean.commons.stapler.export.MethodProperty.getValue(MethodProperty.java:72) at io.jenkins.blueocean.commons.stapler.export.ExportInterceptor$1.getValue(ExportInterceptor.java:43) at io.jenkins.blueocean.commons.stapler.Export$BlueOceanExportInterceptor.getValue(Export.java:167) at io.jenkins.blueocean.commons.stapler.export.Property.writeTo(Property.java:136) at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:228) at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224) at io.jenkins.blueocean.commons.stapler.export.Property.writeValue(Property.java:314) at io.jenkins.blueocean.commons.stapler.export.Property.writeBuffered(Property.java:178) at io.jenkins.blueocean.commons.stapler.export.Property.writeValue(Property.java:239) at io.jenkins.blueocean.commons.stapler.export.Property.writeValue(Property.java:172) at io.jenkins.blueocean.commons.stapler.export.Property.writeTo(Property.java:157) at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:228) at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224) at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224) at io.jenkins.blueocean.commons.stapler.export.Model.writeTo(Model.java:199) at io.jenkins.blueocean.commons.stapler.Export.writeOne(Export.java:148) at io.jenkins.blueocean.commons.stapler.Export.serveExposedBean(Export.java:139) at io.jenkins.blueocean.commons.stapler.Export.doJson(Export.java:79) at io.jenkins.blueocean.commons.stapler.TreeResponse$Processor$1.generateResponse(TreeResponse.java:48) at org.kohsuke.stapler.HttpResponseRenderer$Default.handleHttpResponse(HttpResponseRenderer.java:124) at org.kohsuke.stapler.HttpResponseRenderer$Default.generateResponse(HttpResponseRenderer.java:69) at … There are multiple things going on here. First, Blue Ocean is using Stapler export to write out changelog information. Probably this is intentional (the information is slated to be consumed in the UI somewhere), as opposed to the common antipattern of external tools asking for /jobs/*/*/api/json (without tree ) and then discarding most of the result. This particular instance had set -Dcom.cloudbees.workflow.rest.external.ChangeSetExt.resolveCommitAuthors=false , which would have bypassed the problem for pipeline-stage-view , but perhaps there is no equivalent in Blue Ocean. Next there is the fact that GitChangeSet is doing a kind of reverse lookup, from User.fullName to User.id , which forces use of the relatively expensive CanonicalIdResolver . I tried to suppress this at one point, since it crops up all the time, but it turns out that this behavior is intentional and part of the expected behavior of the plugin. Then we have the specific behaviors of ID resolvers. FullNameIdResolver should I guess have been improved in 2.71 by making getAll be fast. DefaultUserCanonicalIdResolver is trivial. The problem here is UserIDCanonicalIdResolver , which has two potential performance issues. One is the SECURITY-243 defense, which was left enabled in this instance. The other, visible in the stack trace, is getById , which is supposed to be a relatively cheap operation, but is in fact still doing a full filesystem search of $JENKINS_HOME/users/*/config.xml for each newly cached User . So there are two problems visible in User . First, getLegacyConfigFilesFor is poorly optimized: pathname.isDirectory() && new File(pathname, "config.xml" ).isFile() is doing two stats when the single stat new File(pathname, "config.xml" ).isFile() would produce the same result (if a child file exists obviously the parent must be a directory); and IdStrategy.equals could be run first , avoiding any system calls unless and until there is a candidate match. More broadly, it may be the case that after the getAll change in 2.71, the whole getLegacyConfigFilesFor check in getOrCreate could just be deleted: if there were some legacy config file lying around, presumably scanAll would have already found it at startup, adding the user to the in-memory cache. It looks like UserTest.migration is testing this logic but needs to be confirmed.

          Jesse Glick added a comment -

          Extending the scope to Blue Ocean since I feel there should at a minimum be some flag to disable potentially expensive changelog rendering; perhaps this UI should be populated lazily in a background thread, or displayed only upon some user gesture, or something.

          Jesse Glick added a comment - Extending the scope to Blue Ocean since I feel there should at a minimum be some flag to disable potentially expensive changelog rendering; perhaps this UI should be populated lazily in a background thread, or displayed only upon some user gesture, or something.

          Jake Mroz added a comment -

          We are also seeing this issue on Jenkins v. 2.73.1, and Blue Ocean v. 1.2.4.

          Our system load spiked significantly and we had 190 threads backed up similar to below.  We tried killing the threads but eventually had to restart jenkins because the system was unresponsive.

          Handling GET /blue/organizations/jenkins/.../detail/.../.../pipeline/ from 127.0.0.1 : http-nio-8080-exec-11 BlueOceanUI/index.jelly PageStatePreloadDecorator/header.jelly
              java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
              java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
              java.io.File.isFile(File.java:882)
              hudson.model.User$3.accept(User.java:684)
              java.io.File.listFiles(File.java:1291)
              hudson.model.User.getLegacyConfigFilesFor(User.java:681)
              hudson.model.User.getOrCreate(User.java:439)
              hudson.model.User.get(User.java:417)
              hudson.model.User.get(User.java:376)
              hudson.plugins.git.GitChangeSet.findOrCreateUser(GitChangeSet.java:379)
              hudson.plugins.git.GitChangeSet.getAuthor(GitChangeSet.java:452)
              io.jenkins.blueocean.service.embedded.rest.ChangeSetResource.getAuthor(ChangeSetResource.java:49)
              java.lang.invoke.LambdaForm$DMH/1349277854.invokeVirtual_L_L(LambdaForm$DMH)
              java.lang.invoke.LambdaForm$BMH/1654447340.reinvoke(LambdaForm$BMH)
              ...
          
          

          Jake Mroz added a comment - We are also seeing this issue on Jenkins v. 2.73.1, and Blue Ocean v. 1.2.4. Our system load spiked significantly and we had 190 threads backed up similar to below.  We tried killing the threads but eventually had to restart jenkins because the system was unresponsive. Handling GET /blue/organizations/jenkins/.../detail/.../.../pipeline/ from 127.0.0.1 : http-nio-8080-exec-11 BlueOceanUI/index.jelly PageStatePreloadDecorator/header.jelly java.io.UnixFileSystem.getBooleanAttributes0(Native Method) java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242) java.io.File.isFile(File.java:882) hudson.model.User$3.accept(User.java:684) java.io.File.listFiles(File.java:1291) hudson.model.User.getLegacyConfigFilesFor(User.java:681) hudson.model.User.getOrCreate(User.java:439) hudson.model.User.get(User.java:417) hudson.model.User.get(User.java:376) hudson.plugins.git.GitChangeSet.findOrCreateUser(GitChangeSet.java:379) hudson.plugins.git.GitChangeSet.getAuthor(GitChangeSet.java:452) io.jenkins.blueocean.service.embedded. rest .ChangeSetResource.getAuthor(ChangeSetResource.java:49) java.lang.invoke.LambdaForm$DMH/1349277854.invokeVirtual_L_L(LambdaForm$DMH) java.lang.invoke.LambdaForm$BMH/1654447340.reinvoke(LambdaForm$BMH) ...

          Jesse Glick added a comment -

          PR 2251 purported to improve this situation, though as previously noted it may suffice to simply delete the affected code.

          Jesse Glick added a comment - PR 2251 purported to improve this situation, though as previously noted it may suffice to simply delete the affected code.

          James Dumay added a comment -

          jglick is there some assistance from vivek that could be provided to help get this over the line? (I see you've opened a PR but this one isn't assigned to you - assuming that PR isn't the whole story?)

          James Dumay added a comment - jglick is there some assistance from vivek that could be provided to help get this over the line? (I see you've opened a PR but this one isn't assigned to you - assuming that PR isn't the whole story?)

          Vivek Pandey added a comment - - edited

          As far as blueocean is concerned URL: https://ci.blueocean.io/blue/organizations/jenkins/blueocean/detail/BRANCH/RUN_ID/changes/ gets called only when Changes Tab on run details page gets clicked by user. Not in critical path, although for large change set it might makes page load slow. Eventually when it gets fixed in core or upstream library we are going to see it working optimally. Nothing to do for now in blueocean.

          Vivek Pandey added a comment - - edited As far as blueocean is concerned URL: https://ci.blueocean.io/blue/organizations/jenkins/blueocean/detail/BRANCH/RUN_ID/changes/ gets called only when Changes Tab on run details page gets clicked by user. Not in critical path, although for large change set it might makes page load slow. Eventually when it gets fixed in core or upstream library we are going to see it working optimally. Nothing to do for now in blueocean.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          core/src/main/java/hudson/model/User.java
          http://jenkins-ci.org/commit/jenkins/3853b3813967bd3d46e4dde7741eb97fa628345c
          Log:
          JENKINS-47429 User.getLegacyConfigFilesFor no longer seems to be necessary.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/hudson/model/User.java http://jenkins-ci.org/commit/jenkins/3853b3813967bd3d46e4dde7741eb97fa628345c Log: JENKINS-47429 User.getLegacyConfigFilesFor no longer seems to be necessary.

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          core/src/main/java/hudson/model/User.java
          http://jenkins-ci.org/commit/jenkins/53a320afd86844ad82c2d7fa7b64e4143e776c55
          Log:
          Merge pull request #3150 from jglick/User-JENKINS-47429

          JENKINS-47429 User.getLegacyConfigFilesFor no longer seems to be necessary

          Compare: https://github.com/jenkinsci/jenkins/compare/5f8f42624eb5...53a320afd868

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/src/main/java/hudson/model/User.java http://jenkins-ci.org/commit/jenkins/53a320afd86844ad82c2d7fa7b64e4143e776c55 Log: Merge pull request #3150 from jglick/User- JENKINS-47429 JENKINS-47429 User.getLegacyConfigFilesFor no longer seems to be necessary Compare: https://github.com/jenkinsci/jenkins/compare/5f8f42624eb5...53a320afd868

          Oleg Nenashev added a comment -

          The fix has been integrated towards 2.93

          Oleg Nenashev added a comment - The fix has been integrated towards 2.93

          James Dumay added a comment -

          James Dumay added a comment - oleg_nenashev jglick nice one

          Oleg Nenashev added a comment -

          Marked the fix as the lts-canididate, but I am not 100% sure it is safe. jglick WDYT?

          Oleg Nenashev added a comment - Marked the fix as the lts-canididate, but I am not 100% sure it is safe. jglick WDYT?

          Arnaud Héritier added a comment - oleg_nenashev , danielbeck , jglick , olivergondza https://jenkins.io/changelog-stable/  doesn't list  JENKINS-47429 as fixed in 2.89.2 ? But it seems to be here : https://github.com/jenkinsci/jenkins/commit/b450e9bdb60f6b3a0d492219a026438bf7b3b9d6 JENKINS-48157 , JENKINS-34138 are also missing ?

          Oleg Nenashev added a comment -

          2.89.2 was an out-of-order Security release. The changes listed in 2.89.2-fixed will be actually released in 2.89.3 IIUC, but somebody told me the different thing a while ago. Probably it was you

          Oleg Nenashev added a comment - 2.89.2 was an out-of-order Security release. The changes listed in 2.89.2-fixed will be actually released in 2.89.3 IIUC, but somebody told me the different thing a while ago. Probably it was you

          Daniel Beck added a comment -

          Daniel Beck added a comment - https://groups.google.com/d/msg/jenkinsci-dev/YNCLWs7cBHc/FKrH97ZoCwAJ

            jglick Jesse Glick
            svanoort Sam Van Oort
            Votes:
            2 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: