[JENKINS-47429] Absolutely atrocious user lookup performance when many legacy users

Type: Bug
Resolution: Fixed
Priority: Major
Component/s: core
Labels:
- 2.89.3-fixed
- performance
Environment:
core 2.60.x, Git plugin, and user viewing changesets when a lot of users exist.

Similar Issues:
Powered by SuggestiMate

Show
Epic Link:
Performance

The performance of user lookup when many users with legacy config files exist has been identified as the cause of a master's CPU usage spiking to the point of unresponsiveness.

Stack trace:

at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
at java.io.File.isFile(File.java:882)
at hudson.model.User$3.accept(User.java:697)
at java.io.File.listFiles(File.java:1291)
at hudson.model.User.getLegacyConfigFilesFor(User.java:694)
at hudson.model.User.getOrCreate(User.java:437)
at hudson.model.User.getById(User.java:535)
at hudson.model.User$UserIDCanonicalIdResolver.resolveCanonicalId(User.java:1086)
at hudson.model.User.get(User.java:405)
at hudson.model.User.get(User.java:374)
at hudson.plugins.git.GitChangeSet.findOrCreateUser(GitChangeSet.java:379)
at hudson.plugins.git.GitChangeSet.getAuthor(GitChangeSet.java:448)

The root cause of the problem is that for every user we're doing an O(n) scan through the directories in that folder looking for legacy user config files:

 private static final File[] getLegacyConfigFilesFor(final String id) {
        return getRootDir().listFiles(new FileFilter() {
            @Override
            public boolean accept(File pathname) {
                return pathname.isDirectory() && new File(pathname, "config.xml").isFile() && idStrategy().equals(
                        pathname.getName(), id);
            }
        });
    }

Code here: https://github.com/jenkinsci/jenkins/blob/2d2101215dc39dfcb03279e3cb8898b8b9e5bc5f/core/src/main/java/hudson/model/User.java#L680

The likely best fix is either to be able to check for a single file (if case sensitive mode is on) or to cache the list of entries in that folder so we don't need to do a listFiles and linear-time scan through all entries with a FileFilter.

is duplicated by

JENKINS-23281 Jenkins.getUser(...) is debilitatingly slow with a large number of users

Closed

relates to

JENKINS-24317 Nullpointer Exception on build in version 1.576 can't see output and fails

Resolved

JENKINS-43936 Performance issues caused by User.getOrCreate

Resolved

links to

CloudBees Internal OSS-2524

CloudBees Internal OSS-2527

core PR 2251

(1 links to)

Sam Van Oort created issue - 2017-10-13 16:50

Jesse Glick added a comment - 2017-10-13 16:53

What is a “modern Jenkins core”?

Recheck after ~~JENKINS-45737~~ in 2.71+; information about performance of this class predating that change is not really useful now.

Jesse Glick added a comment - 2017-10-13 16:53 What is a “modern Jenkins core”? Recheck after JENKINS-45737 in 2.71+; information about performance of this class predating that change is not really useful now.

Sam Van Oort made changes - 2017-10-13 22:31

Environment

Original: modern Jenkins core, Git plugin, and user viewing changesets when a lot of users exist.

New: core 2.60.x, Git plugin, and user viewing changesets when a lot of users exist.

Sam Van Oort added a comment - 2017-10-13 23:14

jglick I think that change may actually make the issue worse because we're still triggering the search through all legacy config files. Except now we're triggering it up front, when the AllUsers.scanAll initializer runs – which calls getOrCreate for every id, and if a proper config folder does not exist, a scan through all legacy folders is triggered.

This seems like it becomes O(n^2) in the number of legacy users – although thankfully it ought to only run once, after which migration of legacy users will be complete.

I am going to downgrade this to "major" though because at least we have all the pain put up-front for users. And I'm going to hope that I've missed something obvious and some people aren't going to get bitten after all.

Sam Van Oort added a comment - 2017-10-13 23:14 jglick I think that change may actually make the issue worse because we're still triggering the search through all legacy config files. Except now we're triggering it up front, when the AllUsers.scanAll initializer runs – which calls getOrCreate for every id, and if a proper config folder does not exist, a scan through all legacy folders is triggered. This seems like it becomes O(n^2) in the number of legacy users – although thankfully it ought to only run once, after which migration of legacy users will be complete. I am going to downgrade this to "major" though because at least we have all the pain put up-front for users. And I'm going to hope that I've missed something obvious and some people aren't going to get bitten after all.

Sam Van Oort made changes - 2017-10-13 23:14

Priority

Original: Critical [ 2 ]

New: Major [ 3 ]

Ryan Campbell added a comment - 2017-10-17 15:51

The code has significantly changed in 2.71 so we need evidence as to whether this is still an issue or not.

Ryan Campbell added a comment - 2017-10-17 15:51 The code has significantly changed in 2.71 so we need evidence as to whether this is still an issue or not.

Sam Van Oort added a comment - 2017-10-17 15:56

recampbell I've reviewed the modified code in 2.71 (that was the basis for the preceding comment). It doesn't touch the key problem path here – please see details in that comment though.

Sam Van Oort added a comment - 2017-10-17 15:56 recampbell I've reviewed the modified code in 2.71 (that was the basis for the preceding comment). It doesn't touch the key problem path here – please see details in that comment though.

Jesse Glick added a comment - 2017-10-24 18:20

Right the recent modifications would not make the code in question any faster, but IIUC they would make it be run during startup and only once, which might make this much less of an issue in practice (TBD).

Jesse Glick added a comment - 2017-10-24 18:20 Right the recent modifications would not make the code in question any faster, but IIUC they would make it be run during startup and only once, which might make this much less of an issue in practice (TBD).

Jesse Glick added a comment - 2017-10-26 14:03

Similar observed in 2.73.2, which has the getAll change:

"Handling GET /blue/rest/organizations/jenkins/pipelines/…/pipelines/…/branches/…/runs/2/ from … : RequestHandlerThread[#…]" … runnable …
   java.lang.Thread.State: RUNNABLE
	at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
	at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
	at java.io.File.isDirectory(File.java:849)
	at hudson.model.User$3.accept(User.java:688)
	at java.io.File.listFiles(File.java:1291)
	at hudson.model.User.getLegacyConfigFilesFor(User.java:685)
	at hudson.model.User.getOrCreate(User.java:443)
	at hudson.model.User.getById(User.java:541)
	at hudson.model.User$UserIDCanonicalIdResolver.resolveCanonicalId(User.java:1107)
	at hudson.model.User.get(User.java:411)
	at hudson.model.User.get(User.java:380)
	at hudson.plugins.git.GitChangeSet.findOrCreateUser(GitChangeSet.java:379)
	at hudson.plugins.git.GitChangeSet.getAuthor(GitChangeSet.java:448)
	at io.jenkins.blueocean.service.embedded.rest.ChangeSetResource.getAuthor(ChangeSetResource.java:45)
	at …
	at io.jenkins.blueocean.commons.stapler.export.MethodProperty.getValue(MethodProperty.java:72)
	at io.jenkins.blueocean.commons.stapler.export.ExportInterceptor$1.getValue(ExportInterceptor.java:43)
	at io.jenkins.blueocean.commons.stapler.Export$BlueOceanExportInterceptor.getValue(Export.java:167)
	at io.jenkins.blueocean.commons.stapler.export.Property.writeTo(Property.java:136)
	at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:228)
	at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224)
	at io.jenkins.blueocean.commons.stapler.export.Property.writeValue(Property.java:314)
	at io.jenkins.blueocean.commons.stapler.export.Property.writeBuffered(Property.java:178)
	at io.jenkins.blueocean.commons.stapler.export.Property.writeValue(Property.java:239)
	at io.jenkins.blueocean.commons.stapler.export.Property.writeValue(Property.java:172)
	at io.jenkins.blueocean.commons.stapler.export.Property.writeTo(Property.java:157)
	at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:228)
	at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224)
	at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224)
	at io.jenkins.blueocean.commons.stapler.export.Model.writeTo(Model.java:199)
	at io.jenkins.blueocean.commons.stapler.Export.writeOne(Export.java:148)
	at io.jenkins.blueocean.commons.stapler.Export.serveExposedBean(Export.java:139)
	at io.jenkins.blueocean.commons.stapler.Export.doJson(Export.java:79)
	at io.jenkins.blueocean.commons.stapler.TreeResponse$Processor$1.generateResponse(TreeResponse.java:48)
	at org.kohsuke.stapler.HttpResponseRenderer$Default.handleHttpResponse(HttpResponseRenderer.java:124)
	at org.kohsuke.stapler.HttpResponseRenderer$Default.generateResponse(HttpResponseRenderer.java:69)
	at …

There are multiple things going on here. First, Blue Ocean is using Stapler export to write out changelog information. Probably this is intentional (the information is slated to be consumed in the UI somewhere), as opposed to the common antipattern of external tools asking for /jobs/*/*/api/json (without tree) and then discarding most of the result. This particular instance had set -Dcom.cloudbees.workflow.rest.external.ChangeSetExt.resolveCommitAuthors=false, which would have bypassed the problem for pipeline-stage-view, but perhaps there is no equivalent in Blue Ocean.

Next there is the fact that GitChangeSet is doing a kind of reverse lookup, from User.fullName to User.id, which forces use of the relatively expensive CanonicalIdResolver. I tried to suppress this at one point, since it crops up all the time, but it turns out that this behavior is intentional and part of the expected behavior of the plugin.

Then we have the specific behaviors of ID resolvers. FullNameIdResolver should I guess have been improved in 2.71 by making getAll be fast. DefaultUserCanonicalIdResolver is trivial. The problem here is UserIDCanonicalIdResolver, which has two potential performance issues. One is the SECURITY-243 defense, which was left enabled in this instance. The other, visible in the stack trace, is getById, which is supposed to be a relatively cheap operation, but is in fact still doing a full filesystem search of $JENKINS_HOME/users/*/config.xml for each newly cached User.

So there are two problems visible in User. First, getLegacyConfigFilesFor is poorly optimized:

pathname.isDirectory() && new File(pathname, "config.xml").isFile()

is doing two stats when the single stat

new File(pathname, "config.xml").isFile()

would produce the same result (if a child file exists obviously the parent must be a directory); and IdStrategy.equals could be run first, avoiding any system calls unless and until there is a candidate match.

More broadly, it may be the case that after the getAll change in 2.71, the whole getLegacyConfigFilesFor check in getOrCreate could just be deleted: if there were some legacy config file lying around, presumably scanAll would have already found it at startup, adding the user to the in-memory cache. It looks like UserTest.migration is testing this logic but needs to be confirmed.

Jesse Glick added a comment - 2017-10-26 14:03 Similar observed in 2.73.2, which has the getAll change: "Handling GET /blue/rest/organizations/jenkins/pipelines/…/pipelines/…/branches/…/runs/2/ from … : RequestHandlerThread[#…]" … runnable … java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.getBooleanAttributes0(Native Method) at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242) at java.io.File.isDirectory(File.java:849) at hudson.model.User$3.accept(User.java:688) at java.io.File.listFiles(File.java:1291) at hudson.model.User.getLegacyConfigFilesFor(User.java:685) at hudson.model.User.getOrCreate(User.java:443) at hudson.model.User.getById(User.java:541) at hudson.model.User$UserIDCanonicalIdResolver.resolveCanonicalId(User.java:1107) at hudson.model.User.get(User.java:411) at hudson.model.User.get(User.java:380) at hudson.plugins.git.GitChangeSet.findOrCreateUser(GitChangeSet.java:379) at hudson.plugins.git.GitChangeSet.getAuthor(GitChangeSet.java:448) at io.jenkins.blueocean.service.embedded.rest.ChangeSetResource.getAuthor(ChangeSetResource.java:45) at … at io.jenkins.blueocean.commons.stapler.export.MethodProperty.getValue(MethodProperty.java:72) at io.jenkins.blueocean.commons.stapler.export.ExportInterceptor$1.getValue(ExportInterceptor.java:43) at io.jenkins.blueocean.commons.stapler.Export$BlueOceanExportInterceptor.getValue(Export.java:167) at io.jenkins.blueocean.commons.stapler.export.Property.writeTo(Property.java:136) at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:228) at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224) at io.jenkins.blueocean.commons.stapler.export.Property.writeValue(Property.java:314) at io.jenkins.blueocean.commons.stapler.export.Property.writeBuffered(Property.java:178) at io.jenkins.blueocean.commons.stapler.export.Property.writeValue(Property.java:239) at io.jenkins.blueocean.commons.stapler.export.Property.writeValue(Property.java:172) at io.jenkins.blueocean.commons.stapler.export.Property.writeTo(Property.java:157) at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:228) at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224) at io.jenkins.blueocean.commons.stapler.export.Model.writeNestedObjectTo(Model.java:224) at io.jenkins.blueocean.commons.stapler.export.Model.writeTo(Model.java:199) at io.jenkins.blueocean.commons.stapler.Export.writeOne(Export.java:148) at io.jenkins.blueocean.commons.stapler.Export.serveExposedBean(Export.java:139) at io.jenkins.blueocean.commons.stapler.Export.doJson(Export.java:79) at io.jenkins.blueocean.commons.stapler.TreeResponse$Processor$1.generateResponse(TreeResponse.java:48) at org.kohsuke.stapler.HttpResponseRenderer$Default.handleHttpResponse(HttpResponseRenderer.java:124) at org.kohsuke.stapler.HttpResponseRenderer$Default.generateResponse(HttpResponseRenderer.java:69) at … There are multiple things going on here. First, Blue Ocean is using Stapler export to write out changelog information. Probably this is intentional (the information is slated to be consumed in the UI somewhere), as opposed to the common antipattern of external tools asking for /jobs/*/*/api/json (without tree ) and then discarding most of the result. This particular instance had set -Dcom.cloudbees.workflow.rest.external.ChangeSetExt.resolveCommitAuthors=false , which would have bypassed the problem for pipeline-stage-view , but perhaps there is no equivalent in Blue Ocean. Next there is the fact that GitChangeSet is doing a kind of reverse lookup, from User.fullName to User.id , which forces use of the relatively expensive CanonicalIdResolver . I tried to suppress this at one point, since it crops up all the time, but it turns out that this behavior is intentional and part of the expected behavior of the plugin. Then we have the specific behaviors of ID resolvers. FullNameIdResolver should I guess have been improved in 2.71 by making getAll be fast. DefaultUserCanonicalIdResolver is trivial. The problem here is UserIDCanonicalIdResolver , which has two potential performance issues. One is the SECURITY-243 defense, which was left enabled in this instance. The other, visible in the stack trace, is getById , which is supposed to be a relatively cheap operation, but is in fact still doing a full filesystem search of $JENKINS_HOME/users/*/config.xml for each newly cached User . So there are two problems visible in User . First, getLegacyConfigFilesFor is poorly optimized: pathname.isDirectory() && new File(pathname, "config.xml" ).isFile() is doing two stats when the single stat new File(pathname, "config.xml" ).isFile() would produce the same result (if a child file exists obviously the parent must be a directory); and IdStrategy.equals could be run first , avoiding any system calls unless and until there is a candidate match. More broadly, it may be the case that after the getAll change in 2.71, the whole getLegacyConfigFilesFor check in getOrCreate could just be deleted: if there were some legacy config file lying around, presumably scanAll would have already found it at startup, adding the user to the in-memory cache. It looks like UserTest.migration is testing this logic but needs to be confirmed.

Jesse Glick added a comment - 2017-10-26 14:06

Extending the scope to Blue Ocean since I feel there should at a minimum be some flag to disable potentially expensive changelog rendering; perhaps this UI should be populated lazily in a background thread, or displayed only upon some user gesture, or something.

Jesse Glick added a comment - 2017-10-26 14:06 Extending the scope to Blue Ocean since I feel there should at a minimum be some flag to disable potentially expensive changelog rendering; perhaps this UI should be populated lazily in a background thread, or displayed only upon some user gesture, or something.

Jesse Glick made changes - 2017-10-26 14:06

Component/s

New: blueocean-plugin [ 21481 ]

Assignee:: Jesse Glick

Reporter:: Sam Van Oort

Votes:: 2 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2017-10-13 16:50

Updated:: 2018-02-12 12:01

Resolved:: 2017-11-28 13:31

Jenkins

Details

Description

Attachments

Issue Links

Activity

Collapse comment: Jesse Glick added a comment - 2017-10-13 16:53

Expand comment: Jesse Glick added a comment - 2017-10-13 16:53

Collapse comment: Sam Van Oort added a comment - 2017-10-13 23:14

Expand comment: Sam Van Oort added a comment - 2017-10-13 23:14

Collapse comment: Ryan Campbell added a comment - 2017-10-17 15:51

Expand comment: Ryan Campbell added a comment - 2017-10-17 15:51

Collapse comment: Sam Van Oort added a comment - 2017-10-17 15:56

Expand comment: Sam Van Oort added a comment - 2017-10-17 15:56

Collapse comment: Jesse Glick added a comment - 2017-10-24 18:20

Expand comment: Jesse Glick added a comment - 2017-10-24 18:20

Collapse comment: Jesse Glick added a comment - 2017-10-26 14:03

Expand comment: Jesse Glick added a comment - 2017-10-26 14:03

Collapse comment: Jesse Glick added a comment - 2017-10-26 14:06

Expand comment: Jesse Glick added a comment - 2017-10-26 14:06

People

Dates