-
Bug
-
Resolution: Fixed
-
Critical
-
Powered by SuggestiMate -
Jenkins 2.184
The last few week/months all our Jenkins users experience very a very slow web GUI after some time.
Situation:
- In a clean browser (no cache, cookies) Jenkins is very fast
- After some time (workday - 8 hours of active Jenkins use), Jenkins GUI starts to slow down:
Loading jobs takes 10+ seconds, loading of static resources are very long pending etc.
Jenkins just isn't workable for users at that time. - Logging out + in again does not fix it for that user.
- Removing the ACEGI_SECURITY_HASHED_REMEMBER_ME_COOKIE cookie fixes everything for that user and makes Jenkins blazing fast again.
So, what happens with the ACEGI_SECURITY_HASHED_REMEMBER_ME_COOKIE?
Why does it cause the slowness after hours of use?
[update]
SECURITY-901 / CVE-2019-1003004 in Jenkins 2.150.2 introduced a security fix, but with a side effect that after some time (hours) the Jenkins GUI for that user starts to slow down to a crawl.
[JENKINS-56243] Jenkins GUI is slow -removing cookie fixes it (temporarily)
We are using Active Directory security realm using Active Directory Plugin version 2.12. The only additional configuration enabled is "Enable StartTLS".
I'll try to get back to you with the other information as well.
We experienced this same issue after upgrading to LTS 2.150.2 also using Active Directory. The problem was resolved after enabling cache under the security realm configuration. We thought this setting had been configured already - perhaps it was disabled by the upgrade to LTS 2.150.2 or just misconfiguration on our end.
perhaps it was disabled by the upgrade to LTS 2.150.2
Nothing done during the security around that, sorry
hms_lig Try enabling the cache feature of the plugin, that will help you a lot I imagine.
wfollonier We are using built in user authentication with the built in access matrix. Nothing special whatsoever. Stock install via apt.
Could we get thread dumps for this while Jenkins is busy? https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump
Here comes a thread dump while the GUI is slow. I logged in yesterday when leaving work (checked keep me signed in) and shut down my computer. Arriving now this morning and booting up my laptop and opening the Jenkins web it is slow, not unusable but extremely slow.
At the time the thread dump was recorded, that was the only request being handled (and it doesn't show problems). I suppose it's best in this case to take the thread dump another way (e.g. signal), while a slow request is being handled by Jenkins.
This thread dump is taken in one chrome tab while another tab is busy loading another Jenkins page. The threadDump tab finished while the other tab was still busy. I could try the signalling method as well if this dump is not good enough, I just hurried to get the dump before reading your comment entirely and realizing you asked for a different method
Interesting.
https://github.com/jenkinsci/jenkins/blob/1c9eb43283e7321ee4d3a0e1e9995453493ff04a/core/src/main/java/hudson/security/TokenBasedRememberMeServices2.java#L240 is new in the security fix.
With a slow security realm, this will affect not just existing user lookups e.g. from changelogs ( GitChangeSet.findOrCreateUser shows up 4 times in the thread dump), but also any other request. Four separate requests go through TokenBasedRememberMeServices2.retrieveAuthFromCookie – caching means only one request to the security realm, but if that's slow, all of them are handled slowly.
The thread dump also indicates AD goes through referrals, which slows everything down further. I wonder whether the security realm config is just terrible. Make sure you're using caching, if available, and that you're contacting the global catalog (which is all I remember from working in an AD environment).
I don't know anything at all about AD but we do have some performance issues related to it that has affected our initial login to Jenkins (the regular log in takes 20 seconds) even before the upgrade that brought the current problems in this issue. Our IT department have some kind of plan for handling those issue down the road, but yes we do have a slow security realm.
Thanks for the tips I'll look into it on my end....
...I enable the cache and the GUI went from slow to normal fast without having to delete any cookie.
is new in the security fix.
Not really, it was just "moved" from super class to that class, in order to have the time-independant equal method, nothing else changed in the method.
Hi all,
We are currently running Jenkins 2.164.1 LTS with the AD Plugin for authentication and we are facing the same issues. Enabling cache has improved a lot the response time of all HTTP requests (static resources/xhr ..) to jenkins. Is it considered as a work around or a mandatory configuration while using this AD plugin?
I cannot find a 2.150.2, mentioned above but 2.150-1.2
https://pkg.jenkins.io/opensuse/jenkins-2.150-1.2.noarch.rpm
I Just tried to downgrade to 2.149 and it did not go so well, as several plugins do not work now.
Our users are having serious trouble with slow Jenkins user experience. I hope this can be fixed soon.
I was unaware of the opensuse-stable builds. So this is the Jenkins LTS releases.
I tried to downgrade to the 2.150.1 LTS and it worked with our installed plugins.
The LTS upgrade guide seems to have a workaround for this bug
https://jenkins.io/doc/upgrade-guide/2.150/
The LTS upgrade guide seems to have a workaround for this bug
Different issue. You're not even logged in with that one.
So the workaround -Djenkins.security.seed.UserSeedProperty.disableUserSeed=true will not fix this issue?
It says its a workaround for SECURITY-901 which mentioned in the description here caused the regression.
Well if it doesn't, the downgrading to Jenkins LTS 2.150.1 worked on our Test Jenkins, so we have that to fall back to if it doesn't work.
The active-directory thing is a total red herring. We are affected by this, and we do not and never have used AD. The downgrade to 2.150.1 does work.
We had the same issue, the workaround https://jenkins.io/doc/upgrade-guide/2.150/ worked for us.
I upgraded our jenkins masters from 2.150.1 to 2.164.2 a few days ago. The next day, we were intermittently seeing long page loads. Today, I have consistently seen pages take 15-20 seconds to load. This is from an instance of chrome 71 on Linux; this browser has been open for a very long time (weeks, at least). I did a service jenkins restart on one of the masters, and page loads became fast again from that same client browser, even after logging back into jenkins. If I open a new browser (I opened an instance of Firefox), it sees fast page loads on the server that was just restarted, and on the servers where the jenkins process was not restarted, even after logging into jenkins. If I turn on the AD cache on the jenkins masters, page loads become fast on all machines, even from the long-running client browser. I am unconvinced that AD server performance was the root cause, since page loads are fast from a newly-opened web browser to a jenkins server with the AD cache still turned off.
So far, the only thing that has worked for us on Jenkins 2.150+ is disabling the Remember Me option in Global Security.
So, summary from the comments:
- With AD post 2.150.1 become slow over time with Remember Me on
- Without AD post 2.150.1 become slow over time with Remember Me on
- Disabling Remember Me in Global Security fixes the problem, no slowdown after several hours
- SECURITY-901 / CVE-2019-1003004 is de cause of the issue
@Jenkins developers:
Please look into the problem! Since people reported the same problem WITHOUT Active Directoy, please do not blame AD or directory services for the issue.
Disabling Remember Me is a good workaround, but nothing more than a workaround!
henjovr - did you try enabling the cache in Jenkins' LDAP plugin config? That helped for us. See Stuart Rowe's comment, above: comment-363002
The LDAP was already enabled before the upgrade, still enabled after the upgrade from 2.150.1.
So no, that did not fix it unfortunately.
We have tried several workarounds. None of them works.
- Disable security
- LDAP (Logged-in users can do anything)
- LDAP (Anyone can do anything)
- Disable remember me
- Downgrade to Jenkins LTS 2.150.1
Our users still experience that Jenkins UI is slow and loading takes time, sometimes hang.
Is there any update on this? We are using 2.174, and it is painfully slow. Clearing cookies can help for short span of time.
Looking at the code added by SECURITY-901, the code for UserSeedProperty concerns me:
The RNG call RANDOM.generateSeed is effectively single threaded based on how SecureRandom works. I'm not a Jenkins expert, but it looks like the HttpSessionContextIntegrationFilter2 or AuthenticationProcessingFilter2 can cause creation of a User object with properties (via AllUsers). The UserSeedProperty instance then gets generated via the User properties constructor (causing a new RNG call). If that gets called a lot, that can be a severe bottleneck.
Similarly HudsonPrivateSecurityRealm also triggers the seed re-generation.
A test may be to disable the user seed property (note this exposes the issue that SECURITY-901 tries to fix) per this link
Set jenkins.security.seed.UserSeedProperty.disableUserSeed to true
Hi,
I am using 2.176.1. And it is painfully slow. I changed the security settings to 'Jenkin's Own User Database'. Speed was good. Due to abrupt reboot of Jenkins yesterday, its not allowing any kind of authentication now. But can someone tell me the exact version which do not have GUI slowness issues please. I am trying to install Jenkins from fresh on a new box so I would rather install a version which has no known issues.
This issue is resolved after upgrading the Jenkins from 2.176 to 2.179. Jenkins is working perfectly fine now.
I have been trying Jenkins 2.181 on our Jenkins Test server. Really fast loading of UI, not seeing any slowness.
Though the Jenkins Test server does not have as much build history, the real test would be to upgrade our Jenkins Production server.
Edit: Until I went to Manage Jenkins, then loading the icons just hanging page loading.
Edit: Opening a multibranch project branch, the page seems to have loaded completely, but loading is still ongoing for some more ~30 seconds. Looked at what was loading and seems to be prototype.js. There was an fix for this in JENKINS-49319, but it was reverted.
Firefox is having much more problems than browsers with the Chromium engine. So it seems to be a javascript problem that last loading.
Edit:
All good things must come to an end: Now I am getting the same problem in Chrome. Several resources are not loading.
What I find very odd. If I log in to Jenkins (using LDAP) all loading problems goes away. Why should logging in solve the loading slowness?
The last remaining loading on Multibranch Pipeline branch project. Developer Tools in my browser showed me what was remaining trying to load.
ajax 200 xhr prototype.js:1585 310 B 15 ms
runs?since=%232&fullStages=true&_=1561103938220 200 xhr jquery2.js:998 48.8 KB 18 ms
These two repeated several times for about 30+ seconds until the page was actually finished loading.
If there's contention in the SecureRandom instance, that could be causing issues. Let me see if I can reproduce any slowdowns with a benchmark test.
Thanks for verifying that this isn't AD-specific at least. I might be able to help figure this out.
So far from my testing, I'm not finding any slow code in seed renewal. Some basic JMH tests in this branch: https://github.com/jenkinsci/jenkins/compare/master...jvz:user-seed-perf-JENKINS-56243?expand=1
Right now, my hypothesis is that if a SecurityRealm is having any performance issues, multiple requests to load the same user's details could be piling up due to the remember me cookie validation check. The same happens in the session cookie itself. Basically, the reason why it was performing better before was because it wasn't validating authentication properly in the first place.
I'm working on some basic load tests to compare 2.150.1 and 2.150.2 to see if I can reproduce this idea. Based on the comments so far, it sounds like this should even be potentially reproducible using just the built-in user database. The JMH tests above only use an in-memory user database, so introducing lag in the calls to loadUserDetails() could be an interesting way to potentially test this as well.
I discovered that the remember me service bypasses the user details cache entirely. I've made a draft PR with this fixed: https://github.com/jenkinsci/jenkins/pull/4093
We have disabled the Remember Me option and still experience the slowness. The slowness only happens when users are not logged in.
I chatted with wfollonier earlier today, and we've found that the most likely culprit is that TokenBasedRememberMeServices2 does not cache the user seed property in their session. I'll submit a PR later to address this.
Looking forward to testing it out. Our developers are getting frustrated.
Incremental release available: https://repo.jenkins-ci.org/incrementals/org/jenkins-ci/main/cli/2.184-rc28433.92d6063c40c3/
Still waiting for reviews before someone can merge it for the next weekly.
I can test the incremental release on our Test Jenkins instance. I dare not install it in production.
djviking as you said before, if the "Disable remember me" workaround was not working for you, do not expect this change to work either. It's "just" the correction of the root cause of this issue. From my PoV, wiht all the information you gave, you have another (unknown?) problem that is different from this one.
There is another issue I have been tracking I think can be related to our problem of slowness. JENKINS-49319
henjovr, shevek, hms_lig, idriver, could you provide additional information about your configuration? Especially the security realm that is used, in which version (if from plugin), with specific configuration as well (like AD/LDAP cache configuration).
In addition, when you are seeing such performance problem, could you check the cookies of one of your request? (especially the number of cookie that is sent)
From my PoV there is no huge performance impact on the REMEMBER_ME cookie as the only addition there is a User.getById call, that is doing nothing with external security realm.