-
Bug
-
Resolution: Fixed
-
Critical
-
Powered by SuggestiMate -
Blue Ocean 1.2-beta1, Blue Ocean 1.2-beta2, Blue Ocean 1.4 - beta 3, Blue Ocean 1.5 - beta 1
(to investigate)
- Some people are finding dashboard/pipeline screens slow when logged in vs not logged in (see appropriate comments below)
- See comments and support bundles below for this - eg via bksaville. In some cases it is related to number of runs.
- Some users see activity screen as very slow when there are a large number of runs (this seems to be the more common and serious case...)
-
- The best example of this is https://ci.jenkins.io/blue/organizations/jenkins/Core%2Fjenkins/activity - note how long it takes to load (compared to other pipelines there). You also see this if you open from a PR and then dismiss the result screen (as it goes back to activity) - batmat reported this. Large number of branches+runs (other pipelines with less runs load faster)
- The best example of this is https://ci.jenkins.io/blue/organizations/jenkins/Core%2Fjenkins/activity - note how long it takes to load (compared to other pipelines there). You also see this if you open from a PR and then dismiss the result screen (as it goes back to activity) - batmat reported this. Large number of branches+runs (other pipelines with less runs load faster)
— ORIGINAL TICKET —
I've noticed that the dashboard loads quickly when I'm not authenticated.
(classic loads normally)
Deleting the config history like suggested in https://issues.jenkins-ci.org/browse/JENKINS-43208 did not work.
I have sent an HAR file via email to jamesdumay.
Jenkins version 2.46.3, BlueOcean 1.1.2
- plugins.txt
- 9 kB
- haranalysis.png
- 118 kB
- is duplicated by
-
JENKINS-48254 Very slow access of activity screen
-
- Closed
-
-
JENKINS-49347 BlueOcean unusably slow, maybe related to having a lot of builds stored in history
-
- Closed
-
- relates to
-
JENKINS-48254 Very slow access of activity screen
-
- Closed
-
- links to
- mentioned in
-
Page Failed to load
[JENKINS-44995] Very slow activity/pipeline screen load (often when logged in)
schulzha it looks like most requests to the REST API for things that could be slow (listing pipelines, fetching favourites) are returning within a few milliseconds. The page load for Blue Ocean does not.
I have a few questions:
- Is Blue Ocean behind a HTTP proxy?
- Can you please generate a support bundle and email it to me. This will give me some indication of what plugins are running on your system etc
I am using Github Authorization backend with a Github Enterprise instance.
Jenkins is behind an Apache2 reverse-proxy for TLS, configured like described here: https://wiki.jenkins.io/display/JENKINS/Running+Jenkins+behind+Apache.
Apart from that, our corporate network uses a proxy to connect to the internet, but that shouldn't be involved in loading blue ocean.
I have sent you the support bundle.
Thanks!
Another thing: would you be able to perform a thread dump of the server while the dashboard is loading slowly? That could give us some extra ideas of what's happening. Just a few thread dumps every 5 seconds as it's loading would be perfect
I'm running into the same issue here. GitHub Auth backend, same as Hans.
schulzha I see you are using github-oauth 0.27 which is the same version we use on [BlueOcean.io|ci.blueocean.io] without any troubles. Hmm, could you try accessing without the proxy in front? I just want to be 100% sure that its not interfering with requests.
bksaville are you using the github-oauth 0.27 too? Can you provide a list of your installed plugins attached as a text file to this issue please?
Attached my list of installed plugins - we have a couple that our internal only, but the majority are on latest or close to latest versions. And yes, we are using version 0.27 as well. I should note that even though it reports blue ocean 1.1.2 as installed, I believe it hasn't been activated yet since I haven't restarted since upgrading the blue ocean plugins, so we're still on 1.1.1 as far as I know.
jamesdumay I actually cannot try without the proxy, since it does our TLS configuration and OAuth does not work without HTTPS. But it is just a plain Apache2 configured as the wiki describes, so it should not be a problem.
Hi schulzha,
I couldn't find the heap dump in your log files.
Could you try the following:
- Find the process id (pid) of Jenkins
- Try to load the slow page
- Run on the terminal kill -3 <jenkins pid> where <jenkins pid> is the process ID. Run the command every 5 seconds as your request is being processed.
- Send me the server log using the secure drop link I offered previously
Thanks,
James
jamesdumay, I'm not sure what might have changed, but I haven't noticed this problem for quite awhile now and I've been logged in most of the time. I would call this fixed for us, but really not sure what changed as I haven't updated any plugins or anything that I remember.
bksaville I suspect it was some changes we made to test reporting that is causing the slowness. The good news is that we've done some profiling, identified performance issues and pushed a fix into Blue Ocean 1.2. We have a release going out during Tuesday next week that you can try. If its still a problem, we can look into this further.
Somewhere last week, after updating plugins and Jenkins, the issue came back for us.
Fortuantely, I think I found the root cause of this issue, or at least part of it: I noticed that the autofavorite plugin went crazy on me and I had multiple hundred favorites. After I deleted them BlueOcean loads almost as fast as unauthenticated.
I am experiencing this issue too. Using Jenkins 2.60.3 with BlueOcean 1.2.1.
BlueOcen takes ages to load.
I am also experiencing same issue with Jenkins 2.78 and BlueOcean 1.2.1. My setup is also similar to Hans with apache, https, Github Authorization, Github Enterprise)
using https://wiki.jenkins.io/display/JENKINS/Support+Core+Plugin and attaching can help diagnose, otherwise there isn't much that can be done based on the information given.
oz123 sbabu if you could use the support core plugin above - that would help a lot, and point to what is taking all the time.
I do experience the same problem. I have attached the necessary support files.
support_2017-11-17_08.03.54.zip support_2017-11-17_08.04.02.zip
It happens for me when accessing a pipeline job (using the non-declarative version, called MardynPipeline) with around 1000 runs. When accessing it logged in it needs roughly 12 seconds. Not logged in the load is almost instantaneous.
(Jenkins 2.90, BlueOcean 1.3.2)
A pipeline job with 55 runs takes around 4 seconds to load using blue ocean.
---------------------
I got a considerable performance improvement once I deleted a couple of old builds (We had 30 stored for now). After reducing this to 7, the Pipeline Build now loads way faster (around 1-2 seconds)
So I assume some time access through a large number of build (and/or files) are the problem.
Besides of that I get a rather slow response when I open BlueOcean for the first time in a while. (see attachments) support_2017-11-17_11.26.08.zip
After saying that we hadn't seen it in awhile, I've also been noticing this problem again recently. It takes much more than 10-12 seconds though, more like 2-3 minutes to load initially, and then extremely fast after that. Loading the classic UI is immediate, so this must be something with blue ocean itself.
I will try to get a support bundle at some point here, but it may take me a bit.
thanks seckler bksaville - that is most useful.
So clearly there is come caching going on, HOWEVER, given classic doesn't have this, it means that blue ocean is touching some data that it shouldn't, and given it seems to be proportional to the run history, it breaks paging of the data somehow. in the past we saw this with actions...
Any support bundles help, at least to know if there is some plugin causing this...
The key is that we want to have test coverage for this if possible: we will want to have a test case that creates 1000+ runs and possibly times the loading of it relative to classic so it isn't 10x slower... but this sounds error prone test wise... so may need to think other strategies to verify this...
seckler so - do you see it on the dashboard or activity/pipeline page only?
Hmm, I only see this on the dashboard and the activity side, branches and pull requests work fine. But to be fair, we don't have multi branch or PR enabled, so these pages are normally blank.
for activity side, this is not limited to normal pipeline builds, but also freestyle jobs.
works fine:
- new pipeline etc., but there no file touching should happen anyways.
- viewing specific builds of a pipeline
what i could imagine, is that all builds of a pipeline are already cached when loading a pipeline. as we use a network drive, which is not the fastest, this might be the underlying problem...
I should also note that we limit all of our builds to only keep 10 at a time. So a large number of runs shouldn't be related to us, unless a "large number" is 10.
hrm ok - so it isn't quite clear which problem we are talking about - other than logged in is sometimes slower...
Honestly, it could be quite a few things contributing to this. Setting up some kind of test for this against classic seems reasonable, although fragile.
Testing Notes:
- We should look at performance testing of Blue Ocean, especially in environments which have large numbers (hundreds? Over 1000?) of projects and/or branches.
- These performance tests could be designed to compare the same basic workflows between Classic and Blue Ocean...although that feels like it'll be a needy, fragile test.
kshultz that would be most awesome - for extra points if we could delta versions (not easy though), could be good.
Dashboard loading, result page loading and activity screen are known problem areas.
bksaville seckler I am having trouble making sense of the support bundles just now ("slow requests" aren't showing up in them"). If you are able to experiment then disabling the blue ocean JIRA plugin (new since 1.3) may make sense (although if you saw this with 1.2 - then disregard).
EDIT: also if you could turn on "slow requests" as an option when you generate the support bundle, would be helpful.
vivek I am seeing some stack traces like this
"Handling GET /mardyn/blue/rest/organizations/jenkins/pipelines/PIPE_NAME/runs/ from 0:0:0:0:0:0:0:1 : qtp997608398-19" id=19 (0x13) state=RUNNABLE cpu=92%
at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
at java.io.File.exists(File.java:819)
at hudson.model.AbstractBuild.calcChangeSet(AbstractBuild.java:871)
at hudson.model.AbstractBuild.getChangeSet(AbstractBuild.java:843)
at hudson.model.AbstractBuild.getChangeSets(AbstractBuild.java:857)
at jenkins.scm.RunWithSCM.hasParticipant(RunWithSCM.java:146)
which implies it is our friend calcChangeSet hurting things again - this is non multi branch so isn't the multibranch activity calculation but just listing a page of runs (ci.jenkins.io DOES use multibranch on the core jenkins project, and is very slow, but may be same thing... if we can get s support core from them too may see that). More investigation needed...
Let me check that the JIRA plugin isn't actually doing something dumb.
So kzantow has a ticket for the user info fetching that may be slowing down the logged in case.
This might just be something in my brain, I can't find a related ticket. However, I am fairly sure that the issue has to do with fetching one specific type of user data.
michaelneale, our problem definitely only occurs when I'm logged in. I don't believe others see it when they are logged in, possibly only system administrators. I would be getting a lot more complaints if either of those were not true since we have 100s of internal users that are not exactly forgiving
ok bksaville good to know, so the not logged in users are ok (presumably). do you have a lot of favourites by any chance? A lot of runs? is it a specific pipeline that loads slow while others (with less runs) load fast?
OK I have removed that other ticket, this still seems primarily related to being logged in. bksaville do you have a support bundle - sorry can't seem to tell the files apart (if you do, can you link which one is yours in the comment?)
michaelneale for me the issue was definitely the favorites. The autofavorite is going crazy on me, which is why I also filed https://issues.jenkins-ci.org/browse/JENKINS-47214.
When I remove the favorites, the performance is about the same (though it still could be faster ). I have also checked with some users that their performance is the same (they have no favorites).
michaelneale, I don't have any favorites - not even sure how to add them to be honest We only keep 10 runs at a time for almost all of our jobs, so it wouldn't be a large number of runs (unless you count the build # being higher, as in 100-300, but I've never noticed this affecting load time, only whether I am logged in or not).
I will add a support bundle again - I didn't have slow requests in my last one, but I just can't do it today (on call until tomorrow afternoon).
bksaville there is a new version of the autofavorite plugin that allows you to turn off the auto favorite behaviour in the user profile settings. Look for 1.2.0 in a few hours - should be out soon.
michaelneale and jamesdumay, I really should have followed up with schulzha problem earlier - that was precisely my problem as well. I just never configured anything with favorites and didn't realize there was an autofavorite process of any kind. It's nice I'll be able to turn it off (although it now requires a newer jenkins version..sigh, I'm on 2.73.2, just one point release short), but I'm worried that others will run into this as well and won't know. Is there a way to globally disable the autofavorite process?
We use another system to see the bigger picture of what's going on with all of our jobs, so favorites aren't useful to us at all.
ok some more investiagtion bksaville imeredith:
So it looks like the favourite thing has been solved with a release of autofav plugin - allowing people to opt out (or system wide opt out). You still need to remove the favourites, but I think with this the logged in slowness case may be solved..
However, this does leave the general slowness around activity screens. Some new facts:
- ci.jenkins.io has this slowness on the core pipeline, but the number of runs doesn't seem too high, so it may not be related to number of runs (others seem to be reporting similar)
- ci.jenkins.io has a lot of branches on the core pipeline, so some of the slowness may be to do with that
- ci.blueocean.io also has a similar number of runs, and many branches, but doesn't seem slow at all. so there may be some plugin/config of jenkins.io that changes its behavior
- You can see this on this URL: https://ci.blueocean.io/blue/rest/organizations/jenkins/pipelines/blueocean/runs/?start=0&limit=26
- loads pretty fast
- However: https://ci.jenkins.io/blue/rest/organizations/jenkins/pipelines/Core/jenkins/runs/?start=0&limit=26 is really slow
- You can see this on this URL: https://ci.blueocean.io/blue/rest/organizations/jenkins/pipelines/blueocean/runs/?start=0&limit=26
- This seems to be a regression on 1.3 (but still not clear)
- If you reduce the number of rows returned on ci.jenkins.io it is fast:
- https://ci.jenkins.io/blue/rest/organizations/jenkins/pipelines/Core/jenkins/runs/?start=0&limit=5
- So probably not related to pagination? Maybe just N branches * 26 rows of each to be fetched?
- Why is it so much faster just for that small diff of data?
- https://ci.jenkins.io/blue/rest/organizations/jenkins/pipelines/Core/jenkins/runs/?start=0&limit=5
kshultz do you happen to have a test set of data that could work with 1.1, 1.2, 1.3 to compare activity screen load times for a multibranch project (large number of branches perhaps?), so we know this is a regression or not?
batmat is going to get a stacktrace while slowness is happening.
teilo has suggested it may be to do with there being many unstable runs of a specific pipeline, and it has to (erroneosly) recursively load the runs (this may explain why in some cases it is bad, but not others).
Both of them confirm it seem to be 1.3 that is generally worse
michaelneale, apologies, I missed your question from a few days ago. The short answer is no, I don't have a particular dataset, but do have some ideas:
- We could pick from any number repositories to fork in order to create a test data set. In JENKINS-45372, teilo was using Apache Maven. That one's got 29 branches in it and would give a test instance plenty of "stuff to do," so to speak. But it's not hundreds of branches.
- We could create a test repo programatically, and test performance that way. Perhaps as an offshoot of BitbucketServerTest. That might make things nicely self-contained. And would give us some additional Bitbucket coverage implicitly, which might be a welcome "side effect."
I thought I'd created an Improvement ticket for at least starting the ball rolling on performance testing of BO, but I don't see it. I'll do so later today.
After spending time investigating the activity API, I have hit a roadblock.
The performance issues comes while the data is being serialised and sent to the client. This is as opposed to before serialisation happens while we are generating the iterators of BlueRuns. In the past when we have had performance issues it has been during the pre-serialization step that has been problematic.
This problem only seems to affect multibranch activity api. And only for some multibranch pipelines. On the blueocean ci server, we have 400ish branches and many builds with no issue, however the jenkinsci ci server has issues with <150 branches and probably less builds.
Because multibranch activity is loading the 25 latest runs from all branches, it may be loading the last run from 25 different branches. kzantow pointed out that it might be a slowness related to loading runs from so many branches, which would not be as obvious if for example the last 25 runs only came from 2 branches despite many more branches existing. This could explain the difference between what I see on the CI servers.
I'm not sure what is the best way forward here. I see [at least] 3 alternatives
- Something can be fixed in core or pipeline to make this faster assuming the issue is there.
- Add some caching to the activity api. Maybe use an H2 database or just basic in memory cache.
- Make the branches page the default page instead of activity for multibranch pipelines. This doesn't fix anything but maybe makes it less annoying in the default case.
vivek in light of Ivan's comments above - do you have any more ideas? I still don't have concrete info that it is a regression as of 1.3, but it seems a bit slower. Perhaps it is time to bite the bullet and cache activity screen? (the logged in case should be solved by now BTW).
Bug identified and fixed. PR opened https://github.com/jenkinsci/blueocean-plugin/pull/1632.
Details on what was causing and fix:
Analyzing har file showed, FavoriteStatePreloader was returning large number of favorite jobs resulting in most time taken responding to loading dashboard. Bug was in `FavoriteContainer.iterator()` not paginating. Fix adds pagination by default.
I don't see a problem with that PR, but doesn't the page display all favorites? So I'm not sure how paging would fix that.
kzantow Currently yes, it fetches all possible favorites. container iterators are supposed to return default page size. If you call favorite API you get default page size so it makes it consistent.
Favorites are expensive and displaying 100s of favorite is rather pain to user as bunch of them are auto-favorited. I think a new ticket should be opened to add UI pagination support ('Show more') for favorite as well. Current page size is 100, maybe it should be 26 like other things shown in UI?
kzantow over to you as discussed. Listing the issues we discussed to fix as part of this improvement:
- Paginate favorite list on dashboard (my fix fixes that, that is default list of favorite is default page size of 100, possibly it should be 26 like other objects)
- Evaluate if top level pipeline object can include 'favorite' as boolean value without much impact
- Minimize 'item' object properties to include only what favorite UI needs (name of pipeline, favorited or not, commitId...)
- Fix bug in frontend where it calls favorites API even though it already got pre-loaded list of favorites
sophistifunk as mentioned - worth taking a look at (I will take a look at test failures too)
schulzha what authentication backend are you using? LDAP? What plugin?
I received the HAR file - thank you