-
Task
-
Resolution: Fixed
-
Minor
-
None
-
Powered by SuggestiMate -
Evergreen - Milestone 1
- blocks
-
JENKINS-51008 Complete the PCT flow started in JENKINS-50540 for git plugin
-
- In Progress
-
- depends on
-
JENKINS-51205 Slow performance of /pluginManager/available rendering
-
- Resolved
-
- is related to
-
JENKINS-51205 Slow performance of /pluginManager/available rendering
-
- Resolved
-
- links to
[JENKINS-50790] GitPluginTest are failing in the ATH
It runs perfectly locally without using the ath docker container, trying with that
It fails to me locally when using the ath docker image to run the tests
It seems the problem is a timeout trying to load the available plugins page, I am going to debug locally with the docker image to get an idea of what is happening, but seems not only related to git plugin but also other plugins in the last ATH runs
Confirmed that is failing when trying to install plugins via @WithPlugins rule, after adding the MockUpdateCenter and clicking check now it tries to visit the available plugins page and there is where the timeout happens, it seems it takes too much to load, need to properly confirm that but seems a performance issue
Confirmed the performance issue, by increasing the load timeout the tests pass, I have also checked that loading the list of available plugins takes a lot of time, that is the reason for the timeout, checking now if I can reduce that load time as I do not like to increase timeouts. The fact that it works perfectly fine without the local container may indicate that some extra resources are needed and not a change in the code.
This is probably impacting other tests also
cc olivergondza
Right, a lot of the tests started to fail once moved from jenkins.ci.cloudbees.com to ci.jenkins.io (ATH is run in container now): https://ci.jenkins.io/job/Core/job/jenkins/job/stable-2.107/lastCompletedBuild/testReport/.
If you do have the resources to investigate and fix, it would be great. If not, I am ok increasing the timeout to get reliable build.
I will try to investigate and fix if possible and will increase timeout as a last resort, thanks!
Trying to use `ElasticTime.factor` to increase the timeout without impacting the java code, if this works I am going to create a PR to the ath-container script to make sure the tests in the container are run with the appropriate factor
Do we have an idea if everything is slowed down or just the plugin page? That would suggest if your approach is the correct one or not...
The ElasticTime.factor seems to fix the things for me locally, also GitPluginTest#create_tag_for_build is failing just because there is no git username and email defined inside the container.
I believe I can do a PR to solve this once I get rid of some urgent thing so probably nothing until tomorrow or next day
olivergondza You are right, for the moment the only slowness I have found is located in the plugin management page in the available section, so increasing all timeouts via ElasticTime.factor is probably not the best idea, I am going to just increase the page load timeout temporarily on that part and see if this still works, if it doesn't then I will keep on the ElasticTime.factor approach
It seems the PR went fine and solved not only GitPluginTest but also other affected by the same problem, waiting for the PR to be merged to close this
Code changed in jenkins
User: Raul Arabaolaza
Path:
src/main/java/org/jenkinsci/test/acceptance/FallbackConfig.java
src/main/java/org/jenkinsci/test/acceptance/po/PluginManager.java
http://jenkins-ci.org/commit/acceptance-test-harness/012364d1a5f6386fab49198c16d00e0cd309a7d9
Log:
JENKINS-50790 Temporarily override pageLoadTimeout for available plugins
This should fix most of GitPluginTest and some others that are failing due to the
available plugins taking too much time to load on ci.jenkins.io
Code changed in jenkins
User: Raul Arabaolaza
Path:
src/test/java/plugins/GitPluginTest.java
http://jenkins-ci.org/commit/acceptance-test-harness/81589c3da0b5788917dfc2faaaec78b89c4edf27
Log:
JENKINS-50790 Fix GitPluginTest#create_tag_for_build
The docker container where the ATH is run doe snot provide a global git
user or mail, so I added the Custom Name and Mail option to the test so
it can properly tag things
Code changed in jenkins
User: Raul Arabaolaza
Path:
src/main/java/org/jenkinsci/test/acceptance/FallbackConfig.java
src/main/java/org/jenkinsci/test/acceptance/po/PluginManager.java
http://jenkins-ci.org/commit/acceptance-test-harness/7ba7a3983ac1d09da5ded9ec134f205e7383782c
Log:
JENKINS-50790 Address feedback
Code changed in jenkins
User: Oliver Gondža
Path:
src/main/java/org/jenkinsci/test/acceptance/FallbackConfig.java
src/main/java/org/jenkinsci/test/acceptance/po/PluginManager.java
src/test/java/plugins/GitPluginTest.java
http://jenkins-ci.org/commit/acceptance-test-harness/dad333092159cb368efc2f9869572f0a05d255ac
Log:
Merge pull request #428 from raul-arabaolaza/JENKINS-50790
JENKINS-50790 Temporarily override pageLoadTimeout for available plugins
Compare: https://github.com/jenkinsci/acceptance-test-harness/compare/ab1fb4738cb0...dad333092159
Months ago I tried to fix the same problem with increasing the timeout and it didn't solve all the timeout issues, see the PR [here|https://github.com/jenkinsci/acceptance-test-harness/pull/408/files. I also see on your PR that after increasing the timeout, there are still tests failing with the same error, so I think that the root cause of the long page loading has not been determined and we haven't really tackled it. On investigation I realised that there is a JS error on "Available" tab that might be preventing the page from loading and causing all these issues.
It could also be that the click on the available tab happens before the advanced tab has finished loading and it seems like webdriver.get returns as soon as the page fires the load event but the page hasn’t actually finished loading.
See here it seems that the ATH is still flaky, I suspect due to load
jglick As you are the one that did the original implementation of MockUpdateCenter can you imagine a reason why it could have a performance degradation under core >= 2.112 with heavy load?
The delay in the rendering of the available plugins page is obvious even running locally (not in the ATH, just starting the war file) when you do not have any plugin installed, that rendering time (which goes worst which heavy load) is the root cause of this issue,
I have performed a little experimentation on load time for the available plugins, when running the ATH and using the MockUpdateCenter the call to get all available plugins oscillates between 25 and 56 seconds running locally, when not using the MockUpdateCenter it takes less than 10 seconds.
I am starting to believe that the only way to solve this (as I can not increase the page load timeout forever) is to use groovy scripts to install plugins instead of doing directly by the UI.
I am experimenting with skipping the available plugins page and just request the installation via a REST call, initial experiments seem promising
There have been past attempts to load plugins via REST rather than GUI, and that code is still there; you just need a switch to activate it. It does not work all that well as the plugin manager has some subtle logic which is not easily replicated. I would rather just track down the cause of delays and fix them. Run a profiler or whatever. Is the problem in JavaScript or the mock UC service itself?
What I can see is the http call to pluginManager/available performed by the UI when clicking on the Available tab the one that takes too much time. My understanding is that the endpoint calls the mock UC under the covers, as I have checked that same call with an instance run via java -jar (outside ATH) and it takes much less time ~ 10 seconds
Maybe you refer to this code path? I am not going that way, I am making a request to trigger the installation via UC instead of using the buttons in the `Available Plugins Page` and let it handle dependencies and all that stuff, not sure however about versioning constraints at this point
So after a conversation with jglick it seems the problem is in the core itself since 2.112, the addition of detached plugins causing an exponential explosion in a recursive call. He is working on a fix, in the meanwhile a temporary increase in the timeout should be enough to make everything stable again
Not sure if the fix in JENKINS-51205 is enough, I am still receiving email alerts about instability on git-plugin, however at this moment ci.jenkins.io is down so I can not check the status
It seems it worked, I am going to get this open for a pair of days and close if nothing new arises
Last builds of git-plugin have no failures, so I am going to close this.
I have seen other environments when this has worked (with the exception of GitPluginTest#create_tag_for_build ) which has been failing for a lot of time So I am running this locally to see if this could be an infra problem or is related to tests