-
Bug
-
Resolution: Fixed
-
Critical
-
1.483+, also LTS updates with SECURITY-44
-
Powered by SuggestiMate
I'm looking for guidance on debugging slow page load times. Ever since we upgraded from 1.480 to 1.484 (and other versions beyond that) we've had very slow page load times. Sometimes upwards of 120 seconds to load a page, nearly any view tab, job configuration etc.
I've debugged many Java apps and I'm confounded. There are few, if any, Full GC's occuring, regular GC's occur somewhat frequently but take .02 or less seconds. CPU utilization for the java executeable is under %6 and and disk utilization is reasonable (we're averaging a %90+ idle time). The initial HTTP ack comes back immediately but then we wait for the page response to come back for up to 2 minutes sometimes. Othertimes it loads reasonably fast (under 4 seconds). When it's hanging, it's hanging for all requests, much like it was doing a full garbage collect. It feels like some kind of resource contention.
But I cannot find where the contention is, thread dumps look nominal (I've included one here for example). Something causes long page load times, we back revved to 1.484 thinking it had something to do with lazy loading features but clearly it does not. On 1.480 we did not have these issues. I could use some help figuring out what else to look at to identify why Jenkins is slow. There is a lack of information available on common reasons for slow page load times, this results in a terrible user experience with an otherwise fine tool.
[JENKINS-16474] Slow/hung web UI in 1.483+ (stuck in parseURI)
1.483 you mean: winstone 0.9.10-jenkins-38. https://github.com/jenkinsci/winstone/commit/dcc2b4c847ba57da137cc4dd83af585bbfbea41f
Code changed in jenkins
User: Jesse Glick
Path:
README.html
contrib/README_jp.html
src/java/winstone/LocalStrings.properties
http://jenkins-ci.org/commit/winstone/ffe79e8789bfe42edf04b2dc88539942804b7e1d
Log:
Diagnosis of JENKINS-16474 complicated by the fact that documentation did not match revised default values.
–
You received this message because you are subscribed to the Google Groups "Jenkins Commits" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-commits+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Characteristic thread dump seems to have ~20 threads (the new default maximum) stuck in parseURI:
"RequestHandlerThread[#…]" Id=… Group=main RUNNABLE (in native) at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.net.SocketInputStream.read(SocketInputStream.java:182) at winstone.WinstoneInputStream.read(WinstoneInputStream.java:49) at javax.servlet.ServletInputStream.readLine(ServletInputStream.java:27) at winstone.WinstoneInputStream.readLine(WinstoneInputStream.java:108) at winstone.HttpListener.parseURI(HttpListener.java:239) at winstone.RequestHandlerThread.run(RequestHandlerThread.java:75) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
Seems that perhaps Winstone is receiving socket connections which do not send any data. Setting a larger number of handler threads might merely cause this thread leak to not be encountered as quickly.
The fix of SECURITY-44 included a general Winstone update, so 1.480.x LTS users would also be affected.
Symptoms look similar to a SYN flood or similar denial-of-service attack. Attached test simulates a basic attack in which clients try to read from the HTTP socket without sending any headers; it passes but takes ~25 seconds while the “legitimate” request is waiting in queue.
Users running Jenkins on an intranet have reported this, however, so a malicious attack is probably not the cause.
Though I have not yet managed to reproduce the problem, I speculate that the following would serve as a workaround (effective only within the current Jenkins session, so just for evaluation): go to /script and run
f = winstone.HttpListener.getDeclaredField('KEEP_ALIVE_TIMEOUT') f.accessible = true println(f.get(null)) f.set(null, 1000) println(f.get(null))
Code changed in jenkins
User: Kohsuke Kawaguchi
Path:
src/java/winstone/HttpListener.java
src/java/winstone/cmdline/Option.java
http://jenkins-ci.org/commit/winstone/3079a063cdd599ed60b0384a9695f38a7f379aba
Log:
[FIXED JENKINS-16474]
Based on the feedback from several users, easing this up a little bit, and also reducing the timeout since we see thread dumps where most threads are blocked at the parseURI line.
Code changed in jenkins
User: Kohsuke Kawaguchi
Path:
src/java/winstone/RequestHandlerThread.java
http://jenkins-ci.org/commit/winstone/a709000b18fea71d10391d9d20549e711e8820b4
Log:
JENKINS-16474
For better diagnosability, report the remote client IP address when blocking
Code changed in jenkins
User: Kohsuke Kawaguchi
Path:
changelog.html
war/pom.xml
http://jenkins-ci.org/commit/jenkins/4b1a95f23f19c57d8cf48ea0b1b30aaee541db27
Log:
[FIXED JENKINS-16474]
Fixed the HTTP request thread saturation problem with Winstone.
–
You received this message because you are subscribed to the Google Groups "Jenkins Commits" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-commits+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Code changed in jenkins
User: Kohsuke Kawaguchi
Path:
war/pom.xml
http://jenkins-ci.org/commit/jenkins/c01281bdfa4ee88c5860291ef6bdda899a242e7f
Log:
[FIXED JENKINS-16474]
Fixed the HTTP request thread saturation problem with Winstone.
(cherry picked from commit 4b1a95f23f19c57d8cf48ea0b1b30aaee541db27)
Conflicts:
changelog.html
Code changed in jenkins
User: Kohsuke Kawaguchi
Path:
war/pom.xml
http://jenkins-ci.org/commit/jenkins/6801cefc2bbb2ff827178affc6f9274efa4baf7a
Log:
[FIXED JENKINS-16474]
Fixed the HTTP request thread saturation problem with Winstone.
(cherry picked from commit 4b1a95f23f19c57d8cf48ea0b1b30aaee541db27)
Conflicts:
changelog.html
Integrated in jenkins_main_trunk #2283
[FIXED JENKINS-16474] (Revision 6801cefc2bbb2ff827178affc6f9274efa4baf7a)
Result = SUCCESS
kohsuke : 6801cefc2bbb2ff827178affc6f9274efa4baf7a
Files :
- war/pom.xml
Does anyone have any advice on how to reproduce this problem in 1.480.1? In large environments with a lot of real traffic, 1.480.1 appears to be experiencing this issue and becomes essentially unusable. However, in a smaller environment with much less traffic, I cannot get this defect to appear.
What I'm trying to do is prove that a) this is the issue we are seeing and not something else, and b) that it is indeed gone in 1.480.3. I've tried Apache Benchmark with 20+ concurrency, and I've tried simulating traffic with 20+ tabs open with auto-refresh on...
This upgrade is a little more risky since the whole 're-keying' process will take effect, so I want to make sure we won't have to rollback... again...
Any advice would be greatly apprciated!
I am afraid I was never able to reproduce the problem in a test environment; the fix is based on a combination of speculation, and feedback from some people I know to have encountered matching symptoms.
Fix was incomplete. Intended to add an option --httpKeepAliveTimeout (or --httpsKeepAliveTimeout) but this was not done correctly.
Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/Main.java
http://jenkins-ci.org/commit/extras-executable-war/7d559b9e84634f3be06cddef6199042bdaf94982
Log:
JENKINS-16474 Sychronize --help with changed defaults in Winstone.
Compare: https://github.com/jenkinsci/extras-executable-war/compare/f73b889e4d8f...7d559b9e8463
–
You received this message because you are subscribed to the Google Groups "Jenkins Commits" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-commits+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Code changed in jenkins
User: Jesse Glick
Path:
src/java/winstone/cmdline/Option.java
http://jenkins-ci.org/commit/winstone/b2c2bff8d1e03c5d5fd46843b22d9126fd5e9c81
Log:
JENKINS-16474 Make --http
KeepAliveTimeout actually work.
Compare: https://github.com/jenkinsci/winstone/compare/78a00ae807d9...b2c2bff8d1e0
–
You received this message because you are subscribed to the Google Groups "Jenkins Commits" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-commits+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Code changed in jenkins
User: Jesse Glick
Path:
changelog.html
war/pom.xml
http://jenkins-ci.org/commit/jenkins/fa6c9ba35de4ec98f5487f296bab7ee1b7fb2da5
Log:
[FIXED JENKINS-16474] Winstone and executable WAR upgraded to actually support --httpKeepAliveTimeout.
Integrated in jenkins_main_trunk #2350
[FIXED JENKINS-16474] Winstone and executable WAR upgraded to actually support --httpKeepAliveTimeout. (Revision fa6c9ba35de4ec98f5487f296bab7ee1b7fb2da5)
Result = SUCCESS
Jesse Glick : fa6c9ba35de4ec98f5487f296bab7ee1b7fb2da5
Files :
- war/pom.xml
- changelog.html
Thanks Jesse, I'm unable to reproduce it as well.
Can anyone confirm if they were experiencing this issue, and now with this partial (or full) fix, they're not?
@jfigler: I just was pointed to this JIRA issue: we're running 1.480.1, experiencing very slow network connections to the Jenkins UI when a non-trivial number of users is using Jenkins, and until now (running for an afternoon) it looks like the Groovy script from this earlier comment from Feb 7 makes the problem go away.
@jglick: Which LTS version will contain this fix? And is there a way to change the httpKeepAliveTimeout for Winstone more permanently, in 1.480.1?
Thanks for your work on this!
@marnix_klooster: the latest fix is in 1.506 and therefore also 1.509.1 LTS; --httpKeepAliveTimeout may be passed on the command line (see --help for options).
If anyone encountering this issue is using an Apache proxy, JENKINS-10524 suggests Proxy-nokeepalive, though that is probably unrelated.
Hi,
I believe I am getting similar behaviour, using Jenkins 1.512
Loading pages is ok sometimes, but it is really slow in other occasions! sometimes several minutes to load few pages only, or sometimes loading any page!
@wael your issue may not have anything to do with this fix. Please use a separate ticket (using JIRA links as needed) unless you can confirm at the code level that this fix did not do what it claimed to do.
Also please try the new command-line tuning parameters introduced by the fix (--help for details), and if a thread dump confirms that many threads are still stuck in parseURI, attach that thread dump since it will now contain more information.
We have found if the jenkins master contains a large number of jobs and the user is only allowed to access a subset then the UI takes a long time to load. If the user is granted read access to everything then it loads very quickly. We suspect (though cannot confirm) that the matrix authorization plugin is not working as efficiently as it could.
We believe the following is happening:
1. plugin is checking every job against every LDAP group the user is a member of
2. plugin is then checking every job against the specific LDAP user
3. it is possible the plugin is making individual calls to an LDAP server for each job
FYI we found a change note in Jenkins 1.482 indicating that the number of worker threads was reduced to 10 from 1000. We traced our issue to this problem. We can resolve it by upping the thread count the winstone container is using. We set the min threads to 50 and the max threads to 500.