-
Bug
-
Resolution: Duplicate
-
Major
-
Symptoms:
periodically Jenkins UI becomes very slow, sometimes returns 504, restart Jenkins service helps, but just for a short period of time. The periodicity of problems depends on a number of users, which use UI. No strange logs, just Jenkins web UI respond time becomes critical (nginx errors: "upstream timed out (110: Connection timed out) while reading response header from upstream")
Average CPU load < 10%, Memory usage ~ 500Mb (heap size ~ 2.5Gb), Active Threads has critical values (max=38) and it matches with the time of UI problems
Graphs (from Monitoring plugin) in attachments
Monitoring plugin also has "Current requests" view, and there I find reason of this problem: lots of pending requests for same URLs, something like "/job/MY-JOB/changes GET", "/job/MY-JOB/BUILD-ID/wfapi/changesets?=1471452733792 ajax GET"_
I tried to kill this requests (to free threads) using "Kill" button in Monitoring plugin, and after that I find helpful log records.
Second record points on that Jenkins tries to loadUserByUsername and it's strange, because I configured Jenkins search query to search users by uid (login) (according to official configuration guide), but not "FirstName LastName".
Than I checked ldap server logs and discovered high CPU usage, high LA, and tons of incorrect search queries from Jenkins in slapd.log:
Aug 17 17:36:14 ldap-server slapd[1183]: conn=249907 fd=26 ACCEPT from IP=5.2.1.1:43078 (IP=0.0.0.0:636) Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 fd=26 TLS established tls_ssf=256 ssf=256 Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" method=128 Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" mech=SIMPLE ssf=0 Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=0 RESULT tag=97 err=0 text= Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=1 SRCH base="ou=people,dc=example,dc=com" scope=2 deref=3 filter="(uid=firstname lastname)" Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text= Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=2 UNBIND Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 fd=26 closed Aug 17 17:36:15 ldap-server slapd[1183]: conn=249911 fd=25 ACCEPT from IP=5.2.1.1:43132 (IP=0.0.0.0:636) Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 fd=25 TLS established tls_ssf=256 ssf=256 Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" method=128 Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" mech=SIMPLE ssf=0 Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=0 RESULT tag=97 err=0 text= Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=1 SRCH base="ou=people,dc=example,dc=com" scope=2 deref=3 filter="(uid=firstname lastname)" Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text= Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=2 UNBIND Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 fd=25 closed ....
More then 37000 incorrect searches/day, just for my user
Problem:
It seems, that jenkins tries to check user privileges before executing (some?) requests, and while forming search query for LDAP it uses "Full Name"/username, but not ID/login/uid -> ldap can't find anything -> empty result -> jenkins tries one more time to verify privileges -> loop -> busy jenkins threads/workers/executors -> HTTP 504
Workarounds:
- Not great, but possible: use big cache for LDAP (in jenkins "Configure Global Security" preferences), it didn't fix, but can minimize impact of this problem (not 100% sure)
- Like a fix: use custom LDAP search query (in jenkins "Configure Global Security" preferences), smth. like:
(|(uid={0})(cn={0}))
(don't forget to add 'cn' arrtibute to LDAP index)