Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-37489

Jenkins UI slow page load (big TTFB) - possible LDAP issue

      Symptoms:

      periodically Jenkins UI becomes very slow, sometimes returns 504, restart Jenkins service helps, but just for a short period of time. The periodicity of problems depends on a number of users, which use UI. No strange logs, just Jenkins web UI respond time becomes critical (nginx errors: "upstream timed out (110: Connection timed out) while reading response header from upstream")
      Average CPU load < 10%, Memory usage ~ 500Mb (heap size ~ 2.5Gb), Active Threads has critical values (max=38) and it matches with the time of UI problems
      Graphs (from Monitoring plugin) in attachments

      Monitoring plugin also has "Current requests" view, and there I find reason of this problem: lots of pending requests for same URLs, something like "/job/MY-JOB/changes GET", "/job/MY-JOB/BUILD-ID/wfapi/changesets?=1471452733792 ajax GET"_
      I tried to kill this requests (to free threads) using "Kill" button in Monitoring plugin, and after that I find helpful log records.

      Second record points on that Jenkins tries to loadUserByUsername and it's strange, because I configured Jenkins search query to search users by uid (login) (according to official configuration guide), but not "FirstName LastName".

      Than I checked ldap server logs and discovered high CPU usage, high LA, and tons of incorrect search queries from Jenkins in slapd.log:

      Aug 17 17:36:14 ldap-server slapd[1183]: conn=249907 fd=26 ACCEPT from IP=5.2.1.1:43078 (IP=0.0.0.0:636)
      Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 fd=26 TLS established tls_ssf=256 ssf=256
      Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" method=128
      Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" mech=SIMPLE ssf=0
      Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=0 RESULT tag=97 err=0 text=
      Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=1 SRCH base="ou=people,dc=example,dc=com" scope=2 deref=3 filter="(uid=firstname lastname)"
      Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text=
      Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=2 UNBIND
      Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 fd=26 closed
      Aug 17 17:36:15 ldap-server slapd[1183]: conn=249911 fd=25 ACCEPT from IP=5.2.1.1:43132 (IP=0.0.0.0:636)
      Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 fd=25 TLS established tls_ssf=256 ssf=256
      Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" method=128
      Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" mech=SIMPLE ssf=0
      Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=0 RESULT tag=97 err=0 text=
      Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=1 SRCH base="ou=people,dc=example,dc=com" scope=2 deref=3 filter="(uid=firstname lastname)"
      Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text=
      Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=2 UNBIND
      Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 fd=25 closed
      ....
      

      More then 37000 incorrect searches/day, just for my user

      Problem:

      It seems, that jenkins tries to check user privileges before executing (some?) requests, and while forming search query for LDAP it uses "Full Name"/username, but not ID/login/uid -> ldap can't find anything -> empty result -> jenkins tries one more time to verify privileges -> loop -> busy jenkins threads/workers/executors -> HTTP 504

      Workarounds:
      1. Not great, but possible: use big cache for LDAP (in jenkins "Configure Global Security" preferences), it didn't fix, but can minimize impact of this problem (not 100% sure)
      2. Like a fix: use custom LDAP search query (in jenkins "Configure Global Security" preferences), smth. like:
        (|(uid={0})(cn={0}))

        (don't forget to add 'cn' arrtibute to LDAP index)

        1. activeThreads.png
          activeThreads.png
          32 kB
        2. usedMemory.png
          usedMemory.png
          44 kB
        3. cpu.png
          cpu.png
          18 kB

          [JENKINS-37489] Jenkins UI slow page load (big TTFB) - possible LDAP issue

          Andrii Melnyk created issue -
          Andrii Melnyk made changes -
          Environment Original: jenkins 2.7.2
          jenkins LDAP plugin 1.12
          New: jenkins 2.7.2 from http://pkg.jenkins-ci.org/debian-stable
          jenkins LDAP plugin 1.12
          Ubuntu 16.04
          Andrii Melnyk made changes -
          Description Original: h5. Symptoms:
          periodically Jenkins UI becomes very slow, sometimes returns 504, restart Jenkins service helps, but just for a short period of time. The periodicity of problems depends on a number of users, which use UI. No strange logs, just Jenkins web UI respond time becomes critical (nginx errors: "upstream timed out (110: Connection timed out) while reading response header from upstream")
          Average CPU load < 10%, Memory usage ~ 500Mb (heap size ~ 2.5Gb), Active Threads has critical values (max=38) and it matches with the time of UI problems
          Graphs (from [Monitoring plugin|https://wiki.jenkins-ci.org/display/JENKINS/Monitoring]) in attachments

          Monitoring plugin also has "Current requests" view, and there I find reason of this problem: lots of pending requests for same URLs, something like _"/job/MY-JOB/changes GET"_, _"/job/MY-JOB/BUILD-ID/wfapi/changesets?_=1471452733792 ajax GET"_
          I tried to kill this requests (to free threads) using "Kill" button in Monitoring plugin, and after that I find [helpful log records|https://gist.github.com/gips0n/94dec48afcc61c1960f8db2a8e60a43e].

          Second record points on that Jenkins tries to *loadUserByUsername* and it's strange, because I configured Jenkins search query to search users by uid (login) (according to [official configuration guide|https://wiki.jenkins-ci.org/display/JENKINS/LDAP+Plugin#LDAPPlugin-Configuration]), but not "FirstName LastName".

          Than I checked ldap server logs and discovered high CPU usage, high LA, and tons of incorrect search queries from Jenkins in slapd.log:
          {noformat}
          Aug 17 17:36:14 ldap-server slapd[1183]: conn=249907 fd=26 ACCEPT from IP=5.2.1.1:43078 (IP=0.0.0.0:636)
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 fd=26 TLS established tls_ssf=256 ssf=256
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" method=128
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" mech=SIMPLE ssf=0
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=0 RESULT tag=97 err=0 text=
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=1 SRCH base="ou=people,dc=example,dc=com" scope=2 deref=3 filter="(uid=firstname lastname)"
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text=
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=2 UNBIND
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 fd=26 closed
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249911 fd=25 ACCEPT from IP=5.2.1.1:43132 (IP=0.0.0.0:636)
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 fd=25 TLS established tls_ssf=256 ssf=256
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" method=128
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" mech=SIMPLE ssf=0
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=0 RESULT tag=97 err=0 text=
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=1 SRCH base="ou=people,dc=example,dc=com" scope=2 deref=3 filter="(uid=firstname lastname)"
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text=
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=2 UNBIND
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 fd=25 closed
          ....
          {noformat}
          More then 37000 incorrect searches/day, just for my user

          h5. Problem:
          It seems, that jenkins tries to check user privileges before executing (some?) requests, and while forming search query for LDAP it uses "Full Name"/username, but not ID/login/uid -> ldap can't find anything -> empty result -> jenkins tries one more time to verify privileges -> loop -> busy jenkins threads/workers/executors -> HTTP 504

          h5. Workarounds:
          # Not great, but possible: use big cache for LDAP (in jenkins "Configure Global Security" preferences), it didn't fix, but can minimize impact of this problem (not 100% sure)
          # Like a fix: use custom LDAP search query (in jenkins "Configure Global Security" preferences), smth. like: {noformat}(|(uid={0})(cn={0})){noformat} (don't forget to add cn to LDAP index)
          New: h5. Symptoms:
          periodically Jenkins UI becomes very slow, sometimes returns 504, restart Jenkins service helps, but just for a short period of time. The periodicity of problems depends on a number of users, which use UI. No strange logs, just Jenkins web UI respond time becomes critical (nginx errors: "upstream timed out (110: Connection timed out) while reading response header from upstream")
          Average CPU load < 10%, Memory usage ~ 500Mb (heap size ~ 2.5Gb), Active Threads has critical values (max=38) and it matches with the time of UI problems
          Graphs (from [Monitoring plugin|https://wiki.jenkins-ci.org/display/JENKINS/Monitoring]) in attachments

          Monitoring plugin also has "Current requests" view, and there I find reason of this problem: lots of pending requests for same URLs, something like _"/job/MY-JOB/changes GET"_, _"/job/MY-JOB/BUILD-ID/wfapi/changesets?_=1471452733792 ajax GET"_
          I tried to kill this requests (to free threads) using "Kill" button in Monitoring plugin, and after that I find [helpful log records|https://gist.github.com/gips0n/94dec48afcc61c1960f8db2a8e60a43e].

          Second record points on that Jenkins tries to *loadUserByUsername* and it's strange, because I configured Jenkins search query to search users by uid (login) (according to [official configuration guide|https://wiki.jenkins-ci.org/display/JENKINS/LDAP+Plugin#LDAPPlugin-Configuration]), but not "FirstName LastName".

          Than I checked ldap server logs and discovered high CPU usage, high LA, and tons of incorrect search queries from Jenkins in slapd.log:
          {noformat}
          Aug 17 17:36:14 ldap-server slapd[1183]: conn=249907 fd=26 ACCEPT from IP=5.2.1.1:43078 (IP=0.0.0.0:636)
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 fd=26 TLS established tls_ssf=256 ssf=256
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" method=128
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" mech=SIMPLE ssf=0
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=0 RESULT tag=97 err=0 text=
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=1 SRCH base="ou=people,dc=example,dc=com" scope=2 deref=3 filter="(uid=firstname lastname)"
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text=
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 op=2 UNBIND
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249907 fd=26 closed
          Aug 17 17:36:15 ldap-server slapd[1183]: conn=249911 fd=25 ACCEPT from IP=5.2.1.1:43132 (IP=0.0.0.0:636)
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 fd=25 TLS established tls_ssf=256 ssf=256
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" method=128
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=0 BIND dn="cn=jenkins-user,ou=system,dc=example,dc=com" mech=SIMPLE ssf=0
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=0 RESULT tag=97 err=0 text=
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=1 SRCH base="ou=people,dc=example,dc=com" scope=2 deref=3 filter="(uid=firstname lastname)"
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text=
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 op=2 UNBIND
          Aug 17 17:36:16 ldap-server slapd[1183]: conn=249911 fd=25 closed
          ....
          {noformat}
          More then 37000 incorrect searches/day, just for my user

          h5. Problem:
          It seems, that jenkins tries to check user privileges before executing (some?) requests, and while forming search query for LDAP it uses "Full Name"/username, but not ID/login/uid -> ldap can't find anything -> empty result -> jenkins tries one more time to verify privileges -> loop -> busy jenkins threads/workers/executors -> HTTP 504

          h5. Workarounds:
          # Not great, but possible: use big cache for LDAP (in jenkins "Configure Global Security" preferences), it didn't fix, but can minimize impact of this problem (not 100% sure)
          # Like a fix: use custom LDAP search query (in jenkins "Configure Global Security" preferences), smth. like: {noformat}(|(uid={0})(cn={0})){noformat} (don't forget to add 'cn' arrtibute to LDAP index)
          Daniel Beck made changes -
          Resolution New: Duplicate [ 3 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]

            kohsuke Kohsuke Kawaguchi
            amelnyk Andrii Melnyk
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: