Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63737

Upgrading Active Directory Plugin to 2.17: OutOfMemoryError

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Labels:
    • Environment:
      OS version: CentOS 7.8
      Jenkins version: 2.249.1
      Plugin version:Active Directory 2.17
      openJDK version 1.8.0_262
    • Similar Issues:
    • Released As:
      active-directory-2.18

      Description

      After upgrading Active-Directory Plugin from 2.16 to 2.17 on three separate Jenkins servers, each server shortly started to give login errors and then became unresponsive via the UI (nginx reporting HTTP 502).

      The log contained the following:

      2020-09-18 13:47:43.269+0000 [id=1418]  WARNING h.i.i.InstallUncaughtExceptionHandler#handleException: Caught unhandled exception with ID cb4cc28e-5c87-4536-9e45-a191e1512e0e
      java.lang.OutOfMemoryError: unable to create new native thread
              at java.lang.Thread.start0(Native Method)
              at java.lang.Thread.start(Thread.java:717)
              at com.sun.jndi.ldap.Connection.<init>(Connection.java:244)
              at com.sun.jndi.ldap.LdapClient.<init>(LdapClient.java:137)
              at com.sun.jndi.ldap.LdapClient.getInstance(LdapClient.java:1609)
              at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2749)
              at com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:319)
              at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:192)
              at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:151)
              at hudson.plugins.active_directory.ActiveDirectorySecurityRealm$DescriptorImpl.bind(ActiveDirectorySecurityRealm.java:623)
              at hudson.plugins.active_directory.ActiveDirectorySecurityRealm$DescriptorImpl.bind(ActiveDirectorySecurityRealm.java:554)
              at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider$1.call(ActiveDirectoryUnixAuthenticationProvider.java:353)
              at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider$1.call(ActiveDirectoryUnixAuthenticationProvider.java:336)
              at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
              at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
              at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
              at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
              at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
      Caused: com.google.common.util.concurrent.ExecutionError
              at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2232)
              at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
              at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
              at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.retrieveUser(ActiveDirectoryUnixAuthenticationProvider.java:336)
              at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.retrieveUser(ActiveDirectoryUnixAuthenticationProvider.java:299)
              at hudson.plugins.active_directory.ActiveDirectoryUnixAuthenticationProvider.retrieveUser(ActiveDirectoryUnixAuthenticationProvider.java:225)
              at hudson.plugins.active_directory.ActiveDirectorySecurityRealm.authenticate(ActiveDirectorySecurityRealm.java:831)
              at hudson.security.AbstractPasswordBasedSecurityRealm.doAuthenticate(AbstractPasswordBasedSecurityRealm.java:72)
              at hudson.security.AbstractPasswordBasedSecurityRealm.access$000(AbstractPasswordBasedSecurityRealm.java:31)
              at hudson.security.AbstractPasswordBasedSecurityRealm$Authenticator.retrieveUser(AbstractPasswordBasedSecurityRealm.java:106)
              at org.acegisecurity.providers.dao.AbstractUserDetailsAuthenticationProvider.authenticate(AbstractUserDetailsAuthenticationProvider.java:122)
              at org.acegisecurity.providers.ProviderManager.doAuthentication(ProviderManager.java:200)
              at org.acegisecurity.AbstractAuthenticationManager.authenticate(AbstractAuthenticationManager.java:47)
              at org.acegisecurity.ui.webapp.AuthenticationProcessingFilter.attemptAuthentication(AuthenticationProcessingFilter.java:74)
              at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:252)
              at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
              at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93)
              at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
              at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249)
              at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67)
              at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
              at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90)
              at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
              at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:51)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
              at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
              at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
              at jenkins.security.SuspiciousRequestFilter.doFilter(SuspiciousRequestFilter.java:36)
              at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1618)
              at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:549)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
              at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
              at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
              at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
              at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
              at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
              at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1369)
              at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
              at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:489)
              at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
              at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
              at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1284)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
              at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
              at org.eclipse.jetty.server.Server.handle(Server.java:501)
              at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
              at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556)
              at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
              at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:272)
              at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
              at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
              at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
              at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
              at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
              at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
              at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
              at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
              at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
              at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
              at java.lang.Thread.run(Thread.java:748)
      

      With the references to Active Directory, I rolled back the AD plugin from 2.17 to 2.16. Once this was done all three Jenkins servers worked just fine once again.

      This is what JavaMelody Monitoring shows for Threads Count:

        Attachments

          Issue Links

            Activity

            Hide
            jsochna Jan Sochna added a comment - - edited

            I can confirm this has happened in our environment as well. Approximatelly 3h after the upgrade we had over 500 threads with a name starting "ActiveDirectory.updateUserCache" and server collapsed. After reverting the plugin, we got just 4 after the same amount of time.

            Attached chart demonstrates the change.

            • the drop in the middle was restart, 
            • Leveling at 10AM happened after enabling "removeIrrelevantGroups" without a restart (cache of 1000 items for 10 min remained the same).
            • The last drop is the revert.

             

            Show
            jsochna Jan Sochna added a comment - - edited I can confirm this has happened in our environment as well. Approximatelly 3h after the upgrade we had over 500 threads with a name starting "ActiveDirectory.updateUserCache" and server collapsed. After reverting the plugin, we got just 4 after the same amount of time. Attached chart demonstrates the change. the drop in the middle was restart,  Leveling at 10AM happened after enabling " removeIrrelevantGroups " without a restart (cache of 1000 items for 10 min remained the same). The last drop is the revert.  
            Hide
            fbelzunc Félix Belzunce Arcos added a comment -

            Thanks for reporting it.

            I am able to reproduce it.

            I debugged it and I figured out that the issue is that ActiveDirectorySecurityRealm#getAuthenticationProvider we are creating a new object each time you reach ActiveDirectorySecurityRealm#authenticate (so each time you perform a login) this is producing a thread leak.

            We are working on a fix.

            Show
            fbelzunc Félix Belzunce Arcos added a comment - Thanks for reporting it. I am able to reproduce it. I debugged it and I figured out that the issue is that ActiveDirectorySecurityRealm#getAuthenticationProvider we are creating a new object each time you reach ActiveDirectorySecurityRealm#authenticate (so each time you perform a login) this is producing a thread leak. We are working on a fix.
            Hide
            fbelzunc Félix Belzunce Arcos added a comment - - edited

            Jesse Glick To reproduce the issue:

            1 - Clone the repository https://github.com/fbelzunc/docker-samba-ad-dc
            2 - docker build -t samba-ad-dc .
            3 - docker run --rm -e "SAMBA_DOMAIN=SAMDOM" -e "SAMBA_REALM=SAMDOM.EXAMPLE.COM" -e "ROOT_PASSWORD=ia4uV1EeKait" -e "SAMBA_ADMIN_PASSWORD=ia4uV1EeKait" --name dc1 --dns 127.0.0.1 -d -p 53:53 -p 53:53/udp -p 389:389 -p 88:88 -p 135:135 -p 139:139 -p 138:138 -p 445:445 -p 464:464 -p 3268:3268 -p 3269:3269 samba-ad-dc
            4 - Configure the plugin in the way below - don't forget to use Token Groups lookup strategy as showed in the image.
            5 - Install the monitoring plugin
            6 - Login as gogo / ia4uV1EeKait, then as gogo2 / ia4uV1EeKait, ...
            7 - Install the monitoring plugin, use your ide or take thread dump. You will see that there are updateUserCache threads that never finished - because there is threadPoolExecutor per each thread. In the monitoring plugin, if you list the threads you can see that for each login there is a new thread. You should see that you can go beyond the limit 4-8 threds per thread executor - because multiple instances are being created
            8 - In the plugin: mvn hpi:run and breakpoint in ActiveDirectorySecurityRealm#getAuthenticationProvider to see all of this

            Show
            fbelzunc Félix Belzunce Arcos added a comment - - edited Jesse Glick To reproduce the issue: 1 - Clone the repository https://github.com/fbelzunc/docker-samba-ad-dc 2 - docker build -t samba-ad-dc . 3 - docker run --rm -e "SAMBA_DOMAIN=SAMDOM" -e "SAMBA_REALM=SAMDOM.EXAMPLE.COM" -e "ROOT_PASSWORD=ia4uV1EeKait" -e "SAMBA_ADMIN_PASSWORD=ia4uV1EeKait" --name dc1 --dns 127.0.0.1 -d -p 53:53 -p 53:53/udp -p 389:389 -p 88:88 -p 135:135 -p 139:139 -p 138:138 -p 445:445 -p 464:464 -p 3268:3268 -p 3269:3269 samba-ad-dc 4 - Configure the plugin in the way below - don't forget to use Token Groups lookup strategy as showed in the image. 5 - Install the monitoring plugin 6 - Login as gogo / ia4uV1EeKait, then as gogo2 / ia4uV1EeKait, ... 7 - Install the monitoring plugin, use your ide or take thread dump. You will see that there are updateUserCache threads that never finished - because there is threadPoolExecutor per each thread. In the monitoring plugin, if you list the threads you can see that for each login there is a new thread. You should see that you can go beyond the limit 4-8 threds per thread executor - because multiple instances are being created 8 - In the plugin: mvn hpi:run and breakpoint in ActiveDirectorySecurityRealm#getAuthenticationProvider to see all of this
            Hide
            jglick Jesse Glick added a comment -

            Can reproduce an issue from JenkinsRule:

            dynamicSetUp();
            while (true) {
                j.createWebClient().login("Fred", "ia4uV1EeKait");
            }
            

            then

            watch 'jstack … | fgrep updateUserCache | wc -l'
            

            grows endlessly.

            Show
            jglick Jesse Glick added a comment - Can reproduce an issue from JenkinsRule : dynamicSetUp(); while ( true ) { j.createWebClient().login( "Fred" , "ia4uV1EeKait" ); } then watch 'jstack … | fgrep updateUserCache | wc -l' grows endlessly.
            Show
            jglick Jesse Glick added a comment - I hope to see this fix released soon, but if you want to pretest: https://repo.jenkins-ci.org/incrementals/org/jenkins-ci/plugins/active-directory/2.18-rc655.8e5430593be9/active-directory-2.18-rc655.8e5430593be9.hpi
            Hide
            hpaluch_pil Henryk Paluch added a comment -

            Probably off-topic, but also important: the AD caching seems to be broken in 2.17:

            • When I call any REST API (with user token) the AD is searched two times on every request (and new updateUserCache thread created on each request).

            I guess that It may be related to above problem - because new instance is created on every request that cache is actually never used.

            I temporarily solved both issues by downgrade AD plugin to 2.16.

            Show
            hpaluch_pil Henryk Paluch added a comment - Probably off-topic, but also important: the AD caching seems to be broken in 2.17: When I call any REST API (with user token) the AD is searched two times on every request (and new updateUserCache thread created on each request). I guess that It may be related to above problem - because new instance is created on every request that cache is actually never used. I temporarily solved both issues by downgrade AD plugin to 2.16.
            Hide
            jglick Jesse Glick added a comment -

            Henryk Paluch I hope 2.18 addresses the caching regression; please verify.

            Show
            jglick Jesse Glick added a comment - Henryk Paluch I hope 2.18 addresses the caching regression; please verify.
            Hide
            hpaluch_pil Henryk Paluch added a comment -

            Thanks Jesse Glick !

            Both issues resolved with upgrade to 2.18:

            1. Jenkins thread count is now stable (around 100 threads for few days)
            2. REST API latency dropped back to around 20ms most of the time. I use something like /queue/api/json?tree=$(urlencode 'items[id,why,inQueueSince,task[name]]') to get number of jobs stuck in queue for Telegraf.
            Show
            hpaluch_pil Henryk Paluch added a comment - Thanks  Jesse Glick ! Both issues resolved with upgrade to 2.18: Jenkins thread count is now stable (around 100 threads for few days) REST API latency dropped back to around 20ms most of the time. I use something like /queue/api/json?tree=$(urlencode 'items[id,why,inQueueSince,task [name] ]') to get number of jobs stuck in queue for Telegraf.

              People

              Assignee:
              jglick Jesse Glick
              Reporter:
              msymons Mark Symons
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: