Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-20051

Winstone goes down on 1.509.4 (due to LDAP issues?)

      We have experienced a strange issue with Jenkins 1.509.4. Web interface has gone offline after several days.

      • "RMI TCP Connection" thread is not present in the thread dump
      • "RMI Scheduler" and "RMI accept" exist in the thread
      • There is no info in jenkins.log
      • There's no BLOCKED threads
      • Jenkins internal jobs work well, only the web interface is not available

      After analysis of logs, seems that issue has been caused by "Too many open files" error:

      Oct 14, 2013 5:27:48 PM winstone.Logger logInternal
      SEVERE: Error during HTTP listener init or shutdown
      java.net.SocketException: Too many open files
      at java.net.PlainSocketImpl.socketAccept(Native Method)
      at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
      at java.net.ServerSocket.implAccept(ServerSocket.java:530)
      at java.net.ServerSocket.accept(ServerSocket.java:498)
      at winstone.HttpListener.run(HttpListener.java:139)
      at java.lang.Thread.run(Thread.java:724)

      According to logs, there is a huge number of LDAP errors

      WARNING: Failed to search LDAP for username=USER_NAME
      org.acegisecurity.ldap.LdapDataAccessException: Failed to obtain InitialDirContext due to unexpected exception; nested exception is javax.naming.ServiceUnavailableException: ru20waddc.internal.synopsys.com:389; socket closed
      at org.acegisecurity.ldap.DefaultInitialDirContextFactory.connect(DefaultInitialDirContextFactory.java:193)
      at org.acegisecurity.ldap.DefaultInitialDirContextFactory.newInitialDirContext(DefaultInitialDirContextFactory.java:261)
      at org.acegisecurity.ldap.DefaultInitialDirContextFactory.newInitialDirContext(DefaultInitialDirContextFactory.java:241)
      at org.acegisecurity.ldap.LdapTemplate.execute(LdapTemplate.java:123)
      at org.acegisecurity.ldap.LdapTemplate.searchForSingleEntry(LdapTemplate.java:246)
      at org.acegisecurity.ldap.search.FilterBasedLdapUserSearch.searchForUser(FilterBasedLdapUserSearch.java:119)
      at hudson.security.LDAPSecurityRealm$LDAPUserDetailsService.loadUserByUsername(LDAPSecurityRealm.java:591)
      at hudson.security.LDAPSecurityRealm$LDAPUserDetailsService.loadUserByUsername(LDAPSecurityRealm.java:553)
      at org.acegisecurity.ui.rememberme.TokenBasedRememberMeServices.loadUserDetails(TokenBasedRememberMeServices.java:308)
      at org.acegisecurity.ui.rememberme.TokenBasedRememberMeServices.autoLogin(TokenBasedRememberMeServices.java:218)
      at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:104)
      ...
      at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:46)
      at winstone.FilterConfiguration.execute(FilterConfiguration.java:194)
      at winstone.RequestDispatcher.doFilter(RequestDispatcher.java:366)
      at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:81)
      at winstone.FilterConfiguration.execute(FilterConfiguration.java:194)
      at winstone.RequestDispatcher.doFilter(RequestDispatcher.java:366)
      at winstone.RequestDispatcher.forward(RequestDispatcher.java:331)
      at winstone.RequestHandlerThread.processRequest(RequestHandlerThread.java:227)
      at winstone.RequestHandlerThread.run(RequestHandlerThread.java:150)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:724)
      Caused by: javax.naming.ServiceUnavailableException: LDAP_ADDRESS; socket closed
      at com.sun.jndi.ldap.Connection.readReply(Connection.java:454)
      at com.sun.jndi.ldap.LdapClient.ldapBind(LdapClient.java:364)
      at com.sun.jndi.ldap.LdapClient.authenticate(LdapClient.java:213)
      at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2740)
      at com.sun.jndi.ldap.LdapCtx.<init>(LdapCtx.java:316)
      at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:193)
      at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:211)
      at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154)
      at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84)
      at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:684)
      at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:307)
      at javax.naming.InitialContext.init(InitialContext.java:242)
      at javax.naming.InitialContext.<init>(InitialContext.java:216)
      at javax.naming.directory.InitialDirContext.<init>(InitialDirContext.java:101)
      at org.acegisecurity.ldap.DefaultInitialDirContextFactory.connect(DefaultInitialDirContextFactory.java:180)
      ... 39 more

          [JENKINS-20051] Winstone goes down on 1.509.4 (due to LDAP issues?)

          Oleg Nenashev added a comment -

          Fixed logs

          Oleg Nenashev added a comment - Fixed logs

          Oleg Nenashev added a comment -

          Added files from monitoring

          Oleg Nenashev added a comment - Added files from monitoring

          Oleg Nenashev added a comment -

          Unfortunately, issue has been reproduced on our server.
          As I see, the issue's origin was in LDAP server, which somehow caused hanging of LDAP requests till the timeout ()

          • Seems that each request spawns an additional unnamed thread, which opens a new connection to LDAP reads info from it
          • Due to LDAP service, this IO hangs
          • In case of huge number of requests to LDAP, number of opened files and threads goes beyond limits
          • No surprise that some threads in Jenkins collapse in such situations

          Partial workaround:

          • always use caching in LDAP

          BTW, it would be great to limit a number of parallel requests to LDAP

          Spawned LDAP authentication thread:

          "Thread-30370" daemon prio=10 tid=0x00002aaae0001000 nid=0x2849 runnable [0x000000005fb1c000]
          java.lang.Thread.State: RUNNABLE
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(SocketInputStream.java:152)
          at java.net.SocketInputStream.read(SocketInputStream.java:122)
          at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
          at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:334)

          • locked <0x000000072b1de300> (a java.io.BufferedInputStream)
            at com.sun.jndi.ldap.Connection.run(Connection.java:853)
            at java.lang.Thread.run(Thread.java:724)

          Oleg Nenashev added a comment - Unfortunately, issue has been reproduced on our server. As I see, the issue's origin was in LDAP server, which somehow caused hanging of LDAP requests till the timeout () Seems that each request spawns an additional unnamed thread, which opens a new connection to LDAP reads info from it Due to LDAP service, this IO hangs In case of huge number of requests to LDAP, number of opened files and threads goes beyond limits No surprise that some threads in Jenkins collapse in such situations Partial workaround: always use caching in LDAP BTW, it would be great to limit a number of parallel requests to LDAP Spawned LDAP authentication thread: "Thread-30370" daemon prio=10 tid=0x00002aaae0001000 nid=0x2849 runnable [0x000000005fb1c000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) locked <0x000000072b1de300> (a java.io.BufferedInputStream) at com.sun.jndi.ldap.Connection.run(Connection.java:853) at java.lang.Thread.run(Thread.java:724)

          Oleg Nenashev added a comment -

          Reassigned issue to Stephen (issue has been caused by LDAP)

          Oleg Nenashev added a comment - Reassigned issue to Stephen (issue has been caused by LDAP)

          Oleg Nenashev added a comment -

          Workaround exists, so issue is not a blocker

          Oleg Nenashev added a comment - Workaround exists, so issue is not a blocker

          Daniel Beck added a comment -

          Seems to be LDAP alone; assuming core only due to use of former 'www' component.

          Daniel Beck added a comment - Seems to be LDAP alone; assuming core only due to use of former 'www' component.

          I have not heard of this issue for some time... anyway no point in me being the assignee

          Stephen Connolly added a comment - I have not heard of this issue for some time... anyway no point in me being the assignee

            Unassigned Unassigned
            oleg_nenashev Oleg Nenashev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: