Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-9882

Jenkins runs out of file descriptors (winstone problem)

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None
    • Environment: Debian 5, sun-java-jdk (1.6.0_22)
      Jenkins version: 1.414-SNAPSHOT

      Running Jenkins with the embedded Winstone server for a long time
      under constant load conditions causes file descriptor and thread
      leakage.

      Environment: Debian 5, sun-java-jdk (1.6.0_22)
      Jenkins version: 1.414-SNAPSHOT

      What happens:

      After running for about 1 day the following appears on jenkins log
      file:

      [Winstone 2011/05/27 07:35:03] - WARNING: Request handler pool limit exceeded - waiting for retry

      and a bit later (this starts repeating):

      [Winstone 2011/05/27 07:43:25] - WARNING: Request handler pool limit exceeded - waiting for retry
      [Winstone 2011/05/27 07:43:26] - ERROR: Request ignored because there were no more request handlers available in the pool
      [Winstone 2011/05/27 07:43:36] - WARNING: Request handler pool limit exceeded - waiting for retry
      [Winstone 2011/05/27 07:43:37] - ERROR: Request ignored because there were no more request handlers available in the pool

      Jenkins then stops handling requests successfully - at the beginning
      intermittently, but finally basically failing almost all of the
      requests.

      Using VisualVM I can see that there is a thousand RequestHandlerThread
      threads in wait state, and that over 1200 file descriptors are
      currently in use.

      I think the requests start failing because winstone has a this limit:

      private int MAX_REQUEST_HANDLERS_IN_POOL = 1000;

      as it doesn't seem to be running out of available fds (apparently 8192
      is the maximum in this setup).

      When I restart jenkins I can verify a slow buildup of threads and used
      file descriptors:

      • 10 minutes after restart: 136 live threads, 256 fds used
      • 20 minutes: 150 threads, 271 fds
      • 30 minutes: 161 threads, 280 fds
      • 110 minutes: 255 threads, 376 fds

      I've looked at the repository version of winstone, and looking at the
      code there seems to be a race condition in handling of the request
      handler pool.

      When a request is received by ObjectPool.handleRequest, it looks for
      an available request handler from unusedRequestHandlerThreads and
      calls commenceRequestHandling on the available thread.

      commenceRequestHandling in turn does this.notifyAll() to wake up the
      thread. So far so good. However when the thread has finished
      processing the request, it calls
      this.objectPool.releaseRequestHandler(this) and then waits. I think
      here's a race condition, since what can happen is that object pool
      called (CALL) and request handler thread (RH) can interleave like
      this:

      1. RH (in RequestHandler.run): this.objectPool.releaseRequestHandler(this)
      2. RH (in ObjectPool.releaseRequestHandler): this.unusedRequestHandlerThreads.add(rh)
      3. CALL (in ObjectPool.handleRequest): take RH from unusedRequestHandlerThreads
      4. CALL (in ObjectPool.handleRequest): rh.commenceRequestHandling(socket, listener);
      5. CALL (in RequestHandler.commenceRequestHandling): this.notifyAll()
      6. RH (in ObjectPool.run): this.wait()

      Since notify is lost (no waiters), this.wait() in the last step will
      hang forever. This will leak a file descriptor since the socket given
      to be processed is never reclaimed, and threads are effectively lost
      as Winstone will then create more RequestHandlers.

      Now, this is of course a winstone problem, but its development seems
      to be d-e-a-d at least looking at its bug tracker. As long as this
      problem affect Jenkins, I'd still classify it as a Jenkins problem too.

      I've put this into the winstone tracker too: https://sourceforge.net/tracker/?func=detail&aid=3308285&group_id=98922&atid=622497

      Workaround: Use Tomcat, not embedded winstone (that's what I'm doing now).

          [JENKINS-9882] Jenkins runs out of file descriptors (winstone problem)

          We are using Jetty and not Tomcat. But would definitely like to have Winstone as it integrates better - well Winstone makes it much easier to upgrade than Jetty.

          Anoop Karollil added a comment - We are using Jetty and not Tomcat. But would definitely like to have Winstone as it integrates better - well Winstone makes it much easier to upgrade than Jetty.

          James Nord added a comment -

          I haven't seen those log messages in jenkins.log but I do see
          [Winstone 2012/02/22 14:32:12] - Error during HTTP listener init or shutdown
          java.net.SocketException: Too many open files
          at java.net.PlainSocketImpl.socketAccept(Native Method)
          at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:396)
          at java.net.ServerSocket.implAccept(ServerSocket.java:522)
          at java.net.ServerSocket.accept(ServerSocket.java:490)
          at winstone.HttpListener.run(HttpListener.java:134)
          at java.lang.Thread.run(Thread.java:722)

          After which the listener is no longer available (netstat -anpt | grep java | grep LISTEN) shows that the listening socket has been closed. (the https socket which gets very few hits is still alive)

          Possibly related - but possibly not either...

          Happens with 1.424.2.

          James Nord added a comment - I haven't seen those log messages in jenkins.log but I do see [Winstone 2012/02/22 14:32:12] - Error during HTTP listener init or shutdown java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:396) at java.net.ServerSocket.implAccept(ServerSocket.java:522) at java.net.ServerSocket.accept(ServerSocket.java:490) at winstone.HttpListener.run(HttpListener.java:134) at java.lang.Thread.run(Thread.java:722) After which the listener is no longer available (netstat -anpt | grep java | grep LISTEN) shows that the listening socket has been closed. (the https socket which gets very few hits is still alive) Possibly related - but possibly not either... Happens with 1.424.2.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          src/java/winstone/BoundedExecutorService.java
          src/java/winstone/ObjectPool.java
          src/java/winstone/RequestHandlerThread.java
          src/test/winstone/BoundedExecutorServiceTest.java
          http://jenkins-ci.org/commit/winstone/ae511339839f2de25d9d9500496e6a8f5771742f
          Log:
          [FIXED JENKINS-9882] Use Java5 ExecutorService for thread management.

          It is a better implementation of the thread pool, except one catch
          about how it prefers queueing over adding more threads.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: src/java/winstone/BoundedExecutorService.java src/java/winstone/ObjectPool.java src/java/winstone/RequestHandlerThread.java src/test/winstone/BoundedExecutorServiceTest.java http://jenkins-ci.org/commit/winstone/ae511339839f2de25d9d9500496e6a8f5771742f Log: [FIXED JENKINS-9882] Use Java5 ExecutorService for thread management. It is a better implementation of the thread pool, except one catch about how it prefers queueing over adding more threads.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          war/pom.xml
          http://jenkins-ci.org/commit/jenkins/8eb264ceb1bfdea332009499a06b9faa89dc03e9
          Log:
          [FIXED JENKINS-9882] integrated a newer version of Winstone with the fix.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html war/pom.xml http://jenkins-ci.org/commit/jenkins/8eb264ceb1bfdea332009499a06b9faa89dc03e9 Log: [FIXED JENKINS-9882] integrated a newer version of Winstone with the fix.

          dogfood added a comment -

          Integrated in jenkins_main_trunk #1590
          [FIXED JENKINS-9882] integrated a newer version of Winstone with the fix. (Revision 8eb264ceb1bfdea332009499a06b9faa89dc03e9)

          Result = UNSTABLE
          Kohsuke Kawaguchi : 8eb264ceb1bfdea332009499a06b9faa89dc03e9
          Files :

          • war/pom.xml
          • changelog.html

          dogfood added a comment - Integrated in jenkins_main_trunk #1590 [FIXED JENKINS-9882] integrated a newer version of Winstone with the fix. (Revision 8eb264ceb1bfdea332009499a06b9faa89dc03e9) Result = UNSTABLE Kohsuke Kawaguchi : 8eb264ceb1bfdea332009499a06b9faa89dc03e9 Files : war/pom.xml changelog.html

          Uwe Stuehler added a comment - - edited

          It seems to me that this bug might not have been solved completely, or it is a different but similar issue we're seeing.

          After about a day of normal operation, we see connections via HTTP and AJP being accept()ed and that the thread which is accepting these connections is then going back to poll()...

          And then nothing happens with the new connection, not a single read() or anything. New threads aren't created either. We're at exactly 200 "RequestHandlerThread" threads in state RUNNABLE.

          Is 200 a fixed limit on the number of request handler threads?

          Edit: added stack trace

          All RequestHandlerThread threads have this exact same backtrace:

          "RequestHandlerThread[#871]" daemon prio=10 tid=0x00007fee8527a800 nid=0xf2c runnable [0x00007fee7d493000]
             java.lang.Thread.State: RUNNABLE
                  at java.net.SocketInputStream.socketRead0(Native Method)
                  at java.net.SocketInputStream.read(SocketInputStream.java:129)
                  at java.io.DataInputStream.readFully(DataInputStream.java:178)
                  at java.io.DataInputStream.readFully(DataInputStream.java:152)
                  at winstone.ajp13.Ajp13IncomingPacket.<init>(Ajp13IncomingPacket.java:60)
                  at winstone.ajp13.Ajp13Listener.allocateRequestResponse(Ajp13Listener.java:170)
                  at winstone.RequestHandlerThread.run(RequestHandlerThread.java:67)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
                  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                  at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          

          Uwe Stuehler added a comment - - edited It seems to me that this bug might not have been solved completely, or it is a different but similar issue we're seeing. After about a day of normal operation, we see connections via HTTP and AJP being accept()ed and that the thread which is accepting these connections is then going back to poll()... And then nothing happens with the new connection, not a single read() or anything. New threads aren't created either. We're at exactly 200 "RequestHandlerThread" threads in state RUNNABLE. Is 200 a fixed limit on the number of request handler threads? Edit : added stack trace All RequestHandlerThread threads have this exact same backtrace: "RequestHandlerThread[#871]" daemon prio=10 tid=0x00007fee8527a800 nid=0xf2c runnable [0x00007fee7d493000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readFully(DataInputStream.java:152) at winstone.ajp13.Ajp13IncomingPacket.<init>(Ajp13IncomingPacket.java:60) at winstone.ajp13.Ajp13Listener.allocateRequestResponse(Ajp13Listener.java:170) at winstone.RequestHandlerThread.run(RequestHandlerThread.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)

          Uwe Stuehler added a comment -

          FWIW: The number 200 is indeed the thread pool size limit.

          Uwe Stuehler added a comment - FWIW: The number 200 is indeed the thread pool size limit .

          Marc Günther added a comment -

          Uwe Stuehler wrote:
          > Is 200 a fixed limit on the number of request handler threads?

          It is. From winstone.ObjectsPool.java:35

              private int MAX_REQUEST_HANDLERS_IN_POOL = 200;

          Marc Günther added a comment - Uwe Stuehler wrote: > Is 200 a fixed limit on the number of request handler threads? It is. From winstone.ObjectsPool.java:35 private int MAX_REQUEST_HANDLERS_IN_POOL = 200;

          Uwe Stuehler added a comment -

          New issue opened: JENKINS-13275

          Uwe Stuehler added a comment - New issue opened: JENKINS-13275

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          war/pom.xml
          http://jenkins-ci.org/commit/jenkins/5a28f6a7f7b3debf3ea2ac459603af0d8086395e
          Log:
          [FIXED JENKINS-9882] integrated a newer version of Winstone with the fix.
          (cherry picked from commit 8eb264ceb1bfdea332009499a06b9faa89dc03e9)

          Conflicts:

          changelog.html

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html war/pom.xml http://jenkins-ci.org/commit/jenkins/5a28f6a7f7b3debf3ea2ac459603af0d8086395e Log: [FIXED JENKINS-9882] integrated a newer version of Winstone with the fix. (cherry picked from commit 8eb264ceb1bfdea332009499a06b9faa89dc03e9) Conflicts: changelog.html

            Unassigned Unassigned
            santtu Santeri Paavolainen
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: