-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Environment: Debian 5, sun-java-jdk (1.6.0_22)
Jenkins version: 1.414-SNAPSHOT
Running Jenkins with the embedded Winstone server for a long time
under constant load conditions causes file descriptor and thread
leakage.
Environment: Debian 5, sun-java-jdk (1.6.0_22)
Jenkins version: 1.414-SNAPSHOT
What happens:
After running for about 1 day the following appears on jenkins log
file:
[Winstone 2011/05/27 07:35:03] - WARNING: Request handler pool limit exceeded - waiting for retry
and a bit later (this starts repeating):
[Winstone 2011/05/27 07:43:25] - WARNING: Request handler pool limit exceeded - waiting for retry
[Winstone 2011/05/27 07:43:26] - ERROR: Request ignored because there were no more request handlers available in the pool
[Winstone 2011/05/27 07:43:36] - WARNING: Request handler pool limit exceeded - waiting for retry
[Winstone 2011/05/27 07:43:37] - ERROR: Request ignored because there were no more request handlers available in the pool
Jenkins then stops handling requests successfully - at the beginning
intermittently, but finally basically failing almost all of the
requests.
Using VisualVM I can see that there is a thousand RequestHandlerThread
threads in wait state, and that over 1200 file descriptors are
currently in use.
I think the requests start failing because winstone has a this limit:
private int MAX_REQUEST_HANDLERS_IN_POOL = 1000;
as it doesn't seem to be running out of available fds (apparently 8192
is the maximum in this setup).
When I restart jenkins I can verify a slow buildup of threads and used
file descriptors:
- 10 minutes after restart: 136 live threads, 256 fds used
- 20 minutes: 150 threads, 271 fds
- 30 minutes: 161 threads, 280 fds
- 110 minutes: 255 threads, 376 fds
I've looked at the repository version of winstone, and looking at the
code there seems to be a race condition in handling of the request
handler pool.
When a request is received by ObjectPool.handleRequest, it looks for
an available request handler from unusedRequestHandlerThreads and
calls commenceRequestHandling on the available thread.
commenceRequestHandling in turn does this.notifyAll() to wake up the
thread. So far so good. However when the thread has finished
processing the request, it calls
this.objectPool.releaseRequestHandler(this) and then waits. I think
here's a race condition, since what can happen is that object pool
called (CALL) and request handler thread (RH) can interleave like
this:
- RH (in RequestHandler.run): this.objectPool.releaseRequestHandler(this)
- RH (in ObjectPool.releaseRequestHandler): this.unusedRequestHandlerThreads.add(rh)
- CALL (in ObjectPool.handleRequest): take RH from unusedRequestHandlerThreads
- CALL (in ObjectPool.handleRequest): rh.commenceRequestHandling(socket, listener);
- CALL (in RequestHandler.commenceRequestHandling): this.notifyAll()
- RH (in ObjectPool.run): this.wait()
Since notify is lost (no waiters), this.wait() in the last step will
hang forever. This will leak a file descriptor since the socket given
to be processed is never reclaimed, and threads are effectively lost
as Winstone will then create more RequestHandlers.
Now, this is of course a winstone problem, but its development seems
to be d-e-a-d at least looking at its bug tracker. As long as this
problem affect Jenkins, I'd still classify it as a Jenkins problem too.
I've put this into the winstone tracker too: https://sourceforge.net/tracker/?func=detail&aid=3308285&group_id=98922&atid=622497
Workaround: Use Tomcat, not embedded winstone (that's what I'm doing now).