-
Patch
-
Resolution: Done
-
Major
-
None
-
Platform: All, OS: All
After upgrading recently to a version of Hudson that includes the Hudson CLI, we
started seeing catastrophic heap exhaustion when the "groovysh" command was
used. Our Hudson servers are configured to use only the HTTP-based communication
method for the CLI, not the ad hoc TCP port method.
Digging around a bit, it turns out to be related to read timeouts on the Channel
reader. In our case, the HTTP connection timeout was set to 20 seconds, so after
that time we would see the CLI get disconnected from Hudson. The server logs
show an exception like this:
2009-10-02 12:50:59,786 ERROR [hudson.remoting.Channel] I/O error in channel
HTTP full-duplex channel 552692ad-f7ab-4ed2-8eb6-5be614fdcefb
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:746)
at
org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRead(InternalInputBuffer.java:776)
at
org.apache.coyote.http11.filters.ChunkedInputFilter.readBytes(ChunkedInputFilter.java:243)
at
org.apache.coyote.http11.filters.ChunkedInputFilter.parseChunkHeader(ChunkedInputFilter.java:273)
at
org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:132)
at
org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.java:705)
at org.apache.coyote.Request.doRead(Request.java:428)
at
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:304)
at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:372)
at org.apache.catalina.connector.InputBuffer.readByte(InputBuffer.java:317)
at
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:105)
at
java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
at
java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
at
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:802)
This happens in the Channel reader thread, but in the pool thread that is
serving the HTTP request, the Groovysh repl loop continues, continuing to look
for new commands from the user. Each time it does this, it ends up getting down
to Channel.send where an exception is thrown to indicate that the channel is closed.
In the case of writes/flush, this exception is swallowed by the PrintStream and
PrintWriter wrappers at various levels of the output chain (because they swallow
ALL IOExceptions, per their contract). In the case of reads, the IOException
ends up being throw up to Groovy's ShellRunner.work where it is caught as a
Throwable and passed to the configured errorHook.
The problem is that the errorHook then tries to write the error message out to
stderr. This fails, but because of the behavior of PrintStream and PrintWriter,
the IOException from Channel.send doesn't get thrown. This allows the loop in
ShellRunner.work to continue, calling readLine again to repeat the cycle.
While this is happening, each attempt to read from the ProxyInputStream creates
a Chunk request that gets put into Channel.pendingCalls. Since these requests
will never be serviced they just sit there, chewing up the heap until it is all
gone.
I have a patch that fixes this, inasmuch as it can be fixed. Instead of relying
on the default errorHook, it uses a custom errorHook that watches for
ChannelClosedException. If it sees it, it rethrows it, otherwise it calls the
default error hook. This causes the exception to propagate out of
ShellRunner.work and eventually causes the GroovyshCommand to terminate with a
failure.
Once I fixed that, however, I noticed that FullDuplexHttpChannel and
ProxyInputStream$Chunk instances were still being left on the heap. I tracked
that down to a shutdown hook being added in Groovy's RecordCommand. Since that
hook object has a reference to the Groovysh instance, it transitively holds onto
the Channel via IO.in, IO.out, and IO.err.
My fix for this second problem may be more controversial; I decided to remove
the "load", "save", and "record" commands. Besides the shutdown hook, the reason
for this is that these three commands are meant to interact with local files,
but in the case of the Hudson/Groovy CLI these files will actually be looked for
on the server, not the client. I could see having usable variants of "load",
"save", and "record" in the Hudson/Groovy CLI, but I don't think the stock ones
fit the bill.
And last but not least, my patch adds a predefined binding for the "hudson"
variable to hudson.model.Hudson.getInstance() and adds a predefined import for
"hudson.model.*".
I know this potentially three separate issues, so please let me know if I need
to split up the patch.