Status: Resolved (View Workflow)
Resolution: Cannot Reproduce
Jenkins 1.609, Docker plugin 0.8, SSH slaves 1.9
We've been experiencing an issue recently with a possible memory leak in our Jenkins instance. The used heap space will fill slowly over the course of a week or two, and the process will eventually become locked up and unresponsive. The only way to get it back from this state is to restart jenkins, which is not ideal.
Most of our builds use the jenkinsdocker plugin, which creates a fresh jenkins slave for each job run. I think this is the source of the memory leak.
I took a heap dump from jenkins while it was still running, with most of the heap having been consumed (probably a few days from locking up. The eclipse Memory Analyzer Tool found this:
510 instances of "hudson.remoting.Channel", loaded by "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x8f805698" occupy 927,118,648 (83.35%) bytes. These instances are referenced from one instance of "java.util.WeakHashMap$Entry", loaded by "<system class loader>"
org.eclipse.jetty.webapp.WebAppClassLoader @ 0x8f805698
It looks like the hudson.remoting.Channel objects may not be getting cleaned up. I assume each one of these refers to a build slave that has now been destroyed? ~500 jobs executed in a few days sounds about right to me. I've attached a screenshot of the "Common Path to the accumulation point". I'm not comfortable attaching the whole heap dump but I'm happy to provide any more information if it's useful.
JENKINS-28844 The curious case of the Channel memory cycles
Sorry, it seems that screenshot is minimised and i can't read text on it. Could you verify that you attached 1:1 screenshot image?
I can verify that the screenshot is at 1:1 scale, but it is a large image - 1565 x 1259 - so your browser might minimise it to fit it on screen.
When I told my browser to "zoom in", the text was clear and readable.
|Field||Original Value||New Value|
|Resolution||Cannot Reproduce [ 5 ]|
|Status||Open [ 1 ]||Resolved [ 5 ]|
Possible duplicate of
This issue duplicates
|Workflow||JNJira [ 163269 ]||JNJira + In-Review [ 197154 ]|
There are many reasons something could be leaking. Detailed analysis of the heap (root reference paths) is needed before you could even pick a component to assign to.