Status: Resolved (View Workflow)
Resolution: Cannot Reproduce
Jenkins 1.609, Docker plugin 0.8, SSH slaves 1.9
We've been experiencing an issue recently with a possible memory leak in our Jenkins instance. The used heap space will fill slowly over the course of a week or two, and the process will eventually become locked up and unresponsive. The only way to get it back from this state is to restart jenkins, which is not ideal.
Most of our builds use the jenkinsdocker plugin, which creates a fresh jenkins slave for each job run. I think this is the source of the memory leak.
I took a heap dump from jenkins while it was still running, with most of the heap having been consumed (probably a few days from locking up. The eclipse Memory Analyzer Tool found this:
510 instances of "hudson.remoting.Channel", loaded by "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x8f805698" occupy 927,118,648 (83.35%) bytes. These instances are referenced from one instance of "java.util.WeakHashMap$Entry", loaded by "<system class loader>"
org.eclipse.jetty.webapp.WebAppClassLoader @ 0x8f805698
It looks like the hudson.remoting.Channel objects may not be getting cleaned up. I assume each one of these refers to a build slave that has now been destroyed? ~500 jobs executed in a few days sounds about right to me. I've attached a screenshot of the "Common Path to the accumulation point". I'm not comfortable attaching the whole heap dump but I'm happy to provide any more information if it's useful.
JENKINS-28844 The curious case of the Channel memory cycles
|Field||Original Value||New Value|
|Resolution||Cannot Reproduce [ 5 ]|
|Status||Open [ 1 ]||Resolved [ 5 ]|
This issue duplicates
|Workflow||JNJira [ 163269 ]||JNJira + In-Review [ 197154 ]|