-
Bug
-
Resolution: Fixed
-
Major
-
None
-
-
workflow-cps 2.69
Setup
Jenkins v2.89.3 / v2.89.4 LTS
Pipeline API Plugin: 2.26
Pipeline Nodes and Processes Plugin: 2.19
Durable Task Plugin: 1.18
Pipeline Job Plugin: 2.17
Pipeline Shared Groovy Libraries Plugin: 2.9
Pipeline Supporting APIs Plugin: 2.18
Pipeline Groovy Plugin: 2.45
Script Security Plugin: 1.41
Pipeline Default Speed/Durability Level: Performance-Optimized
"Do not allow the pipeline to resume if the master restarts": Enabled on all jobs
Problem
Shortly after upgrading from Jenkins 2.60.3 to Jenkins 2.89.3 LTS, we experienced a java.lang.OutOfMemoryError: Metaspace error. I'm unsure if this a bug or if increased Java Metaspace is to be expected in the newer version of Jenkins and pipeline plugins.
Details
Jenkins v2.60.3
In Jenkins 2.60.3 we used the memory parameters below and the Java nonheap space (Metaspace) stayed between 350mb and 550mb. We ran for months without restarting Jenkins.
-XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxMetaspaceSize=1g -XX:MetaspaceSize=256M -Xms4g -Xmx8g -Dgroovy.use.classvalue=true
Jenkins v2.89.3
After upgrading to Jenkins 2.89.3 and upgrading all of the plugins, we used the Java memory settings below. We removed the "groovy.use.classvalue=true" parameter based on the recommendation here: https://support.cloudbees.com/hc/en-us/articles/115001505172-Jenkins-hangs-due-to-Groovy-ClassInfo-deadlock. We saw the Java nonheap memory fluctuate between 250mb and 750mb slowing trending upwards. After a few weeks, nonheap memory was at 1GB and Jenkins no longer responded to requests. The log file was full of java.lang.OutOfMemoryError: Metaspace errors.
-XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:MaxMetaspaceSize=1g -XX:MetaspaceSize=256M -Xms4g -Xmx8g
Debugging
Our Jenkins masters perform a large amount of Jenkins pipeline jobs and I noticed that all of our Jenkins 2.89.4 masters show higher Java nonheap memory usage than before the Jenkins upgrade. We disabled pipeline durability but still see high memory usage. Standard Java heap memory usage looks fine.
Example 1:
Jenkins Master 1 was completely idle and I performed a System.gc() to force a full garbage collection. Afterwards I saw that the nonheap memory still was over 700mb. Below are some of the Java details I collected on that installation just after the full GC.
VM.native_memory summary
Native Memory Tracking:
Total: reserved=11014208KB, committed=5551020KB
Java Heap (reserved=8388608KB, committed=4194304KB)
(mmap: reserved=8388608KB, committed=4194304KB)
Class (reserved=1686933KB, committed=707989KB)
(classes #48326)
(malloc=13717KB #208203)
(mmap: reserved=1673216KB, committed=694272KB)
Thread (reserved=93951KB, committed=93951KB)
(thread #92)
(stack: reserved=93548KB, committed=93548KB)
(malloc=296KB #463)
(arena=107KB #182)
Code (reserved=273528KB, committed=155620KB)
(malloc=23928KB #24978)
(mmap: reserved=249600KB, committed=131692KB)
GC (reserved=412351KB, committed=256703KB)
(malloc=68287KB #693569)
(mmap: reserved=344064KB, committed=188416KB)
Compiler (reserved=406KB, committed=406KB)
(malloc=276KB #2009)
(arena=131KB #3)
Internal (reserved=88791KB, committed=88791KB)
(malloc=88759KB #184270)
(mmap: reserved=32KB, committed=32KB)
Symbol (reserved=30516KB, committed=30516KB)
(malloc=27279KB #301740)
(arena=3236KB #1)
Native Memory Tracking (reserved=22549KB, committed=22549KB)
(malloc=348KB #5361)
(tracking overhead=22201KB)
Arena Chunk (reserved=190KB, committed=190KB)
(malloc=190KB)
Unknown (reserved=16384KB, committed=0KB)
(mmap: reserved=16384KB, committed=0KB)
GC.class_histogram
num #instances #bytes class name
----------------------------------------------
1: 490462 84832616 [C
2: 2552835 40845360 com.cloudbees.groovy.cps.impl.ConstantBlock
3: 930699 37227960 com.cloudbees.groovy.cps.impl.FunctionCallBlock
4: 1493734 35849616 com.cloudbees.groovy.cps.impl.SourceLocation
5: 883507 33258176 [Ljava.lang.Object;
6: 179552 30097544 [B
7: 922229 29511328 java.util.HashMap$Node
8: 1151159 27386104 [Lcom.cloudbees.groovy.cps.Block;
9: 947492 22739808 java.lang.String
10: 790957 18982968 com.cloudbees.groovy.cps.impl.LocalVariableBlock
11: 213822 13097984 [Ljava.util.HashMap$Node;
12: 519301 12463224 com.cloudbees.groovy.cps.impl.SequenceBlock
13: 452808 10867392 java.util.ArrayList
14: 320616 10259712 com.cloudbees.groovy.cps.impl.PropertyAccessBlock
15: 250810 10032400 com.google.common.cache.LocalCache$WeakEntry
16: 168578 9440368 org.codehaus.groovy.runtime.metaclass.MetaMethodIndex$Entry
17: 260734 8343488 java.util.concurrent.locks.ReentrantLock$NonfairSync
18: 250147 7394208 [Lhudson.model.Action;
19: 142590 6844320 java.util.HashMap
20: 139363 6689424 org.jenkinsci.plugins.workflow.cps.nodes.StepAtomNode
21: 264178 6340272 com.google.common.collect.SingletonImmutableList
22: 195095 6243040 com.cloudbees.groovy.cps.impl.AssignmentBlock
23: 49477 6237064 java.lang.Class
24: 253041 6072984 java.util.concurrent.CopyOnWriteArrayList
25: 250681 6016344 org.jenkinsci.plugins.workflow.support.storage.BulkFlowNodeStorage$Tag
GC.class_stats
We use the Pipeline Shared Groovy Libraries Plugin and noticed a large number of class files with the names similar to below. Each of our shared library classes (org.xxxxxxxx.scripts.xxxxxxx) had about 400 references.
script15202683643871464209649
sun.reflect.GeneratedMethodAccessor2903
org.xxxxxxxx.scripts.xxxxxxx
Example 2
I ran a Java heap dump through an analyzer program and this is what it stated was the top consumers.
Class Name | Objects | Shallow Heap | Retained Heap |
---|---|---|---|
org.jenkinsci.plugins.workflow.support.actions.LogActionImpl | 976,720 | 23,441,280 | >= 343,262,256 |
org.jenkinsci.plugins.workflow.cps.actions.ArgumentsActionImpl | 735,961 | 17,663,064 | >= 305,348,632 |
Example 3
I ran a different Java heap dump through the analyzer and below are the results. Again this was after a full GC.
Classes = 35k
Objects = 47M
Class Load = 4.6k
GC Roots = 4.3k
952,652 instances of "org.jenkinsci.plugins.workflow.support.actions.LogActionImpl", loaded by "hudson.ClassicPluginStrategy$AntClassLoader2 @ 0x5c1746598" occupy 342,684,616 (20.83%) bytes. These instances are referenced from one instance of "org.codehaus.groovy.util.AbstractConcurrentMapBase$Segment[]", loaded by "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x5c0000000"
717,629 instances of "org.jenkinsci.plugins.workflow.cps.actions.ArgumentsActionImpl", loaded by "hudson.ClassicPluginStrategy$AntClassLoader2 @ 0x5c17d9770" occupy 296,727,528 (18.04%) bytes. These instances are referenced from one instance of "org.codehaus.groovy.util.AbstractConcurrentMapBase$Segment[]", loaded by "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x5c0000000"
293 instances of "org.jenkinsci.plugins.workflow.cps.CpsFlowExecution", loaded by "hudson.ClassicPluginStrategy$AntClassLoader2 @ 0x5c17d9770" occupy 210,909,800 (12.82%) bytes. These instances are referenced from one instance of "org.codehaus.groovy.util.AbstractConcurrentMapBase$Segment[]", loaded by "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x5c0000000"
Package | Retained Heap | Retained Heap, % | Top Dominators |
---|---|---|---|
workflow | 1,340,511,168 | 81.50% | 5,043,559 |
Workaround
We once again added "-Dgroovy.use.classvalue=true" to Java startup parameters and have noticed a reduction in Java nonheap memory usage. With the parameter we see the memory increase to around 750mb but when the server becomes idle the Java nonheap memory is reduced to under 300mb. Without the parameter we don't see the same reduction. Currently one of our Jenkins masters that doesn't have the parameter started on 3/2/18 at .2 GB and is currently (3/16/18) at .76GB. The graph looks like slow climb without much fluctuation. According to the Cloudbees article, this parameter is not recommended for Jenkins 2.89 however it does seem to help lower Java nonheap memory usage. Since it increases by over 200 mb a week, I should know shortly if some type of garbage collection reduces it on this second Jenkins master that is nearing the 1GB max. Java nonheap memory usage only reduced by 1mb when I manually performed a System.gc().