-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
Jenkins 2.187
Linux
openjdk version "1.8.0_222"
-
-
workflow-api 2.37
Our large Jenkins instance had a CPU usage spike last weekend.
After some digging, we found that all `org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep` threads tended to get stuck in `java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry` with various stacks, which seems to be the consequence of a ThreadLocal leak:
"org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#12]: checking <redacted> on i-0db24f1a6b300ea1f" #3845 daemon prio=5 os_prio=0 tid=0x00007f2dd8bd2800 nid=0x30bb runnable [0x00007f2d2a926000] java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617) at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499) at java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298) at java.lang.ThreadLocal.remove(ThreadLocal.java:222) at hudson.remoting.Channel.setCurrent(Channel.java:1787) at hudson.remoting.UserRequest$NormalResponse.retrieve(UserRequest.java:328) at hudson.remoting.Channel.call(Channel.java:957) at hudson.FilePath.act(FilePath.java:1072) at hudson.FilePath.act(FilePath.java:1061) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:209) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:532) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:489) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Digged the content of the ThreadLocalMap of each of the live threads of the `ScheduledThreadPoolExecutor` and got something similar:
Thread 3841 org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#8]: checking <redacted> on i-04a6c09bbdc596bbd ThreadLocalMap size: 65536 Elements count per key type: [org.jenkinsci.plugins.workflow.log.DelayBufferedOutputStream$FlushControlledOutputStream$1:20704, java.lang.ThreadLocal:47, hudson.remoting.Channel$ThreadLastIoId:34, null:42550, org.codehaus.groovy.runtime.NullObject:2200, java.lang.ThreadLocal$SuppliedThreadLocal:1]
It looks like the following ThreadLocal is never cleared since it gets attached to reused threads from a thread pool:
- links to
[JENKINS-58899] ThreadLocal leak causing high CPU usage with DurableTaskStep threads
Description |
Original:
Our large Jenkins instance had a CPU usage spike last weekend. After some digging, we found that all `org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep` threads tended to get stuck in `java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry` with various stacks, which seems to be the consequence of a ThreadLocal leak: {noformat} "org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#12]: checking <redacted> on i-0db24f1a6b300ea1f" #3845 daemon prio=5 os_prio=0 tid=0x00007f2dd8bd2800 nid=0x30bb runnable [0x00007f2d2a926000] java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617) at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499) at java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298) at java.lang.ThreadLocal.remove(ThreadLocal.java:222) at hudson.remoting.Channel.setCurrent(Channel.java:1787) at hudson.remoting.UserRequest$NormalResponse.retrieve(UserRequest.java:328) at hudson.remoting.Channel.call(Channel.java:957) at hudson.FilePath.act(FilePath.java:1072) at hudson.FilePath.act(FilePath.java:1061) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:209) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:532) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:489) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){noformat} Digged the content of the ThreadLocalMap of each of the live threads of the `ScheduledThreadPoolExecutor` and got something similar: {noformat} Thread 3841 org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#8]: checking <redacted> on i-04a6c09bbdc596bbd Map size: 65536 [org.jenkinsci.plugins.workflow.log.DelayBufferedOutputStream$FlushControlledOutputStream$1:20704, java.lang.ThreadLocal:47, hudson.remoting.Channel$ThreadLastIoId:34, null:42550, org.codehaus.groovy.runtime.NullObject:2200, java.lang.ThreadLocal$SuppliedThreadLocal:1]{noformat} It looks like the following ThreadLocal is never cleared since it gets attached to reused threads from a thread pool: https://github.com/jenkinsci/workflow-api-plugin/blob/907cf64feb2f38e93aebbedf87946b0235c3dc93/src/main/java/org/jenkinsci/plugins/workflow/log/DelayBufferedOutputStream.java#L123 |
New:
Our large Jenkins instance had a CPU usage spike last weekend. After some digging, we found that all `org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep` threads tended to get stuck in `java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry` with various stacks, which seems to be the consequence of a ThreadLocal leak: {noformat} "org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#12]: checking <redacted> on i-0db24f1a6b300ea1f" #3845 daemon prio=5 os_prio=0 tid=0x00007f2dd8bd2800 nid=0x30bb runnable [0x00007f2d2a926000] java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(ThreadLocal.java:617) at java.lang.ThreadLocal$ThreadLocalMap.remove(ThreadLocal.java:499) at java.lang.ThreadLocal$ThreadLocalMap.access$200(ThreadLocal.java:298) at java.lang.ThreadLocal.remove(ThreadLocal.java:222) at hudson.remoting.Channel.setCurrent(Channel.java:1787) at hudson.remoting.UserRequest$NormalResponse.retrieve(UserRequest.java:328) at hudson.remoting.Channel.call(Channel.java:957) at hudson.FilePath.act(FilePath.java:1072) at hudson.FilePath.act(FilePath.java:1061) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:209) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:532) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:489) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){noformat} Digged the content of the ThreadLocalMap of each of the live threads of the `ScheduledThreadPoolExecutor` and got something similar: {noformat} Thread 3841 org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#8]: checking <redacted> on i-04a6c09bbdc596bbd ThreadLocalMap size: 65536 Elements count per key type: [org.jenkinsci.plugins.workflow.log.DelayBufferedOutputStream$FlushControlledOutputStream$1:20704, java.lang.ThreadLocal:47, hudson.remoting.Channel$ThreadLastIoId:34, null:42550, org.codehaus.groovy.runtime.NullObject:2200, java.lang.ThreadLocal$SuppliedThreadLocal:1]{noformat} It looks like the following ThreadLocal is never cleared since it gets attached to reused threads from a thread pool: [https://github.com/jenkinsci/workflow-api-plugin/blob/907cf64feb2f38e93aebbedf87946b0235c3dc93/src/main/java/org/jenkinsci/plugins/workflow/log/DelayBufferedOutputStream.java#L123] |
Summary | Original: ThreadLocal leak causing high CPU usage | New: ThreadLocal leak causing high CPU usage with DurableTaskStep threads |
Assignee | New: Devin Nusbaum [ dnusbaum ] |
Status | Original: Open [ 1 ] | New: In Progress [ 3 ] |
Status | Original: In Progress [ 3 ] | New: In Review [ 10005 ] |
Remote Link | New: This issue links to "jenkinsci/workflow-api-plugin#100 (Web Link)" [ 23404 ] |
Resolution | New: Fixed [ 1 ] | |
Status | Original: In Review [ 10005 ] | New: Fixed but Unreleased [ 10203 ] |
Released As | New: workflow-api 2.37 | |
Status | Original: Fixed but Unreleased [ 10203 ] | New: Resolved [ 5 ] |