[JENKINS-64843] Memory Leak : Hudson.model.hudson.labels (java.util.concurrent.ConcurrentHashMap)

Type: Bug
Resolution: Fixed
Priority: Major
Component/s: kubernetes-plugin
Labels:
- memory-leak
Environment:
Jenkins 2.249.2

Similar Issues:
Powered by SuggestiMate

Show

Since the beginning of the year, we faced with Jenkins 2 master slowdowns : from GC logs, we suspect a memory leak ..

As you can see, the Old Gen memory increased by more than 800 MB in two weeks.

To confirm this memory leak suspicious : we made a heap dump.

The heap dump shows 51% of the old gen is used to store the attribute Hudson.model.hudson.labels which contains ConcurrentHashMap of 3517 labels used for each jobs launched from the starting of the jvm.

The ConcurrentHashMap containing labels doesn't appear to remove entries after the job has run.

This issue seems to be revealed since the workload on Jenkins has increased a lot in recent months.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

2021-02-10_09h31_45.png
210 kB
2021-02-10 13:57
2021-01-28_16h38_28.png
47 kB
2021-02-10 13:54

Billy created issue - 2021-02-10 14:12

Raihaan Shouhell added a comment - 2021-02-13 01:56

Has the number of nodes on your system increased? When did you start experiencing this issue?

Raihaan Shouhell added a comment - 2021-02-13 01:56 Has the number of nodes on your system increased? When did you start experiencing this issue?

Billy added a comment - 2021-02-16 10:37 - edited

Hi Raihaan ! Thanks for your comment

Our Jenkins use dynamic nodes with Kubernetes plugin : https://plugins.jenkins.io/kubernetes/ , so with the increasing load on Jenkins, the number of nodes must have increased as well.

We use the version 1.27.3 of Kubernetes plugin.

We are starting to experience this issue since the beggining of the year but we believe that this issue was already there and the load was insuffiscient to show a memory leak.

Billy added a comment - 2021-02-16 10:37 - edited Hi Raihaan ! Thanks for your comment Our Jenkins use dynamic nodes with Kubernetes plugin : https://plugins.jenkins.io/kubernetes/ , so with the increasing load on Jenkins, the number of nodes must have increased as well. We use the version 1.27.3 of Kubernetes plugin. We are starting to experience this issue since the beggining of the year but we believe that this issue was already there and the load was insuffiscient to show a memory leak.

Raihaan Shouhell added a comment - 2021-02-16 12:08

Hey Billy, I've looked into core to figure out how this bit works actually this particular map is trimmed out every time a node is added / removed /updated and every 5 minutes.
This map is essentially a memory saving technique where it caches labels based on expressions that are valid.
e.g Assuming a node that has labels linux java ubuntu spawns from your kube cluster. linux&&java results in one entry ubuntu&&java results in another entry, linux&&ubuntu, all 3 together, each on their own etc. When these labels are used they won't ever be removed because the kube cluster can still provision more of them. Which might be the case you are running into that causes high memory usage.

How many nodes are there on your system?
Are there many different node configurations that can be spawned?
Does each node configuration have a ton of labels attached to them?

Raihaan Shouhell added a comment - 2021-02-16 12:08 Hey Billy, I've looked into core to figure out how this bit works actually this particular map is trimmed out every time a node is added / removed /updated and every 5 minutes. This map is essentially a memory saving technique where it caches labels based on expressions that are valid. e.g Assuming a node that has labels linux java ubuntu spawns from your kube cluster. linux&&java results in one entry ubuntu&&java results in another entry, linux&&ubuntu, all 3 together, each on their own etc. When these labels are used they won't ever be removed because the kube cluster can still provision more of them. Which might be the case you are running into that causes high memory usage. How many nodes are there on your system? Are there many different node configurations that can be spawned? Does each node configuration have a ton of labels attached to them?

Billy added a comment - 2021-02-16 16:48 - edited

How many nodes are there on your system ?

With the kubernetes plugin, we have as many nodes as there are pods as we have we have jobs executed .. approximatively, 500 builds per day.

Are there many different node configurations that can be spawned ?

All node configuration are done with Kubernetes plugin, per exemple :

def label = "random-${UUID.randomUUID().toString()}"
podTemplate(name: label, label:label, nodeSelector: 'indus.lyra-network.com/maven-cache=true', containers: [
  containerTemplate(name: 'maven', image: 'maven/maven33:latest', privileged: true, ttyEnabled: true, command: 'cat', alwaysPullImage: true),
  containerTemplate(name: 'jnlp', image: 'openshift/jenkins-slave-base-centos7', args: '${computer.jnlpmac} ${computer.name}', alwaysPullImage: true)
  ]) {   
      node(label) {
        stage('Checkout scm'){
            echo "$label"
        }
    }
}

Does each node configuration have a ton of labels attached to them ?

Each node created with the podTemplate have a unique label to avoid conflict across builds as explained in the documentation : https://github.com/jenkinsci/kubernetes-plugin#pod-and-container-template-configuration

I'm assuming instead of manually forcing the node label, we use the POD_LABEL value as described in the documentation (https://github.com/jenkinsci/kubernetes-plugin#pipeline-support), we will have the same memory issue because the POD_LABEL is a unique value automatically generated.

def name = "random-${UUID.randomUUID().toString()}"
podTemplate(name: name, nodeSelector: 'indus.lyra-network.com/maven-cache=true', containers: [
  containerTemplate(name: 'maven', image: 'maven/maven33:latest', privileged: true, ttyEnabled: true, command: 'cat', alwaysPullImage: true),
  containerTemplate(name: 'jnlp', image: 'openshift/jenkins-slave-base-centos7', args: '${computer.jnlpmac} ${computer.name}', alwaysPullImage: true)
  ]) {    
      node(POD_LABEL) {
        stage('Checkout scm'){
             echo "$POD_LABEL"
       }    
   }
}

When these labels are used they won't ever be removed because the kube cluster can still provision more of them.

I don't understand if the memory leak is cause by the Kubernetes plugin or by our usage of Kubernetes plugin .. I have the feeling that the dynamic nodes created with the Kubernetes plugin is not compatible the label philosophy of Jenkins core.

Billy added a comment - 2021-02-16 16:48 - edited How many nodes are there on your system ? With the kubernetes plugin, we have as many nodes as there are pods as we have we have jobs executed .. approximatively, 500 builds per day. Are there many different node configurations that can be spawned ? All node configuration are done with Kubernetes plugin, per exemple : def label = "random-${UUID.randomUUID().toString()}" podTemplate(name: label, label:label, nodeSelector: 'indus.lyra-network.com/maven-cache=true', containers: [ containerTemplate(name: 'maven', image: 'maven/maven33:latest', privileged: true, ttyEnabled: true, command: 'cat', alwaysPullImage: true), containerTemplate(name: 'jnlp', image: 'openshift/jenkins-slave-base-centos7', args: '${computer.jnlpmac} ${computer.name}', alwaysPullImage: true) ]) { node(label) { stage('Checkout scm'){ echo "$label" } } } Does each node configuration have a ton of labels attached to them ? Each node created with the podTemplate have a unique label to avoid conflict across builds as explained in the documentation : https://github.com/jenkinsci/kubernetes-plugin#pod-and-container-template-configuration I'm assuming instead of manually forcing the node label, we use the POD_LABEL value as described in the documentation ( https://github.com/jenkinsci/kubernetes-plugin#pipeline-support ), we will have the same memory issue because the POD_LABEL is a unique value automatically generated. def name = "random-${UUID.randomUUID().toString()}" podTemplate(name: name, nodeSelector: 'indus.lyra-network.com/maven-cache=true', containers: [ containerTemplate(name: 'maven', image: 'maven/maven33:latest', privileged: true, ttyEnabled: true, command: 'cat', alwaysPullImage: true), containerTemplate(name: 'jnlp', image: 'openshift/jenkins-slave-base-centos7', args: '${computer.jnlpmac} ${computer.name}', alwaysPullImage: true) ]) { node(POD_LABEL) { stage('Checkout scm'){ echo "$POD_LABEL" } } } When these labels are used they won't ever be removed because the kube cluster can still provision more of them. I don't understand if the memory leak is cause by the Kubernetes plugin or by our usage of Kubernetes plugin .. I have the feeling that the dynamic nodes created with the Kubernetes plugin is not compatible the label philosophy of Jenkins core.

Raihaan Shouhell added a comment - 2021-02-16 22:37

500 Builds a day seems pretty normal usage, I don't see why there should be 3000 entries

The random string will not be kept permanently (sorry I was not aware of how the operation of the podTemplates worked) as the random string is not tied to a podTemplate directly but to a specific node, once the node is deleted it should be removed.

I'm curious what is actually in that particular Map (the actual values in the key / value field, especially the key)

Raihaan Shouhell added a comment - 2021-02-16 22:37 500 Builds a day seems pretty normal usage, I don't see why there should be 3000 entries The random string will not be kept permanently (sorry I was not aware of how the operation of the podTemplates worked) as the random string is not tied to a podTemplate directly but to a specific node, once the node is deleted it should be removed. I'm curious what is actually in that particular Map (the actual values in the key / value field, especially the key)

Billy added a comment - 2021-02-17 15:00

As explain before, assuming the labels map should remove label node after the node deletion : the 3500 entries shows (by the heap dump) that the labels map doesn't have the expected behavior ..

I don't found any fix about this issue in jenkins release notes > 2.249.2 : may be you have more informations on this.

Could you confirm that this memory leak is caused by an issue in jenkins core or kubernetes plugin ?

Billy added a comment - 2021-02-17 15:00 As explain before, assuming the labels map should remove label node after the node deletion : the 3500 entries shows (by the heap dump) that the labels map doesn't have the expected behavior .. I don't found any fix about this issue in jenkins release notes > 2.249.2 : may be you have more informations on this. Could you confirm that this memory leak is caused by an issue in jenkins core or kubernetes plugin ?

Raihaan Shouhell added a comment - 2021-02-17 15:37

Hey Billy,

I took a look at the Kubernetes plugin and it seems that there was a bug with template deletion that is present in 1.27.3 https://github.com/jenkinsci/kubernetes-plugin/pull/880

This might explain why the labels are not removed.

Raihaan Shouhell added a comment - 2021-02-17 15:37 Hey Billy, I took a look at the Kubernetes plugin and it seems that there was a bug with template deletion that is present in 1.27.3 https://github.com/jenkinsci/kubernetes-plugin/pull/880 This might explain why the labels are not removed.

Raihaan Shouhell made changes - 2021-02-18 04:30

Assignee

New: Raihaan Shouhell [ raihaan ]

Raihaan Shouhell added a comment - 2021-02-18 04:33

htbthach I believe your issue should go away if you update the kubernetes plugin to 1.27.5 or perhaps just 1.29.0

Raihaan Shouhell added a comment - 2021-02-18 04:33 htbthach I believe your issue should go away if you update the kubernetes plugin to 1.27.5 or perhaps just 1.29.0

Assignee:: Raihaan Shouhell

Reporter:: Billy

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2021-02-10 14:12

Updated:: 2021-04-19 09:56

Resolved:: 2021-03-08 13:49

Jenkins

Details

Description

Attachments

Attachments

Activity

Collapse comment: Raihaan Shouhell added a comment - 2021-02-13 01:56

Expand comment: Raihaan Shouhell added a comment - 2021-02-13 01:56

Collapse comment: Billy added a comment - 2021-02-16 10:37, Edited by Billy - 2021-02-16 10:37

Expand comment: Billy added a comment - 2021-02-16 10:37, Edited by Billy - 2021-02-16 10:37

Collapse comment: Raihaan Shouhell added a comment - 2021-02-16 12:08

Expand comment: Raihaan Shouhell added a comment - 2021-02-16 12:08

Collapse comment: Billy added a comment - 2021-02-16 16:48, Edited by Billy - 2021-02-16 16:48

Expand comment: Billy added a comment - 2021-02-16 16:48, Edited by Billy - 2021-02-16 16:48

Collapse comment: Raihaan Shouhell added a comment - 2021-02-16 22:37

Expand comment: Raihaan Shouhell added a comment - 2021-02-16 22:37

Collapse comment: Billy added a comment - 2021-02-17 15:00

Expand comment: Billy added a comment - 2021-02-17 15:00

Collapse comment: Raihaan Shouhell added a comment - 2021-02-17 15:37

Expand comment: Raihaan Shouhell added a comment - 2021-02-17 15:37

Collapse comment: Raihaan Shouhell added a comment - 2021-02-18 04:33

Expand comment: Raihaan Shouhell added a comment - 2021-02-18 04:33

People

Dates