Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64843

Memory Leak : Hudson.model.hudson.labels (java.util.concurrent.ConcurrentHashMap)

      Since the beginning of the year,  we faced with Jenkins 2 master slowdowns : from GC logs, we suspect a memory leak ..

      As you can see, the Old Gen memory increased by more than 800 MB in two weeks.

      To confirm this memory leak suspicious  : we made a heap dump.

      The heap dump shows 51% of the old gen is used to store the attribute Hudson.model.hudson.labels which contains ConcurrentHashMap of 3517 labels used for each jobs launched from the starting of the jvm.

      The ConcurrentHashMap containing labels doesn't appear to remove entries after the job has run.

      This issue seems to be revealed since the workload on Jenkins has increased a lot in recent months.

       

       

          [JENKINS-64843] Memory Leak : Hudson.model.hudson.labels (java.util.concurrent.ConcurrentHashMap)

          Billy created issue -

          Has the number of nodes on your system increased? When did you start experiencing this issue?

          Raihaan Shouhell added a comment - Has the number of nodes on your system increased? When did you start experiencing this issue?

          Billy added a comment - - edited

          Hi Raihaan ! Thanks for your comment

          Our Jenkins use dynamic nodes with Kubernetes plugin : https://plugins.jenkins.io/kubernetes/ , so with the increasing load on Jenkins, the number of nodes must have  increased as well.

          We use the version 1.27.3 of Kubernetes plugin.

          We are starting to experience this issue since the beggining of the year but we believe that this issue was already there and the load was insuffiscient to show a memory leak.

           

          Billy added a comment - - edited Hi Raihaan ! Thanks for your comment Our Jenkins use dynamic nodes with Kubernetes plugin : https://plugins.jenkins.io/kubernetes/  , so with the increasing load on Jenkins, the number of nodes must have  increased as well. We use the version 1.27.3 of Kubernetes plugin. We are starting to experience this issue since the beggining of the year but we believe that this issue was already there and the load was insuffiscient to show a memory leak.  

          Hey Billy, I've looked into core to figure out how this bit works actually this particular map is trimmed out every time a node is added / removed /updated and every 5 minutes.
          This map is essentially a memory saving technique where it caches labels based on expressions that are valid.
          e.g Assuming a node that has labels linux java ubuntu spawns from your kube cluster. linux&&java results in one entry ubuntu&&java results in another entry, linux&&ubuntu, all 3 together, each on their own etc. When these labels are used they won't ever be removed because the kube cluster can still provision more of them. Which might be the case you are running into that causes high memory usage.

          How many nodes are there on your system?
          Are there many different node configurations that can be spawned?
          Does each node configuration have a ton of labels attached to them?

          Raihaan Shouhell added a comment - Hey Billy, I've looked into core to figure out how this bit works actually this particular map is trimmed out every time a node is added / removed /updated and every 5 minutes. This map is essentially a memory saving technique where it caches labels based on expressions that are valid. e.g Assuming a node that has labels linux java ubuntu spawns from your kube cluster. linux&&java results in one entry ubuntu&&java results in another entry, linux&&ubuntu, all 3 together, each on their own etc. When these labels are used they won't ever be removed because the kube cluster can still provision more of them. Which might be the case you are running into that causes high memory usage. How many nodes are there on your system? Are there many different node configurations that can be spawned? Does each node configuration have a ton of labels attached to them?

          Billy added a comment - - edited

          How many nodes are there on your system ?

          With the kubernetes plugin, we have as many nodes as there are pods as we have we have jobs executed .. approximatively, 500 builds per day.

          Are there many different node configurations that can be spawned ?

          All node configuration are done with Kubernetes plugin, per exemple : 

          def label = "random-${UUID.randomUUID().toString()}"
          podTemplate(name: label, label:label, nodeSelector: 'indus.lyra-network.com/maven-cache=true', containers: [
            containerTemplate(name: 'maven', image: 'maven/maven33:latest', privileged: true, ttyEnabled: true, command: 'cat', alwaysPullImage: true),
            containerTemplate(name: 'jnlp', image: 'openshift/jenkins-slave-base-centos7', args: '${computer.jnlpmac} ${computer.name}', alwaysPullImage: true)
            ]) {   
                node(label) {
                  stage('Checkout scm'){
                      echo "$label"
                  }
              }
          }

           
          Does each node configuration have a ton of labels attached to them ?

          Each node created with the podTemplate have a unique label to avoid conflict across builds as explained in the documentation : https://github.com/jenkinsci/kubernetes-plugin#pod-and-container-template-configuration

          I'm assuming instead of manually forcing the node label, we use the POD_LABEL value as described in the documentation (https://github.com/jenkinsci/kubernetes-plugin#pipeline-support), we will have the same memory issue because the POD_LABEL is a unique value automatically generated.

          def name = "random-${UUID.randomUUID().toString()}"
          podTemplate(name: name, nodeSelector: 'indus.lyra-network.com/maven-cache=true', containers: [
            containerTemplate(name: 'maven', image: 'maven/maven33:latest', privileged: true, ttyEnabled: true, command: 'cat', alwaysPullImage: true),
            containerTemplate(name: 'jnlp', image: 'openshift/jenkins-slave-base-centos7', args: '${computer.jnlpmac} ${computer.name}', alwaysPullImage: true)
            ]) {    
                node(POD_LABEL) {
                  stage('Checkout scm'){
                       echo "$POD_LABEL"
                 }    
             }
          }

          When these labels are used they won't ever be removed because the kube cluster can still provision more of them. 

          I don't understand if the memory leak is cause by the Kubernetes plugin or by our usage of Kubernetes plugin ..  I have the feeling that the dynamic nodes created with the Kubernetes plugin is not compatible the label philosophy of Jenkins core.

           

          Billy added a comment - - edited How many nodes are there on your system ? With the kubernetes plugin, we have as many nodes as there are pods as we have we have jobs executed .. approximatively, 500 builds per day. Are there many different node configurations that can be spawned ? All node configuration are done with Kubernetes plugin, per exemple :  def label = "random-${UUID.randomUUID().toString()}" podTemplate(name: label, label:label, nodeSelector: 'indus.lyra-network.com/maven-cache=true', containers: [ containerTemplate(name: 'maven', image: 'maven/maven33:latest', privileged: true, ttyEnabled: true, command: 'cat', alwaysPullImage: true), containerTemplate(name: 'jnlp', image: 'openshift/jenkins-slave-base-centos7', args: '${computer.jnlpmac} ${computer.name}', alwaysPullImage: true) ]) { node(label) { stage('Checkout scm'){ echo "$label" } } }   Does each node configuration have a ton of labels attached to them ? Each node created with the podTemplate have a unique label to avoid conflict across builds as explained in the documentation : https://github.com/jenkinsci/kubernetes-plugin#pod-and-container-template-configuration I'm assuming instead of manually forcing the node label, we use the POD_LABEL value as described in the documentation ( https://github.com/jenkinsci/kubernetes-plugin#pipeline-support ), we will have the same memory issue because the POD_LABEL is a unique value automatically generated. def name = "random-${UUID.randomUUID().toString()}" podTemplate(name: name, nodeSelector: 'indus.lyra-network.com/maven-cache=true', containers: [ containerTemplate(name: 'maven', image: 'maven/maven33:latest', privileged: true, ttyEnabled: true, command: 'cat', alwaysPullImage: true), containerTemplate(name: 'jnlp', image: 'openshift/jenkins-slave-base-centos7', args: '${computer.jnlpmac} ${computer.name}', alwaysPullImage: true) ]) { node(POD_LABEL) { stage('Checkout scm'){ echo "$POD_LABEL" } } } When these labels are used they won't ever be removed because the kube cluster can still provision more of them.   I don't understand if the memory leak is cause by the Kubernetes plugin or by our usage of Kubernetes plugin ..  I have the feeling that the dynamic nodes created with the Kubernetes plugin is not compatible the label philosophy of Jenkins core.  

          500 Builds a day seems pretty normal usage, I don't see why there should be 3000 entries

          The random string will not be kept permanently (sorry I was not aware of how the operation of the podTemplates worked) as the random string is not tied to a podTemplate directly but to a specific node, once the node is deleted it should be removed.

          I'm curious what is actually in that particular Map (the actual values in the key / value field, especially the key)

          Raihaan Shouhell added a comment - 500 Builds a day seems pretty normal usage, I don't see why there should be 3000 entries The random string will not be kept permanently (sorry I was not aware of how the operation of the podTemplates worked) as the random string is not tied to a podTemplate directly but to a specific node, once the node is deleted it should be removed. I'm curious what is actually in that particular Map (the actual values in the key / value field, especially the key)

          Billy added a comment -

          As explain before, assuming the labels map should remove label node after the node deletion : the 3500 entries shows (by the heap dump) that the labels map doesn't have the expected behavior ..

          I don't found any fix about this issue in jenkins release notes > 2.249.2 : may be you have more informations on this.

          Could you confirm that this memory leak is caused by an issue in jenkins core or kubernetes plugin ? 

          Billy added a comment - As explain before, assuming the labels map should remove label node after the node deletion : the 3500 entries shows (by the heap dump) that the labels map doesn't have the expected behavior .. I don't found any fix about this issue in jenkins release notes > 2.249.2 : may be you have more informations on this. Could you confirm that this memory leak is caused by an issue in jenkins core or kubernetes plugin ? 

          Hey Billy,

          I took a look at the Kubernetes plugin and it seems that there was a bug with template deletion that is present in 1.27.3 https://github.com/jenkinsci/kubernetes-plugin/pull/880

          This might explain why the labels are not removed.

          Raihaan Shouhell added a comment - Hey Billy, I took a look at the Kubernetes plugin and it seems that there was a bug with template deletion that is present in 1.27.3 https://github.com/jenkinsci/kubernetes-plugin/pull/880 This might explain why the labels are not removed.
          Raihaan Shouhell made changes -
          Assignee New: Raihaan Shouhell [ raihaan ]

          htbthach I believe your issue should go away if you update the kubernetes plugin to 1.27.5 or perhaps just 1.29.0

          Raihaan Shouhell added a comment - htbthach I believe your issue should go away if you update the kubernetes plugin to 1.27.5 or perhaps just 1.29.0

            raihaan Raihaan Shouhell
            htbthach Billy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: