Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38260

Kubernetes plugin does not respect Container Cap

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • kubernetes-plugin
    • None
    • Jenkins 2.7.1
      Kubernetes plugin 0.8

      The Kubernetes plugin frequently creates more concurrent slaves than the number set in the "Container Cap" field.
      Even doesn't respect the Max number of instances in the Pod Template field.

          [JENKINS-38260] Kubernetes plugin does not respect Container Cap

          Hello, I have the same problem, although in my case it's version

          • 2.32.1 of Jenkins,
          • 0.10 of the plugin

          I have attached the log from the Jenkins master instance. 

          Thanks in advance for any help.

          jenkins-master.log

          Tomasz Bienkowski added a comment - Hello, I have the same problem, although in my case it's version 2.32.1 of Jenkins, 0.10 of the plugin I have attached the log from the Jenkins master instance.  Thanks in advance for any help. jenkins-master.log

          I don't know how the fabricat8's Kubernetes client works, but looking at the source

          code of "addProvisionedSlave" method here:

          https://github.com/jenkinsci/kubernetes-plugin/blob/kubernetes-0.10/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesCloud.java

          I was wandering if this could be a racing condition. Perhaps fabricat8's client does

          not return any pod until it is actually ready (started). If so, this might explain why 

          the Container Capacity is not respected. The sequence of events could be like this:

          1. Jenkins wants to create a slave pod. Container capacity is not exceeded.
          2. The pod is being deployed to Kubernetes (it is starting).
          3. Jenkins wants to create another pod, it asks the fabricat8's client if there are any slave pods running.
          4. fabricat8's client responds that there are no pods (because the pod from step (2) is still being deployed (it is not running yet)).
          5. Jenkins creates another pod, effectively exceeding Container Capacity setting.

          Is this reasonable?

          Tomasz Bienkowski added a comment - I don't know how the fabricat8's Kubernetes client works, but looking at the source code of "addProvisionedSlave" method here: https://github.com/jenkinsci/kubernetes-plugin/blob/kubernetes-0.10/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesCloud.java I was wandering if this could be a racing condition. Perhaps fabricat8's client does not return any pod until it is actually ready (started). If so, this might explain why  the Container Capacity is not respected. The sequence of events could be like this: Jenkins wants to create a slave pod. Container capacity is not exceeded. The pod is being deployed to Kubernetes (it is starting). Jenkins wants to create another pod, it asks the fabricat8's client if there are any slave pods running. fabricat8's client responds that there are no pods (because the pod from step (2) is still being deployed (it is not running yet)). Jenkins creates another pod, effectively exceeding Container Capacity setting. Is this reasonable?

          ok, so there may be some times when the container cap is not honored when multiple pods are started ant the same time

          Carlos Sanchez added a comment - ok, so there may be some times when the container cap is not honored when multiple pods are started ant the same time

          Unfortunately this can easily lead to exhaustion of the available hardware resources on the Kubernetes cluster.

          I have experienced Kubernetes cluster being destabilized because of this as the pods allocated to a physical

          node start to consume more memory than is physically available.

          Tomasz Bienkowski added a comment - Unfortunately this can easily lead to exhaustion of the available hardware resources on the Kubernetes cluster. I have experienced Kubernetes cluster being destabilized because of this as the pods allocated to a physical node start to consume more memory than is physically available.

          Stefan Bieler added a comment - - edited

          Hi all,

           

          we are facing the exact same effects as described by tb when having a Job triggering multiple others at the exact same time. If workload is high enough, almost everytime at least 6-7 new nodes are created, ignoring our container and template cap settings. This might be only an issue on small-scale clusters (we have only 3 instances for scheduling, each with 4GB RAM) where spawning 6-7 pods at once, each consuming nearly 2GB is kind of a neckbreaker. And it seems to me, as if there are never more than 6-7 pods created, although our job queue can reach up to 30 entries.

          When looking at the code, it is obvious that the capping-logic only works, as long as we can trust what the kubernetes API (resp. fabric8s client impl) is returning. Although the plugin code should now better: it knows how many slave-creation requests were already submitted. Wouldn't it be a better solution to just take an internal counter value into consideration (initialized for the first time by asking the cluster for how many matching nodes exists - named and slaveNodes), instead of asking the remote cluster over and over again? Maybe just a stupid idea, but better than nothing. 

          Regards,

          Stefan

           

          Stefan Bieler added a comment - - edited Hi all,   we are facing the exact same effects as described by tb  when having a Job triggering multiple others at the exact same time. If workload is high enough, almost everytime at least 6-7 new nodes are created, ignoring our container and template cap settings. This might be only an issue on small-scale clusters (we have only 3 instances for scheduling, each with 4GB RAM) where spawning 6-7 pods at once, each consuming nearly 2GB is kind of a neckbreaker. And it seems to me, as if there are never more than 6-7 pods created, although our job queue can reach up to 30 entries. When looking at the code, it is obvious that the capping-logic only works, as long as we can trust what the kubernetes API (resp. fabric8s client impl) is returning. Although the plugin code should now better: it knows how many slave-creation requests were already submitted. Wouldn't it be a better solution to just take an internal counter value into consideration (initialized for the first time by asking the cluster for how many matching nodes exists - named and slaveNodes), instead of asking the remote cluster over and over again? Maybe just a stupid idea, but better than nothing.  Regards, Stefan  

          Martin Sander added a comment -

          I was also able to reproduce it. Just start (container cap) * 2 jobs at the same time.

          Martin Sander added a comment - I was also able to reproduce it. Just start (container cap) * 2 jobs at the same time.

          Martin Sander added a comment - - edited

          I think that the race condition is created because despite the fact that the check in addProvisionedSlave is made from a synchronized method, the actual provisioning is done from a thread pool, i.e. in an asynchronous fashion:

          https://github.com/jenkinsci/kubernetes-plugin/blob/kubernetes-0.11/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesCloud.java#L554-L555

                              r.add(new NodeProvisioner.PlannedNode(t.getDisplayName(), Computer.threadPoolForRemoting
                                          .submit(new ProvisioningCallback(this, t, label)), 1));
          

          So you probably have to do the check again in a loop inside the provisioning thread, again from a synchronized method...
          Synchronizing again in the provisioning thread would of course not work, because that way you could only ever spin up a single Jenkins slave at once - including waiting for it to become available.
          I guess one could come up with a smart way to count how many nodes are currently scheduled to be spun up but not yet available, but I don't know enough of the Jenkins internals to know how to achieve that...

          Martin Sander added a comment - - edited I think that the race condition is created because despite the fact that the check in addProvisionedSlave is made from a synchronized method , the actual provisioning is done from a thread pool, i.e. in an asynchronous fashion: https://github.com/jenkinsci/kubernetes-plugin/blob/kubernetes-0.11/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesCloud.java#L554-L555 r.add( new NodeProvisioner.PlannedNode(t.getDisplayName(), Computer.threadPoolForRemoting .submit( new ProvisioningCallback( this , t, label)), 1)); So you probably have to do the check again in a loop inside the provisioning thread, again from a synchronized method... Synchronizing again in the provisioning thread would of course not work, because that way you could only ever spin up a single Jenkins slave at once - including waiting for it to become available. I guess one could come up with a smart way to count how many nodes are currently scheduled to be spun up but not yet available, but I don't know enough of the Jenkins internals to know how to achieve that...

          NAVIN ILANGO added a comment -

          I have the same problem and this issue eats up all resources in my namespace when there is good amount of load .

          NAVIN ILANGO added a comment - I have the same problem and this issue eats up all resources in my namespace when there is good amount of load .

          tb when you say "exhaustion of the available hardware resources" that should not happen if you set the correct memory and cpu requests and limits

          Seems that the problem is with concurrent agent start requests, when excessWorkload is > 1. So instead of adding one agent by one, we can make addProvisionedSlave start n (=excessWorkload, up to the cap limit) then return the number of agents created before continuing the iteration through templates

          Carlos Sanchez added a comment - tb when you say "exhaustion of the available hardware resources" that should not happen if you set the correct memory and cpu requests and limits Seems that the problem is with concurrent agent start requests, when excessWorkload is > 1. So instead of adding one agent by one, we can make addProvisionedSlave start n (=excessWorkload, up to the cap limit) then return the number of agents created before continuing the iteration through templates

          should have been fixed by JENKINS-47501

          Carlos Sanchez added a comment - should have been fixed by JENKINS-47501

            csanchez Carlos Sanchez
            jrogers Jonathan Rogers
            Votes:
            9 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: