Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50210

Kubernetes agent provisioned regardless of container cap

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • kubernetes-plugin
    • None
    • kubernetes v1.6.1
      Jenkins: v2.89.4 & v2.89.2
      kubernetes-plugin: v1.3.2 & v1.3.1

      Hi,
      My teams and I are currently facing an issue with the provisioning of slaves with the kubernetes-plugin using podTemplate that always go higher than the limitation we try to set.

      We need to run multiple (tens to hundreds depending on the job) slaves in parallel, that will be doing the same task, and thus we create them with the same pod template.
      Those slaves will run with a heavy CPU workload, so we need to limit the amount of attributed CPU to each of them, but also to set a "container cap" to limit the number of them running at the same time. (because we can not lower the CPU limit too much).

      We then decided a Container cap of 7 for our first runs, to see what happens.

      That where the problems arose. First, we saw that the cap was not respected strictly, as sometimes, 8 or 9 of them were running at the same time. This is not critical, we can always work with that if we remember it.

      Where is the real issue I want to address: 
      We figured that after our first batch of ~7 finish, Jenkins will provision, at the same time, a huge amount of slaves, without consideration of the container cap.
      To illustrate that, I recreated a simple scenario to replicate the issue, so I ran in parallel 22 podTemplate that only sleep for a certain time, container cap of 7. What happens was that after the 7 first terminate COMPLETLY (even if 2 or 3 of them finish before that), another batch started, but with the result of having 14 containers running at the same time, and one remaining "in queue". (with more, the queue will be the remaining)

      The same compartment arises immediately if we augment the container cap when the build is running.

      We would like to know if this was a known issue (I tried to find a similar one here or other places), or if you had an idea of what is happening and why?

      I put the scripts that we quickly used to reproduce the issue, and parts of the logs of my Jenkins pod (I cleared it a bit as it was more than 5000 lines of duplicated lines) if it can help you with this.
      And if you need further details, feel free to ask we will try our best to answer asap.

            csanchez Carlos Sanchez
            g_dviniere Damien Viniere
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: