-
Bug
-
Resolution: Fixed
-
Major
-
None
We recently upgraded our Jenkins instances to use kubernetes-plugin 1.30.1 (from 1.27.8). We have a concurrency limit set to 2 on our kubernetes cloud. As expected, two agents can spawn concurrently on Jenkins startup. This proper behavior continues for some time. But after a random number of days, only a single agent is spawn at a time. Concurrent jobs are added to the queue until the single agent is teared down. It is as if the concurrency limit was set to 1. Increasing it to 3 lets the plugin spawn 2 agents concurrently.
I've created a FINEST logger on org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits and I noticed that when the issue happen, the kubernetes global limit never goes back to 0/2. It stays to 1/2 even when no job is running (or 1/3 if I increase the concurrency limit to 3).
My guess is that https://github.com/jenkinsci/kubernetes-plugin/pull/939 has a race condition and that, on specific timing, the global count https://github.com/jenkinsci/kubernetes-plugin/pull/939/files#diff-4877a6b83daf403574dc28dca505926a6b3ad326b84f891f278a4424a68f4b84R103 is not properly decreased.
I'll try to write a test case that shows up the issue.