-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
-
3546.v6103d89542d6
After a Jenkins restart, some of our Jenkins instances are stuck, unable to spawn any new Kubernetes agent. As we already faced some issues with the KubernetesProvisioningLimits class (see https://issues.jenkins.io/browse/JENKINS-66484 and https://github.com/jenkinsci/kubernetes-plugin/pull/1028), I re-created a logger for {{org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits }}and indeed, the stuck instance were showing "impossible" data, e.g.:
Nov 10, 2021 2:57:09 AM FINEST org.csanchez.jenkins.plugins.kubernetes.KubernetesProvisioningLimits kubernetes global limit reached: 32/4. Cannot add 1 more!
My best guess is that KubernetesProvisioningLimits initialization phase has some race condition and more specifically when the instance is restarted with some jobs in the queue. It seems that KubernetesSlave are being created for the elements in the queue before the KubernetesProvisioningLimits#init method is invoked.
- links to
Ping! This issue randomly affects ~250 Eclipse projects and blocks their CI instance at https://ci.eclipse.org.
See also: https://bugs.eclipse.org/bugs/show_bug.cgi?id=577166