-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Jenkins 2.298
kubernetes 1.30.0
kubernetes-client-api 5.4.1
Short version:
After restarting a Jenkins instance it is unable to provision Kubernetes based agents anymore; after restarting it again it is able to do so again.
Long version:
We've seen this happening a couple of weeks ago, and now again. It happens to a number of Jenkinses we run, one of them having the versions in the "environment" fields, some a few versions older. We have many other Jenkinses too that don't have this issue.
Every Saturday we restart the Jenkins instances. Before that Kubernetes based dynamic agents were working correctly. After the Saturday restart they stopped working, builds weren't able to start new agents. So now (Monday) we restarted Jenkins and now they work perfectly again.
Before the restart I tried to recreate the cluster config from scratch, but it didn't fix it.
Clicking the "Test connection" button in the cloud config responded with success.
The Kubernetes cluster is otherwise healthy, it is happily running pods of other systems.
Related output of failing builds:
[Pipeline] Start of Pipeline [Pipeline] echo Bringing up containers [jnlp:[ttyEnabled:false, image:jenkins-jnlp-slave:linux, alwaysPullImage:true, resourceRequestCpu:0.5, resourceLimitCpu:2, resourceRequestMemory:512Mi, resourceLimitMemory:2Gi]] [Pipeline] echo This is the overriden podTemplate, to collect slave info to Grafeas [Pipeline] podTemplate [Pipeline] { [Pipeline] withEnv [Pipeline] { [Pipeline] echo This is the overriden node jenkins-istvans-test-5, to collect slave hostname to Grafeas [Pipeline] nodeStill waiting to schedule task All nodes of label ‘jenkins-istvans-test-5’ are offline (...it is hanging here, nothing else is happening...)
System logs (org.csanchez.jenkins.plugins.kubernetes = ALL)
Jun 21, 2021 12:29:50 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision Label "jenkins-istvans-test-5" excess workload: 1, executors: 0 Jun 21, 2021 12:29:50 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision Template for label "jenkins-istvans-test-5": jenkins-istvans-test-5-p61lh Jun 21, 2021 12:29:50 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision In provisioning : [] Jun 21, 2021 12:29:50 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision Label "jenkins-istvans-test-5" excess workload: 1, executors: 0 Jun 21, 2021 12:29:50 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision Template for label "jenkins-istvans-test-5": jenkins-istvans-test-5-p61lh Jun 21, 2021 12:29:50 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision In provisioning : [] (...nothing else related to the job...)
I'm not really sure what other info I should give, please let me know if there is anything else I can gather when it happens again.
Do you have "concurrency limit" set on your cloud or on podTemplates?
If that's the case, it may be similar to my issue https://issues.jenkins.io/browse/JENKINS-66484. While in our case it mostly happen while Jenkins is running, it also happened at restart.