After Jenkins restart kubernetes agents cannot be provisioned

This issue is archived. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      Short version:

      After restarting a Jenkins instance it is unable to provision Kubernetes based agents anymore; after restarting it again it is able to do so again.

      Long version:

      We've seen this happening a couple of weeks ago, and now again. It happens to a number of Jenkinses we run, one of them having the versions in the "environment" fields, some a few versions older. We have many other Jenkinses too that don't have this issue.

      Every Saturday we restart the Jenkins instances. Before that Kubernetes based dynamic agents were working correctly. After the Saturday restart they stopped working, builds weren't able to start new agents. So now (Monday) we restarted Jenkins and now they work perfectly again.

      Before the restart I tried to recreate the cluster config from scratch, but it didn't fix it.

      Clicking the "Test connection" button in the cloud config responded with success.

      The Kubernetes cluster is otherwise healthy, it is happily running pods of other systems.

      Related output of failing builds:

      [Pipeline] Start of Pipeline
      [Pipeline] echo
      Bringing up containers [jnlp:[ttyEnabled:false, image:jenkins-jnlp-slave:linux, alwaysPullImage:true, resourceRequestCpu:0.5, resourceLimitCpu:2, resourceRequestMemory:512Mi, resourceLimitMemory:2Gi]]
      [Pipeline] echo
      This is the overriden podTemplate, to collect slave info to Grafeas
      [Pipeline] podTemplate
      [Pipeline] {
      [Pipeline] withEnv
      [Pipeline] {
      [Pipeline] echo
      This is the overriden node jenkins-istvans-test-5, to collect slave hostname to Grafeas
      [Pipeline] nodeStill waiting to schedule task
      All nodes of label ‘jenkins-istvans-test-5’ are offline
      
      (...it is hanging here, nothing else is happening...)
      

      System logs (org.csanchez.jenkins.plugins.kubernetes = ALL)

      Jun 21, 2021 12:29:50 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
      Label "jenkins-istvans-test-5" excess workload: 1, executors: 0
      Jun 21, 2021 12:29:50 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
      Template for label "jenkins-istvans-test-5": jenkins-istvans-test-5-p61lh
      Jun 21, 2021 12:29:50 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
      In provisioning : []
      Jun 21, 2021 12:29:50 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
      Label "jenkins-istvans-test-5" excess workload: 1, executors: 0
      Jun 21, 2021 12:29:50 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
      Template for label "jenkins-istvans-test-5": jenkins-istvans-test-5-p61lh
      Jun 21, 2021 12:29:50 PM FINE org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
      In provisioning : []
      
      (...nothing else related to the job...)

      I'm not really sure what other info I should give, please let me know if there is anything else I can gather when it happens again.

       

       

            Assignee:
            Unassigned
            Reporter:
            Istvan Szekeres
            Archiver:
            Jenkins Service Account

              Created:
              Updated:
              Archived: