Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-71796

Some times the kubernetes just stops creating agents.

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • kubernetes-plugin
    • None
    • Kubernetes 1.23, Ubuntu 20.04, Jenkins 2.401.2

      Sometimes we notice that the kubernetes plugin stops create agents.

      We regularly delete old agents that has been running for a wile. When new jobs starts the plugin will create new agents up to the max limit. But for some reason this creation sometime stops and we are stuck with a limited number of agents.

          [JENKINS-71796] Some times the kubernetes just stops creating agents.

          dionj is that a response from the kubernetes api or did you get that output by describing a pending pod? 

          Sigi Kiermayer added a comment - dionj is that a response from the kubernetes api or did you get that output by describing a pending pod? 

          Dion added a comment - - edited

          siegfried describing the pending pod.

           

          I managed to sanitize all the verbose logs, removing the extraneous information and focusing on the single pod: sanitized_reproduction.log

          I'll boil it down further to what I think is the main points here:

          Node successfully provisioned and Pod created:

          2023-10-10 09:54:20.576+0000 [id=34]  INFO  h.s.NodeProvisioner$StandardStrategyImpl#apply: Started provisioning REDACTED from REDACTED with 1 executors. Remaining excess workload: -02023-10-10 09:54:20.576+0000 [id=34]  FINER hudson.slaves.NodeProvisioner#update: Provisioning strategy hudson.slaves.NodeProvisioner$StandardStrategyImpl@3981c444 declared provisioning complete
          2023-10-10 09:54:20.581+0000 [id=162189]  FINEST  o.c.j.p.k.KubernetesCloud#connect: Building connection to Kubernetes REDACTED URL null namespace REDACTED
          2023-10-10 09:54:20.581+0000 [id=162189]  FINE  o.c.j.p.k.KubernetesCloud#connect: Connected to Kubernetes REDACTED URL https://172.20.0.1:443/ namespace REDACTED
          2023-10-10 09:54:20.587+0000 [id=34]  INFO  hudson.slaves.NodeProvisioner#update: REDACTED provisioning successfully completed. We have now 4 computer(s)
          2023-10-10 09:54:20.589+0000 [id=162189]  FINE  o.c.j.p.k.KubernetesLauncher#launch: Creating Pod: REDACTED REDACTED/REDACTED
          (100x) 2023-10-10 09:54:20.601+0000 [id=34] FINER  hudson.slaves.NodeProvisioner#update: ran update on REDACTED in 0ms
          2023-10-10 09:54:20.850+0000 [id=162189]  INFO  o.c.j.p.k.KubernetesLauncher#launch: Created Pod: REDACTED REDACTED/REDACTED

          At this time I check Kubernetes to see the pod and I can see that it's in a Pending status, but is no longer retrying as the container is in a "Waiting" state with reason of `CreateContainerConfigError` after failing to pull a secret that does not exist in the namespace.

          Pod is deleted after passing the 5 minute connection timeout

           

          2023-10-10 10:00:42.387+0000 [id=167889] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent REDACTED
          2023-10-10 10:00:42.402+0000 [id=167962] INFO o.internal.platform.Platform#log: --> DELETE https://172.20.0.1/api/v1/namespaces/REDACTED/pods/REDACTED h2
          2023-10-10 10:00:42.402+0000 [id=167962] INFO o.internal.platform.Platform#log: Authorization: Bearer REDACTED
          2023-10-10 10:00:42.402+0000 [id=167962] INFO o.internal.platform.Platform#log: User-Agent: fabric8-kubernetes-client/6.4.1
          2023-10-10 10:00:42.402+0000 [id=167962] INFO o.internal.platform.Platform#log: Content-Type: application/json; charset=utf-8
          2023-10-10 10:00:42.402+0000 [id=167962] INFO o.internal.platform.Platform#log: Content-Length: 75
          2023-10-10 10:00:42.403+0000 [id=167962] INFO o.internal.platform.Platform#log: Host: 172.20.0.1
          2023-10-10 10:00:42.403+0000 [id=167962] INFO o.internal.platform.Platform#log: Connection: Keep-Alive
          2023-10-10 10:00:42.403+0000 [id=167962] INFO o.internal.platform.Platform#log: Accept-Encoding: gzip
          2023-10-10 10:00:42.403+0000 [id=167962] INFO o.internal.platform.Platform#log:
          2023-10-10 10:00:42.403+0000 [id=167962] INFO o.internal.platform.Platform#log: {"apiVersion":"v1","kind":"DeleteOptions","propagationPolicy":"Background"}
          2023-10-10 10:00:42.403+0000 [id=167962] INFO o.internal.platform.Platform#log: --> END DELETE (75-byte body)
          2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: <-- 200 https://172.20.0.1/api/v1/namespaces/REDACTED/pods/REDACTED (54ms)
          2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: audit-id: acf7caf0-7f7d-44be-951f-fbf599cbde5c
          2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: cache-control: no-cache, private
          2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: content-type: application/json
          2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: x-kubernetes-pf-flowschema-uid: dec317e9-e558-46c2-bfb7-ce848aaccd93
          2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: x-kubernetes-pf-prioritylevel-uid: d25f04bc-e0f0-446b-bff0-210a6f4bd563
          2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: date: Tue, 10 Oct 2023 10:00:42 GMT
          2023-10-10 10:00:42.458+0000 [id=167962] INFO o.internal.platform.Platform#log: <-- END HTTP (16645-byte body)
          2023-10-10 10:00:42.459+0000 [id=167889] INFO o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent REDACTED/REDACTED
          2023-10-10 10:00:42.459+0000 [id=167889] INFO o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer REDACTED
          2023-10-10 10:00:42.461+0000 [id=165373] INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Jetty (winstone)-165373 for REDACTED terminated: java.nio.channels.ClosedChannelException 

          The Jenkins console log does not indicate anything at this point, continuing to wait.

          During this time, several builds tried to run and got caught in the queue.

          And then the 15 min read timeout finally being hit.

          2023-10-10 10:11:00.852+0000 [id=162189]  FINER o.c.j.p.k.KubernetesLauncher#launch: Removing Jenkins node: REDACTED
          2023-10-10 10:11:00.852+0000 [id=162189]  INFO  o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent REDACTED
          2023-10-10 10:11:00.852+0000 [id=162189]  FINEST  o.c.j.p.k.KubernetesCloud#connect: Building connection to Kubernetes REDACTED URL null namespace REDACTED
          2023-10-10 10:11:00.852+0000 [id=162189]  FINE  o.c.j.p.k.KubernetesFactoryAdapter#createClient: Autoconfiguring Kubernetes client
          2023-10-10 10:11:00.852+0000 [id=162189]  FINE  o.c.j.p.k.KubernetesFactoryAdapter#createClient: Creating Kubernetes client: KubernetesFactoryAdapter [serviceAddress=null, namespace=REDACTED, caCertData=null, credentials=null, skipTlsVerify=false, connectTimeout=5, readTimeout=15]
          2023-10-10 10:11:00.852+0000 [id=162189]  FINE  o.c.j.p.k.KubernetesFactoryAdapter#createClient: Proxy Settings for Cloud: false
          2023-10-10 10:11:00.859+0000 [id=162189]  FINE  o.c.j.p.k.KubernetesClientProvider#createClient: Created new Kubernetes client: REDACTED io.fabric8.kubernetes.client.impl.KubernetesClientImpl@67588030
          2023-10-10 10:11:00.859+0000 [id=162189]  FINE  o.c.j.p.k.KubernetesCloud#connect: Connected to Kubernetes REDACTED URL https://172.20.0.1:443/ namespace REDACTED
          2023-10-10 10:11:00.859+0000 [id=162189]  SEVERE  o.c.j.p.k.KubernetesSlave#_terminate: Computer for agent is null: REDACTED
          2023-10-10 10:11:00.859+0000 [id=162189]  INFO  hudson.slaves.AbstractCloudSlave#terminate: FATAL: Computer for agent is null: REDACTED 

          After this, the logs explode and all the queued builds get launched.

          Dion added a comment - - edited siegfried describing the pending pod.   I managed to sanitize all the verbose logs, removing the extraneous information and focusing on the single pod: sanitized_reproduction.log I'll boil it down further to what I think is the main points here: Node successfully provisioned and Pod created: 2023-10-10 09:54:20.576+0000 [id=34] INFO h.s.NodeProvisioner$StandardStrategyImpl#apply: Started provisioning REDACTED from REDACTED with 1 executors. Remaining excess workload: -02023-10-10 09:54:20.576+0000 [id=34] FINER hudson.slaves.NodeProvisioner#update: Provisioning strategy hudson.slaves.NodeProvisioner$StandardStrategyImpl@3981c444 declared provisioning complete 2023-10-10 09:54:20.581+0000 [id=162189] FINEST o.c.j.p.k.KubernetesCloud#connect: Building connection to Kubernetes REDACTED URL null namespace REDACTED 2023-10-10 09:54:20.581+0000 [id=162189] FINE o.c.j.p.k.KubernetesCloud#connect: Connected to Kubernetes REDACTED URL https: //172.20.0.1:443/ namespace REDACTED 2023-10-10 09:54:20.587+0000 [id=34] INFO hudson.slaves.NodeProvisioner#update: REDACTED provisioning successfully completed. We have now 4 computer(s) 2023-10-10 09:54:20.589+0000 [id=162189] FINE o.c.j.p.k.KubernetesLauncher#launch: Creating Pod: REDACTED REDACTED/REDACTED (100x) 2023-10-10 09:54:20.601+0000 [id=34] FINER hudson.slaves.NodeProvisioner#update: ran update on REDACTED in 0ms 2023-10-10 09:54:20.850+0000 [id=162189] INFO o.c.j.p.k.KubernetesLauncher#launch: Created Pod: REDACTED REDACTED/REDACTED At this time I check Kubernetes to see the pod and I can see that it's in a Pending status, but is no longer retrying as the container is in a "Waiting" state with reason of `CreateContainerConfigError` after failing to pull a secret that does not exist in the namespace. Pod is deleted after passing the 5 minute connection timeout   2023-10-10 10:00:42.387+0000 [id=167889] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent REDACTED 2023-10-10 10:00:42.402+0000 [id=167962] INFO o.internal.platform.Platform#log: --> DELETE https: //172.20.0.1/api/v1/namespaces/REDACTED/pods/REDACTED h2 2023-10-10 10:00:42.402+0000 [id=167962] INFO o.internal.platform.Platform#log: Authorization: Bearer REDACTED 2023-10-10 10:00:42.402+0000 [id=167962] INFO o.internal.platform.Platform#log: User-Agent: fabric8-kubernetes-client/6.4.1 2023-10-10 10:00:42.402+0000 [id=167962] INFO o.internal.platform.Platform#log: Content-Type: application/json; charset=utf-8 2023-10-10 10:00:42.402+0000 [id=167962] INFO o.internal.platform.Platform#log: Content-Length: 75 2023-10-10 10:00:42.403+0000 [id=167962] INFO o.internal.platform.Platform#log: Host: 172.20.0.1 2023-10-10 10:00:42.403+0000 [id=167962] INFO o.internal.platform.Platform#log: Connection: Keep-Alive 2023-10-10 10:00:42.403+0000 [id=167962] INFO o.internal.platform.Platform#log: Accept-Encoding: gzip 2023-10-10 10:00:42.403+0000 [id=167962] INFO o.internal.platform.Platform#log: 2023-10-10 10:00:42.403+0000 [id=167962] INFO o.internal.platform.Platform#log: { "apiVersion" : "v1" , "kind" : "DeleteOptions" , "propagationPolicy" : "Background" } 2023-10-10 10:00:42.403+0000 [id=167962] INFO o.internal.platform.Platform#log: --> END DELETE (75- byte body) 2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: <-- 200 https: //172.20.0.1/api/v1/namespaces/REDACTED/pods/REDACTED (54ms) 2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: audit-id: acf7caf0-7f7d-44be-951f-fbf599cbde5c 2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: cache-control: no-cache, private 2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: content-type: application/json 2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: x-kubernetes-pf-flowschema-uid: dec317e9-e558-46c2-bfb7-ce848aaccd93 2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: x-kubernetes-pf-prioritylevel-uid: d25f04bc-e0f0-446b-bff0-210a6f4bd563 2023-10-10 10:00:42.457+0000 [id=167962] INFO o.internal.platform.Platform#log: date: Tue, 10 Oct 2023 10:00:42 GMT 2023-10-10 10:00:42.458+0000 [id=167962] INFO o.internal.platform.Platform#log: <-- END HTTP (16645- byte body) 2023-10-10 10:00:42.459+0000 [id=167889] INFO o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent REDACTED/REDACTED 2023-10-10 10:00:42.459+0000 [id=167889] INFO o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer REDACTED 2023-10-10 10:00:42.461+0000 [id=165373] INFO j.s.DefaultJnlpSlaveReceiver#channelClosed: Jetty (winstone)-165373 for REDACTED terminated: java.nio.channels.ClosedChannelException The Jenkins console log does not indicate anything at this point, continuing to wait. During this time, several builds tried to run and got caught in the queue. And then the 15 min read timeout finally being hit. 2023-10-10 10:11:00.852+0000 [id=162189] FINER o.c.j.p.k.KubernetesLauncher#launch: Removing Jenkins node: REDACTED 2023-10-10 10:11:00.852+0000 [id=162189] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent REDACTED 2023-10-10 10:11:00.852+0000 [id=162189] FINEST o.c.j.p.k.KubernetesCloud#connect: Building connection to Kubernetes REDACTED URL null namespace REDACTED 2023-10-10 10:11:00.852+0000 [id=162189] FINE o.c.j.p.k.KubernetesFactoryAdapter#createClient: Autoconfiguring Kubernetes client 2023-10-10 10:11:00.852+0000 [id=162189] FINE o.c.j.p.k.KubernetesFactoryAdapter#createClient: Creating Kubernetes client: KubernetesFactoryAdapter [serviceAddress= null , namespace=REDACTED, caCertData= null , credentials= null , skipTlsVerify= false , connectTimeout=5, readTimeout=15] 2023-10-10 10:11:00.852+0000 [id=162189] FINE o.c.j.p.k.KubernetesFactoryAdapter#createClient: Proxy Settings for Cloud: false 2023-10-10 10:11:00.859+0000 [id=162189] FINE o.c.j.p.k.KubernetesClientProvider#createClient: Created new Kubernetes client: REDACTED io.fabric8.kubernetes.client.impl.KubernetesClientImpl@67588030 2023-10-10 10:11:00.859+0000 [id=162189] FINE o.c.j.p.k.KubernetesCloud#connect: Connected to Kubernetes REDACTED URL https: //172.20.0.1:443/ namespace REDACTED 2023-10-10 10:11:00.859+0000 [id=162189] SEVERE o.c.j.p.k.KubernetesSlave#_terminate: Computer for agent is null : REDACTED 2023-10-10 10:11:00.859+0000 [id=162189] INFO hudson.slaves.AbstractCloudSlave#terminate: FATAL: Computer for agent is null : REDACTED After this, the logs explode and all the queued builds get launched.

          Amit added a comment - - edited

          dionj   siegfried 

          Experiencing the same problem.

          •  Kubernetes : 1.27
          •  Jenkins Version: 2.414.2
          •  Kubernetes Plugin: 4054.v2da_8e2794884
          •  Kubernetes client API: 6.8.1-224.vd388fca_4db_3b_

           

           I managed to capture logs with the following loggers set to 'ALL'

          • io.fabric8.kubernetes

           
           I was able to locate the exact stack trace which kicks in when the job is stuck in below stage

          Still waiting to schedule task
          All nodes of label ‘REDACTED’ are offline

           
           Based on the below trace , looks like the dispatcher was shut down
           

          Trying to configure client from Kubernetes config...
          Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryKubeConfig
          Did not find Kubernetes config at: [/var/jenkins_home/.kube/config]. Ignoring.
          Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryServiceAccount
          Trying to configure client from service account...
          Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryServiceAccount
          Found service account host and port: 172.20.0.1:443
          Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryServiceAccount
          Found service account ca cert at: [/var/run/secrets/kubernetes.io/serviceaccount/ca.crt}].
          Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryServiceAccount
          Found service account token at: [/var/run/secrets/kubernetes.io/serviceaccount/token].
          Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryNamespaceFromPath
          Trying to configure client namespace from Kubernetes service account namespace path...
          Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryNamespaceFromPath
          Found service account namespace at: [/var/run/secrets/kubernetes.io/serviceaccount/namespace].
          Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.utils.HttpClientUtils getHttpClientFactory
          Using httpclient io.fabric8.kubernetes.client.okhttp.OkHttpClientFactory factory
          Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl close
          Shutting down dispatcher okhttp3.Dispatcher@2effffe9 at the following call stack: 
          	at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.close(OkHttpClientImpl.java:255)
          	at io.fabric8.kubernetes.client.impl.BaseClient.close(BaseClient.java:139)
          	at org.csanchez.jenkins.plugins.kubernetes.PodTemplateUtils.parseFromYaml(PodTemplateUtils.java:627)
          	at org.csanchez.jenkins.plugins.kubernetes.PodTemplateUtils.validateYamlContainerNames(PodTemplateUtils.java:683)
          	at org.csanchez.jenkins.plugins.kubernetes.PodTemplateUtils.validateYamlContainerNames(PodTemplateUtils.java:673)
          	at org.csanchez.jenkins.plugins.kubernetes.pipeline.PodTemplateStepExecution.start(PodTemplateStepExecution.java:145)
          	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:323)
          	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:196)
          	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:124)
          	at jdk.internal.reflect.GeneratedMethodAccessor1666.invoke(Unknown Source)
          	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
          	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
          	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
          	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
          	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1034)
          	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:41)
          	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
          	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
          	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:180)
          	at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
          	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:163)
          	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:148)
          	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:178)
          	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:182)
          	at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
          	at org.jenkinsci.plugins.workflow.cps.LoggingInvoker.methodCall(LoggingInvoker.java:105)
          	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:90)
          	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:116)
          	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:85)
          	at jdk.internal.reflect.GeneratedMethodAccessor165.invoke(Unknown Source)
          	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
          	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
          	at com.cloudbees.groovy.cps.impl.ClosureBlock.eval(ClosureBlock.java:46)
          	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
          	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:152)
          	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:146)
          	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136)
          	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275)
          	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
          	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
          	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
          	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187)
          	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:423)
          	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:331)
          	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:295)
          	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:97)
          	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
          	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
          	at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
          	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
          	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          	at java.base/java.lang.Thread.run(Thread.java:829)
          
          

           
           
           The pods are provisioned only after 15 minutes, and I see the below statement for resync
           
           

          Oct 20, 2023 9:01:07 PM FINE io.fabric8.kubernetes.client.informers.impl.DefaultSharedIndexInformer start
          Ready to run resync and reflector for v1/namespaces/tooling-jenkins/pods with resync 0
          Oct 20, 2023 9:01:07 PM FINE io.fabric8.kubernetes.client.informers.impl.DefaultSharedIndexInformer scheduleResync
          Resync skipped due to 0 full resync period for v1/namespaces/tooling-jenkins/pods
          Oct 20, 2023 9:01:07 PM FINEST io.fabric8.kubernetes.client.http.HttpLoggingInterceptor$HttpLogger logStart
          

          Amit added a comment - - edited dionj   siegfried   Experiencing the same problem.  Kubernetes : 1.27  Jenkins Version: 2.414.2  Kubernetes Plugin: 4054.v2da_8e2794884  Kubernetes client API: 6.8.1-224.vd388fca_4db_3b_    I managed to capture logs with the following loggers set to 'ALL' io.fabric8.kubernetes    I was able to locate the exact stack trace which kicks in when the job is stuck in below stage Still waiting to schedule task All nodes of label ‘REDACTED’ are offline    Based on the below trace , looks like the dispatcher was shut down   Trying to configure client from Kubernetes config... Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryKubeConfig Did not find Kubernetes config at: [/ var /jenkins_home/.kube/config]. Ignoring. Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryServiceAccount Trying to configure client from service account... Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryServiceAccount Found service account host and port: 172.20.0.1:443 Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryServiceAccount Found service account ca cert at: [/ var /run/secrets/kubernetes.io/serviceaccount/ca.crt}]. Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryServiceAccount Found service account token at: [/ var /run/secrets/kubernetes.io/serviceaccount/token]. Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryNamespaceFromPath Trying to configure client namespace from Kubernetes service account namespace path... Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.Config tryNamespaceFromPath Found service account namespace at: [/ var /run/secrets/kubernetes.io/serviceaccount/namespace]. Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.utils.HttpClientUtils getHttpClientFactory Using httpclient io.fabric8.kubernetes.client.okhttp.OkHttpClientFactory factory Oct 20, 2023 8:35:30 PM FINE io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl close Shutting down dispatcher okhttp3.Dispatcher@2effffe9 at the following call stack: at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.close(OkHttpClientImpl.java:255) at io.fabric8.kubernetes.client.impl.BaseClient.close(BaseClient.java:139) at org.csanchez.jenkins.plugins.kubernetes.PodTemplateUtils.parseFromYaml(PodTemplateUtils.java:627) at org.csanchez.jenkins.plugins.kubernetes.PodTemplateUtils.validateYamlContainerNames(PodTemplateUtils.java:683) at org.csanchez.jenkins.plugins.kubernetes.PodTemplateUtils.validateYamlContainerNames(PodTemplateUtils.java:673) at org.csanchez.jenkins.plugins.kubernetes.pipeline.PodTemplateStepExecution.start(PodTemplateStepExecution.java:145) at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:323) at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:196) at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:124) at jdk.internal.reflect.GeneratedMethodAccessor1666.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1034) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:41) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:180) at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23) at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:163) at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:148) at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:178) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:182) at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17) at org.jenkinsci.plugins.workflow.cps.LoggingInvoker.methodCall(LoggingInvoker.java:105) at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:90) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:116) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:85) at jdk.internal.reflect.GeneratedMethodAccessor165.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) at com.cloudbees.groovy.cps.impl.ClosureBlock.eval(ClosureBlock.java:46) at com.cloudbees.groovy.cps.Next.step(Next.java:83) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:152) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:146) at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136) at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275) at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51) at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:423) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:331) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:295) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:97) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang. Thread .run( Thread .java:829)      The pods are provisioned only after 15 minutes, and I see the below statement for resync     Oct 20, 2023 9:01:07 PM FINE io.fabric8.kubernetes.client.informers.impl.DefaultSharedIndexInformer start Ready to run resync and reflector for v1/namespaces/tooling-jenkins/pods with resync 0 Oct 20, 2023 9:01:07 PM FINE io.fabric8.kubernetes.client.informers.impl.DefaultSharedIndexInformer scheduleResync Resync skipped due to 0 full resync period for v1/namespaces/tooling-jenkins/pods Oct 20, 2023 9:01:07 PM FINEST io.fabric8.kubernetes.client.http.HttpLoggingInterceptor$HttpLogger logStart

          Yael added a comment -

          Also experiencing this issue. 

          Occasionally, Jenkins will timeout after attempting to create the pod and the console logs looks like:

           Still waiting to schedule task
          All nodes of label ‘X’ are offline
          

           Or:

           Still waiting to schedule task
           `Jenkins` does not have label `X`
          

          When X is the label name.

          After some time, Jenkins eventually time out or we have to abort. 

          Yael added a comment - Also experiencing this issue.  GKE version 1.27.3-gke.100 Jenkins version 2.4111 Kubernetes plugin  4054.v2da_8e2794884 Kubernetes Client API plugin 6.8.1-224.vd388fca_4db_3b_ Occasionally, Jenkins will timeout after attempting to create the pod and the console logs looks like: Still waiting to schedule task All nodes of label ‘X’ are offline  Or: Still waiting to schedule task `Jenkins` does not have label `X` When X is the label name. After some time, Jenkins eventually time out or we have to abort. 

          B added a comment -

          We also saw this issue when trying to update from version 3937.vd7b_82db_e347b_ to kubernetes:4054.v2da_8e2794884 so I think the issue was introduced somewhere in between.

          Seeing similar issues to dionj where pods just are not created in the k8s API for up to 15-20 minutes in some cases. It seems to be somewhat sporadic but might be when Jenkins is under higher load?

          B added a comment - We also saw this issue when trying to update from version 3937.vd7b_82db_e347b_ to kubernetes:4054.v2da_8e2794884 so I think the issue was introduced somewhere in between. Seeing similar issues to dionj where pods just are not created in the k8s API for up to 15-20 minutes in some cases. It seems to be somewhat sporadic but might be when Jenkins is under higher load?

          Robyn added a comment -

          We are also experiencing this issue. In our case a majority of our pods would not even get created. We found that pods that used yamlMergeStrategy were the pods that hit the issue most of the time.
          We had upgrade to LTS 2.414.3 with the following plugins:

          kubernetes:4054.v2da_8e2794884
          kubernetes-client-api:6.8.1-224.vd388fca_4db_3b_
          kubernetes-credentials:0.11
          snakeyaml-api:2.2-111.vc6598e30cc65

          The only way we were able to get back into a working state was downgrade the plugins to the following versions:
          kubernetes:4007.v633279962016
          kubernetes-client-api:6.4.1-215.v2ed17097a_8e9

          kubernetes-credentials:0.10.0
          snakeyaml-api:1.33-95.va_b_a_e3e47b_fa_4

          Robyn added a comment - We are also experiencing this issue. In our case a majority of our pods would not even get created. We found that pods that used yamlMergeStrategy were the pods that hit the issue most of the time. We had upgrade to LTS 2.414.3 with the following plugins: kubernetes:4054.v2da_8e2794884 kubernetes-client-api:6.8.1-224.vd388fca_4db_3b_ kubernetes-credentials:0.11 snakeyaml-api:2.2-111.vc6598e30cc65 The only way we were able to get back into a working state was downgrade the plugins to the following versions: kubernetes:4007.v633279962016 kubernetes-client-api:6.4.1-215.v2ed17097a_8e9 kubernetes-credentials:0.10.0 snakeyaml-api:1.33-95.va_b_a_e3e47b_fa_4

          Robyn added a comment - - edited

          I was wondering if there is any updates here. This is preventing us from update a bunch of plugins we use, and that need to be upgraded due to security issues as well as other issues.

          Robyn added a comment - - edited I was wondering if there is any updates here. This is preventing us from update a bunch of plugins we use, and that need to be upgraded due to security issues as well as other issues.

          At least from our side, while we have seen this issue very sporadically, we don't have this issue in any way blocking us.

          Sigi Kiermayer added a comment - At least from our side, while we have seen this issue very sporadically, we don't have this issue in any way blocking us.

          Ofir added a comment -

          I agree with rsndv this issue is blocking us from upgrading Jenkins (2.387.2) and its plugins.

          From my side, the issue occurred when Jenkins was loaded (running 100-300 concurrent jobs) and consumed lots of memory.

          Ofir added a comment - I agree with rsndv this issue is blocking us from upgrading Jenkins ( 2.387.2 ) and its plugins. From my side, the issue occurred when Jenkins was loaded (running 100-300 concurrent jobs) and consumed lots of memory.

          Ofir added a comment -

          Hi guys,

          Following some tests and coredumps we made when we faced the issue we noticed that Jenkins tried to execute jobs on offline/non-existent pods (JNLP agent) which we suspect is the root cause of this issue.

          Following this Beta release https://plugins.jenkins.io/kubernetes/#plugin-content-garbage-collection-beta a new GB mechanism was implemented to clean "left behind" agents and may resolve this issue.

          Did someone had a chance to test it?

           

           

          Ofir added a comment - Hi guys, Following some tests and coredumps we made when we faced the issue we noticed that Jenkins tried to execute jobs on offline/non-existent pods (JNLP agent) which we suspect is the root cause of this issue. Following this Beta release https://plugins.jenkins.io/kubernetes/#plugin-content-garbage-collection-beta a new GB mechanism was implemented to clean "left behind" agents and may resolve this issue. Did someone had a chance to test it?    

            Unassigned Unassigned
            bildrulle Lars Berntzon
            Votes:
            10 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated: