Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67664

KubernetesClientException: not ready after 5000 MILLISECONDS

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • kubernetes-plugin
    • None
    • Prod
    • Blue Ocean - Candidates
    • 3690.va_9ddf6635481

      We have 4 Jenkins servers in 4 AKS Clusters. 

      Af of sudden all Jenkins agent pods started giving Errors as below, Few pods are working and few are giving Errors.  This is Happening 1-2 times out of 4-5 attempts. 

      AKS Version : 1.20.13 

      Jenkins Version, which clusters is having different version. I can reproduce this Error in all versions. 

      AKS-1:

      • kubernetes:1.30.1
      • kubernetes-client-api:5.10.1-171.vaa0774fb8c20
      • kubernetes-credentials:0.8.0

      AKS-2:

      • kubernetes:1.31.3
      • kubernetes-client-api:5.11.2-182.v0f1cf4c5904e
      • kubernetes-credentials:0.9.0

      AKS-3:

      • kubernetes:1.30.1
      • kubernetes-client-api:5.10.1-171.vaa0774fb8c20
      • kubernetes-credentials:0.8.0

      AKS-4:

      • kubernetes:1.31.3
      • workflow-job:1145.v7f2433caa07f
      • workflow-aggregator:2.6
        21:00:49 io.fabric8.kubernetes.client.KubernetesClientException: not ready after 5000 MILLISECONDS*21:00:49* at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:176)21:00:49 at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.exec(PodOperationsImpl.java:322)21:00:49 at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.exec(PodOperationsImpl.java:84)21:00:49 at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:427)21:00:49 at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:344)21:00:49 at hudson.Launcher$ProcStarter.start(Launcher.java:507)21:00:49 at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:176)21:00:49 at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:132)21:00:49 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:324)21:00:49 at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:319)21:00:49 at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:193)21:00:49 at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)21:00:49 at jdk.internal.reflect.GeneratedMethodAccessor546.invoke(Unknown Source)21:00:49 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)21:00:49 at java.base/java.lang.reflect.Method.invoke(Method.java:566)21:00:49 at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)21:00:49 at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)21:00:49 at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)21:00:49 at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)21:00:49 at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)21:00:49 at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)21:00:49 at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)21:00:49 at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:163)21:00:49 at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)21:00:49 at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:158)21:00:49 at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:161)21:00:49 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:165)21:00:49 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)21:00:49 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)21:00:49 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)21:00:49 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)21:00:49 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)21:00:49 at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)21:00:49 at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)21:00:49 at WorkflowScript.run(WorkflowScript:63)21:00:49 at __cps.transform__(Native Method)21:00:49 at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:86)21:00:49 at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)21:00:49 at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)21:00:49 at jdk.internal.reflect.GeneratedMethodAccessor286.invoke(Unknown Source)21:00:49 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)21:00:49 at java.base/java.lang.reflect.Method.invoke(Method.java:566)21:00:49 at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)21:00:49 at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:107)21:00:49 at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)21:00:49 at jdk.internal.reflect.GeneratedMethodAccessor286.invoke(Unknown Source)21:00:49 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)21:00:49 at java.base/java.lang.reflect.Method.invoke(Method.java:566)21:00:49 at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)21:00:49 at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:89)21:00:49 at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)21:00:49 at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)21:00:49 at jdk.internal.reflect.GeneratedMethodAccessor286.invoke(Unknown Source)21:00:49 at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)21:00:49 at java.base/java.lang.reflect.Method.invoke(Method.java:566)21:00:49 at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)21:00:49 at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)21:00:49 at com.cloudbees.groovy.cps.Next.step(Next.java:83)21:00:49 at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)21:00:49 at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)21:00:49 at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)21:00:49 at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)21:00:49 at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)21:00:49 at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)21:00:49 at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)21:00:49 at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:185)21:00:49 at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:402)21:00:49 at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:96)21:00:49 at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:314)21:00:49 at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:278)21:00:49 at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)21:00:49 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)21:00:49 at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)21:00:49 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)21:00:49 at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)21:00:49 at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)21:00:49 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)21:00:49 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)21:00:49 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)21:00:49 at java.base/java.lang.Thread.run(Thread.java:829)21:00:49 [Bitbucket] Notifying commit build result*21:00:50* [Bitbucket] Build result notified*21:00:50* Finished: FAILURE

          [JENKINS-67664] KubernetesClientException: not ready after 5000 MILLISECONDS

          allan_burdajewicz thanks for your work. I will try this hpi in my project and let you know after the week if I still facing this issue.

          Artem Chernenko added a comment - allan_burdajewicz thanks for your work. I will try this hpi in my project and let you know after the week if I still facing this issue.

          allan_burdajewicz thank you very much for your PR, we are using the 3680 version for a few days now and we haven't seen any of those infamous 5000ms errors. You single-handedly solved a problem that was afflicting us for weeks and seriously hampering our work. Much appreciated!

          Alessandro Vozza added a comment - allan_burdajewicz  thank you very much for your PR, we are using the 3680 version for a few days now and we haven't seen any of those infamous 5000ms errors. You single-handedly solved a problem that was afflicting us for weeks and seriously hampering our work. Much appreciated!

          allan_burdajewicz We don't see an issue so far. Thank you. Will let you know if we face this issue.

          Artem Chernenko added a comment - allan_burdajewicz We don't see an issue so far. Thank you. Will let you know if we face this issue.

          Adam Placzek added a comment -

          allan_burdajewicz T error is still there, but the exponential backoff allows the pipeline to retry it, continue and finish successfully. Great stuff ! 
           

          Adam Placzek added a comment - allan_burdajewicz T error is still there, but the exponential backoff allows the pipeline to retry it, continue and finish successfully. Great stuff !   

          SparkC added a comment -

          aplaczek  Error is still there with latest Jenkins version and Latest Plugins versions.  i have tried all options. 

          tag: "2.346.2"

          installPlugins:

          • kubernetes:3663.v1c1e0ec5b_650
          • kubernetes-client-api:5.12.2-193.v26a_6078f65a_9
            06:29:10  io.fabric8.kubernetes.client.KubernetesClientException: not ready after 5000 MILLISECONDS
            06:29:10  	at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:181)
            06:29:10  	at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.exec(PodOperationsImpl.java:332)
            06:29:10  	at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.exec(PodOperationsImpl.java:85)
            06:29:10  	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:425)
            06:29:10  	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:328)
            06:29:10  	at hudson.Launcher$ProcStarter.start(Launcher.java:509) 

          SparkC added a comment - aplaczek   Error is still there with latest Jenkins version and Latest Plugins versions.  i have tried all options.  tag: "2.346.2" installPlugins: kubernetes:3663.v1c1e0ec5b_650 kubernetes-client-api:5.12.2-193.v26a_6078f65a_9 06:29:10 io.fabric8.kubernetes.client.KubernetesClientException: not ready after 5000 MILLISECONDS 06:29:10 at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:181) 06:29:10 at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.exec(PodOperationsImpl.java:332) 06:29:10 at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.exec(PodOperationsImpl.java:85) 06:29:10 at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:425) 06:29:10 at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:328) 06:29:10 at hudson.Launcher$ProcStarter.start(Launcher.java:509)

          Just an fyi, also posted on the PR for this.

          We have deployed the incremental build from PR on our environment, so far no issues observed.

          Jenkins version: `2.346.2`

          Kubernetes plugin version: `3680.va_31c13cda_9b_5`

          Jonathan Hardison added a comment - Just an fyi, also posted on the PR for this. We have deployed the incremental build from PR on our environment, so far no issues observed. Jenkins version: `2.346.2` Kubernetes plugin version: `3680.va_31c13cda_9b_5`

          We see reoccurrence of `Failed to start websocket connection: io.fabric8.kubernetes.client.KubernetesClientException: not ready after 5000 MILLISECONDS`, this time with the addition of the following message:

          Caused by: java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
              at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
              at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)

           

          Still running the 3680 version. We run long-running multi-pipeline builds nightly, we saw it once last week but last night it was fine, so it's really again an hit&miss bug.

          Alessandro Vozza added a comment - We see reoccurrence of `Failed to start websocket connection: io.fabric8.kubernetes.client.KubernetesClientException: not ready after 5000 MILLISECONDS`, this time with the addition of the following message: Caused by: java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'     at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)     at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)   Still running the 3680 version. We run long-running multi-pipeline builds nightly, we saw it once last week but last night it was fine, so it's really again an hit&miss bug.

          Olexandr Shamin added a comment - - edited

          is there problem still exists? As we are still using single container solution.

          Olexandr Shamin added a comment - - edited is there problem still exists? As we are still using single container solution.

          oshamin The sporadic 500s and timeout can still happen yes. This by design due to the fragility of the exec API used by the kubernetes plugin when launching commands in non jnlp containers. But a retry mechanism has been implemented to improve stability of the builds. That should be suitable in most cases. The single container solution still has its benefit of relying less on the K8s REST API.

          Allan BURDAJEWICZ added a comment - oshamin The sporadic 500s and timeout can still happen yes. This by design due to the fragility of the exec API used by the kubernetes plugin when launching commands in non jnlp containers. But a retry mechanism has been implemented to improve stability of the builds. That should be suitable in most cases. The single container solution still has its benefit of relying less on the K8s REST API.

          allan_burdajewicz, thank you for the feedback.

          Olexandr Shamin added a comment - allan_burdajewicz , thank you for the feedback.

            allan_burdajewicz Allan BURDAJEWICZ
            rohithg534 SparkC
            Votes:
            15 Vote for this issue
            Watchers:
            38 Start watching this issue

              Created:
              Updated:
              Resolved: