Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64905

Pipeline jobs fail to connect to agents after jenkins controller restart

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • kubernetes-plugin
    • None

      Environment

      Jenkins 2.280

      Kubernetes plugin 1.29.0

      Helm chart installation using the official chart (https://github.com/jenkinsci/helm-charts).

      Symptoms

      After Jenkins controller restarts, Pipeline jobs cannot run on agent nodes inside Kubernetes. 

      java.lang.IllegalStateException: Not expecting pod template to be null at this point
      	at org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave.getTemplate(KubernetesSlave.java:92)
      	at org.csanchez.jenkins.plugins.kubernetes.pipeline.SecretsMasker$Factory.secretsOf(SecretsMasker.java:144)
      	at org.csanchez.jenkins.plugins.kubernetes.pipeline.SecretsMasker$Factory.get(SecretsMasker.java:122)
      	at org.csanchez.jenkins.plugins.kubernetes.pipeline.SecretsMasker$Factory.get(SecretsMasker.java:94)
      	at org.jenkinsci.plugins.workflow.steps.DynamicContext$Typed.get(DynamicContext.java:94)
      	at org.jenkinsci.plugins.workflow.cps.ContextVariableSet.get(ContextVariableSet.java:139)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.getContextVariable(CpsThread.java:135)
      	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:297)
      	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:75)
      	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.getListener(DefaultStepContext.java:127)
      	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:79)
      	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:258)
      	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:193)
      	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)
      	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:48)
      	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
      	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
      	at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.evaluateStage(ModelInterpreter.groovy:240)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.evaluateSequentialStages(ModelInterpreter.groovy:172)
      	at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2030)
      	at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2015)
      	at com.cloudbees.groovy.cps.CpsDefaultGroovyMethods.each(CpsDefaultGroovyMethods:2056)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.evaluateSequentialStages(ModelInterpreter.groovy:157)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.call(ModelInterpreter.groovy:84)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.toolsBlock(ModelInterpreter.groovy:544)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.toolsBlock(ModelInterpreter.groovy:543)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.call(ModelInterpreter.groovy:83)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.inWrappers(ModelInterpreter.groovy:613)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.inWrappers(ModelInterpreter.groovy:612)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.call(ModelInterpreter.groovy:79)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.withEnvBlock(ModelInterpreter.groovy:443)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.withEnvBlock(ModelInterpreter.groovy:442)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.call(ModelInterpreter.groovy:78)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.withCredentialsBlock(ModelInterpreter.groovy:481)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.withCredentialsBlock(ModelInterpreter.groovy:480)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.call(ModelInterpreter.groovy:77)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.ModelInterpreter.inDeclarativeAgent(ModelInterpreter.groovy:590)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.agent.impl.AnyScript.run(AnyScript.groovy:43)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.agent.CheckoutScript.checkoutAndRun(CheckoutScript.groovy:64)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.agent.CheckoutScript.doCheckout(CheckoutScript.groovy:40)
      	at org.jenkinsci.plugins.pipeline.modeldefinition.agent.impl.LabelScript.run(LabelScript.groovy:43)
      	at ___cps.transform___(Native Method)
      	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:86)
      	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)
      	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)
      	at sun.reflect.GeneratedMethodAccessor305.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
      	at com.cloudbees.groovy.cps.impl.ClosureBlock.eval(ClosureBlock.java:46)
      	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
      	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)
      	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
      	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
      	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
      	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:185)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:400)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:96)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:312)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:276)
      	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
      	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      During the restart agent pods disconnect, but quickly reconnect as soon as the controller pod comes back online. 

       java.net.ConnectException: Connection refused (Connection refused)                                                                                                                                                
           at java.net.PlainSocketImpl.socketConnect(Native Method)                                                                                                                                                      
           at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)                                                                                                                               
           at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)                                                                                                                        
           at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)                                                                                                                                 
           at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)                                                                                                                                                 
           at java.net.Socket.connect(Socket.java:607)                                                                                                                                                                   
           at java.net.Socket.connect(Socket.java:556)                                                                                                                                                                   
           at sun.net.NetworkClient.doConnect(NetworkClient.java:180)                                                                                                                                                    
           at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)                                                                                                                                                
           at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)                                                                                                                                                
           at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)                                                                                                                                                    
           at sun.net.www.http.HttpClient.New(HttpClient.java:339)                                                                                                                                                       
           at sun.net.www.http.HttpClient.New(HttpClient.java:357)                                                                                                                                                       
           at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)                                                                                                                  
           at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)                                                                                                                     
           at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)                                                                                                                      
           at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)                                                                                                                            
           at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1570)                                                                                                                   
           at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)                                                                                                                    
           at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)                                                                                                                                     
           at hudson.remoting.Engine.runWebSocket(Engine.java:640)                                                                                                                                                       
           at hudson.remoting.Engine.run(Engine.java:470)                                                                                                                                                                
       Feb 19, 2021 12:46:05 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1 onReconnect                                                                                               
       INFO: Restarting agent via jenkins.slaves.restarter.UnixSlaveRestarter@54fd030b                                                                                                                                   
       Feb 19, 2021 12:46:08 PM hudson.remoting.jnlp.Main createEngine                                                                                                                                                   
       INFO: Setting up agent: agente-f394d                                                                                                                                                                              
       Feb 19, 2021 12:46:08 PM hudson.remoting.jnlp.Main$CuiListener <init>                                                                                                                                             
       INFO: Jenkins agent is running in headless mode.                                                                                                                                                                  
       Feb 19, 2021 12:46:08 PM hudson.remoting.Engine startEngine                                                                                                                                                       
       INFO: Using Remoting version: 4.6                                                                                                                                                                                 
       Feb 19, 2021 12:46:08 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir                                                                                                                           
       INFO: Using /home/jenkins/remoting as a remoting work directory                                                                                                                                                   
       Feb 19, 2021 12:46:08 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging                                                                                                                                
       INFO: Both error and output logs will be printed to /home/jenkins/remoting                                                                                                                                        
       Feb 19, 2021 12:46:09 PM hudson.remoting.jnlp.Main$CuiListener status                                                                                                                                             
       INFO: WebSocket connection open                                                                                                                                                                                   
       Feb 19, 2021 12:46:09 PM hudson.remoting.jnlp.Main$CuiListener status                                                                                                                                             
       INFO: Connected                                                                                                                                                                                                   
      
      

      Except, at this point, any Pipeline job fails with the error (stacktrace above)

      java.lang.IllegalStateException: Not expecting pod template to be null at this point
      

      The only workaround I found is to restart agent pods. No further errors or messages are found in the agent logs.

      The issue only affects Pipeline jobs. Freestyle jobs can still run normally after Jenkins Controller restarts.

      I also noted this happens for pod templates that have idleMinutes greater than zero; meaning agents that linger for a while before they are gracefully removed.

      Steps to Reproduce

      Configure a cloud with the Kubernetes Plugin. Create and use a pod template with idleMinutes (Time in minutes to retain agent when idle) greater than zero.

      1. Run a Pipeline job sucessfully.
      2. Restart Jenkins controller.
      3. Wait for the agent pod to reconnect.
      4. Re-run the same pipeline job.

          [JENKINS-64905] Pipeline jobs fail to connect to agents after jenkins controller restart

          Chris Nelson added a comment - - edited

          We have been seeing this issue as well.  Do you use the configuation-as-code plugin to configure your instance?

           

          I think I've traced the cause back to that, the issue being:

          Jenkins starts (for the first time) the casc plugin runs and configures kubernetes plugin and our pod template

          Builds work fine

          Jenkins restarts, the casc plugin runs again and configures kubernetes plugin and our pod template (changing the pod template id when it recreates it)

          The existing agents now have a pod template id that doesn't match and this issue occurs.

           

           

          Chris Nelson added a comment - - edited We have been seeing this issue as well.  Do you use the configuation-as-code plugin to configure your instance?   I think I've traced the cause back to that, the issue being: Jenkins starts (for the first time) the casc plugin runs and configures kubernetes plugin and our pod template Builds work fine Jenkins restarts, the casc plugin runs again and configures kubernetes plugin and our pod template (changing the pod template id when it recreates it) The existing agents now have a pod template id that doesn't match and this issue occurs.    

          Yes, we jave a new Jenkins instance running in a Kubernetes cluster, installed with the official helm chart.

          https://github.com/jenkinsci/helm-charts

          We are using JCasC to load every possible configuration from our values.yaml. The pod template is created by the chart. With the JCasC plugin involved, I'm not sure this could be properly fixed in the kubernetes-plugin or the configuration-as-code-plugin. Either way, there is a clear conflict in the broad design that was implemented.

          Julio Morimoto added a comment - Yes, we jave a new Jenkins instance running in a Kubernetes cluster, installed with the official helm chart. https://github.com/jenkinsci/helm-charts We are using JCasC to load every possible configuration from our values.yaml. The pod template is created by the chart. With the JCasC plugin involved, I'm not sure this could be properly fixed in the kubernetes-plugin or the configuration-as-code-plugin. Either way, there is a clear conflict in the broad design that was implemented.

          As of now, we are using a couple of workarounds:

          • Configure the agent pod templates with a short idleMinutes, say 30 or 60 minutes. This can hopefully keep agent pods reasonably fresh for the current Jenkins instance.
          • Manually issue a kubectl command that deletes agent pods after a helm upgrade restarts the Jenkins pod.

          Julio Morimoto added a comment - As of now, we are using a couple of workarounds: Configure the agent pod templates with a short idleMinutes, say 30 or 60 minutes. This can hopefully keep agent pods reasonably fresh for the current Jenkins instance. Manually issue a kubectl command that deletes agent pods after a helm upgrade restarts the Jenkins pod.

          This seems like a duplicate of JENKINS-64628. I'm trying to offer a PR to the helm chart to for a smoother workaround, other than restarting agent pods on every controller restart.

          Julio Morimoto added a comment - This seems like a duplicate of JENKINS-64628 . I'm trying to offer a PR to the helm chart to for a smoother workaround, other than restarting agent pods on every controller restart.

            Unassigned Unassigned
            juliohm Julio Morimoto
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: