Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67111

Kubernetes nodes sometimes reconnect after jobs complete, when configured to never reconnect

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • kubernetes-plugin
    • None
    • Kubernetes 1.20.11 on Oracle Cloud (OCI), Kubernetes plugin 1.30.4 on Jenkins LTS 2.289.3 on JDK 11.0.12 (both agent and master)

      Using Kubernetes 1.20.11, I am seeing jobs run successfully on Kubernetes agents; however, the Kubernetes plugin intermittently attempt to reconnect to the master after the job completes.

      Happy Path case:  (note the 'Disabled agent engine reconnects' line)

      ... job runs fine
       Nov 10, 2021 12:13:22 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$SlaveDisconnector call
       INFO: Disabled agent engine reconnects.
       Nov 10, 2021 12:13:22 PM hudson.remoting.jnlp.Main$CuiListener status
       INFO: Terminated
       [INFO tini (1)] Spawned child process 'docker-entrypoint.sh' with pid '7'
       [INFO tini (1)] Main child exited with signal (with signal 'Terminated')
       ...
       k8s terminates Pod (releasing resources), and deletes Pod
      

      Sad path case: (not there is no 'Disabled agent engine' line)

       ... job runs fine
       Nov 10, 2021 12:37:44 PM hudson.remoting.jnlp.Main$CuiListener status
       INFO: Terminated
       Nov 10, 2021 12:37:54 PM hudson.remoting.jnlp.Main$CuiListener status
       INFO: Performing onReconnect operation.
       Nov 10, 2021 12:37:54 PM hudson.remoting.jnlp.Main$CuiListener status
       INFO: onReconnect operation failed.
       ...
       (agent tries to reconnect to master; master rejects agent; agent exits with an error)
       ...
       k8s terminates Pod (releasing resources), but marks it as status Completed
       

          [JENKINS-67111] Kubernetes nodes sometimes reconnect after jobs complete, when configured to never reconnect

          Steve Roth added a comment -

          forgot to mention:

          • the K8s cloud in Jenkins is configured with Pod Retention: Never
          • for the sad-path case, the attempt to reconnect to the master somehow causes the K8s provider (Oracle OKE in this case) to leave the pod in the 'Completed' state rather than deleting the pod. 

          Steve Roth added a comment - forgot to mention: the K8s cloud in Jenkins is configured with Pod Retention: Never for the sad-path case, the attempt to reconnect to the master somehow causes the K8s provider (Oracle OKE in this case) to leave the pod in the 'Completed' state rather than deleting the pod. 

          Steve Roth added a comment -

          This seems to occur approximately 1/4 of the time a build runs.  (in our case more than 100x a day).    So on the plus side, it should be straightforward to determine when this is fixed.

          Is there some debug which would be helpful to enable on the agent or master side?

          Steve Roth added a comment - This seems to occur approximately 1/4 of the time a build runs.  (in our case more than 100x a day).    So on the plus side, it should be straightforward to determine when this is fixed. Is there some debug which would be helpful to enable on the agent or master side?

          Steve Roth added a comment - - edited

          From the Jenkins master perspective, I see the following differences:

          Happy Path case: (no reconnect attempt after job completes, ends up with a deleted pod)

          Nov 12, 2021 8:31:19 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Terminating Kubernetes instance for agent cicd2-test-9t27v
          
          Nov 12, 2021 8:31:19 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave deleteSlavePod
          INFO: Terminated Kubernetes instance for agent jenkins/cicd2-test-9t27v
          Terminated Kubernetes instance for agent jenkins/cicd2-test-9t27v
          
          Nov 12, 2021 8:31:19 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Disconnected computer cicd2-test-9t27v
          Disconnected computer cicd2-test-9t27v
          
          Nov 12, 2021 8:31:19 AM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
          INFO: Computer.threadPoolForRemoting [#421985] for cicd2-test-9t27v terminated: java.nio.channels.ClosedChannelException
          

          Sad Path case (with agent reconnect attempt, ends up resulting in a pod in completed state):

          Nov 12, 2021 8:47:36 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Terminating Kubernetes instance for agent cicd2-e3-x8jws
          
          Nov 12, 2021 8:47:36 AM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed
          INFO: Computer.threadPoolForRemoting [#422256] for cicd2-e3-x8jws terminated: java.nio.channels.ClosedChannelException
          
          Nov 12, 2021 8:47:46 AM hudson.TcpSlaveAgentListener$ConnectionHandler run
          INFO: Accepted JNLP4-connect connection #108,959 from /100.105.81.176:51928
          
          Nov 12, 2021 8:47:46 AM org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer onRecv
          INFO: [JNLP4-connect connection from oke-c226o5bkfaa-nzf7dodowea-scnnsb5of4q-108.<redacted>/100.105.81.176:51928] Refusing headers from remote: Unknown client name: cicd2-e3-x8jws
          

          So on the happy path case, I see these lines logged, which I do not see logged on the sad-path case:

          Nov 12, 2021 8:31:19 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave deleteSlavePod
          INFO: Terminated Kubernetes instance for agent jenkins/cicd2-test-9t27v
          Terminated Kubernetes instance for agent jenkins/cicd2-test-9t27v
          
          Nov 12, 2021 8:31:19 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
          INFO: Disconnected computer cicd2-test-9t27v
          Disconnected computer cicd2-test-9t27v
          

           

          Steve Roth added a comment - - edited From the Jenkins master perspective, I see the following differences: Happy Path case: (no reconnect attempt after job completes, ends up with a deleted pod) Nov 12, 2021 8:31:19 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Terminating Kubernetes instance for agent cicd2-test-9t27v Nov 12, 2021 8:31:19 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave deleteSlavePod INFO: Terminated Kubernetes instance for agent jenkins/cicd2-test-9t27v Terminated Kubernetes instance for agent jenkins/cicd2-test-9t27v Nov 12, 2021 8:31:19 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Disconnected computer cicd2-test-9t27v Disconnected computer cicd2-test-9t27v Nov 12, 2021 8:31:19 AM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed INFO: Computer.threadPoolForRemoting [#421985] for cicd2-test-9t27v terminated: java.nio.channels.ClosedChannelException Sad Path case (with agent reconnect attempt, ends up resulting in a pod in completed state): Nov 12, 2021 8:47:36 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Terminating Kubernetes instance for agent cicd2-e3-x8jws Nov 12, 2021 8:47:36 AM jenkins.slaves.DefaultJnlpSlaveReceiver channelClosed INFO: Computer.threadPoolForRemoting [#422256] for cicd2-e3-x8jws terminated: java.nio.channels.ClosedChannelException Nov 12, 2021 8:47:46 AM hudson.TcpSlaveAgentListener$ConnectionHandler run INFO: Accepted JNLP4-connect connection #108,959 from /100.105.81.176:51928 Nov 12, 2021 8:47:46 AM org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer onRecv INFO: [JNLP4-connect connection from oke-c226o5bkfaa-nzf7dodowea-scnnsb5of4q-108.<redacted>/100.105.81.176:51928] Refusing headers from remote: Unknown client name: cicd2-e3-x8jws So on the happy path case, I see these lines logged, which I do not see logged on the sad-path case: Nov 12, 2021 8:31:19 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave deleteSlavePod INFO: Terminated Kubernetes instance for agent jenkins/cicd2-test-9t27v Terminated Kubernetes instance for agent jenkins/cicd2-test-9t27v Nov 12, 2021 8:31:19 AM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate INFO: Disconnected computer cicd2-test-9t27v Disconnected computer cicd2-test-9t27v  

            Unassigned Unassigned
            srothco Steve Roth
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: