• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • kubernetes-plugin
    • None
    • JENKINS: 2.303.2
      Kubernetes Plugin: 1.30.3

      We have a use case where we need to set an initContainer to do some operations before the actual job container starts.

      The initContainer operation would take around 20-30 mins.

      If I add slaveConnectTimeout and set it to 3600, the kubernetes plugin ends up terminating the agent (after 6 mins) and re-creates a new pod.

      Are we setting it correctly?

      Pipeline Spec:

      podTemplate(cloud:'kubernetes', 
      slaveConnectTimeout: '3600', 
      yaml : """
      apiVersion: v1
      kind: Pod
      metadata:
        labels:
          some-label: some-label-value
      spec:
        initContainers:
        - name: init-sleep
          image: ubuntu
          command: ['bash', '-c', 'while true; do echo "hello world"; sleep 10; done']
        containers:
        - name: job-container
          image: ubuntu:latest
          command:
          - cat
          restartPolicy: Never
          backoffLimit: 4
          tty: true
        nodeSelector:
          kubernetes.io/os: linux
      """  ) {
        node(POD_LABEL) {
          stage('work') {
              container('job-container') {
                  sh 'apt update'
              }
          }
        }
      }
      
      

      Logs:

      18:17:24  Created Pod: kubernetes dynamic-updater/init-build-26-bt2j2-ktzkh-1w9j4
      18:17:24  [Normal][dynamic-updater/init-build-26-bt2j2-ktzkh-1w9j4][Scheduled] Successfully assigned dynamic-updater/init-build-26-bt2j2-ktzkh-1w9j4 to node-1
      18:17:25  [Normal][dynamic-updater/init-build-26-bt2j2-ktzkh-1w9j4][Pulling] Pulling image "ubuntu"
      18:17:26  [Normal][dynamic-updater/init-build-26-bt2j2-ktzkh-1w9j4][Pulled] Successfully pulled image "ubuntu"
      18:17:26  [Normal][dynamic-updater/init-build-26-bt2j2-ktzkh-1w9j4][Created] Created container init-sleep
      18:17:27  [Normal][dynamic-updater/init-build-26-bt2j2-ktzkh-1w9j4][Started] Started container init-sleep
      18:17:34  Still waiting to schedule task
      18:17:34  'init-build-26-bt2j2-ktzkh-1w9j4' is offline
      18:23:34  Created Pod: kubernetes dynamic-updater/init-build-26-bt2j2-ktzkh-0d7pk
      18:23:34  [Normal][dynamic-updater/init-build-26-bt2j2-ktzkh-0d7pk][Scheduled] Successfully assigned dynamic-updater/init-build-26-bt2j2-ktzkh-0d7pk to node-10
      18:23:35  [Normal][dynamic-updater/init-build-26-bt2j2-ktzkh-0d7pk][Pulling] Pulling image "ubuntu"
      18:23:36  [Normal][dynamic-updater/init-build-26-bt2j2-ktzkh-0d7pk][Pulled] Successfully pulled image "ubuntu"
      18:23:36  [Normal][dynamic-updater/init-build-26-bt2j2-ktzkh-0d7pk][Created] Created container init-sleep
      18:23:36  [Normal][dynamic-updater/init-build-26-bt2j2-ktzkh-0d7pk][Started] Started container init-sleep
      

      Jenkins Instance Logs:

      2021-10-13 01:17:24.724+0000 [id=323]   INFO    hudson.slaves.NodeProvisioner#update: init-build-26-bt2j2-ktzkh-1w9j4 provisioning successfully completed. We have now 3 computer(s)
      2021-10-13 01:17:24.818+0000 [id=322]   INFO    o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes dynamic-updater/init-build-26-bt2j2-ktzkh-1w9j4
      2021-10-13 01:23:25.196+0000 [id=383]   INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent init-build-26-bt2j2-ktzkh-1w9j4
      2021-10-13 01:23:25.202+0000 [id=383]   INFO    o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer init-build-26-bt2j2-ktzkh-1w9j4
      Disconnected computer init-build-26-bt2j2-ktzkh-1w9j4
      
      

      We are expecting the pod shouldn't get re-created for the amount of time specified by slaveConnectTimeout.

       

      Is there something we missed? What triggers "terminate" for the pod?

       

       

       

          [JENKINS-66886] slaveConnectTimeout not honored

          Seems the default slaveConnectTimeout is 1000 seconds, i.e. over 16 minutes, so your 6-minute timeout is something else.

          https://github.com/jenkinsci/kubernetes-plugin/blob/17ab0ad0f19d01ec99f154f08ea6f7f28f47b3df/src/main/java/org/csanchez/jenkins/plugins/kubernetes/PodTemplate.java#L67-L71

          Kalle Niemitalo added a comment - Seems the default slaveConnectTimeout is 1000 seconds, i.e. over 16 minutes, so your 6-minute timeout is something else. https://github.com/jenkinsci/kubernetes-plugin/blob/17ab0ad0f19d01ec99f154f08ea6f7f28f47b3df/src/main/java/org/csanchez/jenkins/plugins/kubernetes/PodTemplate.java#L67-L71

          Hitesh Kulkarni added a comment - - edited

          Thanks Kalle for your comment. We tried to overwrite that parameter too. We passed org.csanchez.jenkins.plugins.kubernetes.PodTemplate.connectionTimeout parameter in Jenkins WAR process. This is how the war process looks like

           

          java -Duser.home=/var/jenkins -Duser.timezone=America/Los_Angeles -XX:MaxRAMPercentage=70.0 -Djenkins.install.runSetupWizard=false -Dorg.csanchez.jenkins.plugins.kubernetes.PodTemplate.connectionTimeout=60000 -Djenkins.model.Jenkins.slaveAgentPort=50000 -jar /usr/share/jenkins/jenkins.war --prefix=/dynamic-updater-test 
          

           

          Hitesh Kulkarni added a comment - - edited Thanks Kalle for your comment. We tried to overwrite that parameter too. We passed org.csanchez.jenkins.plugins.kubernetes.PodTemplate.connectionTimeout parameter in Jenkins WAR process. This is how the war process looks like   java -Duser.home=/ var /jenkins -Duser.timezone=America/Los_Angeles -XX:MaxRAMPercentage=70.0 -Djenkins.install.runSetupWizard= false -Dorg.csanchez.jenkins.plugins.kubernetes.PodTemplate.connectionTimeout=60000 -Djenkins.model.Jenkins.slaveAgentPort=50000 -jar /usr/share/jenkins/jenkins.war --prefix=/dynamic-updater-test  

          Andreas Müller added a comment - - edited

          For me it the problem seems to be here

          https://github.com/jenkinsci/kubernetes-plugin/blob/aed016b8d357d9a90b1d84361b32e659f606fd90/src/main/java/org/csanchez/jenkins/plugins/kubernetes/AllContainersRunningPodWatcher.java#L93

          return periodicAwait(10, System.currentTimeMillis(), Math.min(10000L, Math.max(remaining / 10, 1000L)), remaining);
          

          For me it looks for 10 tries and then the max/min magic the timeout could not be longer than 10000 ms. 10 *10000 ms = 100 sec s timeout. If your timeout is longer than that it is not honored.

          And the default timeout of 1000secs was decreased to 100.

          Bug comes in at 10/15/21 with this commit f95a604462fd7723ba8246b748c83dc90d65a9e3

          vlatombe could you an urgent look on this ? We have very slow starting agents so it is very urgent for us to get this fixed.

           

          Andreas Müller added a comment - - edited For me it the problem seems to be here https://github.com/jenkinsci/kubernetes-plugin/blob/aed016b8d357d9a90b1d84361b32e659f606fd90/src/main/java/org/csanchez/jenkins/plugins/kubernetes/AllContainersRunningPodWatcher.java#L93 return periodicAwait(10, System .currentTimeMillis(), Math .min(10000L, Math .max(remaining / 10, 1000L)), remaining); For me it looks for 10 tries and then the max/min magic the timeout could not be longer than 10000 ms. 10 *10000 ms = 100 sec s timeout. If your timeout is longer than that it is not honored. And the default timeout of 1000secs was decreased to 100. Bug comes in at 10/15/21 with this commit f95a604462fd7723ba8246b748c83dc90d65a9e3 vlatombe  could you an urgent look on this ? We have very slow starting agents so it is very urgent for us to get this fixed.  

          Andreas Müller added a comment - added PR https://github.com/jenkinsci/kubernetes-plugin/pull/1094

            vlatombe Vincent Latombe
            hkulkar Hitesh Kulkarni
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: