Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67196

Pods are terminated after ~110s and ignore PodTemplate.connectionTimeout when containers start

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • kubernetes-plugin
    • None
    • Jenkins 2.303.3 running on GKE 1.20
      Kubernetes plugin >= 1.30.5

      Issue started right after upgrading from 1.30.4 to any later version (1.30.5 to latest 1.30.10).

      Log shows:

       

      2021-11-20 13:42:17.978+0000 [id=108]   INFO    o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes jenkins/meta-mbdn9-9j8d3
      2021-11-20 13:44:08.228+0000 [id=108]   WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: meta-mbdn9-9j8d3, temp
      late=PodTemplate{id='a54377cc-f22f-4767-9571-f9abf713d15f', name='meta-mbdn9', namespace='jenkins', idleMinutes=5, label='meta', serviceAccount='jenkins', nod
      eSelector='node_pool=build-pool', containers=[ContainerTemplate{name='gcloud', image='gcr.io/google.com/cloudsdktool/cloud-sdk:debian_component_based', comman
      d='cat', ttyEnabled=true}], annotations=[PodAnnotation{key='buildUrl', value='http://jenkins-ui:8080/job/k8s/job/reclaim-volumes/2675/'}, PodAnnotation{key='r
      unUrl', value='job/k8s/job/reclaim-volumes/2675/'}]}
      io.fabric8.kubernetes.client.KubernetesClientTimeoutException: Timed out waiting for [1000000] milliseconds for [Pod] with name:[meta-mbdn9-9j8d3] in namespac
      e [jenkins].
              at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.await(AllContainersRunningPodWatcher.java:96)
              at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:169)
              at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:293)
              at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
              at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
              at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
              at java.base/java.lang.Thread.run(Thread.java:829)
      2021-11-20 13:44:08.230+0000 [id=108]   INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent meta-mbdn9-9j8d3
      Terminated Kubernetes instance for agent jenkins/meta-mbdn9-9j8d3
      
      
      

       

      This happens for every pod if its containers do no start in < 110 seconds (logs shows that after ~110 seconds the pod gets terminated by the plugin) even the error message is wrong: Timed out waiting for [1000000] milliseconds - it didn't wait 1000 seconds as it should.

       

      I believe the issue comes from this commit: https://github.com/jenkinsci/kubernetes-plugin/commit/f95a604462fd7723ba8246b748c83dc90d65a9e3

      I think these changed lines don't do the right thing:

      - return periodicAwait(10, System.currentTimeMillis(), Math.max(remaining / 10, 1000L), remaining); 
      + // Retry with 10% of the remaining time, with a min of 1s and a max of 10s 
      + return periodicAwait(10, System.currentTimeMillis(), Math.min(10000L, Math.max(remaining / 10, 1000L)), remaining);

       

      I've reverted that line and now my pods behave correctly like in 1.30.4. and agents are provisioned correctly.

            vlatombe Vincent Latombe
            stefans Stefan
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: