Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67951

Jenkins kubernetes plugin creating multiple pod requests in a short span

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • kubernetes-plugin
    • None

      Kubernetes Environment - 1.14.7

      Jenkins Kubernetes Plugin - 1.31.3

      We have noticed after the update of the plugin from 1.31.2 to 1.31.3, the pods that get created are twice or more as a part of the plugin interaction with Kubernetes.

      This happens often but sometimes it just creates a single pod as requested and moves on. In the below example screenshot, you can see similar pods getting created in a span of 30 or more seconds.(see the time) What could be causing this?

      We use both pod templates that is populated in the configure cloud section and also we generate the pod manifest yaml dynamically based on labels. This occurs in both the cases.

      We also noticed that pod is being retained for 5 minutes or so before terminating. We double checked the pod retention policy is set as 'never' at global level.  

      Thanks for your time.

          [JENKINS-67951] Jenkins kubernetes plugin creating multiple pod requests in a short span

          peng wu added a comment -

          We have a similar problem with Kubernetes Environment - 1.12.10 + Jenkins Kubernetes Plugin - 1.31.3.  Normally we get one pod per job, but it has happened twice to us when each job would create more than 100 pods, and the pods are spaced 30s minimum and upto 5 minutes apart.  Once getting this mode, after a few hours the kubernetes cluster would be hosed with thousands of exited containers per worker node.  We would have to restore the kubenetes worker nodes from backups to get rid of the exited containers quickly while also clear the job queues in jenkins.

          It would be desirable to configure the number of pods each job can create and let the job to fail once the limit is reached.  It would also be desirable to configure more time between pods.

          peng wu added a comment - We have a similar problem with Kubernetes Environment - 1.12.10 + Jenkins Kubernetes Plugin - 1.31.3.  Normally we get one pod per job, but it has happened twice to us when each job would create more than 100 pods, and the pods are spaced 30s minimum and upto 5 minutes apart.  Once getting this mode, after a few hours the kubernetes cluster would be hosed with thousands of exited containers per worker node.  We would have to restore the kubenetes worker nodes from backups to get rid of the exited containers quickly while also clear the job queues in jenkins. It would be desirable to configure the number of pods each job can create and let the job to fail once the limit is reached.  It would also be desirable to configure more time between pods.

          The slowness in K8S api aggravates this problem. Once API  server is optimized to take more API requests, this issue reduces however, I do see it once in a while. It tells that the plugin has some setting that does not recognize the response from API(or no response from API) and retries and does not see that it already has a pod executor already

          Krishna Chaitanya Edimadakala added a comment - The slowness in K8S api aggravates this problem. Once API  server is optimized to take more API requests, this issue reduces however, I do see it once in a while. It tells that the plugin has some setting that does not recognize the response from API(or no response from API) and retries and does not see that it already has a pod executor already

          peng wu added a comment -

          I wondering whether we can add the following enhancements::

          • Introduce Retry limit:  Quit a build job if its pod had been deleted and recreated this many times.  Currently it is unlimited.
          • Progressively more delay before recreating a build pod, similar to kubernetes CrashLoopBackOff.

          I would also like to see the design document on delete and recreate build pods.  We have seen a build pod got deleted/recreated as fast as every 20s.

          peng wu added a comment - I wondering whether we can add the following enhancements:: Introduce Retry limit:  Quit a build job if its pod had been deleted and recreated this many times.  Currently it is unlimited. Progressively more delay before recreating a build pod, similar to kubernetes CrashLoopBackOff. I would also like to see the design document on delete and recreate build pods.  We have seen a build pod got deleted/recreated as fast as every 20s.

            Unassigned Unassigned
            krishnavit Krishna Chaitanya Edimadakala
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: