[JENKINS-67951] Jenkins kubernetes plugin creating multiple pod requests in a short span

Type: Bug
Resolution: Unresolved
Priority: Major
Component/s: kubernetes-plugin
Labels:
None

Similar Issues:
Powered by SuggestiMate

Show

Kubernetes Environment - 1.14.7

Jenkins Kubernetes Plugin - 1.31.3

We have noticed after the update of the plugin from 1.31.2 to 1.31.3, the pods that get created are twice or more as a part of the plugin interaction with Kubernetes.

This happens often but sometimes it just creates a single pod as requested and moves on. In the below example screenshot, you can see similar pods getting created in a span of 30 or more seconds.(see the time) What could be causing this?

We use both pod templates that is populated in the configure cloud section and also we generate the pod manifest yaml dynamically based on labels. This occurs in both the cases.

We also noticed that pod is being retained for 5 minutes or so before terminating. We double checked the pod retention policy is set as 'never' at global level.

Thanks for your time.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2022-03-04-12-51-48-686.png
304 kB
2022-03-04 11:51

peng wu added a comment - 2022-03-16 03:31

We have a similar problem with Kubernetes Environment - 1.12.10 + Jenkins Kubernetes Plugin - 1.31.3. Normally we get one pod per job, but it has happened twice to us when each job would create more than 100 pods, and the pods are spaced 30s minimum and upto 5 minutes apart. Once getting this mode, after a few hours the kubernetes cluster would be hosed with thousands of exited containers per worker node. We would have to restore the kubenetes worker nodes from backups to get rid of the exited containers quickly while also clear the job queues in jenkins.

It would be desirable to configure the number of pods each job can create and let the job to fail once the limit is reached. It would also be desirable to configure more time between pods.

peng wu added a comment - 2022-03-16 03:31 We have a similar problem with Kubernetes Environment - 1.12.10 + Jenkins Kubernetes Plugin - 1.31.3. Normally we get one pod per job, but it has happened twice to us when each job would create more than 100 pods, and the pods are spaced 30s minimum and upto 5 minutes apart. Once getting this mode, after a few hours the kubernetes cluster would be hosed with thousands of exited containers per worker node. We would have to restore the kubenetes worker nodes from backups to get rid of the exited containers quickly while also clear the job queues in jenkins. It would be desirable to configure the number of pods each job can create and let the job to fail once the limit is reached. It would also be desirable to configure more time between pods.

Krishna Chaitanya Edimadakala added a comment - 2022-04-11 09:16

The slowness in K8S api aggravates this problem. Once API server is optimized to take more API requests, this issue reduces however, I do see it once in a while. It tells that the plugin has some setting that does not recognize the response from API(or no response from API) and retries and does not see that it already has a pod executor already

Krishna Chaitanya Edimadakala added a comment - 2022-04-11 09:16 The slowness in K8S api aggravates this problem. Once API server is optimized to take more API requests, this issue reduces however, I do see it once in a while. It tells that the plugin has some setting that does not recognize the response from API(or no response from API) and retries and does not see that it already has a pod executor already

peng wu added a comment - 2022-04-11 18:46

I wondering whether we can add the following enhancements::

Introduce Retry limit: Quit a build job if its pod had been deleted and recreated this many times. Currently it is unlimited.
Progressively more delay before recreating a build pod, similar to kubernetes CrashLoopBackOff.

I would also like to see the design document on delete and recreate build pods. We have seen a build pod got deleted/recreated as fast as every 20s.

peng wu added a comment - 2022-04-11 18:46 I wondering whether we can add the following enhancements:: Introduce Retry limit: Quit a build job if its pod had been deleted and recreated this many times. Currently it is unlimited. Progressively more delay before recreating a build pod, similar to kubernetes CrashLoopBackOff. I would also like to see the design document on delete and recreate build pods. We have seen a build pod got deleted/recreated as fast as every 20s.

Jenkins

Details

Description

Attachments

Attachments

Activity

Collapse comment: peng wu added a comment - 2022-03-16 03:31

Expand comment: peng wu added a comment - 2022-03-16 03:31

Collapse comment: Krishna Chaitanya Edimadakala added a comment - 2022-04-11 09:16

Expand comment: Krishna Chaitanya Edimadakala added a comment - 2022-04-11 09:16

Collapse comment: peng wu added a comment - 2022-04-11 18:46

Expand comment: peng wu added a comment - 2022-04-11 18:46

People

Dates