-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
recent jenkins/plugin running on Linux, kubernetes on Linus, various versions
-
-
3578.vb_9a_92ea_9845a_
When a build creates a pod on kubernetes, if jenkins cannot verify the pod, it will delete it and recreate one every 10s. We are not aware of any configuration parameters that can control the pace. This can hose the kubernetes cluster with huge number of pod creations and deletions. We would like the build to fail after a number of failures instead of keeping delete/create pods forever. At least we would like to have a new pod wait progressively more time, similar to kubernetes crashloop.
In production, we had situations where the kubernetes cannot report the pod status in the time expected by jenkins, and the resulting flood of pod creation/deletion left each node to hold more than 8000 deleted containers while running over the pod count limit, which would need hours to clear even with the jenkins feed turned off - we eventually restored the nodes from backup. Although this bug is not considered the root cause for the response slowing down, the bug caused a "pod storm" which brought the kubernetes cluster to its knees and required this drastic node restore.
In testing, we had a situation that the connection to kubernetes does not support websocket, thus jenkins could not read the pod status via what appears to be a "watch" on the pod, failing on request "path" similar to the following in the kubernetes ingress log: '/api/v1/namespaces/<ns>/pods?<podname>&allowWatchBookmarks=true&watch=true'
This started the pod creation/deletion loop. In the slightly obfuscated console log attached, the log line "Still waiting to schedule task" is around the failure on the watch request in the k8s ingress log shown above, and the build is recreating the pod every 10s until the build is aborted manually.
- is related to
-
JENKINS-66822 Jenkins is trying to create an agent pod forever
-
- Open
-
- relates to
-
JENKINS-47615 failing container in pod triggers multiple restarts
-
- Open
-
hello i had this issue aswell, i was working on this on my free time and i reach to this part of code, i would like to share what i found
https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesProvisioningLimits.java#L91
there is a option called podTemplate.getInstanceCap() that theorically can be set and can posible stop the massive pod creation , if not set by default is set to Integer.MAX_VALUE which is very big value, i have not found the way to set this value yet on the pod template but i will report if i find anything new
https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/PodTemplate.java#L328