-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
recent jenkins/plugin running on Linux, kubernetes on Linus, various versions
-
-
3578.vb_9a_92ea_9845a_
When a build creates a pod on kubernetes, if jenkins cannot verify the pod, it will delete it and recreate one every 10s. We are not aware of any configuration parameters that can control the pace. This can hose the kubernetes cluster with huge number of pod creations and deletions. We would like the build to fail after a number of failures instead of keeping delete/create pods forever. At least we would like to have a new pod wait progressively more time, similar to kubernetes crashloop.
In production, we had situations where the kubernetes cannot report the pod status in the time expected by jenkins, and the resulting flood of pod creation/deletion left each node to hold more than 8000 deleted containers while running over the pod count limit, which would need hours to clear even with the jenkins feed turned off - we eventually restored the nodes from backup. Although this bug is not considered the root cause for the response slowing down, the bug caused a "pod storm" which brought the kubernetes cluster to its knees and required this drastic node restore.
In testing, we had a situation that the connection to kubernetes does not support websocket, thus jenkins could not read the pod status via what appears to be a "watch" on the pod, failing on request "path" similar to the following in the kubernetes ingress log: '/api/v1/namespaces/<ns>/pods?<podname>&allowWatchBookmarks=true&watch=true'
This started the pod creation/deletion loop. In the slightly obfuscated console log attached, the log line "Still waiting to schedule task" is around the failure on the watch request in the k8s ingress log shown above, and the build is recreating the pod every 10s until the build is aborted manually.
- is related to
-
JENKINS-66822 Jenkins is trying to create an agent pod forever
-
- Open
-
- relates to
-
JENKINS-47615 failing container in pod triggers multiple restarts
-
- Open
-
wrong link, my bad, the other pr was too old, this is the correct ticket that implemented this , seems to be a very new feature https://issues.jenkins.io/browse/JENKINS-66822