Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68409

kubernetes plugin can create a new pod every 10s when something wrong

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • kubernetes-plugin
    • None
    • recent jenkins/plugin running on Linux, kubernetes on Linus, various versions
    • 3578.vb_9a_92ea_9845a_

      When a build creates a pod on kubernetes, if jenkins cannot verify the pod, it will delete it and recreate one every 10s.  We are not aware of any configuration parameters that can control the pace.  This can hose the kubernetes cluster with huge number of pod creations and deletions.  We would like the build to fail after a number of failures instead of keeping delete/create pods forever.  At least we would like to have a new pod wait progressively more time, similar to kubernetes crashloop.

       In production, we had situations where the kubernetes cannot report the pod status in the time expected by jenkins, and the resulting flood of pod creation/deletion left each node to hold more than 8000 deleted containers while running over the pod count limit, which would need hours to clear even with the jenkins  feed turned off - we eventually restored the  nodes from backup.  Although this bug is not considered the root cause for the response slowing down, the bug caused a "pod storm" which brought the kubernetes cluster to its knees and required this drastic node restore.

       In testing, we had a situation that the connection to kubernetes does not  support websocket, thus jenkins could not read the pod status via what appears to  be a "watch" on the pod, failing on request "path" similar to the following in the kubernetes ingress log: '/api/v1/namespaces/<ns>/pods?<podname>&allowWatchBookmarks=true&watch=true'

       This started the pod creation/deletion loop.  In the slightly obfuscated console log attached,  the log line "Still waiting to schedule task" is around the failure on the watch request in the k8s ingress log shown above, and the build is recreating the pod every 10s until the build is aborted manually.

            gabocuadros gabriel cuadros
            wu105 peng wu
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: