Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63976

Kubernetes plugin cancels build task when resource quota is temporarily exceeded

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • kubernetes-plugin
    • kubernetes 1.27.2 (Jenkins Plugin)
      Jenkins 2.249.1
      OKD 3.11 (Kubernetes v1.11.0+d4cacc0)
    • 1.28.6

      Since the update to kubernetes plugin version 1.27.2 we've found that build tasks are canceled and terminated with "FAILURE" status when the namespace's ResourceQuota is temporarily exceeded.

      Steps to reproduce:

      • Create a resource quota in namespace where the kubernetes plugin creates pods
      • Manually run a pod that uses up most or all of the quota, e.g. with
        kubectl run quotauserpod --command --attach=false -i --image=alpine:latest --requests='cpu=2,memory=100Mi' --limits='cpu=2,memory=100Mi' -- sh -c 'cat'
        
      • Run a Jenkins job using the kubernetes plugin that has resource requests and limits that would fit into the quota if no other pods were running

      Expected behaviour: Jenkins retries creating the pod until it succeeds

      Actual behaviour: Jenkins cancels the job, sets its status to "FAILURE" and logs the following error message:

      14:56:12  ERROR: Unable to create pod podnamespace/buildpodname.
      14:56:12  Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/podnamespace/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "buildpodname" is forbidden: exceeded quota: quota-s, requested: limits.cpu=2500m,requests.cpu=2, used: limits.cpu=2,requests.cpu=2, limited: limits.cpu=2500m,requests.cpu=2500m.

       

      I have downgraded the kubernetes plugin to 1.26.4 and have been able to verify that the scheduling works as expected, i.e. jenkins retries pod creation until it succeeds.

      This is a major issue for us since we run Jenkins as a service for our developers and can't control job scheduling to ensure that all parallel running jobs don't exceed the resource quota. Moreover, some job runtimes vary widely, which would make scheduling even more challenging.

      I believe the new behaviour was introduced in commit 6e7c0c374ac53244aabc1a9008aed250c5937ac0 (plugin v1.27.1).
      It seems that with this change all HTTP 4xx-errors errors lead to the behaviour described above, including when a namespace's resource quota has been exceeded.

            Unassigned Unassigned
            dlandtwing Dominik Landtwing
            Votes:
            6 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: