Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63976

Kubernetes plugin cancels build task when resource quota is temporarily exceeded

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • kubernetes-plugin
    • kubernetes 1.27.2 (Jenkins Plugin)
      Jenkins 2.249.1
      OKD 3.11 (Kubernetes v1.11.0+d4cacc0)
    • 1.28.6

      Since the update to kubernetes plugin version 1.27.2 we've found that build tasks are canceled and terminated with "FAILURE" status when the namespace's ResourceQuota is temporarily exceeded.

      Steps to reproduce:

      • Create a resource quota in namespace where the kubernetes plugin creates pods
      • Manually run a pod that uses up most or all of the quota, e.g. with
        kubectl run quotauserpod --command --attach=false -i --image=alpine:latest --requests='cpu=2,memory=100Mi' --limits='cpu=2,memory=100Mi' -- sh -c 'cat'
        
      • Run a Jenkins job using the kubernetes plugin that has resource requests and limits that would fit into the quota if no other pods were running

      Expected behaviour: Jenkins retries creating the pod until it succeeds

      Actual behaviour: Jenkins cancels the job, sets its status to "FAILURE" and logs the following error message:

      14:56:12  ERROR: Unable to create pod podnamespace/buildpodname.
      14:56:12  Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/podnamespace/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "buildpodname" is forbidden: exceeded quota: quota-s, requested: limits.cpu=2500m,requests.cpu=2, used: limits.cpu=2,requests.cpu=2, limited: limits.cpu=2500m,requests.cpu=2500m.

       

      I have downgraded the kubernetes plugin to 1.26.4 and have been able to verify that the scheduling works as expected, i.e. jenkins retries pod creation until it succeeds.

      This is a major issue for us since we run Jenkins as a service for our developers and can't control job scheduling to ensure that all parallel running jobs don't exceed the resource quota. Moreover, some job runtimes vary widely, which would make scheduling even more challenging.

      I believe the new behaviour was introduced in commit 6e7c0c374ac53244aabc1a9008aed250c5937ac0 (plugin v1.27.1).
      It seems that with this change all HTTP 4xx-errors errors lead to the behaviour described above, including when a namespace's resource quota has been exceeded.

          [JENKINS-63976] Kubernetes plugin cancels build task when resource quota is temporarily exceeded

          Egor Ermakov added a comment -

          dlandtwing, thank you for creating this issue. We have just arrived to the same conclusion - the behaviour of scheduling dynamic agents on k8s changed.

          It affects us in the same way - we do not want to give unrestricted resources to jenkins namespace and have to impose quote, but due to the dynamic nature of tasks, sometimes the quota is exceeded and it is reasonable to expect to retry. Once the other jobs finish and destroy pods, new pods would be able to be scheduled. 

          Egor Ermakov added a comment - dlandtwing , thank you for creating this issue. We have just arrived to the same conclusion - the behaviour of scheduling dynamic agents on k8s changed. It affects us in the same way - we do not want to give unrestricted resources to jenkins namespace and have to impose quote, but due to the dynamic nature of tasks, sometimes the quota is exceeded and it is reasonable to expect to retry. Once the other jobs finish and destroy pods, new pods would be able to be scheduled. 

          Vincent Latombe added a comment - Caused by https://github.com/jenkinsci/kubernetes-plugin/pull/825

          Hi,

          We have the same use case as kenota. We are using a multi tenant Kubernetes and every team have their own quotas for scheduling builds. For the user experience is very bad to have the build failed without meaningful message.

          The previous behavior was better, retrying until some resources are free (other builds to finish)

          Thanks

          Valentin Delaye added a comment - Hi, We have the same use case as kenota . We are using a multi tenant Kubernetes and every team have their own quotas for scheduling builds. For the user experience is very bad to have the build failed without meaningful message. The previous behavior was better, retrying until some resources are free (other builds to finish) Thanks

          Charles K added a comment - - edited

          I submitted a PR which restores the behavior prior to v1.27.1 (retry rather than fail if quota exceeded / conflict): https://github.com/jenkinsci/kubernetes-plugin/pull/930

          It just got merged recently. So expect the change should be in a near-future release of the plugin, or can install a copy now from the recent master build output.

          Charles K added a comment - - edited I submitted a PR which restores the behavior prior to v1.27.1 (retry rather than fail if quota exceeded / conflict): https://github.com/jenkinsci/kubernetes-plugin/pull/930 It just got merged recently. So expect the change should be in a near-future release of the plugin, or can install a copy now from the recent master build output.

            Unassigned Unassigned
            dlandtwing Dominik Landtwing
            Votes:
            6 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: