Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64802

Kubernetes Declarative Agents is causing race conditions which results in random build failures or instance limit is not respected

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • kubernetes-plugin
    • None

      Jenkins version: 2.263.3
      Kubernetes plugin version: 1.28.5

      We try to reuse pod templates across many repositories and mostly between branches at same repository.
      We found that we can achieve that by using labels  assigned to pod tempalte and defining an agent selector based on labels. We also try to re use pod instances between builds to reduce the time needed to start new pod as we have auto scaled work groups.

      In out global configuration we have pod templates that have labels like jdk11.build or jdk11.release . Each pod template has own concurrency limit that allows us to control max number of concurent builds for PR builds and still have space to execute releases without need of wait.
      We use two scripted and declarative way for selecting the agents to build:
      e.g.
      Scripted syntax:

      podTemplate {
        node("jdk11.build"){
            container("jdk"){
                 ...
            }
        }
      }

      This option works fine and we have no issue with it. It's always able to find correct pod template.

      For declarative syntax we have tried many diffent ways but for all we face different issues  :

      1.  Only label
        agent {
          kubernetes{
            label "jdk11.build"
          }
        }
        

        This one results in case that job is randomly not finding the global template but it's using dynamic pod template. That results in unpredictable behavior.
        Cause of it :
        The declarative syntax is triggering PodTemplateStep  execution with label used in pipeline that results in dynamic pod template creation with that label.
        See:
        https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/PodTemplateStepExecution.java
        Result:

        • we have two pod templates with same label
        • it randomly selects one or another
        • when temp pod template is selected build fails as it's based at default pod template
      2. Label with inheritFrom
        agent {
         kubernetes{
           label "jdk11.build"
           inheritFrom "jdk11-build" 
         }
        } 

        Result:

        • we have two pod templates
        • Pod template is randomly selected
        • The limit of concurrent build is not respected as templates are dynamic each has own limit ( as labels for dynamic pod template have random suffix )
        • Reusage of pod between job fails with exception NPE - Pod templated don't expected to be null at this point. When dynamic pod template is removed.
      3. Only inheritFrom
        agent {
         kubernetes{
           inheritFrom "jdk11-build" 
         }
        } 

        Result:

        • The limit of concurrent build is not respected as each template gets labels based at job name
        • There is no way to reuse pod instances between builds

      Note:
      The declarative syntax like :

      agent {
         label "jdk11.build" 
      } 

      works exactly the same as

      agent {
       kubernetes{
         label "jdk11.build" 
       }
      } 

            Unassigned Unassigned
            jaro Jaroslaw
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: