Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-73349

Terminate dynamic template agents quicker when pipeline is aborted

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Minor Minor
    • kubernetes-plugin
    • None

      When a pipeline schedule an pod agent that is not immediately schedulable (due to node selectors, available resources, ...) the pod is pending and the Jenkins node is created / suspended waiting for the agent to connect.

      Now if the pipeline is aborted, then the Jenkins agent is not terminated yet. At least not until either the slaveConnectTimeout (Timeout in seconds for Jenkins connection, default to 1000 seconds since https://github.com/jenkinsci/kubernetes-plugin/commit/252f8d1e2cf3ed71f3a2c3694eff08b7fa1004c7 / https://github.com/jenkinsci/kubernetes-plugin/releases/tag/kubernetes-1.29.6) or the agent retentionTimeout (Container Cleanup Timeout, default to 5 minutes) kicks off.

      • If the agent cannot be schedule, it will never be connected / idle and therefore the retentionTimeout does not apply. The agent would be terminated after the slaveConnectTimeout.
      • If the agent is eventually scheduled, it connect and the retentionTimeout eventually terminates it.

      This issue is about improving this behavior and try to terminate the agent earlier when the requesting pipeline build is aborted / completed. Especially for dynamic pod templates that have a reference to the build.

      Reproduce

      • Run a pipeline like the following:
      // Uses Declarative syntax to run commands inside a container.
      pipeline {
          agent {
              kubernetes {
                  cloud 'local'
                  yaml '''
      apiVersion: v1
      kind: Pod
      spec:
        nodeSelector:
          dedicated: doesnotexist
      '''
              }
          }
          stages {
              stage('Main') {
                  steps {
                      sh 'echo "OK"'
                  }
              }
          }
      }
      
      * Abort the pipeline after a few seconds
      * Notice that the pod / agent will hang around for a while before being deleted. Per the default configuration it will take >16min (1000s after being created).
      
      

          [JENKINS-73349] Terminate dynamic template agents quicker when pipeline is aborted

          Allan BURDAJEWICZ created issue -
          Allan BURDAJEWICZ made changes -
          Summary Original: Terminate dynamic template agents when pipeline is aborted New: Terminate dynamic template agents quicker when pipeline is aborted
          Allan BURDAJEWICZ made changes -
          Description Original: When a pipeline schedule an pod agent that is not immediately schedulable (due to node selectors, available resources, ...) the pod is pending and the Jenkins node is created / suspended waiting for the agent to connect.

          Now if the pipeline is aborted, then the Jenkins agent is not terminated yet. At least not until either the {{slaveConnectTimeout}} (Timeout in seconds for Jenkins connection, default to 1000 seconds since https://github.com/jenkinsci/kubernetes-plugin/commit/252f8d1e2cf3ed71f3a2c3694eff08b7fa1004c7 / https://github.com/jenkinsci/kubernetes-plugin/releases/tag/kubernetes-1.29.6) or the agent {{retentionTimeout}} (Container Cleanup Timeout, default to 5 minutes) kicks off.

          * If the agent cannot be schedule, it will never be connected / idle and therefore the {{retentionTimeout}} does not apply. The agent would be terminated after the {{slaveConnectTimeout}}.
          * If the agent is eventually scheduled, it connect and the {{retentionTimeout}} eventually terminates it.

          This issue is about improving this behavior and try to terminate the agent earlier when the requesting pipeline build is aborted / completed. Especially for dynamic pod templates that have a reference to the build.
          New: When a pipeline schedule an pod agent that is not immediately schedulable (due to node selectors, available resources, ...) the pod is pending and the Jenkins node is created / suspended waiting for the agent to connect.

          Now if the pipeline is aborted, then the Jenkins agent is not terminated yet. At least not until either the {{slaveConnectTimeout}} (Timeout in seconds for Jenkins connection, default to 1000 seconds since https://github.com/jenkinsci/kubernetes-plugin/commit/252f8d1e2cf3ed71f3a2c3694eff08b7fa1004c7 / https://github.com/jenkinsci/kubernetes-plugin/releases/tag/kubernetes-1.29.6) or the agent {{retentionTimeout}} (Container Cleanup Timeout, default to 5 minutes) kicks off.

          * If the agent cannot be schedule, it will never be connected / idle and therefore the {{retentionTimeout}} does not apply. The agent would be terminated after the {{slaveConnectTimeout}}.
          * If the agent is eventually scheduled, it connect and the {{retentionTimeout}} eventually terminates it.

          This issue is about improving this behavior and try to terminate the agent earlier when the requesting pipeline build is aborted / completed. Especially for dynamic pod templates that have a reference to the build.

          h3. Reproduce

          * Run a pipeline like the following:

          {code}
          // Uses Declarative syntax to run commands inside a container.
          pipeline {
              agent {
                  kubernetes {
                      cloud 'local'
                      yaml '''
          apiVersion: v1
          kind: Pod
          spec:
            nodeSelector:
              dedicated: doesnotexist
          '''
                  }
              }
              stages {
                  stage('Main') {
                      steps {
                          sh 'echo "OK"'
                      }
                  }
              }
          }

          * Abort the pipeline after a few seconds
          * Notice that the pod / agent will hang around for a while before being deleted. Per the default configuration it will take >16min (1000s after being created).

          {code}
          Allan BURDAJEWICZ made changes -
          Remote Link New: This issue links to "CloudBees Internal Issue (Web Link)" [ 30422 ]

            Unassigned Unassigned
            allan_burdajewicz Allan BURDAJEWICZ
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: