Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-73476

parallel stages on different nodes scale sh commands poorly

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None

      We have a primary build pipeline that uses workflow-cps parallel step to run around ~50 pods (this behavior is true also to static linux agents, not just pods/containers) at the same time, and it's heavily sh dependent. 
      As more agents are added to the parallel step, sh commands take longer to complete if they are wrapped in groovy script.

      I've simplified the pipeline so it will be easier to reproduce:

      pipeline {
          agent none
          environment {
              POD_YAML = """
      apiVersion: v1
      kind: Pod
      metadata:
        name: test-pod
      spec:
        containers:
          - name: test-build
            image: ubuntu
            resources:
              requests:
                memory: 4000Mi
                cpu: 4
              limits:
                memory: 4000Mi
                cpu: 4
            command: ['sleep']
            args: ['6h']
            tty: true
              """
          }
          stages {
              stage('Build and Test') {
                  parallel {
                      
                      stage('Stage1') {
                          agent {
                              kubernetes {
                                  yaml env.POD_YAML
                              }
                          }
                          steps {
                              container('test-build') {
                                  script {
                                      for (int i = 0; i < 30; i++) {
                                          sh 'cat /etc/hosts'
                                      }
                                  }
                              }
                          }
                      }
                      
       
                      stage('Stage2') {
                          agent {
                              kubernetes {
                                  yaml env.POD_YAML
                              }
                          }
                          steps {
                              container('test-build') {
                                  script {
                                      for (int i = 0; i < 30; i++) {
                                          sh 'cat /etc/hosts'
                                      }
                                  }
                              }
                          }
                      }
                      
                      
                      stage('Stage3') {
                          agent {
                              kubernetes {
                                  yaml env.POD_YAML
                              }
                          }
                          steps {
                              container('test-build') {
                                  script {
                                      for (int i = 0; i < 30; i++) {
                                          sh 'cat /etc/hosts'
                                      }
                                  }
                              }
                          }
                      }
                      
                       // ....
                      // .....
                     // you can keep adding stages
                      
                  }
              }
          }
      } 

       

       

      My results are as follows:
      7 stages: ~50 seconds for each agent to finish
      17 stages: ~120 seconds for each agent to finish
      35 stages: ~270 for each agent to finish

      If I leave only one stage at the parallel part, then run the same build X50 times at the same time, each build finishes very fast (around 15 seconds)
      So I don't think it's a general load issue, but rather maybe parallel step is throttling the sh response in some way, making them hang. (maybe CpsFlowExecution thread?)

      P.S: "Default Speed / Durability LevelDefault Speed / Durability Level" is already set to performance optimized. 

            Unassigned Unassigned
            simpleniko Niko
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: