Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50429

Shell command are really slower than before

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Minor
    • Resolution: Fixed
    • kubernetes-plugin
    • None
    • jenkins 2.111,
      workflow-durable-task-step 2.19
      kubernetes-1.6.0

    Description

      We have recently updated our Jenkins installation from 2.101 to 2.111 including every plugin related to pipeline.

      Since this update every shell «sh» invocation is really slower than before. shell invocation was taking few millisecond before. it is now taking seconds. 

      So job that were taking 1:30 minutes or taking up to 25:00 minutes.

      We are trying to figure out which plugin is related 

      Attachments

        Issue Links

          Activity

            oleg_nenashev Oleg Nenashev added a comment -

            Could it be related to changes in Cheetah svanoort? https://jenkins.io/blog/2018/02/22/cheetah/

            pascallap it would be definitely great to have more information about your scripts and use-cases to triage the issue.

            oleg_nenashev Oleg Nenashev added a comment - Could it be related to changes in Cheetah svanoort ? https://jenkins.io/blog/2018/02/22/cheetah/ pascallap it would be definitely great to have more information about your scripts and use-cases to triage the issue.

            We don't have try to pin point the exact version of which component is causing us trouble. We don't have the feeling that cheetsh is really involve with the slowness. We have tried an older version of Jenkins core (2.101 and 2.93) but it was impossible for use to have the exact same plugins versions that we had before.

            To be fair, we were out dated on 2 plugins since 2.93. The kubernetes-plugin and the durable-task.

            We run most of the jobs in a kubernetes pod

            Here is a reference build run:

             

            node("slavenode"){   // <-- This is a K8s pod
              stage('checkout'){
                container("jnlp"){
                  deleteDir()
                  sh "touch A"
                  sh "chmod a+x A"
                  sh "ls -la"
                  stash name: 'artefact', includes: "A"
                  deleteDir()
                  unstash 'artefact'
                  sh "ls -la" 
               }
              }
            }

            The stage was reported to run in 4~5 second (so it doesn't include spawning a new pod).

            With the new version we get a good 13~15 seconds.

             

             

             

            pascallap Pascal Laporte added a comment - We don't have try to pin point the exact version of which component is causing us trouble. We don't have the feeling that cheetsh is really involve with the slowness. We have tried an older version of Jenkins core (2.101 and 2.93) but it was impossible for use to have the exact same plugins versions that we had before. To be fair, we were out dated on 2 plugins since 2.93. The kubernetes-plugin and the durable-task. We run most of the jobs in a kubernetes pod Here is a reference build run:   node( "slavenode" ){ // <-- This is a K8s pod stage( 'checkout' ){ container( "jnlp" ){ deleteDir() sh "touch A" sh "chmod a+x A" sh "ls -la" stash name: 'artefact' , includes: "A" deleteDir() unstash 'artefact' sh "ls -la" } } } The stage was reported to run in 4~5 second (so it doesn't include spawning a new pod). With the new version we get a good 13~15 seconds.      
            svanoort Sam Van Oort added a comment -

            pascallap A couple questions to gather more info:

            1. Does the slowdown only appear when using a container, or is it present with traditional build agents too?
            2. Does your master show high CPU during this operation?
            3. Can you grab a thread dump from the master while this is running? This will help identify deadlocks or CPU-expensive methods.

            Thanks!

            svanoort Sam Van Oort added a comment - pascallap A couple questions to gather more info: 1. Does the slowdown only appear when using a container, or is it present with traditional build agents too? 2. Does your master show high CPU during this operation? 3. Can you grab a thread dump from the master while this is running? This will help identify deadlocks or CPU-expensive methods. Thanks!
            pascallap Pascal Laporte added a comment - - edited

            svanoort Couple of answer for you:

            1. Using a traditional agents (ssh) is getting a 1 second run time. So yes it seem to happen only on containers.
            2. For this precise exemple we don't see higher or alarming CPU usage on the master node.
            3. Attach to the issus is now a threadDump of the master node.

            Extra info, the master node is also a Kubernetes Pods (container), running the same job on the master is taking 1 second.

            pascallap Pascal Laporte added a comment - - edited svanoort Couple of answer for you: Using a traditional agents (ssh) is getting a 1 second run time. So yes it seem to happen only on containers. For this precise exemple we don't see higher or alarming CPU usage on the master node. Attach to the issus is now a threadDump of the master node. Extra info, the master node is also a Kubernetes Pods (container), running the same job on the master is taking 1 second.

            I have setup a new local Kubernetes and have pin point the problem to the Kubernetes Plugin.

            We have been able to point to the exact version of the plugins that is introducing the slowness.

            Everything is working within the 3 second (acceptable time) with version 1.2 of kubernetes plugin and with version 1.2.1 and up we get the ultra slow steps.

            pascallap Pascal Laporte added a comment - I have setup a new local Kubernetes and have pin point the problem to the Kubernetes Plugin. We have been able to point to the exact version of the plugins that is introducing the slowness. Everything is working within the 3 second (acceptable time) with version 1.2 of kubernetes plugin and with version 1.2.1 and up we get the ultra slow steps.
            svanoort Sam Van Oort added a comment -

            That sounds like an issue with the Kubernetes plugin then, so I'm going to reassign this.

            My suspicion is that this has to do with how logging is handled in some fashion.

            svanoort Sam Van Oort added a comment - That sounds like an issue with the Kubernetes plugin then, so I'm going to reassign this. My suspicion is that this has to do with how logging is handled in some fashion.

            I have retry with latest version of Jenkins 1.124 and latest version of Kubernetes plugin 1.6.3 on a local setup I'm getting 10 seconds for the sample job.

            Taking the same installation and  if I install kubernetes plugin 1.2, I am getting a 2 seconds result.

            csanchez Do you have any idea? I don't like the idea of being stuck at a specific version for a specific plugin.

            pascallap Pascal Laporte added a comment - I have retry with latest version of Jenkins 1.124 and latest version of Kubernetes plugin 1.6.3 on a local setup I'm getting 10 seconds for the sample job. Taking the same installation and  if I install kubernetes plugin 1.2, I am getting a 2 seconds result. csanchez  Do you have any idea? I don't like the idea of being stuck at a specific version for a specific plugin.

            I get this using container

            19:28:57 [slow] Running shell script
            19:29:00 + touch A
            [Pipeline] sh
            19:29:00 [slow] Running shell script
            19:29:02 + chmod a+x A
            [Pipeline] sh
            19:29:02 [slow] Running shell script
            19:29:05 + ls -la
            19:29:05 total 8
            19:29:05 drwxr-xr-x    2 jenkins  jenkins       4096 May 25 19:29 .
            19:29:05 drwxr-xr-x    4 jenkins  jenkins       4096 May 25 19:28 ..
            19:29:05 -rwxr-xr-x    1 jenkins  jenkins          0 May 25 19:29 A
            [Pipeline] stash
            19:29:06 Stashed 1 file(s)
            [Pipeline] deleteDir
            [Pipeline] unstash
            [Pipeline] sh
            19:29:06 [slow] Running shell script
            19:29:08 + ls -la
            19:29:08 total 8
            19:29:08 drwxr-xr-x    2 jenkins  jenkins       4096 May 25 19:29 .
            19:29:08 drwxr-xr-x    4 jenkins  jenkins       4096 May 25 19:29 ..
            19:29:08 -rwxr-xr-x    1 jenkins  jenkins          0 May 25 19:29 A
            

            and this without

            19:30:47 [slow] Running shell script
            19:30:47 + touch A
            [Pipeline] sh
            19:30:47 [slow] Running shell script
            19:30:48 + chmod a+x A
            [Pipeline] sh
            19:30:48 [slow] Running shell script
            19:30:48 + ls -la
            19:30:48 total 8
            19:30:48 drwxr-xr-x    2 jenkins  jenkins       4096 May 25 19:30 .
            19:30:48 drwxr-xr-x    4 jenkins  jenkins       4096 May 25 19:30 ..
            19:30:48 -rwxr-xr-x    1 jenkins  jenkins          0 May 25 19:30 A
            [Pipeline] stash
            19:30:48 Stashed 1 file(s)
            [Pipeline] deleteDir
            [Pipeline] unstash
            [Pipeline] sh
            19:30:49 [slow] Running shell script
            19:30:49 + ls -la
            19:30:49 total 8
            19:30:49 drwxr-xr-x    2 jenkins  jenkins       4096 May 25 19:30 .
            19:30:49 drwxr-xr-x    4 jenkins  jenkins       4096 May 25 19:30 ..
            19:30:49 -rwxr-xr-x    1 jenkins  jenkins          0 May 25 19:30 A
            

            are you using container in your 1.2 k8s ?

            obviously there is an overhead when running container because it runs through websockets and the kubernetes api
            Were you able to pinpoint the issue between 1.2 and 1.2.1?
            https://github.com/jenkinsci/kubernetes-plugin/blob/master/CHANGELOG.md

            csanchez Carlos Sanchez added a comment - I get this using container 19:28:57 [slow] Running shell script 19:29:00 + touch A [Pipeline] sh 19:29:00 [slow] Running shell script 19:29:02 + chmod a+x A [Pipeline] sh 19:29:02 [slow] Running shell script 19:29:05 + ls -la 19:29:05 total 8 19:29:05 drwxr-xr-x 2 jenkins jenkins 4096 May 25 19:29 . 19:29:05 drwxr-xr-x 4 jenkins jenkins 4096 May 25 19:28 .. 19:29:05 -rwxr-xr-x 1 jenkins jenkins 0 May 25 19:29 A [Pipeline] stash 19:29:06 Stashed 1 file(s) [Pipeline] deleteDir [Pipeline] unstash [Pipeline] sh 19:29:06 [slow] Running shell script 19:29:08 + ls -la 19:29:08 total 8 19:29:08 drwxr-xr-x 2 jenkins jenkins 4096 May 25 19:29 . 19:29:08 drwxr-xr-x 4 jenkins jenkins 4096 May 25 19:29 .. 19:29:08 -rwxr-xr-x 1 jenkins jenkins 0 May 25 19:29 A and this without 19:30:47 [slow] Running shell script 19:30:47 + touch A [Pipeline] sh 19:30:47 [slow] Running shell script 19:30:48 + chmod a+x A [Pipeline] sh 19:30:48 [slow] Running shell script 19:30:48 + ls -la 19:30:48 total 8 19:30:48 drwxr-xr-x 2 jenkins jenkins 4096 May 25 19:30 . 19:30:48 drwxr-xr-x 4 jenkins jenkins 4096 May 25 19:30 .. 19:30:48 -rwxr-xr-x 1 jenkins jenkins 0 May 25 19:30 A [Pipeline] stash 19:30:48 Stashed 1 file(s) [Pipeline] deleteDir [Pipeline] unstash [Pipeline] sh 19:30:49 [slow] Running shell script 19:30:49 + ls -la 19:30:49 total 8 19:30:49 drwxr-xr-x 2 jenkins jenkins 4096 May 25 19:30 . 19:30:49 drwxr-xr-x 4 jenkins jenkins 4096 May 25 19:30 .. 19:30:49 -rwxr-xr-x 1 jenkins jenkins 0 May 25 19:30 A are you using container in your 1.2 k8s ? obviously there is an overhead when running container because it runs through websockets and the kubernetes api Were you able to pinpoint the issue between 1.2 and 1.2.1? https://github.com/jenkinsci/kubernetes-plugin/blob/master/CHANGELOG.md
            pascallap Pascal Laporte added a comment - - edited

            Version 1.2.1 is probably the worst: we are getting a 18 seconds score.

            We are running every test within containers. (Running on master or ssh slaves is super fast, >1sec)

            I have added a visual of the stage timing 

            pascallap Pascal Laporte added a comment - - edited Version 1.2.1 is probably the worst: we are getting a 18 seconds score. We are running every test within containers. (Running on master or ssh slaves is super fast, >1sec) I have added a visual of the stage timing 

            I mean if you use the container step because in your example is not needed as things run by default in the jnlp container

            csanchez Carlos Sanchez added a comment - I mean if you use the container step because in your example is not needed as things run by default in the jnlp container

            This is only a generic test easy to reproduce.

            In our real pipeline we use multiple pod/node and containers. Checkout is generally done in the jnlp container, and all the other stuff is done within the others containers.
            Still, if we want to execute steps within a container (other than jnlp) we will have to invoque the container steps.

            pascallap Pascal Laporte added a comment - This is only a generic test easy to reproduce. In our real pipeline we use multiple pod/node and containers. Checkout is generally done in the jnlp container, and all the other stuff is done within the others containers. Still, if we want to execute steps within a container (other than jnlp) we will have to invoque the container steps.

            Ok, I can pinpoint the issue with this pipeline

            def label = "slow-${UUID.randomUUID().toString()}"
            podTemplate(label: label) {
            
                timestamps {
                    node(label){   // <-- This is a K8s pod
                        stage('60 calls outside container'){
                            for (i = 0; i <60; i++) {
                                sh "sleep 1"
                            }
                        }
                        stage('60 calls'){
                            container("jnlp"){
                                for (i = 0; i <60; i++) {
                                    sh "sleep 1"
                                }
                            }
                        }
                        stage('1 call'){
                            container("jnlp"){
                                sh "sleep 60"
                            }
                        }
                    }
                }
            }
            
            

            In 1.2 it takes the same time to run inside container step than outside

            in the latest one it takes almost 3 times as much

            csanchez Carlos Sanchez added a comment - Ok, I can pinpoint the issue with this pipeline def label = "slow-${UUID.randomUUID().toString()}" podTemplate(label: label) { timestamps { node(label){ // <-- This is a K8s pod stage( '60 calls outside container' ){ for (i = 0; i <60; i++) { sh "sleep 1" } } stage( '60 calls' ){ container( "jnlp" ){ for (i = 0; i <60; i++) { sh "sleep 1" } } } stage( '1 call' ){ container( "jnlp" ){ sh "sleep 60" } } } } } In 1.2 it takes the same time to run inside container step than outside in the latest one it takes almost 3 times as much
            csanchez Carlos Sanchez added a comment - - edited

            In 1.3 it takes 1min 28s 2min 15s
            In 1.2.1 1min 31s 2min 55s

            So I guess something happened between 1.2 and 1.2.1
            https://github.com/jenkinsci/kubernetes-plugin/compare/kubernetes-1.2...jenkinsci:kubernetes-1.2.1

            Maybe env var management code

            csanchez Carlos Sanchez added a comment - - edited In 1.3 it takes 1min 28s 2min 15s In 1.2.1 1min 31s 2min 55s So I guess something happened between 1.2 and 1.2.1 https://github.com/jenkinsci/kubernetes-plugin/compare/kubernetes-1.2...jenkinsci:kubernetes-1.2.1 Maybe env var management code

            We are experiencing the same issue since upgrade from Kubernetes plugin version 1.1.2 to 1.6 and further.

            After downgrading the plugin version back to 1.1.2 on Jenkins 2.121.2 , the time is again 3 time faster than the newer version.

            I ran the same test csanchez ran just added another step for 1 call outside a container:

            Build#2 ran on Jenkins 2.121.2 with Kubernetes-plugin 1.10.1 and build#3 ran with Kubernetes-plugin 1.1.2. 
            Is there any solution for this issue, we would like to continue using the newer plugin version.

            varditn Vardit Natanson added a comment - We are experiencing the same issue since upgrade from Kubernetes plugin version 1.1.2 to 1.6 and further. After downgrading the plugin version back to 1.1.2 on Jenkins 2.121.2 , the time is again 3 time faster than the newer version. I ran the same test csanchez ran just added another step for 1 call outside a container: Build#2 ran on Jenkins 2.121.2 with Kubernetes-plugin 1.10.1 and build#3 ran with Kubernetes-plugin 1.1.2.  Is there any solution for this issue, we would like to continue using the newer plugin version.
            guyshaanan Guy Shaanan added a comment -

            We are having similar issues- but not with Kubernetes.

            Our setup: Jenkins Master (2.89.2) + slaves locally (on vCenter) + 1 slave on remove vCenter (within corporate network but on another continent).

            SSH plugin version 2.4.

            A dead simple job (some "mkdir" and "touch" with 'sh') takes < 1 second on local slaves but ~ 5 seconds on remote one.

            Using the pipeline native functions (like dir and writeFile) on that remote slave take < 1 second.

            node("remote") {
                timestamps {
                    cleanWs()
                        
                        sh("mkdir -p a; touch a/1.txt a/2.txt a/3.txt")
                        
                        echo "------------------------"
                        
                        sh("mkdir -p b; touch b/1.txt b/2.txt b/3.txt")
                        
                        echo "------------------------"
                        
                        sh("mkdir -p c; touch c/1.txt c/2.txt c/3.txt")
                        
                        cleanWs()
                        
                        echo "--------------------------"
                        
                        sh("""
                        mkdir -p a; touch a/1.txt a/2.txt a/3.txt
                        mkdir -p b; touch b/1.txt b/2.txt b/3.txt
                        mkdir -p c; touch c/1.txt c/2.txt c/3.txt
                        """)
                        
                        echo "--------------------------"
                        
                        sh("")
                        sh("")
                        sh("")
                        
                        echo "-------------------------"
                        echo "[start] use groovy native writeFile"
                        dir("d") {
                            writeFile(file: "d/1.txt", text: '')
                            writeFile(file: "d/2.txt", text: '')
                            writeFile(file: "d/3.txt", text: '')
                        }
                        echo "[end] use groovy native writeFile"
                        
                        echo "-------------------------"
                        
                        echo "[start] use groovy native writeFile"
                        dir("e") {
                            writeFile(file: "e/1.txt", text: '')
                            writeFile(file: "e/2.txt", text: '')
                            writeFile(file: "e/3.txt", text: '')
                        }
                        echo "[end] use groovy native writeFile"
                }
            }
            

            The execution on the remote slave:

            [Pipeline] properties
            [Pipeline] node
            Running on remote in /home/jenkins/workspace/tmp-job
            [Pipeline] {
            [Pipeline] timestamps
            [Pipeline] {
            [Pipeline] cleanWs
            08:43:31 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done
            [Pipeline] sh
            08:43:32 [tmp-job] Running shell script
            08:43:32 + mkdir -p a
            08:43:32 + touch a/1.txt a/2.txt a/3.txt
            [Pipeline] echo
            08:43:33 ------------------------
            [Pipeline] sh
            08:43:33 [tmp-job] Running shell script
            08:43:34 + mkdir -p b
            08:43:34 + touch b/1.txt b/2.txt b/3.txt
            [Pipeline] echo
            08:43:34 ------------------------
            [Pipeline] sh
            08:43:35 [tmp-job] Running shell script
            08:43:36 + mkdir -p c
            08:43:36 + touch c/1.txt c/2.txt c/3.txt
            [Pipeline] cleanWs
            08:43:36 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done
            [Pipeline] echo
            08:43:36 --------------------------
            [Pipeline] sh
            08:43:37 [tmp-job] Running shell script
            08:43:38 + mkdir -p a
            08:43:38 + touch a/1.txt a/2.txt a/3.txt
            08:43:38 + mkdir -p b
            08:43:38 + touch b/1.txt b/2.txt b/3.txt
            08:43:38 + mkdir -p c
            08:43:38 + touch c/1.txt c/2.txt c/3.txt
            [Pipeline] echo
            08:43:38 --------------------------
            [Pipeline] sh
            08:43:38 Warning: was asked to run an empty script
            08:43:39 [tmp-job] Running shell script
            [Pipeline] sh
            08:43:40 Warning: was asked to run an empty script
            08:43:41 [tmp-job] Running shell script
            [Pipeline] sh
            08:43:42 Warning: was asked to run an empty script
            08:43:43 [tmp-job] Running shell script
            [Pipeline] echo
            08:43:43 -------------------------
            [Pipeline] echo
            08:43:43 [start] use groovy native writeFile
            [Pipeline] dir
            08:43:43 Running in /home/jenkins/workspace/tmp-job/d
            [Pipeline] {
            [Pipeline] writeFile
            [Pipeline] writeFile
            [Pipeline] writeFile
            [Pipeline] }
            [Pipeline] // dir
            [Pipeline] echo
            08:43:44 [end] use groovy native writeFile
            [Pipeline] echo
            08:43:44 -------------------------
            [Pipeline] echo
            08:43:44 [start] use groovy native writeFile
            [Pipeline] dir
            08:43:44 Running in /home/jenkins/workspace/tmp-job/e
            [Pipeline] {
            [Pipeline] writeFile
            [Pipeline] writeFile
            [Pipeline] writeFile
            [Pipeline] }
            [Pipeline] // dir
            [Pipeline] echo
            08:43:44 [end] use groovy native writeFile
            [Pipeline] }
            [Pipeline] // timestamps
            [Pipeline] }
            [Pipeline] // node
            [Pipeline] End of Pipeline
            Finished: SUCCESS
            

            On larger jobs we even have overhead of 100% sometimes.

            I will be happy to provide more info if needed.

            Thanks.

            guyshaanan Guy Shaanan added a comment - We are having similar issues- but not with Kubernetes. Our setup: Jenkins Master (2.89.2) + slaves locally (on vCenter) + 1 slave on remove vCenter (within corporate network but on another continent). SSH plugin version 2.4. A dead simple job (some "mkdir" and "touch" with 'sh') takes < 1 second on local slaves but ~ 5 seconds on remote one. Using the pipeline native functions (like dir and writeFile) on that remote slave take < 1 second. node( "remote" ) { timestamps { cleanWs() sh( "mkdir -p a; touch a/1.txt a/2.txt a/3.txt" ) echo "------------------------" sh( "mkdir -p b; touch b/1.txt b/2.txt b/3.txt" ) echo "------------------------" sh( "mkdir -p c; touch c/1.txt c/2.txt c/3.txt" ) cleanWs() echo "--------------------------" sh(""" mkdir -p a; touch a/1.txt a/2.txt a/3.txt mkdir -p b; touch b/1.txt b/2.txt b/3.txt mkdir -p c; touch c/1.txt c/2.txt c/3.txt """) echo "--------------------------" sh("") sh("") sh("") echo "-------------------------" echo "[start] use groovy native writeFile" dir( "d" ) { writeFile(file: "d/1.txt" , text: '') writeFile(file: "d/2.txt" , text: '') writeFile(file: "d/3.txt" , text: '') } echo "[end] use groovy native writeFile" echo "-------------------------" echo "[start] use groovy native writeFile" dir( "e" ) { writeFile(file: "e/1.txt" , text: '') writeFile(file: "e/2.txt" , text: '') writeFile(file: "e/3.txt" , text: '') } echo "[end] use groovy native writeFile" } } The execution on the remote slave: [Pipeline] properties [Pipeline] node Running on remote in /home/jenkins/workspace/tmp-job [Pipeline] { [Pipeline] timestamps [Pipeline] { [Pipeline] cleanWs 08:43:31 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done [Pipeline] sh 08:43:32 [tmp-job] Running shell script 08:43:32 + mkdir -p a 08:43:32 + touch a/1.txt a/2.txt a/3.txt [Pipeline] echo 08:43:33 ------------------------ [Pipeline] sh 08:43:33 [tmp-job] Running shell script 08:43:34 + mkdir -p b 08:43:34 + touch b/1.txt b/2.txt b/3.txt [Pipeline] echo 08:43:34 ------------------------ [Pipeline] sh 08:43:35 [tmp-job] Running shell script 08:43:36 + mkdir -p c 08:43:36 + touch c/1.txt c/2.txt c/3.txt [Pipeline] cleanWs 08:43:36 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done [Pipeline] echo 08:43:36 -------------------------- [Pipeline] sh 08:43:37 [tmp-job] Running shell script 08:43:38 + mkdir -p a 08:43:38 + touch a/1.txt a/2.txt a/3.txt 08:43:38 + mkdir -p b 08:43:38 + touch b/1.txt b/2.txt b/3.txt 08:43:38 + mkdir -p c 08:43:38 + touch c/1.txt c/2.txt c/3.txt [Pipeline] echo 08:43:38 -------------------------- [Pipeline] sh 08:43:38 Warning: was asked to run an empty script 08:43:39 [tmp-job] Running shell script [Pipeline] sh 08:43:40 Warning: was asked to run an empty script 08:43:41 [tmp-job] Running shell script [Pipeline] sh 08:43:42 Warning: was asked to run an empty script 08:43:43 [tmp-job] Running shell script [Pipeline] echo 08:43:43 ------------------------- [Pipeline] echo 08:43:43 [start] use groovy native writeFile [Pipeline] dir 08:43:43 Running in /home/jenkins/workspace/tmp-job/d [Pipeline] { [Pipeline] writeFile [Pipeline] writeFile [Pipeline] writeFile [Pipeline] } [Pipeline] // dir [Pipeline] echo 08:43:44 [end] use groovy native writeFile [Pipeline] echo 08:43:44 ------------------------- [Pipeline] echo 08:43:44 [start] use groovy native writeFile [Pipeline] dir 08:43:44 Running in /home/jenkins/workspace/tmp-job/e [Pipeline] { [Pipeline] writeFile [Pipeline] writeFile [Pipeline] writeFile [Pipeline] } [Pipeline] // dir [Pipeline] echo 08:43:44 [end] use groovy native writeFile [Pipeline] } [Pipeline] // timestamps [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline Finished: SUCCESS On larger jobs we even have overhead of 100% sometimes. I will be happy to provide more info if needed. Thanks.

            csanchez Do you have any progress on this issue.

            We would like to upgrade our Kubernetes plugin. But, we won't be able until the performance issue with the «container» step is fix.

             

            Thanks for your help

            pascallap Pascal Laporte added a comment - csanchez Do you have any progress on this issue. We would like to upgrade our Kubernetes plugin. But, we won't be able until the performance issue with the «container» step is fix.   Thanks for your help

            We have done some investiation, it seem indeed that the environment variable injection to  shell is the culprit here.

            However, the problem seem deeper than that and involve the fabric8 kubernetes Watchexec PipeInput which is really slow.

             

            Still, If the watch channel could be kept open during every step within a container steps and send the envVars only a single time could probably improve performance quite a lot.

            pascallap Pascal Laporte added a comment - We have done some investiation, it seem indeed that the environment variable injection to  shell is the culprit here. However, the problem seem deeper than that and involve the fabric8 kubernetes Watchexec PipeInput which is really slow.   Still, If the watch channel could be kept open during every step within a container steps and send the envVars only a single time could probably improve performance quite a lot.
            mrjgreen Joseph Green added a comment -

            We are also experiencing this issue, and have spent time recreating the test pipeline to highlight the overhead of commands ran inside containers - the overhead is around 3 - 4 seconds. Our pipeline runs many shell commands and so this time adds up to be quite significant.

            As others have suggested, the regression seems to be between v1.2.0 and v1.2.1. 

            pascallap your suggestion seems sensible... do you know of any existing effort towards a solution for the Watchexec PipeInput performance?

            mrjgreen Joseph Green added a comment - We are also experiencing this issue, and have spent time recreating the test pipeline to highlight the overhead of commands ran inside containers - the overhead is around 3 - 4 seconds. Our pipeline runs many shell commands and so this time adds up to be quite significant. As others have suggested, the regression seems to be between v1.2.0 and v1.2.1.  pascallap your suggestion seems sensible... do you know of any existing effort towards a solution for the Watchexec PipeInput performance?
            pascallap Pascal Laporte added a comment - - edited

            The only references that we have found during our timeboxed investigation was:

            https://github.com/fabric8io/kubernetes-client/issues/1008

             

            Since we don't use Jenkins variable injection in pods. We are using the «env.VARIABLE» style. We have ended up with commenting https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/ContainerExecDecorator.java#L342-L364

             

            Not the finest fix, but it solve our problem for the moment.

            pascallap Pascal Laporte added a comment - - edited The only references that we have found during our timeboxed investigation was: https://github.com/fabric8io/kubernetes-client/issues/1008   Since we don't use Jenkins variable injection in pods. We are using the «env.VARIABLE» style. We have ended up with commenting https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/ContainerExecDecorator.java#L342-L364   Not the finest fix, but it solve our problem for the moment.
            csanchez Carlos Sanchez added a comment - removing the lines  https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/ContainerExecDecorator.java#L342-L364  I get the same times with container and without so that is something to look into

            the cause is all the environment variables being sent in https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/ContainerExecDecorator.java#L398

            but it doesn't matter if all of them get sent together or separately to the container, same times.

            I'm not sure if we can avoid sending them or send them once

            csanchez Carlos Sanchez added a comment - the cause is all the environment variables being sent in https://github.com/jenkinsci/kubernetes-plugin/blob/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/ContainerExecDecorator.java#L398 but it doesn't matter if all of them get sent together or separately to the container, same times. I'm not sure if we can avoid sending them or send them once

            We have already try to send a single WatchExec that combine all the env variable.

            It didn't changed much the resulting elapse on the total job.

            pascallap Pascal Laporte added a comment - We have already try to send a single WatchExec that combine all the env variable. It didn't changed much the resulting elapse on the total job.

            can't you just collapse your shell commands?

            sh """
            cmd1
            cmd2
            """ 
            csanchez Carlos Sanchez added a comment - can't you just collapse your shell commands? sh """ cmd1 cmd2 """
            mrjgreen Joseph Green added a comment - - edited

            If its writing the env variables thats causing the problem, can we try doing something like this?

            private void setupEnvironmentVariable(EnvVars vars, ExecWatch watch) throws IOException {
              StringBuilder envVars = new StringBuilder();
              for (Map.Entry<String, String> entry : vars.entrySet()) {
                //Check that key is bash compliant.
                if (entry.getKey().matches("[a-zA-Z_][a-zA-Z0-9_]*")) {
                  envVars.append(String.format(
                      "export %s='%s'%s",
                      entry.getKey(),
                      entry.getValue().replace("'", "'\\''"),
                      NEWLINE
                  ));
                }
              }
            
              watch.getInput().write(envVars.toString());
            }
            

            I've created a PR with an initial attempt at this. We need to address any failing tests as a result of this change. https://github.com/jenkinsci/kubernetes-plugin/pull/390

            mrjgreen Joseph Green added a comment - - edited If its writing the env variables thats causing the problem, can we try doing something like this? private void setupEnvironmentVariable(EnvVars vars, ExecWatch watch) throws IOException { StringBuilder envVars = new StringBuilder(); for (Map.Entry< String , String > entry : vars.entrySet()) { //Check that key is bash compliant. if (entry.getKey().matches( "[a-zA-Z_][a-zA-Z0-9_]*" )) { envVars.append( String .format( "export %s= '%s' %s" , entry.getKey(), entry.getValue().replace( " '" , "' \\''" ), NEWLINE )); } } watch.getInput().write(envVars.toString()); } I've created a PR with an initial attempt at this. We need to address any failing tests as a result of this change. https://github.com/jenkinsci/kubernetes-plugin/pull/390
            mrjgreen Joseph Green added a comment -

            Additionally, this post (also mentioned earlier in this thread) https://stackoverflow.com/questions/28617175/did-i-find-a-bug-in-java-io-pipedinputstream seems to suggest that `OutputStream.write()` can hang for up to a second and that calling `OutputStream.flush()` can help with this

            mrjgreen Joseph Green added a comment - Additionally, this post (also mentioned earlier in this thread) https://stackoverflow.com/questions/28617175/did-i-find-a-bug-in-java-io-pipedinputstream seems to suggest that `OutputStream.write()` can hang for up to a second and that calling `OutputStream.flush()` can help with this

            check above comments, writing one string or multiple ones takes the same amount of time

            csanchez Carlos Sanchez added a comment - check above comments, writing one string or multiple ones takes the same amount of time
            mrjgreen Joseph Green added a comment -

            Yeah I saw those comments. I will test the example PR later and post my findings. I just wanted to make sure we have some concrete record of this having been tested and discussed.

            It seems that there is an issue in this area (as you found that commenting out this block sped up the pipeline), and this seemed like the natural place to start the investigation. If this proves to make no difference, I will look into other causes.

            Could the issues discussed in this SO post be a reasonable place too look next? https://stackoverflow.com/questions/28617175/did-i-find-a-bug-in-java-io-pipedinputstream.

            mrjgreen Joseph Green added a comment - Yeah I saw those comments. I will test the example PR later and post my findings. I just wanted to make sure we have some concrete record of this having been tested and discussed. It seems that there is an issue in this area (as you found that commenting out this block sped up the pipeline), and this seemed like the natural place to start the investigation. If this proves to make no difference, I will look into other causes. Could the issues discussed in this SO post be a reasonable place too look next? https://stackoverflow.com/questions/28617175/did-i-find-a-bug-in-java-io-pipedinputstream .
            mrjgreen Joseph Green added a comment -

            I've tested combining all the env vars into a single command and that makes no difference.

            My test command was this:

            def label = "slow-${UUID.randomUUID().toString()}"
            
            podTemplate(label: label) {
              node(label){ 
                  stage('20 calls outside container'){
                      for (i = 0; i <20; i++) {
                          sh "date"
                      }
                  }
                  stage('20 calls inside container'){
                      container("jnlp"){
                          for (i = 0; i <20; i++) {
                              sh "date"
                          }
                      }
                  }
              }
            }
            

            Results were pretty consistently:

            Outside container: ~12 seconds
            Inside container: ~1min 10 seconds

            Individual sh calls for the `date` command were:
            Outside container: ~400ms
            Inside container: ~3 seconds

            Commenting out all the env variable setting only reduced the individual container time to 1s... This seems to tally with the stack overflow post reporting the piped input stream bug. (https://github.com/karianna/jdk8_tl/blob/master/jdk/src/share/classes/java/io/PipedInputStream.java#L274) ??

            I tried combining the env variables into a single set before sending them, which reduced the time spent inside the container to 30 seconds.

            There's a PR for that here... https://github.com/jenkinsci/kubernetes-plugin/pull/393

            mrjgreen Joseph Green added a comment - I've tested combining all the env vars into a single command and that makes no difference. My test command was this: def label = "slow-${UUID.randomUUID().toString()}" podTemplate(label: label) { node(label){ stage( '20 calls outside container' ){ for (i = 0; i <20; i++) { sh "date" } } stage( '20 calls inside container' ){ container( "jnlp" ){ for (i = 0; i <20; i++) { sh "date" } } } } } Results were pretty consistently: Outside container: ~12 seconds Inside container: ~1min 10 seconds Individual sh calls for the `date` command were: Outside container: ~400ms Inside container: ~3 seconds Commenting out all the env variable setting only reduced the individual container time to 1s... This seems to tally with the stack overflow post reporting the piped input stream bug. ( https://github.com/karianna/jdk8_tl/blob/master/jdk/src/share/classes/java/io/PipedInputStream.java#L274 ) ?? I tried combining the env variables into a single set before sending them, which reduced the time spent inside the container to 30 seconds. There's a PR for that here... https://github.com/jenkinsci/kubernetes-plugin/pull/393
            clabu609 Claes Buckwalter added a comment - - edited

            I am trying to understand what you guys think is the root cause of the issue. mrjgreen are you saying that commenting out the env variable setting only reduced the container time by 1s, or reduced it down to 1s?

            I think my team is facing the same issue. When comparing an almost identical Pipeline running on Docker Swarm vs Kubernetes, we find that in our setup it consistently takes 1s on Docker Swarm and 37s on Kubernetes. Each sh build step takes about 12s. 

            We are using:

            • Jenkins 2.138.4
            • Kubernetes plugin 1.13.5
            • Kubernetes 1.11.2
            • Docker plugin 1.1.5

            Kubernetes build (uses JNLP agent):

            pipeline {
              agent {
                kubernetes {
                  label "${Utils.getTimestamp()}"
                  inheritFrom 'k8s-build'
                  containerTemplate {
                    name 'build'
                    image "aado-docker-releases.${ARTIFACTORY_URL}/build_centos7:latest"
                    alwaysPullImage true
                    workingDir '/home/jenkins'
                    ttyEnabled true
                    command 'cat'
                    args ''
                  }
                }
              }  
              options {
                timestamps()
                disableConcurrentBuilds()
                buildDiscarder(logRotator(daysToKeepStr: '1', artifactDaysToKeepStr: '2'))
                timeout(time: 5, unit: 'MINUTES')
              }
              stages {
                stage('shell steps') {
                  steps{
                    container('build') {
                      sh 'date --iso-8601=ns && pwd'
                      sh "date --iso-8601=ns && whoami"
                      sh '''
                          date --iso-8601=ns
                          pwd
                          whoami
                          date --iso-8601=ns
                          '''
                    }
                  }
                }
              }
            }
            

            [Pipeline] {
            [Pipeline] container
            [Pipeline] {
            [Pipeline] timestamps
            [Pipeline] {
            [Pipeline] timeout
            21:11:03 Timeout set to expire in 5 min 0 sec
            [Pipeline] {
            [Pipeline] stage
            [Pipeline] { (shell steps)
            [Pipeline] container
            [Pipeline] {
            [Pipeline] sh
            21:11:16 + date --iso-8601=ns
            21:11:16 2019-02-05T21:11:16,397400542+0000
            21:11:16 + pwd
            21:11:16 /home/jenkins/workspace/CTO/DevOps/sandbox/DO-5721/test-sh-k8s
            [Pipeline] sh
            21:11:29 + date --iso-8601=ns
            21:11:29 2019-02-05T21:11:28,906771174+0000
            21:11:29 + whoami
            21:11:29 root
            [Pipeline] sh
            21:11:41 + date --iso-8601=ns
            21:11:41 2019-02-05T21:11:41,347653982+0000
            21:11:41 + pwd
            21:11:41 /home/jenkins/workspace/CTO/DevOps/sandbox/DO-5721/test-sh-k8s
            21:11:41 + whoami
            21:11:41 root
            21:11:41 + date --iso-8601=ns
            21:11:41 2019-02-05T21:11:41,350677503+0000
            [Pipeline] }
            [Pipeline] // container
            [Pipeline] }
            [Pipeline] // stage
            [Pipeline] }
            [Pipeline] // timeout
            [Pipeline] }
            [Pipeline] // timestamps
            [Pipeline] }
            [Pipeline] // container
            [Pipeline] }
            [Pipeline] // node
            [Pipeline] }
            [Pipeline] // podTemplate
            [Pipeline] End of Pipeline
            Finished: SUCCESS 

             

            Docker build (uses SSH agent):

            pipeline {
              agent { label 'build_rhel7_jdk1.7_maven3_latest_candidate' }
              options {
                timestamps()
                disableConcurrentBuilds()
                buildDiscarder(logRotator(daysToKeepStr: '1', artifactDaysToKeepStr: '2'))
                timeout(time: 5, unit: 'MINUTES')
              }
              stages {
                stage('shell steps') {
                  steps{
                      sh 'date --iso-8601=ns && pwd'
                      sh "date --iso-8601=ns && whoami"
                      sh '''
                          date --iso-8601=ns
                          pwd
                          whoami
                          date --iso-8601=ns
                          '''
                  }
                }
              }
            }
            

            [Pipeline] {
            [Pipeline] timestamps
            [Pipeline] {
            [Pipeline] timeout
            21:20:14 Timeout set to expire in 5 min 0 sec
            [Pipeline] {
            [Pipeline] stage
            [Pipeline] { (shell steps)
            [Pipeline] sh
            21:20:14 + date --iso-8601=ns
            21:20:14 2019-02-06T02:50:14,454876315+0530
            21:20:14 + pwd
            21:20:14 /home/jenkins/workspace/CTO/DevOps/sandbox/DO-5721/test-sh-docker
            [Pipeline] sh
            21:20:15 + date --iso-8601=ns
            21:20:15 2019-02-06T02:50:14,796162448+0530
            21:20:15 + whoami
            21:20:15 jenkins
            [Pipeline] sh
            21:20:15 + date --iso-8601=ns
            21:20:15 2019-02-06T02:50:15,089597643+0530
            21:20:15 + pwd
            21:20:15 /home/jenkins/workspace/CTO/DevOps/sandbox/DO-5721/test-sh-docker
            21:20:15 + whoami
            21:20:15 jenkins
            21:20:15 + date --iso-8601=ns
            21:20:15 2019-02-06T02:50:15,091610478+0530
            [Pipeline] }
            [Pipeline] // stage
            [Pipeline] }
            [Pipeline] // timeout
            [Pipeline] }
            [Pipeline] // timestamps
            [Pipeline] }
            [Pipeline] // node
            [Pipeline] End of Pipeline
            Finished: SUCCESS
            
            clabu609 Claes Buckwalter added a comment - - edited I am trying to understand what you guys think is the root cause of the issue. mrjgreen are you saying that commenting out the env variable setting only reduced the container time by 1s, or reduced it down to 1s? I think my team is facing the same issue. When comparing an almost identical Pipeline running on Docker Swarm vs Kubernetes, we find that in our setup it consistently takes 1s on Docker Swarm and 37s on Kubernetes. Each sh  build step takes about 12s.  We are using: Jenkins 2.138.4 Kubernetes plugin 1.13.5 Kubernetes 1.11.2 Docker plugin 1.1.5 Kubernetes build (uses JNLP agent): pipeline { agent { kubernetes { label "${Utils.getTimestamp()}" inheritFrom 'k8s-build' containerTemplate { name 'build' image "aado-docker-releases.${ARTIFACTORY_URL}/build_centos7:latest" alwaysPullImage true workingDir '/home/jenkins' ttyEnabled true command 'cat' args '' } } } options { timestamps() disableConcurrentBuilds() buildDiscarder(logRotator(daysToKeepStr: '1' , artifactDaysToKeepStr: '2' )) timeout(time: 5, unit: 'MINUTES' ) } stages { stage( 'shell steps' ) { steps{ container( 'build' ) { sh 'date --iso-8601=ns && pwd' sh "date --iso-8601=ns && whoami" sh ''' date --iso-8601=ns pwd whoami date --iso-8601=ns ''' } } } } } [Pipeline] { [Pipeline] container [Pipeline] { [Pipeline] timestamps [Pipeline] { [Pipeline] timeout 21:11:03 Timeout set to expire in 5 min 0 sec [Pipeline] { [Pipeline] stage [Pipeline] { (shell steps) [Pipeline] container [Pipeline] { [Pipeline] sh 21:11:16 + date --iso-8601=ns 21:11:16 2019-02-05T21:11:16,397400542+0000 21:11:16 + pwd 21:11:16 /home/jenkins/workspace/CTO/DevOps/sandbox/DO-5721/test-sh-k8s [Pipeline] sh 21:11:29 + date --iso-8601=ns 21:11:29 2019-02-05T21:11:28,906771174+0000 21:11:29 + whoami 21:11:29 root [Pipeline] sh 21:11:41 + date --iso-8601=ns 21:11:41 2019-02-05T21:11:41,347653982+0000 21:11:41 + pwd 21:11:41 /home/jenkins/workspace/CTO/DevOps/sandbox/DO-5721/test-sh-k8s 21:11:41 + whoami 21:11:41 root 21:11:41 + date --iso-8601=ns 21:11:41 2019-02-05T21:11:41,350677503+0000 [Pipeline] } [Pipeline] // container [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // timeout [Pipeline] } [Pipeline] // timestamps [Pipeline] } [Pipeline] // container [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // podTemplate [Pipeline] End of Pipeline Finished: SUCCESS   Docker build (uses SSH agent): pipeline { agent { label 'build_rhel7_jdk1.7_maven3_latest_candidate' } options { timestamps() disableConcurrentBuilds() buildDiscarder(logRotator(daysToKeepStr: '1' , artifactDaysToKeepStr: '2' )) timeout(time: 5, unit: 'MINUTES' ) } stages { stage( 'shell steps' ) { steps{ sh 'date --iso-8601=ns && pwd' sh "date --iso-8601=ns && whoami" sh ''' date --iso-8601=ns pwd whoami date --iso-8601=ns ''' } } } } [Pipeline] { [Pipeline] timestamps [Pipeline] { [Pipeline] timeout 21:20:14 Timeout set to expire in 5 min 0 sec [Pipeline] { [Pipeline] stage [Pipeline] { (shell steps) [Pipeline] sh 21:20:14 + date --iso-8601=ns 21:20:14 2019-02-06T02:50:14,454876315+0530 21:20:14 + pwd 21:20:14 /home/jenkins/workspace/CTO/DevOps/sandbox/DO-5721/test-sh-docker [Pipeline] sh 21:20:15 + date --iso-8601=ns 21:20:15 2019-02-06T02:50:14,796162448+0530 21:20:15 + whoami 21:20:15 jenkins [Pipeline] sh 21:20:15 + date --iso-8601=ns 21:20:15 2019-02-06T02:50:15,089597643+0530 21:20:15 + pwd 21:20:15 /home/jenkins/workspace/CTO/DevOps/sandbox/DO-5721/test-sh-docker 21:20:15 + whoami 21:20:15 jenkins 21:20:15 + date --iso-8601=ns 21:20:15 2019-02-06T02:50:15,091610478+0530 [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // timeout [Pipeline] } [Pipeline] // timestamps [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline Finished: SUCCESS
            fabricepipart Fabrice Pipart added a comment - - edited

            We are facing the same issue described here and we don't know exactly what to do. The issue has been discovered almost a year ago and there is still no workaround proposed. Could we, at least, raise the severity of the issue? Such a performance drop looks like a showstopper to me.

            I guess that the logger did comment the environment variable setting like mentioned earlier. I am afraid I will have to do the same. We need to use a version > 1.2 because of the security concerns. But can't use it because of the performance concerns.

            I did not have the time to dive into the technical details but could we at least propose an option to deactivate this part of the code that slows down the execution? Would it be acceptable as a PR?

            fabricepipart Fabrice Pipart added a comment - - edited We are facing the same issue described here and we don't know exactly what to do. The issue has been discovered almost a year ago and there is still no workaround proposed. Could we, at least, raise the severity of the issue? Such a performance drop looks like a showstopper to me. I guess that the logger did comment the environment variable setting like mentioned earlier. I am afraid I will have to do the same. We need to use a version > 1.2 because of the security concerns. But can't use it because of the performance concerns. I did not have the time to dive into the technical details but could we at least propose an option to deactivate this part of the code that slows down the execution? Would it be acceptable as a PR?
            clabu609 Claes Buckwalter added a comment - - edited

            mrjgreen, you wrote:

            Commenting out all the env variable setting only reduced the individual container time to 1s... This seems to tally with the stack overflow post reporting the piped input stream bug.

            I think you are saying that:

            Was it really ever faster than 1s to execute a sh step with the k8s plugin? I assume that for each sh step, the plugin has to do the equivalent of a "kubectl exec" to run the shell command in the container. What is the normal overhead of doing that?

            If we think it can be faster, then it looks like we have to fix it in the fabric8io library. This is also discussed in the pull request mentioned above. Does JenkinsX use the same fabric8io library for its k8s communication and does JenkinsX have the same performance bottleneck? Maybe jstrachan (formerly at fabric8io, now at CloudBees) can point us in the right direction for a fix? 

            clabu609 Claes Buckwalter added a comment - - edited mrjgreen , you wrote: Commenting out all the env variable setting only reduced the individual container time to 1s... This seems to tally with the stack overflow post reporting the piped input stream bug. I think you are saying that: Removing the environment variable code introduced in 1.2.1 brings the time it takes to run a trivial sh step down to 1s. I see that change was merged in December . Is it in the latest plugin release? We cannot get below 1s because of the 1s wait in java.io.PipedInputStream . Was it really ever faster than 1s to execute a sh step with the k8s plugin? I assume that for each sh step, the plugin has to do the equivalent of a "kubectl exec" to run the shell command in the container. What is the normal overhead of doing that? If we think it can be faster, then it looks like we have to fix it in the fabric8io library . This is also discussed in the pull request mentioned above . Does JenkinsX use the same fabric8io library for its k8s communication and does JenkinsX have the same performance bottleneck? Maybe jstrachan (formerly at fabric8io, now at CloudBees) can point us in the right direction for a fix? 
            pyieh Pierson Yieh added a comment -

            We have isolated the cause of the slowdown to the 1s wait in java.io.PipedOutputStream's write() method. The symptom being that whenever the process tries to write to the buffer (namely all the export environment variable statements), a number of the write() calls will block for 1+ seconds due to the buffer being full. Our solution was to, instead of the main thread writing, have the main thread delegate the write calls to asynchronous writer threads (with each one being in charge of writing an export statement to the buffer), then ensuring all the writer threads have finished at the end. This dramatically reduced our overhead time of `sh` calls from 3-4 seconds down to less than 1 second. 

            We're currently in the process of refining it and then will submit a formal PR with our changes, but if there are any comments or suggestions please let us know.

            Also a note, we noticed that the slow `sh` behavior only occurred when called within a `container` block, not when the `sh` calls were simply using the default container. However, even using the same container as the default container produced slow `sh` calls.  Example:

            pipeline {
               agent {
                  kubernetes {
                     label "pod-name"
                     defaultContainer "jnlp"
                     yaml """
                        apiVersion: v1
                        kind: Pod
                        spec:
                           containers:
                           - name: jnlp
                           ...
                     """
                  }
               }
               stages {
                  stage("Loop in Default") {
                     steps {
                        script {
                            for (i = 0; i < 10; i++) {
                              sh "which jq"
                            }
                        }
                     }
                  }
               }
               stages {
                  stage("Loop in JNLP") {
                     steps {
                        container("jnlp") {
                           script {
                              for (i = 0; i < 10; i++) {
                                 sh "which jq"
                              }
                           }
                        }
                     }
                  }
               }
            }
            
            pyieh Pierson Yieh added a comment - We have isolated the cause of the slowdown to the 1s wait in java.io.PipedOutputStream's write() method. The symptom being that whenever the process tries to write to the buffer (namely all the export environment variable statements), a number of the write() calls will block for 1+ seconds due to the buffer being full. Our solution was to, instead of the main thread writing, have the main thread delegate the write calls to asynchronous writer threads (with each one being in charge of writing an export statement to the buffer), then ensuring all the writer threads have finished at the end. This dramatically reduced our overhead time of `sh` calls from 3-4 seconds down to less than 1 second.  We're currently in the process of refining it and then will submit a formal PR with our changes, but if there are any comments or suggestions please let us know. Also a note, we noticed that the slow `sh` behavior only occurred when called within a `container` block, not when the `sh` calls were simply using the default container. However, even using the same container as the default container produced slow `sh` calls.  Example: pipeline {    agent {       kubernetes {          label "pod-name"          defaultContainer "jnlp"          yaml """             apiVersion: v1             kind: Pod             spec:                containers:                - name: jnlp                ...          """       }    }    stages {       stage( "Loop in Default" ) {          steps {             script {                 for (i = 0; i < 10; i++) {                   sh "which jq"                 }             }          }       }    }    stages {       stage( "Loop in JNLP" ) {          steps {             container( "jnlp" ) {                script {                    for (i = 0; i < 10; i++) {                      sh "which jq"                   } }             }          }       }    } }
            fabricepipart Fabrice Pipart added a comment - - edited

            Thanks a lot for those last two inputs clabu609 and pyieh, this gives me back hope. I am currently trying to see if any your hints can be turned into a PR that would solve my issues.
            pyieh Your solution sounds quite complex if multithreaded (playing with buffer sizes looks easier if as efficient). Would it mean that if I have 100 environment variables to pass, the plugin would spawn 100 threads?

            fabricepipart Fabrice Pipart added a comment - - edited Thanks a lot for those last two inputs clabu609 and pyieh , this gives me back hope. I am currently trying to see if any your hints can be turned into a PR that would solve my issues. pyieh Your solution sounds quite complex if multithreaded (playing with buffer sizes looks easier if as efficient). Would it mean that if I have 100 environment variables to pass, the plugin would spawn 100 threads?
            pyieh Pierson Yieh added a comment - - edited

            fabricepipart That is correct (100 exports = 100 threads). We considered doing some calculations on maybe clumping together smaller export statements into a single writer thread, but felt the gain wouldn't outweigh the effort to do the calculations. Realistically, most of these threads would end as soon as the writing is done, therefore not all the threads would be running at once. The kubernetes-plugin already has ExecutorService classes that would manage the threads and so we're currently exploring that.

            We also considered changing the buffer size, the 2 problems with this are:

            • The number of export statements (i.e. the amount of data you're trying to write) is going to be dependent on many factors (e.g. global environment variables set from the Jenkins Master, environment variables declared in the pipeline, etc.) and so there's not compelling argument to increase the buffer size to a given larger size if the required size can be volatile.
            • We don't know if we even have control of how large the buffer size is. We have not fully explored this option, but it seems the buffer size is being set on the Jenkins / Cloudbees side. And so to change the buffer size would require a code change on the actual Jenkins code and we wanted to limit changes to simply the kubernetes-plugin.
            pyieh Pierson Yieh added a comment - - edited fabricepipart That is correct (100 exports = 100 threads). We considered doing some calculations on maybe clumping together smaller export statements into a single writer thread, but felt the gain wouldn't outweigh the effort to do the calculations. Realistically, most of these threads would end as soon as the writing is done, therefore not all the threads would be running at once. The kubernetes-plugin already has ExecutorService classes that would manage the threads and so we're currently exploring that. We also considered changing the buffer size, the 2 problems with this are: The number of export statements (i.e. the amount of data you're trying to write) is going to be dependent on many factors (e.g. global environment variables set from the Jenkins Master, environment variables declared in the pipeline, etc.) and so there's not compelling argument to increase the buffer size to a given larger size if the required size can be volatile. We don't know if we even have control of how large the buffer size is. We have not fully explored this option, but it seems the buffer size is being set on the Jenkins / Cloudbees side. And so to change the buffer size would require a code change on the actual Jenkins code and we wanted to limit changes to simply the kubernetes-plugin.

            That is good if you keep exploring the thread solution. And you're right, the threads would leave for a very short time. But that's also more memory stress for the master. Sounds good anyway if you managed to get good results when going in that direction.

            On my side, I have been exploring the buffer size solution. It seams feasible, I just did not manage to make it work before the end of my workday. I'll try again on Monday and will keep you updated.

            fabricepipart Fabrice Pipart added a comment - That is good if you keep exploring the thread solution. And you're right, the threads would leave for a very short time. But that's also more memory stress for the master. Sounds good anyway if you managed to get good results when going in that direction. On my side, I have been exploring the buffer size solution. It seams feasible, I just did not manage to make it work before the end of my workday. I'll try again on Monday and will keep you updated.

            Good news! I managed to solve the issue with a fix. I tried to avoid using reflection but did not manage to get it working. That would probably have required too many changes in the code of the plugin.
            In order to keep the change minimal, I used two lines of reflection.

                                Object sink = FieldUtils.readField(watch.getInput(), "sink", true );
                                FieldUtils.writeField(sink, "buffer",  new byte[8192 * 1024], true);
            

            Not very sexy honestly. But my execution was 1m43 and is now 8s ...

            fabricepipart Fabrice Pipart added a comment - Good news! I managed to solve the issue with a fix. I tried to avoid using reflection but did not manage to get it working. That would probably have required too many changes in the code of the plugin. In order to keep the change minimal, I used two lines of reflection. Object sink = FieldUtils.readField(watch.getInput(), "sink" , true ); FieldUtils.writeField(sink, "buffer" , new byte [8192 * 1024], true ); Not very sexy honestly. But my execution was 1m43 and is now 8s ...
            mrjgreen Joseph Green added a comment -

            fabricepipart - This looks promising! Just for completeness, could you share more information about the fix?

            Which file are you applying this fix to? Is there a PR you could share (even if its just a Work In Progress or PoC)?

            What test are you running that took 1m43 and now takes 8s? Could you share the reference code (or link to the comment above if its one of the previous examples?)

            mrjgreen Joseph Green added a comment - fabricepipart - This looks promising! Just for completeness, could you share more information about the fix? Which file are you applying this fix to? Is there a PR you could share (even if its just a Work In Progress or PoC)? What test are you running that took 1m43 and now takes 8s? Could you share the reference code (or link to the comment above if its one of the previous examples?)
            fabricepipart Fabrice Pipart added a comment - - edited

            Here is the PR I proposed: https://github.com/jenkinsci/kubernetes-plugin/pull/425
            Feel free to comment and ask all the related questions there. I gave some details, I hope it will answer all your questions.

            fabricepipart Fabrice Pipart added a comment - - edited Here is the PR I proposed: https://github.com/jenkinsci/kubernetes-plugin/pull/425 Feel free to comment and ask all the related questions there. I gave some details, I hope it will answer all your questions.

            A proposed fix without reflection at https://github.com/jenkinsci/kubernetes-plugin/pull/427

            csanchez Carlos Sanchez added a comment - A proposed fix without reflection at https://github.com/jenkinsci/kubernetes-plugin/pull/427

            csanchez

            I'm very sad to report that it is still slow.... 

            Using the first provided example, we get a good 6~7 sec (with kubernetes-plugins (unreleased) 1.14.7 ). It is better than the 9~10 sec from previous versions, but still slow.

            And we have 2~3 sec without the env variables injection.

            I will probably open a new issue since versions of jenkins, kubernetes, and plugins have changed...

            pascallap Pascal Laporte added a comment - csanchez I'm very sad to report that it is still slow....  Using the first provided example, we get a good 6~7 sec (with kubernetes-plugins (unreleased) 1.14.7 ). It is better than the 9~10 sec from previous versions, but still slow. And we have 2~3 sec without the env variables injection. I will probably open a new issue since versions of jenkins, kubernetes, and plugins have changed...

            Check the system property mentioned in the issue and increase the buffer size with it. Then report back

            csanchez Carlos Sanchez added a comment - Check the system property mentioned in the issue and increase the buffer size with it. Then report back

            csanchezYou are a hero.

            With 4096 we are starting to see the magic operate (2~3 sec)!

            Thanks a lot!

            pascallap Pascal Laporte added a comment - csanchez You are a hero. With 4096 we are starting to see the magic operate (2~3 sec)! Thanks a lot!
            kugel Thomas M added a comment -

            pascallap csanchez This issue is marked as resolved, but your last comments don't look like it?

             

            I'm facing a similar problem, but I'm not using any kubernetes related plugin. csanchez can you please suggest how I could quickly check if the workaround helps in my case as well (don't know how I could apply a hotfix in java code in an existing Jenkins installation).

            kugel Thomas M added a comment - pascallap csanchez This issue is marked as resolved, but your last comments don't look like it?   I'm facing a similar problem, but I'm not using any kubernetes related plugin. csanchez can you please suggest how I could quickly check if the workaround helps in my case as well (don't know how I could apply a hotfix in java code in an existing Jenkins installation).

            Hi! As far as I'm concerned, this issue has been solved. You can have a look at the mentioned Pull Request(s) mentioned above if you're interested to implement a similar fix.

            fabricepipart Fabrice Pipart added a comment - Hi! As far as I'm concerned, this issue has been solved. You can have a look at the mentioned Pull Request(s) mentioned above if you're interested to implement a similar fix.
            kugel Thomas M added a comment -

            The PR applies to a specific plugin. I can't pin it down to a specific plugin.

             

            https://github.com/jenkinsci/kubernetes-plugin/pull/425/files seems to use reflection to make a general change to the java runtime. I would like to try that, but I don't know how to apply it. Do I have to develop a dummy plugin or can I otherwise hook into the runtime?

            kugel Thomas M added a comment - The PR applies to a specific plugin. I can't pin it down to a specific plugin.   https://github.com/jenkinsci/kubernetes-plugin/pull/425/files seems to use reflection to make a general change to the java runtime. I would like to try that, but I don't know how to apply it. Do I have to develop a dummy plugin or can I otherwise hook into the runtime?

            People

              csanchez Carlos Sanchez
              pascallap Pascal Laporte
              Votes:
              6 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: