• Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major Major
    • kubernetes-plugin
    • Java version: JDK 11
      Jenkins version: 2.375.2
      Remoting version: 4.13

      How can I improve the checkout performance if dynamic Jenkins agent runs inside Kubernetes cluster in poor network?

      The git repo is very small, some text files with only one commit. It just takes several seconds on permanent agent, so the checkout time becomes to the issue.

       

      The 'Declarative: Checkout SCM' step is slow if I run Jenkins job in a dynamic Jenkins agent created in a local Kubernetes cluster which has poor network.

       

      Jenkins job configuration is attached.

       

      Component version:

      Jenkins version: 2.375.2

      Plugins:

      git: 5.0.0

      git-client: 4.0.0

      kubernetes: 3743.v1fa_4c724c3b_7

      ssh-agent version: 295.v9ca_a_1c7cc3a_a_

      ssh-slave version: 2.854.v7fd446b_337c9

       

      Environment:

      Jenkinsfile: https://github.com/zhan9san/test-checkout/blob/main/Jenkinsfile

      Jenkins is running on an EC2 server.

      Kubernetes cluster-1 is an EKS instance in AWS

      Kubernetes cluster-2 is a local Kubernetes cluster in Office

      The network between the Jenkins server and cluster-1 is good while Jenkins server to cluster-2 is not good

      The same pod template with only one container `jnlp` is configured in Jenkins cloud configuration for both cluster-1 and cluster-2.

      One worker node in cluster-02  is configured as a permanent Jenkins agent

      They are configured with different labels

       

      How long 'Declarative: Checkout SCM' would take in different test?

       

      Test 1: Run Jenkins job in cluster-1

      It takes about less 10 seconds

      Test 2: Run Jenkins job in cluster-2 without cached jarCache

      It takes about 6 min 10s

      Test 3: Run Jenkins job in cluster-2 with cached jarCache

      It takes about 4 min 50s

      Test 4: Run Jenkins job in permanent Jenkins agent

      It takes about 3 min first time(It may retrieve jar file to agent)

      It takes about 9 s second time

       

      How the jarCache is cached in cluster?

      There is an NFS storing the cached jar files, and a PVC is mounted in jnlp container. And `-jar-cache /home/jenkins/jarCache` is passed to `jenkins-agent` shell script. I use `inotifywait -m -r /home/jenkins/jarCache` to monitor the file change, no `tmp` jar files are created, so I think no retrieving operation occurs.

       

      Here is my conjecture.

      The time is spent on calculating the md5 value of jar file and update the last modified time of existing jar files. Would it possible to not update the last modified time of jar files to reduce the time?

      If I understand correctly, once the jar files are cached, there is no need to retrieve them again. But `remoting` still update the last modified time via `Files.setLastModifiedTime` in `FileSystemJarCache.java`

       

      Here is what I tested.

      I run `inotifywait -m -r /home/jenkins/jarCache` inside jnlp container and notice that there are much more file operations if the pipeline console log is blocked at '[Pipeline] checkout'

      /home/jenkins/jarCache/B2/ OPEN 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ATTRIB 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ CLOSE_NOWRITE,CLOSE 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ACCESS 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ACCESS 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ OPEN 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ATTRIB 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ CLOSE_NOWRITE,CLOSE 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ACCESS 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/24/ OPEN EB8D09A06BF01720885977D998797A.jar
      /home/jenkins/jarCache/24/ ATTRIB EB8D09A06BF01720885977D998797A.jar
      /home/jenkins/jarCache/24/ CLOSE_NOWRITE,CLOSE EB8D09A06BF01720885977D998797A.jar
      /home/jenkins/jarCache/24/ ACCESS EB8D09A06BF01720885977D998797A.jar
      /home/jenkins/jarCache/B2/ OPEN 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ATTRIB 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ CLOSE_NOWRITE,CLOSE 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ACCESS 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ACCESS 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ OPEN 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ATTRIB 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ CLOSE_NOWRITE,CLOSE 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ACCESS 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ACCESS 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ OPEN 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ATTRIB 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ CLOSE_NOWRITE,CLOSE 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ACCESS 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ OPEN 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ATTRIB 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ CLOSE_NOWRITE,CLOSE 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ ACCESS 674E7818836879CAD5D217139F4228.jar
      /home/jenkins/jarCache/B2/ OPEN 674E7818836879CAD5D217139F4228.jar

       

      But it's weird that if the job runs on the permanent Jenkins agent(one worker node in cluster-2), it is still much faster than it runs in cluster-2.

       

      Besides I tried to run checkout explicitly inside Jenkinsfile as below in a dynamic jenkins agent in local cluster.

      `checkout scmGit(branches: [[name: '*/main']], extensions: [], userRemoteConfigs: [[url: 'https://github.com/zhan9san/test-checkout.git']])`

      It just takes about 1 min 29 seconds(13:39:49-13:41:18)

      Running on tcpdump-kg8sf in /home/jenkins/agent/workspace/Playground/jack-test/test-agent-label
      [Pipeline] {
      [Pipeline] timestamps
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (Init)
      [Pipeline] checkout
      13:39:49  No credentials specified
      13:39:49  Cloning the remote Git repository
      13:40:44  Cloning repository https://github.com/zhan9san/test-checkout.git
      13:40:45   > git init /home/jenkins/agent/workspace/Playground/jack-test/test-agent-label # timeout=10
      13:40:46  Fetching upstream changes from https://github.com/zhan9san/test-checkout.git
      13:40:46   > git --version # timeout=10
      13:40:46   > git --version # 'git version 2.25.1'
      13:40:46   > git fetch --tags --force --progress -- https://github.com/zhan9san/test-checkout.git +refs/heads/*:refs/remotes/origin/* # timeout=10
      13:40:47   > git config remote.origin.url https://github.com/zhan9san/test-checkout.git # timeout=10
      13:40:47   > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
      13:41:05  Avoid second fetch
      13:41:05  Checking out Revision 5d4b69dd8f4e9987a0cfe4ff233a63b266b122c0 (refs/remotes/origin/main)
      13:41:05   > git rev-parse refs/remotes/origin/main^{commit} # timeout=10
      13:41:06   > git config core.sparsecheckout # timeout=10
      13:41:06   > git checkout -f 5d4b69dd8f4e9987a0cfe4ff233a63b266b122c0 # timeout=10
      13:41:17  Commit message: "Create Jenkinsfile"
      13:41:18   > git rev-list --no-walk 5d4b69dd8f4e9987a0cfe4ff233a63b266b122c0 # timeout=10
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] stage
      [Pipeline] { (Hello)
      [Pipeline] script 

       

          [JENKINS-71217] Optimize checkout performance issue

          Jack added a comment -

          Jack added a comment - Similar to https://stackoverflow.com/questions/75359712/jenkins-angent-on-k8s-is-slow-when-checkout-codes

          Jack added a comment -

          Jack added a comment - May be similar to https://issues.jenkins.io/browse/JENKINS-50429

          Jesse Glick added a comment -

          The JAR cache will normally start out empty when using a Kubernetes agent, so every node block will have to download ~30Mb of instructions. It would be possible to bake some commonly required JARs (enough to run sh, checkout with GitSCM, etc.) into the image as built by jenkinsci/docker-inbound-agent, but the problem is maintenance: every update to Jenkins core or various plugins would invalidate parts of the cache. Sharing the cache using a persistent volume would also be possible but adds some administrative issues and would be a security risk in general (depending on how your controller is configured).

          Jesse Glick added a comment - The JAR cache will normally start out empty when using a Kubernetes agent, so every node block will have to download ~30Mb of instructions. It would be possible to bake some commonly required JARs (enough to run sh , checkout with GitSCM , etc.) into the image as built by jenkinsci/docker-inbound-agent , but the problem is maintenance: every update to Jenkins core or various plugins would invalidate parts of the cache. Sharing the cache using a persistent volume would also be possible but adds some administrative issues and would be a security risk in general (depending on how your controller is configured).

          Jack added a comment - - edited

          jglick 

           

          Thanks for your attention. Do you know where is the time consumed?

           

          I have tried to use the baked jarCache folder in jnlp container, and it save about 80 seconds(Test2 vs Test3), but it still need about 4 min 50s VS only 9s on a permanent agent(Test3 VS Test4).

           

          Test 2: Run Jenkins job in cluster-2 without cached jarCache

          It takes about 6 min 10s

          Test 3: Run Jenkins job in cluster-2 with cached jarCache

          It takes about 4 min 50s

          Test 4: Run Jenkins job in permanent Jenkins agent

          It takes about 3 min first time(It may retrieve jar file to agent)

          It takes about 9 s second time

           

          Jack added a comment - - edited jglick     Thanks for your attention. Do you know where is the time consumed?   I have tried to use the baked jarCache folder in jnlp container, and it save about 80 seconds(Test2 vs Test3), but it still need about 4 min 50s VS only 9s on a permanent agent(Test3 VS Test4).   Test 2: Run Jenkins job in cluster-2 without cached jarCache It takes about 6 min 10s Test 3: Run Jenkins job in cluster-2 with cached jarCache It takes about 4 min 50s Test 4: Run Jenkins job in permanent Jenkins agent It takes about 3 min first time(It may retrieve jar file to agent) It takes about 9 s second time  

          Jesse Glick added a comment -

          I do not know. Those times sound pathological. In my exployer’s cluster, with no special setup in this regard, from the time the podTemplate step starts to the time a sh step inside the resulting node is producing output is typically <20s (assuming that the agent image is in the K8s node’s cache already).

          If you run the support-core plugin you can get some basic statistics about Remoting which might be useful, but if the agent is just sitting around for minutes on end you should have plenty of time to kubectl exec $pod – jstack $pid and see what is going on.

          Jesse Glick added a comment - I do not know. Those times sound pathological. In my exployer’s cluster, with no special setup in this regard, from the time the podTemplate step starts to the time a sh step inside the resulting node is producing output is typically <20s (assuming that the agent image is in the K8s node’s cache already). If you run the support-core plugin you can get some basic statistics about Remoting which might be useful, but if the agent is just sitting around for minutes on end you should have plenty of time to kubectl exec $pod – jstack $pid and see what is going on.

            Unassigned Unassigned
            jack_zhang Jack
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: