• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical

      Execution of parallel blocks scales poorly for values of N > 100.  With ~50 nodes (each with 4 executors, for a total of ~200 slots), the following pipeline job takes extraordinarily long to execute:

       

      def stepsForParallel = [:]
      for (int i = 0; i < Integer.valueOf(params.SUB_JOBS); i++) {
        def s = "subjob_${i}" 
        stepsForParallel[s] = {
          node("darwin") {
            echo "hello"
          }
        }
      }
      parallel stepsForParallel
      

       

      SUB_JOBS   Time (sec)
      ---------------------
       100         10
       200         40
       300         96
       400        214
       500        392
       600        660
       700        960
       800       1500
       900       2220
      1000       gave up...

      At no point does the underlying system become taxed (CPU utilization is very low, as this is a very beefy system – 28 cores, 128GB RAM, SSDs)

      CPU and Thread CPU Time Sampling (via VisualVM) are attached for reference.

       

       

       

       

       

          [JENKINS-45553] Parallel pipeline execution scales poorly

          Tom Skrainar created issue -

          Jesse Glick added a comment -

          Attach a textual thread dump for analysis please; the screenshots do not tell me anything.

          Jesse Glick added a comment - Attach a textual thread dump for analysis please; the screenshots do not tell me anything.
          Jesse Glick made changes -
          Labels Original: Pipeline performance New: performance

          Jesse Glick added a comment -

          And what is “Pipeline 2.5” supposed to mean? There are a couple dozen plugins. You need to specify versions.

          Jesse Glick added a comment - And what is “Pipeline 2.5” supposed to mean? There are a couple dozen plugins. You need to specify versions.
          Tom Skrainar made changes -
          Attachment New: JENKINS-45553_20170725.tgz [ 39026 ]
          Tom Skrainar made changes -
          Attachment New: JENKINS-45553_20170725.tgz [ 39027 ]

          Tom Skrainar added a comment - - edited

          Hi Jesse.  Tarball attached containing:

           

          nodes/master/pipeline-thread-dump.txt

          nodes/master/pipeline-timings.txt

          nodes/master/thread-dump.txt

          plugins/active.txt

           

          'submit-tester' is the name of the job (Pipeline code for which is pasted in the original submission comment).

          Dump was generated while the build was in flight (about 3 minutes into the run, or roughly halfway (SUB_JOBS==500)), which was triggered immediately after a fresh resart of Jenkins with all other jobs disabled (i.e. nothing else running in the system).

           

          JENKINS-45553_20170725.tgz

           

           

           

          Tom Skrainar added a comment - - edited Hi Jesse.  Tarball attached containing:   nodes/master/pipeline-thread-dump.txt nodes/master/pipeline-timings.txt nodes/master/thread-dump.txt plugins/active.txt   'submit-tester' is the name of the job (Pipeline code for which is pasted in the original submission comment). Dump was generated while the build was in flight (about 3 minutes into the run, or roughly halfway (SUB_JOBS==500)), which was triggered immediately after a fresh resart of Jenkins with all other jobs disabled (i.e. nothing else running in the system).   JENKINS-45553_20170725.tgz      
          Tom Skrainar made changes -
          Attachment Original: JENKINS-45553_20170725.tgz [ 39026 ]

          Jesse Glick added a comment -

          nodes/master/pipeline-timings.txt shows most of the time being spent in run but the nodes/master/thread-dump.txt happens to have captured something in saveProgram which is not informative.

          Jesse Glick added a comment - nodes/master/pipeline-timings.txt shows most of the time being spent in run but the nodes/master/thread-dump.txt happens to have captured something in saveProgram which is not informative.

          Jesse Glick added a comment -

          Reproducible at n=1000, with one mock-slave agent with 200 executors. Seems to be a grab-bag of issues. Well-known copyLogs overhead (JENKINS-38381 off the top of my head); related LogActionImpl.isRunning; some step or another still using Guice; core Queue management overhead; HMACConfidentialKey.createMac being too slow for repeated use from ConsoleNote.encodeTo; etc.

          Jesse Glick added a comment - Reproducible at n=1000, with one mock-slave agent with 200 executors. Seems to be a grab-bag of issues. Well-known copyLogs overhead ( JENKINS-38381 off the top of my head); related LogActionImpl.isRunning ; some step or another still using Guice; core Queue management overhead; HMACConfidentialKey.createMac being too slow for repeated use from ConsoleNote.encodeTo ; etc.

            jglick Jesse Glick
            tskrainar Tom Skrainar
            Votes:
            4 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: