Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42937

Massive slowdown of pipeline execution when making massive use of parallel

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Blocker
    • Resolution: Duplicate
    • pipeline
    • None
    • Jenkins 2.50 on Windows (master)

    Description

      I use pipeline parallel feature to schedule all regression tests we use in parallel, before the weekend we decided to change the job, to use the full set of tests (roughly 650 test suites). To ensure a good test system utilization we build up a dedicated node with four executors and we create a parallel branch per test suite. This was quite well proven with smaller sets of tests, now we observe an tremendous slowdown of pipeline execution, to a point were a simple informative echo takes several minutes to be processed according to the job log.

      Looking at related systems utilization, it turns out that the master runs cpu bound on one core of four (Java all the time 25% or more). Using Jenkins monitor plugin we found all the time is consumed by WorkflowRun.copyLogs in LinearBlockHoppingScanner.java:120-123, see stack trace provided.

      org.jenkinsci.plugins.workflow.graphanalysis.LinearBlockHoppingScanner.next(LinearBlockHoppingScanner.java:120)
      org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.next(AbstractFlowScanner.java:212)
      org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.next(AbstractFlowScanner.java:94)
      org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:255)
      org.jenkinsci.plugins.workflow.graphanalysis.LinearScanner.findFirstMatch(LinearScanner.java:135)
      org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:274)
      org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.isRunning(LogActionImpl.java:153)
      org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.getLogText(LogActionImpl.java:128)
      org.jenkinsci.plugins.workflow.job.WorkflowRun.copyLogs(WorkflowRun.java:441)
      org.jenkinsci.plugins.workflow.job.WorkflowRun.access$600(WorkflowRun.java:125)
      org.jenkinsci.plugins.workflow.job.WorkflowRun$3.run(WorkflowRun.java:313)
      java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
      java.util.concurrent.FutureTask.runAndReset(Unknown Source)
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
      java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      java.lang.Thread.run(Unknown Source)

      Looking at source code it seems to loop over a list in a next method, calling on each iteration .contains(...) on another collection. Depending on underlying collections this could introduce a huge runtime complexity.

      Attachments

        Issue Links

          Activity

            manschwetus Florian Manschwetus created issue -
            manschwetus Florian Manschwetus made changes -
            Field Original Value New Value
            Description I use pipeline parallel feature to schedule all regression tests we use in parallel, before the weekend we decided to change the job, to use the full set of tests (roughly 650 test suites). To ensure a good test system utilization we build up a dedicated node with four executors and we create a parallel branch per test suite. This was quite well proven with smaller sets of tests, now we observe an tremendous slowdown of pipeline execution, to a point were a simple informative echo takes several minutes to be processed according to the job log.

            Looking at related systems utilization, it turns out that the master runs cpu bound on one core of four (Java all the time 25% or more). Using Jenkins monitor plugin we found all the time is consumed by {color:#0000bb}WorkflowRun.copyLogs{color} in [LinearBlockHoppingScanner.java:120-123|http://jenkins.cs-software.local:8080/monitoring?part=source&class=org.jenkinsci.plugins.workflow.graphanalysis.LinearBlockHoppingScanner#120], see stack trace provided.
            {noformat}
            org.jenkinsci.plugins.workflow.graphanalysis.LinearBlockHoppingScanner.next(LinearBlockHoppingScanner.java:120)
            org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.next(AbstractFlowScanner.java:212)
            org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.next(AbstractFlowScanner.java:94)
            org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:255)
            org.jenkinsci.plugins.workflow.graphanalysis.LinearScanner.findFirstMatch(LinearScanner.java:135)
            org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:274)
            org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.isRunning(LogActionImpl.java:153)
            org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.getLogText(LogActionImpl.java:128)
            org.jenkinsci.plugins.workflow.job.WorkflowRun.copyLogs(WorkflowRun.java:441)
            org.jenkinsci.plugins.workflow.job.WorkflowRun.access$600(WorkflowRun.java:125)
            org.jenkinsci.plugins.workflow.job.WorkflowRun$3.run(WorkflowRun.java:313)
            java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
            java.util.concurrent.FutureTask.runAndReset(Unknown Source)
            java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
            java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
            java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
            java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
            java.lang.Thread.run(Unknown Source){noformat}
            Looking at source code it seems to loop over a list in a next method, calling on each iteration .contains(...) on another collection. Depending on underlying collections this could introduce a huge runtime complexity.
            I use pipeline parallel feature to schedule all regression tests we use in parallel, before the weekend we decided to change the job, to use the full set of tests (roughly 650 test suites). To ensure a good test system utilization we build up a dedicated node with four executors and we create a parallel branch per test suite. This was quite well proven with smaller sets of tests, now we observe an tremendous slowdown of pipeline execution, to a point were a simple informative echo takes several minutes to be processed according to the job log.

            Looking at related systems utilization, it turns out that the master runs cpu bound on one core of four (Java all the time 25% or more). Using Jenkins monitor plugin we found all the time is consumed by {color:#0000bb}WorkflowRun.copyLogs{color} in {color:#0000bb}LinearBlockHoppingScanner.java:120-123{color}, see stack trace provided.
            {noformat}
            org.jenkinsci.plugins.workflow.graphanalysis.LinearBlockHoppingScanner.next(LinearBlockHoppingScanner.java:120)
            org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.next(AbstractFlowScanner.java:212)
            org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.next(AbstractFlowScanner.java:94)
            org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:255)
            org.jenkinsci.plugins.workflow.graphanalysis.LinearScanner.findFirstMatch(LinearScanner.java:135)
            org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:274)
            org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.isRunning(LogActionImpl.java:153)
            org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.getLogText(LogActionImpl.java:128)
            org.jenkinsci.plugins.workflow.job.WorkflowRun.copyLogs(WorkflowRun.java:441)
            org.jenkinsci.plugins.workflow.job.WorkflowRun.access$600(WorkflowRun.java:125)
            org.jenkinsci.plugins.workflow.job.WorkflowRun$3.run(WorkflowRun.java:313)
            java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
            java.util.concurrent.FutureTask.runAndReset(Unknown Source)
            java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
            java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
            java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
            java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
            java.lang.Thread.run(Unknown Source){noformat}
            Looking at source code it seems to loop over a list in a next method, calling on each iteration .contains(...) on another collection. Depending on underlying collections this could introduce a huge runtime complexity.
            svanoort Sam Van Oort made changes -
            Assignee Sam Van Oort [ svanoort ]
            svanoort Sam Van Oort made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            svanoort Sam Van Oort made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            jglick Jesse Glick made changes -
            Link This issue duplicates JENKINS-40934 [ JENKINS-40934 ]
            jglick Jesse Glick made changes -
            Resolution Duplicate [ 3 ]
            Status In Review [ 10005 ] Resolved [ 5 ]

            People

              svanoort Sam Van Oort
              manschwetus Florian Manschwetus
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: