Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
Jenkins 2.69
Pipeline 2.5
Ubuntu 16.04
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
JVM args: -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -server -XX:+AlwaysPreTouch -Djenkins.install.runSetupWizard=false -Dgroovy.use.classvalue=true -Xmx8192m -Xms8192mJenkins 2.69 Pipeline 2.5 Ubuntu 16.04 OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode) JVM args: -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -server -XX:+AlwaysPreTouch -Djenkins.install.runSetupWizard=false -Dgroovy.use.classvalue=true -Xmx8192m -Xms8192m
Description
Execution of parallel blocks scales poorly for values of N > 100. With ~50 nodes (each with 4 executors, for a total of ~200 slots), the following pipeline job takes extraordinarily long to execute:
def stepsForParallel = [:] for (int i = 0; i < Integer.valueOf(params.SUB_JOBS); i++) { def s = "subjob_${i}" stepsForParallel[s] = { node("darwin") { echo "hello" } } } parallel stepsForParallel
SUB_JOBS Time (sec) --------------------- 100 10 200 40 300 96 400 214 500 392 600 660 700 960 800 1500 900 2220 1000 gave up...
At no point does the underlying system become taxed (CPU utilization is very low, as this is a very beefy system – 28 cores, 128GB RAM, SSDs)
CPU and Thread CPU Time Sampling (via VisualVM) are attached for reference.
Attachments
Issue Links
- depends on
-
JENKINS-38381 [JEP-210] Optimize log handling in Pipeline and Durable Task
-
- Resolved
-
-
JENKINS-36547 Queue.Task.getFullDisplayName is a poor choice of key for LoadBalancer.CONSISTENT_HASH
-
- Resolved
-
- is duplicated by
-
JENKINS-45876 Jenkins becomes extremely slow while running a lot of tests parallel
-
- Resolved
-
- relates to
-
JENKINS-34542 Hang in ExecutorStepExecution
-
- Resolved
-
-
JENKINS-42556 PlaceholderTask.runForDisplay vulnerable to AccessDeniedException
-
- Resolved
-
-
JENKINS-38223 FlowNode.isRunning is not very useful
-
- Closed
-
-
JENKINS-26132 Executor should show the current stage the flow run is in
-
- Resolved
-
-
JENKINS-40934 LogActionImpl listener inefficient; poor performance queuing large parallel workloads
-
- Resolved
-
- links to
Hello svanoort, like you mentioned above I just tested the new versions and there definitely is an improvement. I updated short after you wrote that comment and I'm still using those versions. We pretty much rely on this feature since our whole test infrastructure depends on deploying data on nodes for many branches so we pretty much got a 24/7 running Jenkins (-with up to 1-2k executors in queue).
Never the less the scaling can not be considered as stable. We got many tests that need ~2m and wait ~10-15min (worst case) for being processed by Jenkins. Like mentioned in https://issues.jenkins-ci.org/browse/JENKINS-45876 there seems to be kind of an quadratic or exponential correlation. That means even if there is a big improvement it gets to it's limits when crossing this edge.
In my opinion there is still room for further improvements to ensure also large jenkins environments become more effective.