-
Bug
-
Resolution: Fixed
-
Critical
-
HideJenkins 2.69
Pipeline 2.5
Ubuntu 16.04
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
JVM args: -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -server -XX:+AlwaysPreTouch -Djenkins.install.runSetupWizard=false -Dgroovy.use.classvalue=true -Xmx8192m -Xms8192mShowJenkins 2.69 Pipeline 2.5 Ubuntu 16.04 OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode) JVM args: -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication -XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -server -XX:+AlwaysPreTouch -Djenkins.install.runSetupWizard=false -Dgroovy.use.classvalue=true -Xmx8192m -Xms8192m
-
Powered by SuggestiMate
Execution of parallel blocks scales poorly for values of N > 100. With ~50 nodes (each with 4 executors, for a total of ~200 slots), the following pipeline job takes extraordinarily long to execute:
def stepsForParallel = [:] for (int i = 0; i < Integer.valueOf(params.SUB_JOBS); i++) { def s = "subjob_${i}" stepsForParallel[s] = { node("darwin") { echo "hello" } } } parallel stepsForParallel
SUB_JOBS Time (sec) --------------------- 100 10 200 40 300 96 400 214 500 392 600 660 700 960 800 1500 900 2220 1000 gave up...
At no point does the underlying system become taxed (CPU utilization is very low, as this is a very beefy system – 28 cores, 128GB RAM, SSDs)
CPU and Thread CPU Time Sampling (via VisualVM) are attached for reference.
- depends on
-
JENKINS-38381 [JEP-210] Optimize log handling in Pipeline and Durable Task
-
- Resolved
-
-
JENKINS-36547 Queue.Task.getFullDisplayName is a poor choice of key for LoadBalancer.CONSISTENT_HASH
-
- Resolved
-
- is duplicated by
-
JENKINS-45876 Jenkins becomes extremely slow while running a lot of tests parallel
-
- Resolved
-
- relates to
-
JENKINS-34542 Hang in ExecutorStepExecution
-
- Resolved
-
-
JENKINS-42556 PlaceholderTask.runForDisplay vulnerable to AccessDeniedException
-
- Resolved
-
-
JENKINS-38223 FlowNode.isRunning is not very useful
-
- Closed
-
-
JENKINS-26132 Executor should show the current stage the flow run is in
-
- Resolved
-
-
JENKINS-40934 LogActionImpl listener inefficient; poor performance queuing large parallel workloads
-
- Resolved
-
- links to
[JENKINS-45553] Parallel pipeline execution scales poorly
Also Queue.maintain should not call String taskDisplayName = p.task.getFullDisplayName(); unless logging is enabled!
Just to note, this is a real PITA, currently our master runs fulltime 100% CPU on 4 cores and consumes close to 4,5G RAM as the scans keep the threads busy, blocking background GC.
It runs 4 builds with 4 active parallel branches each so 16 active branches in sum.
Code changed in jenkins
User: Oleg Nenashev
Path:
core/src/main/java/hudson/model/Queue.java
http://jenkins-ci.org/commit/jenkins/378199ff846b2b5b19b56a40b43289127f74f9cc
Log:
Merge pull request #2947 from jglick/Queue-opt
JENKINS-45553 Avoid calling Task.getFullDisplayName unless and until we need to
Compare: https://github.com/jenkinsci/jenkins/compare/4df5895b7eb2...378199ff846b
Code changed in jenkins
User: Jesse Glick
Path:
core/src/main/java/jenkins/security/HMACConfidentialKey.java
http://jenkins-ci.org/commit/jenkins/564ca0b6d7f50c3b1164f3f1de8dad5c8c89a8cb
Log:
JENKINS-45553 - Cache the Mac so we do not need to constantly recreate it (#2948)
JENKINS-45553 - Cache the Mac so we do not need to constantly recreate it
It seems there are also connections to stages. Something at the end of an stage seems to kind of "reset" this exponential duration. I.e. if you would split SUB_JOBS=800 into four stages each =200 the duration would be far less.
See also JENKINS-45876
I could probably support this observation, as we reuse the parallel worker spawning subparallels construct for each test stage, I thought till now, leaving the enclosing parallel resets behavior.
If interested/relevant I could try to elaborate a bit more about our scheduling construct.
Also it looks like rolling back Supporting APIs to 2.13 reverts the false improvement in JENKINS-40934
https://github.com/jenkinsci/workflow-support-plugin/pull/38 pretty much fixed the problem for me, so no need to waste time testing the current code.
I get this messages in the log, do they matter?
Aug 01, 2017 1:32:47 PM SEVERE io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor parallelStart nestedBranches size: 520 not equal to parallelBranchEndNodes: 521 Aug 01, 2017 1:32:47 PM SEVERE io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor parallelStart nestedBranches size: 1 not equal to parallelBranchEndNodes: 2
manschwetus were they throwing that error before applying this change? If not, that would actually make me nervous jglick - the APIs that BlueOcean class depends on are pretty tightly coupled to the FlowNode APIs, and one should never see more parallel branch starts detected than ends.
Afaik yes, as I haven't installed anything beside BlueOcean since roughly a week, I just tried to downgrade pipeline supporting, but this seems to not have completed/worked as I expected, maybe I have to restart Jenkins to have the downgrade effective.
jglick, in which component and version, the will be in? As there seems to be an issue with 2.72 and you state your self, to not waste time with the first changes.
Certain fixes are indeed in core, but the most critical is in workflow-api + workflow-support; versions TBD.
Still have a hotspot related to JENKINS-26132
"AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#731]" #1011 daemon prio=5 os_prio=0 tid=0x00007f4dd0010000 nid=0x1a36 runnable [0x00007f4d6bafa000] java.lang.Thread.State: RUNNABLE at java.util.AbstractCollection.addAll(AbstractCollection.java:343) at java.util.LinkedHashSet.<init>(LinkedHashSet.java:169) at org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.setup(AbstractFlowScanner.java:132) at org.jenkinsci.plugins.workflow.graphanalysis.LinearScanner.setup(LinearScanner.java:187) at org.jenkinsci.plugins.workflow.graphanalysis.LinearBlockHoppingScanner.setup(LinearBlockHoppingScanner.java:68) at org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.setup(AbstractFlowScanner.java:172) at org.jenkinsci.plugins.workflow.graphanalysis.FlowScanningUtils.fetchEnclosingBlocks(FlowScanningUtils.java:122) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.computeEnclosingLabel(ExecutorStepExecution.java:473) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getEnclosingLabel(ExecutorStepExecution.java:465) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getDisplayName(ExecutorStepExecution.java:422) at org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask.getFullDisplayName(ExecutorStepExecution.java:438) at hudson.model.LoadBalancer$1.assignGreedily(LoadBalancer.java:115) at hudson.model.LoadBalancer$1.map(LoadBalancer.java:105) at hudson.model.LoadBalancer$2.map(LoadBalancer.java:157) at hudson.model.Queue.maintain(Queue.java:1571) at …
which would best be solved by implementing JENKINS-36547.
The code slated to be deleted in JENKINS-38381 also remains a hotspot, as previously noted:
"Running CpsFlowExecution[Owner[…]]" … java.lang.Thread.State: RUNNABLE at java.util.HashMap.putVal(HashMap.java:628) at java.util.HashMap.put(HashMap.java:611) at java.util.HashSet.add(HashSet.java:219) at java.util.AbstractCollection.addAll(AbstractCollection.java:344) at java.util.LinkedHashSet.<init>(LinkedHashSet.java:169) at org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.setup(AbstractFlowScanner.java:132) at org.jenkinsci.plugins.workflow.graphanalysis.LinearScanner.setup(LinearScanner.java:187) at org.jenkinsci.plugins.workflow.graphanalysis.LinearBlockHoppingScanner.setup(LinearBlockHoppingScanner.java:68) at org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:251) at org.jenkinsci.plugins.workflow.graphanalysis.LinearScanner.findFirstMatch(LinearScanner.java:135) at org.jenkinsci.plugins.workflow.graphanalysis.AbstractFlowScanner.findFirstMatch(AbstractFlowScanner.java:274) at org.jenkinsci.plugins.workflow.graph.FlowNode.isActive(FlowNode.java:136) at org.jenkinsci.plugins.workflow.support.actions.LogActionImpl.getLogText(LogActionImpl.java:123) at org.jenkinsci.plugins.workflow.job.WorkflowRun.copyLogs(WorkflowRun.java:485) at org.jenkinsci.plugins.workflow.job.WorkflowRun.access$600(WorkflowRun.java:134) at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:959) - locked <0x00000006ca550158> (a java.util.concurrent.atomic.AtomicBoolean) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1221) at …
jglick RE the first hotspot, what about the suggestion I made last year to reduce the cost of this call? https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/2#r59458780
I'd anticipated nearly a year ago that this would potentially be a performance problem, and think now might be a reasonable time to implement the suggestions there.
RE the second hotspot – what about reusing a single instance of a LinearScanner for logging within a single flow, maybe caching in a transient field? This reduces some of the setup work, and may play nicer with the CPU cache since the same object is getting hit often from a single thread.
I'd designed the scanners to be reusable for that sort of use case, so would be nice to see if it pays off like expected.
what about the suggestion I made last year to reduce the cost of this call
getEnclosingLabel already does some caching. But queue maintenance calls are a different matter.
This reduces some of the setup work
Not sure how—you would still be calling setup with each head. Anyway feel free to play with this kind of micro-optimization later; the patches as they stand make a clear improvement.
Is there anything blocking progress in this matter?
As this is a regression, prior to latest supporting api change our pipeline has finished in an acceptable time frame (~1 day, now it takes ~6 days!!!)
I already tried to rollback supporting-apis, but the dependencies are too tight, it will require rolling back all things around pipeline one by one.
Same question. What is blocking the progress? This fix is crucial for us atm.
Can a statement be made when this issue will be released (at least approx.)? Last release was end of march.
florian_meser The fix that was done earlier had a significant bug in it, and the attempt to work around to that caused another performance regression by undoing a fix we did earlier to prevent abusive performance if you have many nodes followed by a lot of parallel branches.
I believe jglick has arrived at a hybrid solution that will satisfy the requirements in all cases but requires some additional work (not sure on ETA, maybe he can speak to it – might end up falling to someone else). However this solution should generate far more significant performance improvements for all.
As this seems to take much longer as expected, is it possible to respin supporting apis 2.14 without , as this change has made things seriously worse?JENKINS-40934
I'm currently checking the option of going back to supporting APIs 2.13, but this means rolling back ppl groovy to 2.29 and also all dependents on ppl groovy >2.29, only fiddling out what to roll back the version to go back to is a serious issue.
Found another solution, as the change making things worse is suspected to be https://github.com/jenkinsci/workflow-support-plugin/commit/4b48f17fc8525a9d2cfb63c763d7ec5070192712 , I cloned supporting apis 2.14 and reverted this merge and build the result as 2.14.1-sth and installed it, now I can verify that reverting this change solves the regression. Of course really fixing this is the best thing to do, but when this needs more time, probably revert this change in a patch release till the real fix is ready.
It looks generally good now, manschwetus – you will probably be able to run the SNAPSHOT build off Jesse's PR branch and see a big improvement, until we've finished dotting the last few i's and crossing t's to cut the ultimate release.
Code changed in jenkins
User: Jesse Glick
Path:
pom.xml
src/main/java/org/jenkinsci/plugins/workflow/support/actions/LogActionImpl.java
http://jenkins-ci.org/commit/workflow-support-plugin/d78f301c4630ad3d9cbd408494ed1ad36e97b132
Log:
JENKINS-38223 Using FlowNode.isActive to eliminate the main overhead in JENKINS-45553.
Code changed in jenkins
User: Sam Van Oort
Path:
pom.xml
src/main/java/org/jenkinsci/plugins/workflow/support/actions/LogActionImpl.java
src/main/java/org/jenkinsci/plugins/workflow/support/visualization/table/FlowGraphTable.java
src/test/java/org/jenkinsci/plugins/workflow/support/actions/LogActionImplTest.java
http://jenkins-ci.org/commit/workflow-support-plugin/236f06ca40fc019ca4da2cadb4d0804971faa9db
Log:
Merge pull request #38 from jglick/FlowNode.isActive-JENKINS-38223
JENKINS-38223 Using FlowNode.isActive to improve JENKINS-45553
Compare: https://github.com/jenkinsci/workflow-support-plugin/compare/01f9538af15c...236f06ca40fc
manschwetus I'd recommend consuming the SNAPSHOT build of this, since it should fully resolve your issues
I think all planned fixes have been merged, though some remain unreleased.
All core fixes went into 2.72 so it will be part of LTS line 2.73 naturally.
I'm a little confused. A "clear improvement" was mentioned before. Which of the changes will reveal this "clear improvement" and are they released yet?
The core changes which are already released since 2.72? I've tried the new LTS 2.73.1 and there was no improvement so I guess there are changes in plugins that are not released yet.
Would be nice to somehow get an overview of the status like "which components are affected", "what's released so far" and "what still needs to be released".
Code changed in jenkins
User: Sam Van Oort
Path:
pom.xml
src/main/java/org/jenkinsci/plugins/workflow/flow/FlowExecution.java
src/main/java/org/jenkinsci/plugins/workflow/graph/BlockStartNode.java
src/main/java/org/jenkinsci/plugins/workflow/graph/FlowNode.java
src/main/java/org/jenkinsci/plugins/workflow/graph/GraphLookupView.java
src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java
src/main/java/org/jenkinsci/plugins/workflow/graphanalysis/FlowScanningUtils.java
src/main/java/org/jenkinsci/plugins/workflow/graphanalysis/NodeStepNamePredicate.java
src/test/java/org/jenkinsci/plugins/workflow/graph/FlowNodeTest.java
src/test/java/org/jenkinsci/plugins/workflow/graphanalysis/FlowScannerTest.java
http://jenkins-ci.org/commit/workflow-api-plugin/4ba6b42b651d9a5853a32681e9582f9649ad8fa2
Log:
Merge pull request #50 from svanoort/jenkins-27395-block-structure-lookup
JENKINS-37573 / JENKINS-45553 Provide a fast view of block structures in the flow graph
Compare: https://github.com/jenkinsci/workflow-api-plugin/compare/63e8ad0c2715...4ba6b42b651d
florian_meser You wait for the release coming over this weekend, or build and install the master branch of the plugins https://github.com/jenkinsci/workflow-api-plugin/commits/master and https://github.com/jenkinsci/workflow-support-plugin – we had an additional enhancement that was lumped along with it (plus some further optimizations) and needed to be integrated. It's ready now though and should be live shortly.
Code changed in jenkins
User: Sam Van Oort
Path:
pom.xml
src/main/java/org/jenkinsci/plugins/workflow/flow/FlowExecution.java
src/main/java/org/jenkinsci/plugins/workflow/graph/BlockStartNode.java
src/main/java/org/jenkinsci/plugins/workflow/graph/FlowNode.java
src/main/java/org/jenkinsci/plugins/workflow/graph/GraphLookupView.java
src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java
src/main/java/org/jenkinsci/plugins/workflow/graphanalysis/FlowScanningUtils.java
src/main/java/org/jenkinsci/plugins/workflow/graphanalysis/NodeStepNamePredicate.java
src/test/java/org/jenkinsci/plugins/workflow/graph/FlowNodeTest.java
src/test/java/org/jenkinsci/plugins/workflow/graphanalysis/FlowScannerTest.java
http://jenkins-ci.org/commit/workflow-api-plugin/70a05dc4d4a330b797ce370fd587fccfe74355d9
Log:
Revert "Revert "JENKINS-37573 / JENKINS-45553 Provide a fast view of block structures in the flow graph""
This reverts commit 88ffdfc69c43bd4dde21a6578b5ac466999b4fd4.
Benchmarks at ~10x as fast, by the way, more likely quite a bit more (hitting some limits on the test system).
Resolved by core + changes released in workflow-api v2.22 and workflow-support 2.15
manschwetus / florian_meser We appreciate your patience - releases are now cut and this should reflect a fairly comprehensive improvement for your case plus several related one. It requires Pipeline API Plugin v2.22 and Pipeline Supporting APIs version 2.15 to be installed to get the full benefits.
Please give them a try and let us know how they work out for you – based on our testing you should see a tremendous performance improvement from these changes!
Sep 27, 2017 8:09:24 PM io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor parallelStart SEVERE: nestedBranches size: 1 not equal to parallelBranchEndNodes: 0 Sep 27, 2017 8:09:24 PM io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor parallelStart SEVERE: nestedBranches size: 16 not equal to parallelBranchEndNodes: 15 Sep 27, 2017 8:09:24 PM io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor parallelStart SEVERE: nestedBranches size: 20 not equal to parallelBranchEndNodes: 19 Sep 27, 2017 8:09:24 PM io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor parallelStart SEVERE: nestedBranches size: 23 not equal to parallelBranchEndNodes: 22 Sep 27, 2017 8:09:24 PM io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor parallelStart SEVERE: nestedBranches size: 1 not equal to parallelBranchEndNodes: 0 Sep 27, 2017 8:09:24 PM io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor parallelStart SEVERE: nestedBranches size: 16 not equal to parallelBranchEndNodes: 15 Sep 27, 2017 8:09:24 PM io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor parallelStart SEVERE: nestedBranches size: 20 not equal to parallelBranchEndNodes: 19 Sep 27, 2017 8:09:24 PM io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor parallelStart SEVERE: nestedBranches size: 23 not equal to parallelBranchEndNodes: 22 Sep 27, 2017 8:09:24 PM io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor parallelStart SEVERE: nestedBranches size: 1 not equal to parallelBranchEndNodes: 0 Sep 27, 2017 8:09:24 PM io.jenkins.blueocean.rest.impl.pipeline.PipelineNodeGraphVisitor parallelStart SEVERE: nestedBranches size: 16 not equal to parallelBranchEndNodes: 15
Jenkins is spamming the logs with this. Jenkins 2.73.1. Pipeline API Plugin v2.22 and Pipeline Supporting APIs version 2.15. How do I fix this?
puneeth_n You'll need to open a new bug and provide the pipeline that triggers this error. It could be an issue in one of two places and there's nothing in that log to permit me to reproduce this.
Code changed in jenkins
User: Sam Van Oort
Path:
pom.xml
src/main/java/org/jenkinsci/plugins/workflow/flow/FlowExecution.java
src/main/java/org/jenkinsci/plugins/workflow/graph/BlockStartNode.java
src/main/java/org/jenkinsci/plugins/workflow/graph/FlowNode.java
src/main/java/org/jenkinsci/plugins/workflow/graph/GraphLookupView.java
src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java
src/main/java/org/jenkinsci/plugins/workflow/graphanalysis/FlowScanningUtils.java
src/main/java/org/jenkinsci/plugins/workflow/graphanalysis/NodeStepNamePredicate.java
src/test/java/org/jenkinsci/plugins/workflow/graph/FlowNodeTest.java
src/test/java/org/jenkinsci/plugins/workflow/graphanalysis/FlowScannerTest.java
http://jenkins-ci.org/commit/workflow-cps-plugin/88ffdfc69c43bd4dde21a6578b5ac466999b4fd4
Log:
Revert "JENKINS-37573 / JENKINS-45553 Provide a fast view of block structures in the flow graph"
Code changed in jenkins
User: Sam Van Oort
Path:
pom.xml
src/main/java/org/jenkinsci/plugins/workflow/flow/FlowExecution.java
src/main/java/org/jenkinsci/plugins/workflow/graph/BlockStartNode.java
src/main/java/org/jenkinsci/plugins/workflow/graph/FlowNode.java
src/main/java/org/jenkinsci/plugins/workflow/graph/GraphLookupView.java
src/main/java/org/jenkinsci/plugins/workflow/graph/StandardGraphLookupView.java
src/main/java/org/jenkinsci/plugins/workflow/graphanalysis/FlowScanningUtils.java
src/main/java/org/jenkinsci/plugins/workflow/graphanalysis/NodeStepNamePredicate.java
src/test/java/org/jenkinsci/plugins/workflow/graph/FlowNodeTest.java
src/test/java/org/jenkinsci/plugins/workflow/graphanalysis/FlowScannerTest.java
http://jenkins-ci.org/commit/workflow-cps-plugin/c0daeb5ce9ba55e6f51cb6c8db903cc5fbba324b
Log:
Merge pull request #52 from jenkinsci/revert-50-jenkins-27395-block-structure-lookup
Revert "JENKINS-37573 / JENKINS-45553 Provide a fast view of block structures in the flow graph"
Hello svanoort, like you mentioned above I just tested the new versions and there definitely is an improvement. I updated short after you wrote that comment and I'm still using those versions. We pretty much rely on this feature since our whole test infrastructure depends on deploying data on nodes for many branches so we pretty much got a 24/7 running Jenkins (-with up to 1-2k executors in queue).
Never the less the scaling can not be considered as stable. We got many tests that need ~2m and wait ~10-15min (worst case) for being processed by Jenkins. Like mentioned in https://issues.jenkins-ci.org/browse/JENKINS-45876 there seems to be kind of an quadratic or exponential correlation. That means even if there is a big improvement it gets to it's limits when crossing this edge.
In my opinion there is still room for further improvements to ensure also large jenkins environments become more effective.
florian_meser I agree completely that there is some room for further optimization of massively-parallel pipeline execution – the best place to currently follow the work and investigations is https://issues.jenkins-ci.org/browse/JENKINS-47724 now. That ticket also includes some concrete advice that may help with your scenario.
If you'd like to add some quantitative scaling observations to help identify where the bottleneck is, that might be of some assistance – I also expect the work currently in beta release from JENKINS-47170 will help a bit (reduces the per-flownode overheads associated with pipelines quite significantly – that's a small component of parallel execution).
Very likely you'll see a big improvement from the next phase of that work, https://issues.jenkins-ci.org/browse/JENKINS-38381, which was the culprit here for a lot of the nonlinear behaviors – that's slated to be my next strategic push on performance, along with some tactical fixes that may help with your scenario.
One other comment: the bottlenecks appears to be only with massive parallels in a single pipeline – if you break your job into smaller ones with fewer parallel branches in each, this overheads per-branch will be less important.
Pipeline is also never going to achieve fully linear scale-out with large numbers of executors, because only some parts of the execution can take full advantage of parallel execution – primarily the shell/batch/powershell steps that should be doing the bulk of work. Our work is primarily focused on reducing the other overheads so it can spend more time executing those steps.
Amdahl's Law in spades, basically.
svanoort I'm currently trying to implement some time measurement to get quantitative scaling observations. Currently I don't got much time to spent for that though. As far as I got something i'll let you know.
I don't know if this is offtopic but it seems that another neck breaker just came in. Therefor the question: are there any observation regarding the Meltdown/Spectre Windows7 updates topic which, again, seem to dramatic reduce the performance of our so called "massive parallels in a single pipeline"?
I'm observing a dramatic loss of performance although no changes in our Jenkins-Pipeline were made regarding this symptomatic. With KB4056894 there was definitely a patch containing Meltdown/Spectre topics. I'm quiet curious if I'm the only one who is having this kind of trouble.
florian_meser I'm not sure what the performance impact of the Meltdown/Spectre updates is on Windows - not really set up for scaling tests on Windows, but it might be related to changes in IO performance.
Please try out the advice I just added in the latest comment on https://issues.jenkins-ci.org/browse/JENKINS-47724 – this should help considerably. The last few months have been heavily focused on performance improvements to Pipeline and it should show in a big way.
Reproducible at n=1000, with one mock-slave agent with 200 executors. Seems to be a grab-bag of issues. Well-known copyLogs overhead (
JENKINS-38381off the top of my head); related LogActionImpl.isRunning; some step or another still using Guice; core Queue management overhead; HMACConfidentialKey.createMac being too slow for repeated use from ConsoleNote.encodeTo; etc.