Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
None
-
Jenkins 2.263
pipelien-maven-plugin 3.9.3
PostgreSQL 9.5.23
Ubuntu 16.04.07 LTS
-
pipeline-maven-3.11.0
Description
We're facing a severe performance regression after upgrading the pipeline-maven-plugin from 3.8.3 to 3.9.3.
In 3.8.3 the downstreamPipelineTriggerRunListener needed ~52.031 ms to complete.
In 3.9.3 the downstreamPipelineTriggerRunListener needs ~24.694.245 ms (~7 hours)
The task manager shows 100% CPU usage for a postgre processes while the listener runs.
The only changes related to SQL-Statements between 3.8.3 and 3.9.3 have been introduced by this PR: https://github.com/jenkinsci/pipeline-maven-plugin/pull/226 (JENKINS-59500)
The second observation i made is that the message "Skip triggering ... because it has a dependency on a pipeline that will be triggered by this build" is now printed 46 times for the same job in 3.9.3 instead of 15 times in 3.8.3.
Attachments
- logs.zip
- 902 kB
- pipeline-maven.hpi
- 469 kB
Issue Links
- links to
Activity
how many jobs do you have huber, whittlec
Can you share some statistics about what you have in the Tools configuration screen in term of tables sizes ?
falcon did you see any perf impact with the change you did in https://github.com/jenkinsci/pipeline-maven-plugin/pull/226 (JENKINS-59500)
(I didn't see such perf impact but I am not using the triggering across jobs thus my usecase is not relevant here)
My environment :
Jenkins more or less latest, NOT LTS
Bitbucket cloud, which 5 scanning jobs, for a total of near 400 repositories and more than 1000 branches
I did not see any performance regression with my changes. The only times my builds seems stucked, when I dig in, it was because of API rate limiting from BitBucket.
whittlec huber can you provide more informations about your environment ? is it possible to obtain a copy of your database to test queries on it ?
here are the statistics from the plugins configuration page:
PipelineMavenPluginPostgreSqlDao - PostgreSQL 9.5.24 JDBC URL: jdbc:postgresql:jenkins Table JENKINS_MASTER: 1 rows Table MAVEN_ARTIFACT: 9182 rows Table JENKINS_JOB: 326 rows Table JENKINS_BUILD: 2776 rows Table MAVEN_DEPENDENCY: 1288668 rows Table GENERATED_MAVEN_ARTIFACT: 215469 rows Table MAVEN_PARENT_PROJECT: 74455 rows Table JENKINS_BUILD_UPSTREAM_CAUSE: 597 rows Performances: find: totalDurationInMs=55967, count=1114 write: totalDurationInMs=105341, count=29330
I've gzipped the database and it's around 6.6 MB.
Since i don't think that the database contains any sensitive information, i would be happy to share it with you.
huber sorry for the delay
Here are my stats after a workday :
PipelineMavenPluginPostgreSqlDao - PostgreSQL 11.9 (Debian 11.9-0+deb10u1)
JDBC URL: jdbc:postgresql://localhost/jenkins
Table JENKINS_MASTER: 1 rows
Table MAVEN_ARTIFACT: 120078 rows
Table JENKINS_JOB: 655 rows
Table JENKINS_BUILD: 5976 rows
Table MAVEN_DEPENDENCY: 142105 rows
Table GENERATED_MAVEN_ARTIFACT: 88989 rows
Table MAVEN_PARENT_PROJECT: 16061 rows
Table JENKINS_BUILD_UPSTREAM_CAUSE: 492 rows
Performances:
find: totalDurationInMs=92813, count=536
write: totalDurationInMs=49663, count=15657
I have received your mail and your SQL dump, I will do my best to look at it this week
Hello huber
I restored your database and run queries on it and see things. Could you test this hpi for me please :
It corresponds to this PR :
Hi falcon,
i've installed the hpi you provided, but sadly the build is still terribly slow.
After about an hour without mutch progress i've reverted to 3.8.3.
The postgresql process was again at 100% CPU usage.
Here are the statistics for the (uncompleted) run:
PipelineMavenPluginPostgreSqlDao - PostgreSQL 9.5.24 JDBC URL: jdbc:postgresql:jenkins Table JENKINS_MASTER: 1 rows Table MAVEN_ARTIFACT: 9182 rows Table JENKINS_JOB: 319 rows Table JENKINS_BUILD: 2826 rows Table MAVEN_DEPENDENCY: 1296586 rows Table GENERATED_MAVEN_ARTIFACT: 219888 rows Table MAVEN_PARENT_PROJECT: 75849 rows Table JENKINS_BUILD_UPSTREAM_CAUSE: 651 rows Performances: find: totalDurationInMs=2725564, count=33 write: totalDurationInMs=60201, count=13923
The build has done 8 out of 211 (with 3.8.3) checks in this time.
Note: There have been other builds in parallel that likely affected the statistics
Update
Just for clarification.
I observed that after reverting to 3.8.3 the CPU usage of the postgresql process is also at 99% but the triggers are scanned way faster.
The first run after the revert took 970972 ms
I've only reverted the hpi, not the changes on the database introduced by the patch.
PipelineMavenPluginPostgreSqlDao - PostgreSQL 9.5.24 JDBC URL: jdbc:postgresql:jenkins Table JENKINS_MASTER: 1 rows Table MAVEN_ARTIFACT: 9182 rows Table JENKINS_JOB: 321 rows Table JENKINS_BUILD: 2838 rows Table MAVEN_DEPENDENCY: 1304505 rows Table GENERATED_MAVEN_ARTIFACT: 221027 rows Table MAVEN_PARENT_PROJECT: 76278 rows Table JENKINS_BUILD_UPSTREAM_CAUSE: 655 rows Performances: find: totalDurationInMs=1547241, count=1604 write: totalDurationInMs=59974, count=14709
Ok, could you please alter your postgresql configuration : enable the `log_min_duration_statement` parameter, setting it to 5000 (5s) and retest the 3.9.x plugin
Then, post your postgresql.log to see long queries
Can you verify if the DB migration script was properly applied on your DB huber ?
DROP INDEX IDX_MAVEN_ARTIFACT;CREATE INDEX IDX_MAVEN_ARTIFACT on MAVEN_ARTIFACT (GROUP_ID, ARTIFACT_ID, VERSION); CREATE INDEX IDX_GENERATED_ARTIFACT ON GENERATED_MAVEN_ARTIFACT(artifact_id);CREATE INDEX IDX_GENERATED_BUILD ON GENERATED_MAVEN_ARTIFACT(build_id); CREATE INDEX IDX_DEPENDENCY_ARTIFACT ON MAVEN_DEPENDENCY (artifact_id);CREATE INDEX IDX_DEPENDENCY_BUILD ON MAVEN_DEPENDENCY (build_id); CREATE INDEX IDX_PARENT_ARTIFACT ON MAVEN_PARENT_PROJECT (artifact_id);CREATE INDEX IDX_PARENT_BUILD ON MAVEN_PARENT_PROJECT (build_id); INSERT INTO VERSION(VERSION) VALUES (3);
Hey aheritier,
you are absolutely right, the indexes haven't been created.
jenkins=# SELECT tablename, indexname, indexdef FROM pg_indexes WHERE schemaname='public' ORDER BY tablename, indexname; tablename | indexname | indexdef ------------------------------+-----------------------------------+--------------------------------------------------------------------------------------------------------------- generated_maven_artifact | generated_maven_artifact_pkey | CREATE UNIQUE INDEX generated_maven_artifact_pkey ON public.generated_maven_artifact USING btree (id) jenkins_build | idx_jenkins_build | CREATE UNIQUE INDEX idx_jenkins_build ON public.jenkins_build USING btree (job_id, number) jenkins_build | jenkins_build_pkey | CREATE UNIQUE INDEX jenkins_build_pkey ON public.jenkins_build USING btree (id) jenkins_build_result | jenkins_build_result_pkey | CREATE UNIQUE INDEX jenkins_build_result_pkey ON public.jenkins_build_result USING btree (id) jenkins_build_upstream_cause | jenkins_build_upstream_cause_pkey | CREATE UNIQUE INDEX jenkins_build_upstream_cause_pkey ON public.jenkins_build_upstream_cause USING btree (id) jenkins_job | idx_jenkins_job | CREATE UNIQUE INDEX idx_jenkins_job ON public.jenkins_job USING btree (jenkins_master_id, full_name) jenkins_job | jenkins_job_pkey | CREATE UNIQUE INDEX jenkins_job_pkey ON public.jenkins_job USING btree (id) jenkins_master | idx_legacy_instance_id | CREATE UNIQUE INDEX idx_legacy_instance_id ON public.jenkins_master USING btree (legacy_instance_id) jenkins_master | jenkins_master_pkey | CREATE UNIQUE INDEX jenkins_master_pkey ON public.jenkins_master USING btree (id) maven_artifact | idx_maven_artifact | CREATE INDEX idx_maven_artifact ON public.maven_artifact USING btree (group_id, artifact_id, version, type) maven_artifact | maven_artifact_pkey | CREATE UNIQUE INDEX maven_artifact_pkey ON public.maven_artifact USING btree (id) maven_dependency | maven_dependency_pkey | CREATE UNIQUE INDEX maven_dependency_pkey ON public.maven_dependency USING btree (id) maven_parent_project | maven_parent_project_pkey | CREATE UNIQUE INDEX maven_parent_project_pkey ON public.maven_parent_project USING btree (id) (13 rows)
After executing the statements from your comments the indexes have been created:
jenkins=# SELECT tablename, indexname, indexdef FROM pg_indexes WHERE schemaname='public' ORDER BY tablename, indexname; tablename | indexname | indexdef ------------------------------+-----------------------------------+--------------------------------------------------------------------------------------------------------------- generated_maven_artifact | generated_maven_artifact_pkey | CREATE UNIQUE INDEX generated_maven_artifact_pkey ON public.generated_maven_artifact USING btree (id) generated_maven_artifact | idx_generated_artifact | CREATE INDEX idx_generated_artifact ON public.generated_maven_artifact USING btree (artifact_id) generated_maven_artifact | idx_generated_build | CREATE INDEX idx_generated_build ON public.generated_maven_artifact USING btree (build_id) jenkins_build | idx_jenkins_build | CREATE UNIQUE INDEX idx_jenkins_build ON public.jenkins_build USING btree (job_id, number) jenkins_build | jenkins_build_pkey | CREATE UNIQUE INDEX jenkins_build_pkey ON public.jenkins_build USING btree (id) jenkins_build_result | jenkins_build_result_pkey | CREATE UNIQUE INDEX jenkins_build_result_pkey ON public.jenkins_build_result USING btree (id) jenkins_build_upstream_cause | jenkins_build_upstream_cause_pkey | CREATE UNIQUE INDEX jenkins_build_upstream_cause_pkey ON public.jenkins_build_upstream_cause USING btree (id) jenkins_job | idx_jenkins_job | CREATE UNIQUE INDEX idx_jenkins_job ON public.jenkins_job USING btree (jenkins_master_id, full_name) jenkins_job | jenkins_job_pkey | CREATE UNIQUE INDEX jenkins_job_pkey ON public.jenkins_job USING btree (id) jenkins_master | idx_legacy_instance_id | CREATE UNIQUE INDEX idx_legacy_instance_id ON public.jenkins_master USING btree (legacy_instance_id) jenkins_master | jenkins_master_pkey | CREATE UNIQUE INDEX jenkins_master_pkey ON public.jenkins_master USING btree (id) maven_artifact | idx_maven_artifact | CREATE INDEX idx_maven_artifact ON public.maven_artifact USING btree (group_id, artifact_id, version) maven_artifact | maven_artifact_pkey | CREATE UNIQUE INDEX maven_artifact_pkey ON public.maven_artifact USING btree (id) maven_dependency | idx_dependency_artifact | CREATE INDEX idx_dependency_artifact ON public.maven_dependency USING btree (artifact_id) maven_dependency | idx_dependency_build | CREATE INDEX idx_dependency_build ON public.maven_dependency USING btree (build_id) maven_dependency | maven_dependency_pkey | CREATE UNIQUE INDEX maven_dependency_pkey ON public.maven_dependency USING btree (id) maven_parent_project | idx_parent_artifact | CREATE INDEX idx_parent_artifact ON public.maven_parent_project USING btree (artifact_id) maven_parent_project | idx_parent_build | CREATE INDEX idx_parent_build ON public.maven_parent_project USING btree (build_id) maven_parent_project | maven_parent_project_pkey | CREATE UNIQUE INDEX maven_parent_project_pkey ON public.maven_parent_project USING btree (id) (19 rows)
I've installed the hpi again, set log_min_duration_statement = 5000 in the postgresql conf and restartet the database and jenkins.
Here are the results after triggering another build (and letting it run for about 20 minutes (still running though)):
PipelineMavenPluginPostgreSqlDao - PostgreSQL 9.5.24 JDBC URL: jdbc:postgresql:jenkins Table JENKINS_MASTER: 1 rows Table MAVEN_ARTIFACT: 9182 rows Table JENKINS_JOB: 325 rows Table JENKINS_BUILD: 2850 rows Table MAVEN_DEPENDENCY: 1308739 rows Table GENERATED_MAVEN_ARTIFACT: 222489 rows Table MAVEN_PARENT_PROJECT: 76793 rows Table JENKINS_BUILD_UPSTREAM_CAUSE: 666 rows Performances: find: totalDurationInMs=1358906, count=161 write: totalDurationInMs=23417, count=6139
The following SQL-Statment has been logged several times (with different parameters)
2021-02-02 12:16:46 CET [438-57] jenkins@jenkins LOG: duration: 5049.369 ms execute S_12: select distinct upstream_job.full_name, upstream_build.number from JENKINS_JOB as upstream_job inner join JENKINS_BUILD as upstream_build on (upstream_job.id = upstream_build.job_id and upstream_job.last_successful_build_number = upstream_build.number) inner join GENERATED_MAVEN_ARTIFACT on (upstream_build.id = GENERATED_MAVEN_ARTIFACT.build_id and GENERATED_MAVEN_ARTIFACT.skip_downstream_triggers = false) inner join MAVEN_ARTIFACT as generated_artefact on GENERATED_MAVEN_ARTIFACT.artifact_id = generated_artefact.id inner join MAVEN_ARTIFACT as dependency_artefact on generated_artefact.group_id = dependency_artefact.group_id and generated_artefact.artifact_id = dependency_artefact.artifact_id and generated_artefact.version = dependency_artefact.version and ( generated_artefact.type = dependency_artefact.type or generated_artefact.classifier is null or dependency_artefact.classifier is null or generated_artefact.classifier = dependency_artefact.classifier ) inner join MAVEN_DEPENDENCY on (MAVEN_DEPENDENCY.artifact_id = dependency_artefact.id and MAVEN_DEPENDENCY.ignore_upstream_triggers = false) inner join JENKINS_BUILD as downstream_build on MAVEN_DEPENDENCY.build_id = downstream_build.id inner join JENKINS_JOB as downstream_job on downstream_build.job_id = downstream_job.id where downstream_job.full_name = $1 and downstream_job.jenkins_master_id = $2 and downstream_build.number = $3 and upstream_job.jenkins_master_id = $4 2021-02-02 12:16:46 CET [438-58] jenkins@jenkins DETAIL: parameters: $1 = 'dragon/integration/develop%2Fcurrent', $2 = '1', $3 = '1142', $4 = '1'
The build seems to be much faster compared to 3.9.3 but is still much slower than in 3.8.3.
I'll let the build finish to see how long it takes and give you an update when its done.
ok this was indeed the problematic query. Now I also have real parameters, I will keep investigate.
On other side, I do not understand why the index haven't been created automatically.
The build finished after 32.959.083 ms (> 9h).
Here are the final statistics for the day:
PipelineMavenPluginPostgreSqlDao - PostgreSQL 9.5.24 JDBC URL: jdbc:postgresql:jenkins Table JENKINS_MASTER: 1 rows Table MAVEN_ARTIFACT: 9464 rows Table JENKINS_JOB: 339 rows Table JENKINS_BUILD: 2955 rows Table MAVEN_DEPENDENCY: 1342403 rows Table GENERATED_MAVEN_ARTIFACT: 228959 rows Table MAVEN_PARENT_PROJECT: 79062 rows Table JENKINS_BUILD_UPSTREAM_CAUSE: 702 rows Performances: find: totalDurationInMs=228355839, count=30553 write: totalDurationInMs=562607, count=128944
I've attached the logs of the downstreamPipelineTriggerRunListener as well as the logs of the database for further analysing.
As you may see in the logs there are many jobs that get scanned several times (more often than in 3.8.3).
> On other side, I do not understand why the index haven't been created automatically.
I don't understand too. I will have to check.huber could you confirm that you restarted your instance after upgrading ? (I imagine yes but not sure what happened for now)
Hi aheritier,
yes, i restarted the instance every time i changed the plugin version.
thanks for the confirmatin huber We will have to verify what happens, it doesn't look correct to me, I was expecting to have the update done automatically
If i had to guess, it may be caused by the SQL-Update in https://github.com/jenkinsci/pipeline-maven-plugin/blob/master/jenkins-plugin/src/main/resources/sql/postgresql/01_migration.sql that incorrectly sets the version to 2 instead of 1.
INSERT INTO VERSION(VERSION) VALUES (2);
I don't known much about the internals of those SQL-Scripts but it seems like the next file should've been named 03_migration.sql.
Indeed, very nice catch, thanks. I pushed a new commit to fix the upgrade
Regarding performance, the query which take 5017.142 ms on your side take 50ms on the database I restored from your dump, and 25ms with the indexes added. I though it was Postgres version (I use 11), so I launched a 9.5 database on docker and got ... 32ms.
I come to the conclusion that your high timing are due to too much load on your database which means there is a bug in the code of this plugin, or your hardware is undersized.
I read your logs (many thanks for your help and time !) and found something strange with version 3.9 :
[withMaven] downstreamPipelineTriggerRunListener - Triggering downstream pipeline ... due to dependency on com.dakosy.dragon:dragon-client:jar:2021.1-SNAPSHOT(2021.1-20210202.105658-64), com.dakosy.dragon:dragon-client:jar:2021.1-SNAPSHOT(2021.1-SNAPSHOT), com.dakosy.dragon:dragon-client-cdi-se-extensions:jar:2021.1-SNAPSHOT(2021.1-20210202.105649-64), com.dakosy.dragon:dragon-client-cdi-se-extensions:jar:2021.1-SNAPSHOT(2021.1-SNAPSHOT), ...
all artefacts are duplicated this the SNAPSHOT version as in the POM and the timestamped one as in your artifact manager (Nexus, Artifactory, ...)
I do not have such timestamped version in my database.
Can you try the 3.9 plugin with a fresh new and empty database, relaunching all your builds at least once so that they get registered (produced artifacts) ?
I've already recreated the database earlier with 3.9.x an it got slower as the database startet to grow.
Nontheless i've recreated the database again, this is what i have done:
- Install the hpi provided in this Jira issue
- Stopped Jenkins
- Recreated the database using the dropdb and createdb commands
- Started Jenkins
- Created the indexes manually as the hpi does not contain your fix yet
I'm now triggering some builds and will let the plugin run for a week.
Last time it took around 4 days to trigger the issue.
I'll keep you updated as the database grows in size.
> Created the indexes manually as the hpi does not contain your fix yet
Right ... sorry, I just uploaded a new HPI to this issue
Many thanks again for your tests. I am interested by a new dump of your database (started from scratch, with only the 3.9 plugin) once the issue triggers
Could you also provide the spy log (See https://github.com/jenkinsci/pipeline-maven-plugin/blob/master/FAQ.adoc#how-do-i-capture-the-log-file-generated-by-the-jenkins-maven-event-spy) for one build triggering others, one with the 3.8 plugin and another one with the 3.9 ?
9 days have been passed since i dropped the database and created a new one with the 3.10-SNAPSHOT version of the plugin.
The trigger is now running around 7.555.251 ms (~2 hours) again.
I've send you the sql dump as an e-mail.
I've tried to save the spy logs through adding writeFile file: '.archive-jenkins-maven-event-spy-logs', text: '' to my pipeline, unfortunately, this caused the pipeline to hang.
Some background to my setup:
1 Master Instance (0 build executors)
6 Slave Instances (1 build executor for each slave)
Pipeline:
Three separate maven steps for compiling, testing and deploying
I enabled the spy log for the deploy stage but as soon as the file should have been transfered to the master the jobs hung up (i also tried to enable them in the other stages but the result was the same).
I've attached the thread dump of the master process to this issue.
This is the output after the job has been canceled:
[INFO] [jenkins-event-spy] Generated /data/jenkins/workspace/on_integration_develop_current_2@tmp/withMaven0cedcb4c/maven-spy-20210218-111559-605226699638201085421.log [Pipeline] writeFile [Pipeline] } ERROR: [withMaven] WARNING Exception archiving Maven build logs /var/data/jenkins/workspace/on_integration_develop_current_2@tmp/withMaven0cedcb4c/maven-spy-20210218-111559-605226699638201085421.log, skip file. java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:177) at hudson.remoting.Channel.call(Channel.java:1000) at hudson.FilePath.act(FilePath.java:1158) at hudson.FilePath.act(FilePath.java:1147) at hudson.FilePath.copyTo(FilePath.java:2478) at hudson.FilePath.copyTo(FilePath.java:2433) at org.jenkinsci.plugins.pipeline.maven.publishers.JenkinsMavenEventSpyLogsPublisher.process(JenkinsMavenEventSpyLogsPublisher.java:38) at org.jenkinsci.plugins.pipeline.maven.MavenSpyLogProcessor.processMavenSpyLogs(MavenSpyLogProcessor.java:128) at org.jenkinsci.plugins.pipeline.maven.WithMavenStepExecution2$WithMavenStepExecutionCallBack.finished(WithMavenStepExecution2.java:1097) at org.jenkinsci.plugins.workflow.steps.GeneralNonBlockingStepExecution$TailCall.lambda$onSuccess$0(GeneralNonBlockingStepExecution.java:140) at org.jenkinsci.plugins.workflow.steps.GeneralNonBlockingStepExecution.lambda$run$0(GeneralNonBlockingStepExecution.java:77) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [Pipeline] // withMaven [Pipeline] } [Pipeline] // stage [Pipeline] }
I've copied the log manually and attached it to the issue (logs-and-dumps_18-02-2018.zip).
Please tell me if you really need the logs for 3.8 and if you need the logs for all three stages or just the last one.
If you need them i'll try to figure out how i can save them manually before the workspace is wiped out after the build.
Update
Sorry, i used the wrong date in the name of the zip file. It should have been 2021 instead of 2018...
Hello, any chance that we can get a fix for this issue? We tried tuning the database on our side by creating indexes but unfortunately without success. It has severe impact on our builds. Some of them do take more than two hours to calculate all the jobs they need to trigger because they're checking the same downstream dependencies over and over again if multiple artifacts do have the same downstream dependents, and the queries are incredibly slow.
Hello everyone here,
Sorry for the -long- delay ...
I found something, but it seems too stupid to be only that, so I need feedbacks please !!!
You can find hpi here :
https://ci.jenkins.io/job/Plugins/job/pipeline-maven-plugin/job/PR-323/
The pull request has been rebased on top of master, so you will get the edge version of the plugin, with all last fixes.
I'm seeing this same behaviour. I have jobs that have finished their actual work (on other agents), but are now sat on master printing this message. CPU use is up at 100% on the master node. The jobs have been running for about 18 hours with no sign of stopping. The jobs cannot be cancelled, or at least the cancel request does nothing when made.
This issue also prevents the graceful update of the system with plugins or the next LTS release as when restarting via the web UI Jenkins will prefer to wait for all jobs to complete before restarting. It means I need to RDP onto my master node and shut Jenkins down using 'net stop'.