Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64036

downstreamPipelineTriggerRunListener: Severe performance regression in 3.9.x

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: In Review (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Component/s: pipeline-maven-plugin
    • Labels:
      None
    • Environment:
      Jenkins 2.263
      pipelien-maven-plugin 3.9.3
      PostgreSQL 9.5.23
      Ubuntu 16.04.07 LTS
    • Similar Issues:

      Description

      We're facing a severe performance regression after upgrading the pipeline-maven-plugin from 3.8.3 to 3.9.3.
      In 3.8.3 the downstreamPipelineTriggerRunListener needed ~52.031 ms to complete.
      In 3.9.3 the downstreamPipelineTriggerRunListener needs ~24.694.245 ms (~7 hours)

      The task manager shows 100% CPU usage for a postgre processes while the listener runs.
      The only changes related to SQL-Statements between 3.8.3 and 3.9.3 have been introduced by this PR: https://github.com/jenkinsci/pipeline-maven-plugin/pull/226 (JENKINS-59500)

      The second observation i made is that the message "Skip triggering ... because it has a dependency on a pipeline that will be triggered by this build" is now printed 46 times for the same job in 3.9.3 instead of 15 times in 3.8.3.

        Attachments

        1. pipeline-maven.hpi
          469 kB
        2. logs-and-dumps_18-02-2018.zip
          191 kB
        3. logs.zip
          902 kB

          Issue Links

            Activity

            Hide
            aheritier Arnaud Héritier added a comment -

            nice catch Kevin Huber cc benoit guerin 

            Show
            aheritier Arnaud Héritier added a comment - nice catch Kevin Huber  cc benoit guerin  
            Hide
            falcon benoit guerin added a comment -

            Indeed, very nice catch, thanks. I pushed a new commit to fix the upgrade

             

            Regarding performance, the query which take 5017.142 ms on your side take 50ms on the database I restored from your dump, and 25ms with the indexes added. I though it was Postgres version (I use 11), so I launched a 9.5 database on docker and got ... 32ms.

            I come to the conclusion that your high timing are due to too much load on your database which means there is a bug in the code of this plugin, or your hardware is undersized.

             

            I read your logs (many thanks for your help and time !) and found something strange with version 3.9 :

            [withMaven] downstreamPipelineTriggerRunListener - Triggering downstream pipeline ... due to dependency on
            com.dakosy.dragon:dragon-client:jar:2021.1-SNAPSHOT(2021.1-20210202.105658-64),
            com.dakosy.dragon:dragon-client:jar:2021.1-SNAPSHOT(2021.1-SNAPSHOT),
            com.dakosy.dragon:dragon-client-cdi-se-extensions:jar:2021.1-SNAPSHOT(2021.1-20210202.105649-64),
            com.dakosy.dragon:dragon-client-cdi-se-extensions:jar:2021.1-SNAPSHOT(2021.1-SNAPSHOT),
            ...

            all artefacts are duplicated this the SNAPSHOT version as in the POM and the timestamped one as in your artifact manager (Nexus, Artifactory, ...)

            I do not have such timestamped version in my database.

             

            Can you try the 3.9 plugin with a fresh new and empty database, relaunching all your builds at least once so that they get registered (produced artifacts) ?

            Show
            falcon benoit guerin added a comment - Indeed, very nice catch, thanks. I pushed a new commit to fix the upgrade   Regarding performance, the query which take 5017.142 ms on your side take 50ms on the database I restored from your dump, and 25ms with the indexes added. I though it was Postgres version (I use 11), so I launched a 9.5 database on docker and got ... 32ms. I come to the conclusion that your high timing are due to too much load on your database which means there is a bug in the code of this plugin, or your hardware is undersized.   I read your logs (many thanks for your help and time !) and found something strange with version 3.9 : [withMaven] downstreamPipelineTriggerRunListener - Triggering downstream pipeline ... due to dependency on com.dakosy.dragon:dragon-client:jar:2021.1-SNAPSHOT(2021.1-20210202.105658-64), com.dakosy.dragon:dragon-client:jar:2021.1-SNAPSHOT(2021.1-SNAPSHOT), com.dakosy.dragon:dragon-client-cdi-se-extensions:jar:2021.1-SNAPSHOT(2021.1-20210202.105649-64), com.dakosy.dragon:dragon-client-cdi-se-extensions:jar:2021.1-SNAPSHOT(2021.1-SNAPSHOT), ... all artefacts are duplicated this the SNAPSHOT version as in the POM and the timestamped one as in your artifact manager (Nexus, Artifactory, ...) I do not have such timestamped version in my database.   Can you try the 3.9 plugin with a fresh new and empty database, relaunching all your builds at least once so that they get registered (produced artifacts) ?
            Hide
            huber Kevin Huber added a comment -

            I've already recreated the database earlier with 3.9.x an it got slower as the database startet to grow.
            Nontheless i've recreated the database again, this is what i have done:

            • Install the hpi provided in this Jira issue
            • Stopped Jenkins
            • Recreated the database using the dropdb and createdb commands
            • Started Jenkins
            • Created the indexes manually as the hpi does not contain your fix yet

            I'm now triggering some builds and will let the plugin run for a week.
            Last time it took around 4 days to trigger the issue.
            I'll keep you updated as the database grows in size.

            Show
            huber Kevin Huber added a comment - I've already recreated the database earlier with 3.9.x an it got slower as the database startet to grow. Nontheless i've recreated the database again, this is what i have done: Install the hpi provided in this Jira issue Stopped Jenkins Recreated the database using the dropdb and createdb commands Started Jenkins Created the indexes manually as the hpi does not contain your fix yet I'm now triggering some builds and will let the plugin run for a week. Last time it took around 4 days to trigger the issue. I'll keep you updated as the database grows in size.
            Hide
            falcon benoit guerin added a comment -

            > Created the indexes manually as the hpi does not contain your fix yet

            Right ... sorry, I just uploaded a new HPI to this issue

             

            Many thanks again for your tests. I am interested by a new dump of your database (started from scratch, with only the 3.9 plugin) once the issue triggers

            Could you also provide the spy log (See https://github.com/jenkinsci/pipeline-maven-plugin/blob/master/FAQ.adoc#how-do-i-capture-the-log-file-generated-by-the-jenkins-maven-event-spy) for one build triggering others, one with the 3.8 plugin and another one with the 3.9 ?

            Show
            falcon benoit guerin added a comment - > Created the indexes manually as the hpi does not contain your fix yet Right ... sorry, I just uploaded a new HPI to this issue   Many thanks again for your tests. I am interested by a new dump of your database (started from scratch, with only the 3.9 plugin) once the issue triggers Could you also provide the spy log (See https://github.com/jenkinsci/pipeline-maven-plugin/blob/master/FAQ.adoc#how-do-i-capture-the-log-file-generated-by-the-jenkins-maven-event-spy)  for one build triggering others, one with the 3.8 plugin and another one with the 3.9 ?
            Hide
            huber Kevin Huber added a comment - - edited

            9 days have been passed since i dropped the database and created a new one with the 3.10-SNAPSHOT version of the plugin.
            The trigger is now running around 7.555.251 ms (~2 hours) again.
            I've send you the sql dump as an e-mail.

            I've tried to save the spy logs through adding writeFile file: '.archive-jenkins-maven-event-spy-logs', text: '' to my pipeline, unfortunately, this caused the pipeline to hang.

            Some background to my setup:
            1 Master Instance (0 build executors)
            6 Slave Instances (1 build executor for each slave)

            Pipeline:
            Three separate maven steps for compiling, testing and deploying

            I enabled the spy log for the deploy stage but as soon as the file should have been transfered to the master the jobs hung up (i also tried to enable them in the other stages but the result was the same).
            I've attached the thread dump of the master process to this issue.
            This is the output after the job has been canceled:

            [INFO] [jenkins-event-spy] Generated /data/jenkins/workspace/on_integration_develop_current_2@tmp/withMaven0cedcb4c/maven-spy-20210218-111559-605226699638201085421.log
            [Pipeline] writeFile
            [Pipeline] }
            ERROR: [withMaven] WARNING Exception archiving Maven build logs /var/data/jenkins/workspace/on_integration_develop_current_2@tmp/withMaven0cedcb4c/maven-spy-20210218-111559-605226699638201085421.log, skip file. 
            java.lang.InterruptedException
            	at java.lang.Object.wait(Native Method)
            	at hudson.remoting.Request.call(Request.java:177)
            	at hudson.remoting.Channel.call(Channel.java:1000)
            	at hudson.FilePath.act(FilePath.java:1158)
            	at hudson.FilePath.act(FilePath.java:1147)
            	at hudson.FilePath.copyTo(FilePath.java:2478)
            	at hudson.FilePath.copyTo(FilePath.java:2433)
            	at org.jenkinsci.plugins.pipeline.maven.publishers.JenkinsMavenEventSpyLogsPublisher.process(JenkinsMavenEventSpyLogsPublisher.java:38)
            	at org.jenkinsci.plugins.pipeline.maven.MavenSpyLogProcessor.processMavenSpyLogs(MavenSpyLogProcessor.java:128)
            	at org.jenkinsci.plugins.pipeline.maven.WithMavenStepExecution2$WithMavenStepExecutionCallBack.finished(WithMavenStepExecution2.java:1097)
            	at org.jenkinsci.plugins.workflow.steps.GeneralNonBlockingStepExecution$TailCall.lambda$onSuccess$0(GeneralNonBlockingStepExecution.java:140)
            	at org.jenkinsci.plugins.workflow.steps.GeneralNonBlockingStepExecution.lambda$run$0(GeneralNonBlockingStepExecution.java:77)
            	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            	at java.lang.Thread.run(Thread.java:748)
            [Pipeline] // withMaven
            [Pipeline] }
            [Pipeline] // stage
            [Pipeline] }
            

            I've copied the log manually and attached it to the issue (logs-and-dumps_18-02-2018.zip).
            Please tell me if you really need the logs for 3.8 and if you need the logs for all three stages or just the last one.
            If you need them i'll try to figure out how i can save them manually before the workspace is wiped out after the build.

            Update

            Sorry, i used the wrong date in the name of the zip file. It should have been 2021 instead of 2018...

            Show
            huber Kevin Huber added a comment - - edited 9 days have been passed since i dropped the database and created a new one with the 3.10-SNAPSHOT version of the plugin. The trigger is now running around 7.555.251 ms (~2 hours) again. I've send you the sql dump as an e-mail. I've tried to save the spy logs through adding writeFile file: '.archive-jenkins-maven-event-spy-logs', text: '' to my pipeline, unfortunately, this caused the pipeline to hang. Some background to my setup: 1 Master Instance (0 build executors) 6 Slave Instances (1 build executor for each slave) Pipeline: Three separate maven steps for compiling, testing and deploying I enabled the spy log for the deploy stage but as soon as the file should have been transfered to the master the jobs hung up (i also tried to enable them in the other stages but the result was the same). I've attached the thread dump of the master process to this issue. This is the output after the job has been canceled: [INFO] [jenkins-event-spy] Generated /data/jenkins/workspace/on_integration_develop_current_2@tmp/withMaven0cedcb4c/maven-spy-20210218-111559-605226699638201085421.log [Pipeline] writeFile [Pipeline] } ERROR: [withMaven] WARNING Exception archiving Maven build logs /var/data/jenkins/workspace/on_integration_develop_current_2@tmp/withMaven0cedcb4c/maven-spy-20210218-111559-605226699638201085421.log, skip file. java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:177) at hudson.remoting.Channel.call(Channel.java:1000) at hudson.FilePath.act(FilePath.java:1158) at hudson.FilePath.act(FilePath.java:1147) at hudson.FilePath.copyTo(FilePath.java:2478) at hudson.FilePath.copyTo(FilePath.java:2433) at org.jenkinsci.plugins.pipeline.maven.publishers.JenkinsMavenEventSpyLogsPublisher.process(JenkinsMavenEventSpyLogsPublisher.java:38) at org.jenkinsci.plugins.pipeline.maven.MavenSpyLogProcessor.processMavenSpyLogs(MavenSpyLogProcessor.java:128) at org.jenkinsci.plugins.pipeline.maven.WithMavenStepExecution2$WithMavenStepExecutionCallBack.finished(WithMavenStepExecution2.java:1097) at org.jenkinsci.plugins.workflow.steps.GeneralNonBlockingStepExecution$TailCall.lambda$onSuccess$0(GeneralNonBlockingStepExecution.java:140) at org.jenkinsci.plugins.workflow.steps.GeneralNonBlockingStepExecution.lambda$run$0(GeneralNonBlockingStepExecution.java:77) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [Pipeline] // withMaven [Pipeline] } [Pipeline] // stage [Pipeline] } I've copied the log manually and attached it to the issue ( logs-and-dumps_18-02-2018.zip ). Please tell me if you really need the logs for 3.8 and if you need the logs for all three stages or just the last one. If you need them i'll try to figure out how i can save them manually before the workspace is wiped out after the build. Update Sorry, i used the wrong date in the name of the zip file. It should have been 2021 instead of 2018...

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              huber Kevin Huber
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated: