Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-39661

Investigate performance of Pipeline /activity and /runs REST endpoint

    XMLWordPrintable

    Details

    • Similar Issues:
    • Epic Link:
    • Sprint:
      arctic, tasman

      Description

      The /activity and /runs endpoints have been reported to take up to 28 seconds on some Jenkins masters.

      Tracing through the code, it looks like io.jenkins.blueocean.service.embedded.rest.RunContainerImpl#get will load all the runs using hudson.model.Job#getBuilds() for the Job then applies the pagination.

      However, there is a method hudson.model.Job#getBuilds(hudson.model.Fingerprint.RangeSet) that will allow us to fetch a range of data which would be suitable for pagination.

        Attachments

          Issue Links

            Activity

            jamesdumay James Dumay created issue -
            jamesdumay James Dumay made changes -
            Field Original Value New Value
            Epic Link JENKINS-37957 [ 174099 ]
            jamesdumay James Dumay made changes -
            Rank Ranked higher
            Hide
            jamesdumay James Dumay added a comment - - edited

            I've linked the Customer Data here that is on the internal CloudBees tracker - non-employees will be able to see it. I am only referencing this here so we can keep the development of Blue Ocean out in the open and still respond to these requests.

            Show
            jamesdumay James Dumay added a comment - - edited I've linked the Customer Data here that is on the internal CloudBees tracker - non-employees will be able to see it. I am only referencing this here so we can keep the development of Blue Ocean out in the open and still respond to these requests.
            jamesdumay James Dumay made changes -
            Remote Link This issue links to "Customer data (Web Link)" [ 15039 ]
            Hide
            jamesdumay James Dumay added a comment -

            Tom FENNELLY if you feel comfortable picking this kinda thing up then you can.

            Show
            jamesdumay James Dumay added a comment - Tom FENNELLY if you feel comfortable picking this kinda thing up then you can.
            jamesdumay James Dumay made changes -
            Description The {{/activity}} and {{/runs}} endpoints have been reported to take up to 28 seconds on some Jenkins masters.

            Tracing through the code, it looks like {{io.jenkins.blueocean.service.embedded.rest.RunContainerImpl#get}} will load all the runs using {{hudson.model.Job#getBuilds()}} for the Job then apply the pagination.

            However, there is a method {{hudson.model.Job#getBuilds(hudson.model.Fingerprint.RangeSet)}} that will allow us to fetch a range of data which would be suitable for pagination.
            The {{/activity}} and {{/runs}} endpoints have been reported to take up to 28 seconds on some Jenkins masters.

            Tracing through the code, it looks like {{io.jenkins.blueocean.service.embedded.rest.RunContainerImpl#get}} will *load all the runs* using {{hudson.model.Job#getBuilds()}} for the Job *then apply the pagination*.

            However, there is a method {{hudson.model.Job#getBuilds(hudson.model.Fingerprint.RangeSet)}} that will allow us to fetch a range of data which would be suitable for pagination.
            jamesdumay James Dumay made changes -
            Description The {{/activity}} and {{/runs}} endpoints have been reported to take up to 28 seconds on some Jenkins masters.

            Tracing through the code, it looks like {{io.jenkins.blueocean.service.embedded.rest.RunContainerImpl#get}} will *load all the runs* using {{hudson.model.Job#getBuilds()}} for the Job *then apply the pagination*.

            However, there is a method {{hudson.model.Job#getBuilds(hudson.model.Fingerprint.RangeSet)}} that will allow us to fetch a range of data which would be suitable for pagination.
            The {{/activity}} and {{/runs}} endpoints have been reported to take up to 28 seconds on some Jenkins masters.

            Tracing through the code, it looks like {{io.jenkins.blueocean.service.embedded.rest.RunContainerImpl#get}} will *load all the runs* using {{hudson.model.Job#getBuilds()}} for the Job *then applies the pagination*.

            However, there is a method {{hudson.model.Job#getBuilds(hudson.model.Fingerprint.RangeSet)}} that will allow us to fetch a range of data which would be suitable for pagination.
            tfennelly Tom FENNELLY made changes -
            Assignee Tom FENNELLY [ tfennelly ]
            Hide
            tfennelly Tom FENNELLY added a comment -

            James Dumay Assigned to me now. Thanks.

            Show
            tfennelly Tom FENNELLY added a comment - James Dumay Assigned to me now. Thanks.
            tfennelly Tom FENNELLY made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            jamesdumay James Dumay added a comment -

            Tom FENNELLY I think Vivek Pandey had some ideas about this one - maybe sync up?

            Show
            jamesdumay James Dumay added a comment - Tom FENNELLY I think Vivek Pandey had some ideas about this one - maybe sync up?
            Hide
            michaelneale Michael Neale added a comment -

            Tom FENNELLY James Dumay should this be assigned to vivek if he is working on making the api correctly to pagination or is Tom tackling this?

            Show
            michaelneale Michael Neale added a comment - Tom FENNELLY James Dumay should this be assigned to vivek if he is working on making the api correctly to pagination or is Tom tackling this?
            michaelneale Michael Neale made changes -
            Sprint arctic [ 131 ]
            michaelneale Michael Neale made changes -
            Status In Progress [ 3 ] Open [ 1 ]
            michaelneale Michael Neale made changes -
            Assignee Tom FENNELLY [ tfennelly ] Vivek Pandey [ vivek ]
            Hide
            michaelneale Michael Neale added a comment -

            James Dumay giving this to vivek - needs a bit of investigation. I put it in this sprint but up to you how important you think it is.

            Show
            michaelneale Michael Neale added a comment - James Dumay giving this to vivek - needs a bit of investigation. I put it in this sprint but up to you how important you think it is.
            Hide
            tfennelly Tom FENNELLY added a comment - - edited

            Yeah I think Vivek Pandey is going to need to look at this as it seems, to me, like the pagination needs to change because at the moment it is expecting the full resultset to be created by the function bound to the endpoint (e.g. AbstractPipelineImpl.getActivities()) and then the filtering (start, limit etc) happens further up the call stack in Stapler (I think). The fact that AbstractPipelineImpl.getActivities() needs to read all runs on a Job in order for the pagination to work back in stapler is where this is choking imo i.e. you want the first 30 runs but are reading all 5000 runs into a collection and only then the filtering happens. Filtering earlier in the process would work better I think.

            getBuilds(hudson.model.Fingerprint.RangeSet) would not really help here as far as I can see.

            If changing the pagination is a big problem then one possible quick solution here might be to not use the Stapler pagination and, instead, to do the pagination "manually' for getting activity (where there can be large resultsets).

            Show
            tfennelly Tom FENNELLY added a comment - - edited Yeah I think Vivek Pandey is going to need to look at this as it seems, to me, like the pagination needs to change because at the moment it is expecting the full resultset to be created by the function bound to the endpoint (e.g. AbstractPipelineImpl.getActivities() ) and then the filtering ( start , limit etc) happens further up the call stack in Stapler (I think). The fact that AbstractPipelineImpl.getActivities() needs to read all runs on a Job in order for the pagination to work back in stapler is where this is choking imo i.e. you want the first 30 runs but are reading all 5000 runs into a collection and only then the filtering happens. Filtering earlier in the process would work better I think. getBuilds(hudson.model.Fingerprint.RangeSet) would not really help here as far as I can see. If changing the pagination is a big problem then one possible quick solution here might be to not use the Stapler pagination and, instead, to do the pagination "manually' for getting activity (where there can be large resultsets).
            Hide
            tfennelly Tom FENNELLY added a comment -

            As a quick experiment, I changed AbstractPipelineImpl.getActivities() by removing the @Navigable annotation and doing the pagination manually using a LazyBuildMixIn and RunMap iterator. In my case (a simple pipeline Job with > 5000 runs on it), it made the first page loading on a restarted Jenkins go from ~ 30 seconds down to ~ 1.3 seconds (still not fast, but a lot better than 30 s), and the reloading of that page thereafter from ~ 800 ms down to ~ 90 ms.

            Why is it faster ... because we are only loading the runs that we are going to show in the UI i.e. 25 of them instead of 5000+ of them.

            Sure, you might want to generalise this in some way, but at least we now know we have some option to make this better.

            See https://github.com/tfennelly/blueocean-plugin/tree/JENKINS-39661

            Show
            tfennelly Tom FENNELLY added a comment - As a quick experiment, I changed AbstractPipelineImpl.getActivities() by removing the @Navigable annotation and doing the pagination manually using a LazyBuildMixIn and RunMap iterator. In my case (a simple pipeline Job with > 5000 runs on it), it made the first page loading on a restarted Jenkins go from ~ 30 seconds down to ~ 1.3 seconds (still not fast, but a lot better than 30 s), and the reloading of that page thereafter from ~ 800 ms down to ~ 90 ms. Why is it faster ... because we are only loading the runs that we are going to show in the UI i.e. 25 of them instead of 5000+ of them. Sure, you might want to generalise this in some way, but at least we now know we have some option to make this better. See https://github.com/tfennelly/blueocean-plugin/tree/JENKINS-39661
            Hide
            tfennelly Tom FENNELLY added a comment -

            While looking at JENKINS-39625 and after talking to Bobby Sandell, I made a visit to https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fpipeline-model-definition-plugin/activity/. Loading the top level activity page for that job took about 4 minutes, with all/most of the time spent loading blue/rest/organizations/jenkins/pipelines/Plugins/pipeline-model-definition-plugin/activities/?start=0&limit=26, the size of which was 1.2Mb.

            Show
            tfennelly Tom FENNELLY added a comment - While looking at JENKINS-39625 and after talking to Bobby Sandell, I made a visit to https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fpipeline-model-definition-plugin/activity/ . Loading the top level activity page for that job took about 4 minutes, with all/most of the time spent loading blue/rest/organizations/jenkins/pipelines/Plugins/pipeline-model-definition-plugin/activities/?start=0&limit=26, the size of which was 1.2Mb.
            Hide
            michaelneale Michael Neale added a comment -

            Tom FENNELLY interesting - on your last comment, would that response be helped by using correct pagination? (that response size seems odd - what is bloating it?)

            Show
            michaelneale Michael Neale added a comment - Tom FENNELLY interesting - on your last comment, would that response be helped by using correct pagination? (that response size seems odd - what is bloating it?)
            Hide
            tfennelly Tom FENNELLY added a comment -

            Michael Neale If you just get one record (https://ci.jenkins.io/blue/rest/organizations/jenkins/pipelines/Plugins/pipeline-model-definition-plugin/activities/?start=0&limit=1) you'll see that it's 23,000+ lines of JSON and it looks like most of that is artifacts.

            Show
            tfennelly Tom FENNELLY added a comment - Michael Neale If you just get one record ( https://ci.jenkins.io/blue/rest/organizations/jenkins/pipelines/Plugins/pipeline-model-definition-plugin/activities/?start=0&limit=1 ) you'll see that it's 23,000+ lines of JSON and it looks like most of that is artifacts.
            Hide
            vivek Vivek Pandey added a comment -

            Tom FENNELLY Great thanks. I will look at generalizing it. BTW, whats the best way to create 5000+ jobs? groovy script console?

            Show
            vivek Vivek Pandey added a comment - Tom FENNELLY Great thanks. I will look at generalizing it. BTW, whats the best way to create 5000+ jobs? groovy script console?
            Hide
            jamesdumay James Dumay added a comment -

            Thanks both for taking a look

            Show
            jamesdumay James Dumay added a comment - Thanks both for taking a look
            Hide
            tfennelly Tom FENNELLY added a comment -

            Vivek Pandey Hey Vivek .... I'm guessing you mean 5000+ runs of a job. In that case, yes, I used the script console to do that e.g.

            def job = Jenkins.instance.getItem('pipeline_1');
            
            for (i = 0; i < 1000; i++) {
                job.scheduleBuild2(0).waitForStart();
            }
            

            If you mean Jobs then I have used the following script: https://gist.github.com/tfennelly/c53953b98850304deb405591063a6e85

            And then run that like ....

            ./create-jobs.sh --host=localhost:8080/jenkins --login=tfennelly:XXXXX --job=pipeline --count=500
            
            Show
            tfennelly Tom FENNELLY added a comment - Vivek Pandey Hey Vivek .... I'm guessing you mean 5000+ runs of a job. In that case, yes, I used the script console to do that e.g. def job = Jenkins.instance.getItem( 'pipeline_1' ); for (i = 0; i < 1000; i++) { job.scheduleBuild2(0).waitForStart(); } If you mean Jobs then I have used the following script: https://gist.github.com/tfennelly/c53953b98850304deb405591063a6e85 And then run that like .... ./create-jobs.sh --host=localhost:8080/jenkins --login=tfennelly:XXXXX --job=pipeline --count=500
            Hide
            michaelneale Michael Neale added a comment -

            I think that 5000 runs is what you would want to do to repro this, not jobs (that is a separate issue)

            Show
            michaelneale Michael Neale added a comment - I think that 5000 runs is what you would want to do to repro this, not jobs (that is a separate issue)
            Hide
            michaelneale Michael Neale added a comment -

            FYI I think in at least the case of a lot of artifacts, blue ocean is broken here: https://issues.jenkins-ci.org/browse/JENKINS-39737 - am following it up on that ticket.

            Show
            michaelneale Michael Neale added a comment - FYI I think in at least the case of a lot of artifacts, blue ocean is broken here: https://issues.jenkins-ci.org/browse/JENKINS-39737 - am following it up on that ticket.
            michaelneale Michael Neale made changes -
            Link This issue relates to JENKINS-39737 [ JENKINS-39737 ]
            tfennelly Tom FENNELLY made changes -
            Link This issue is blocking JENKINS-39625 [ JENKINS-39625 ]
            Hide
            vivek Vivek Pandey added a comment -

            Tom FENNELLY Yeah runs not jobs, thanks!

            Show
            vivek Vivek Pandey added a comment - Tom FENNELLY Yeah runs not jobs , thanks!
            jamesdumay James Dumay made changes -
            Link This issue is blocking JENKINS-39625 [ JENKINS-39625 ]
            Hide
            michaelneale Michael Neale added a comment - - edited

            ok Vivek Pandey we have opened other tickets for the artifact stuff, which will probably improve things a bit (in your testing you might want to comment out all artifacts and see if it is still slow).

            Based on my calculations looking at the json from https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fpipeline-model-definition-plugin/detail/PR-54/1/artifacts/ (ie the https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fpipeline-model-definition-plugin project) - it takes around 10 seconds to load one runs artifact flat listing (there are 1700 odd artifacts). Multiply that by a page worth of activity (26 items), and you get about 4 minutes latency.

            So this could be the cause of the slowness, at least in the case of Andrew Bayer's pipeline.

            Still need to run a pressure test anyway as suggested by tom's script.

            Show
            michaelneale Michael Neale added a comment - - edited ok Vivek Pandey we have opened other tickets for the artifact stuff, which will probably improve things a bit (in your testing you might want to comment out all artifacts and see if it is still slow). Based on my calculations looking at the json from https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fpipeline-model-definition-plugin/detail/PR-54/1/artifacts/ (ie the https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fpipeline-model-definition-plugin project) - it takes around 10 seconds to load one runs artifact flat listing (there are 1700 odd artifacts). Multiply that by a page worth of activity (26 items), and you get about 4 minutes latency. So this could be the cause of the slowness, at least in the case of Andrew Bayer 's pipeline. Still need to run a pressure test anyway as suggested by tom's script.
            vivek Vivek Pandey made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            Hide
            vivek Vivek Pandey added a comment -

            Tom FENNELLY I have create 6000 runs of a simple pipeline, fresh load time (on new browser is about 600ms). Maybe its my pipeline is too simple to reproduce 10s of sec initial load time? Its all on my laptop.

            Here is my pipeline:

            stage 'build'
            node{
              echo "Building..."
            }
            stage 'deploy'
            node{
              echo "Deploying"
            }
            

            Can you share your pipeline? Maybe that will help me diagnose. I am proceeding with a fix regardless, however need to validate it so need to reproduce.

            Thanks buddy!

            Show
            vivek Vivek Pandey added a comment - Tom FENNELLY I have create 6000 runs of a simple pipeline, fresh load time (on new browser is about 600ms). Maybe its my pipeline is too simple to reproduce 10s of sec initial load time? Its all on my laptop. Here is my pipeline: stage 'build' node{ echo "Building..." } stage 'deploy' node{ echo "Deploying" } Can you share your pipeline? Maybe that will help me diagnose. I am proceeding with a fix regardless, however need to validate it so need to reproduce. Thanks buddy!
            Hide
            tfennelly Tom FENNELLY added a comment -

            Vivek Pandey Hey Vivek ... afraid I wiped my test env, but it was a simple enough pipeline that just echoed to the log about 100 or 200 times (I forget exactly, but wasn't a lot).

            So something like...

            for (i = 0; i < 200; i++) {
                echo "Vivek is an awesome guy, except that he loves Donald Trump"
            }
            
            Show
            tfennelly Tom FENNELLY added a comment - Vivek Pandey Hey Vivek ... afraid I wiped my test env, but it was a simple enough pipeline that just echoed to the log about 100 or 200 times (I forget exactly, but wasn't a lot). So something like... for (i = 0; i < 200; i++) { echo "Vivek is an awesome guy, except that he loves Donald Trump" }
            Hide
            vivek Vivek Pandey added a comment -

            Tom FENNELLY Ha ha! That test is some evil fantasy of yours

            Ok so test is fine. For some reason I don't see that kind of slow performance, relatively speaking I am getting about 300% improvement with my change. 670ms to ~250ms. I am going to put a PR.

            Show
            vivek Vivek Pandey added a comment - Tom FENNELLY Ha ha! That test is some evil fantasy of yours Ok so test is fine. For some reason I don't see that kind of slow performance, relatively speaking I am getting about 300% improvement with my change. 670ms to ~250ms. I am going to put a PR.
            vivek Vivek Pandey made changes -
            Status In Progress [ 3 ] In Review [ 10005 ]
            jamesdumay James Dumay made changes -
            Sprint arctic [ 131 ] arctic, tasman [ 131, 136 ]
            vivek Vivek Pandey made changes -
            Resolution Fixed [ 1 ]
            Status In Review [ 10005 ] Resolved [ 5 ]
            jamesdumay James Dumay made changes -
            Remote Link This issue links to "CloudBees Internal UX-583 (Web Link)" [ 18214 ]

              People

              Assignee:
              vivek Vivek Pandey
              Reporter:
              jamesdumay James Dumay
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: