Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38592

pipelines data for json is getting a bit too crazy (for dashboard)

    XMLWordPrintable

    Details

    • Similar Issues:
    • Epic Link:
    • Sprint:
      pacific, atlantic

      Description

      In many cases the piplines json for the dashboard (search results) is blowing out to 14kb for 26 "rows" of data. This isn't a huge json transfer, but when you look closer, the impact of plugins adding to actions appears to be huge, and may be the cause of some problems.

      As part of looking at a support ticket Thorsten Scherler uncovered that for 26 pipelines, there is really 20 lines of attributes that are used by the dashboard == 520 lines (roughly).
      However there is 10k lines in the response - this is a bit out of control.

      It seems that every plugin installed adds actions for every little thing, which the dashboard doesn't need.

      For example

              "_class": "io.jenkins.blueocean.service.embedded.rest.ActionProxiesImpl",
              "_links": {
                "self": {
                  "_class": "io.jenkins.blueocean.rest.hal.Link",
                  "href": "/blue/rest/organizations/jenkins/... (hidden)"
                }
              },
              "_class": "com.cloudbees.plugins.credentials    .ViewCredentialsAction",
              "stores": {},
              "urlName": "credentials"
            }
          ],
      

      is added to every pipeline (along with 100's of others). It really depends on the plugins.

      Also

               {
                  "_class": "hudson.model.TextParameterDefinition",
                  "defaultParameterValue": {
                    "_class": "hudson.model.StringParameterValue",
                    "name": "jiraComment",
                    "value": ""
                  },
                  "description": "Additional info for theissue's comment",
                  "name": "jiraComment",
                  "type": "TextParameterDefinition"
                },
      

      will never be of use to the dashboard (well probably).

      Some questions:

      Is there a better query to use to make the response smaller/more compact? (even just eyeballing the reponse could be helped by not returning the metadata optionally). I expect the calculation of this is quite expensive.

      Could this be a source of the slowdown when people have enough/certain plugins installed?

        Attachments

          Issue Links

            Activity

            Hide
            michaelneale Michael Neale added a comment - - edited

            FYI here it is for 3 pipelines and a minimal install: https://gist.github.com/michaelneale/9127c8e14d16baf05910807bd8ac8cc9

            This isn't as problematic as other "real world" examples. It is still 90% noise for the pipeline screen, but the action list seems fairly finite. The problem is with every single plugin adding to it, when it is returned in the listing. It also doesn't seem bugger (no wayward whitespace in _class or nulls).

            (putting in current sprint as there is work on to look at search response times)

            Show
            michaelneale Michael Neale added a comment - - edited FYI here it is for 3 pipelines and a minimal install: https://gist.github.com/michaelneale/9127c8e14d16baf05910807bd8ac8cc9 This isn't as problematic as other "real world" examples. It is still 90% noise for the pipeline screen, but the action list seems fairly finite. The problem is with every single plugin adding to it, when it is returned in the listing. It also doesn't seem bugger (no wayward whitespace in _class or nulls). (putting in current sprint as there is work on to look at search response times)
            Hide
            michaelneale Michael Neale added a comment -

            Thorsten Scherler can you confirm if the metadata is used at all on the dashboard now? (if you know) - perhaps the solution is to not have it for the listing (but I don't know if some is used). Whilst future plugins may use it - it is less likely on dashboard listing.

            Show
            michaelneale Michael Neale added a comment - Thorsten Scherler can you confirm if the metadata is used at all on the dashboard now? (if you know) - perhaps the solution is to not have it for the listing (but I don't know if some is used). Whilst future plugins may use it - it is less likely on dashboard listing.
            Hide
            tscherler Thorsten Scherler added a comment -

            Quick comment on the whitespaces in the attributes, I did the formating with regexp so that may be extra whitespaces that are not shown in real world response.

            AFAIK we use bits of the metadata. There is some code using the links and some use capabilities but that we do strictly on the root class/response object. The one exception is logs, where we have to scan the actions of a step to determines, whether we support logs.

            Show
            tscherler Thorsten Scherler added a comment - Quick comment on the whitespaces in the attributes, I did the formating with regexp so that may be extra whitespaces that are not shown in real world response. AFAIK we use bits of the metadata. There is some code using the links and some use capabilities but that we do strictly on the root class/response object. The one exception is logs, where we have to scan the actions of a step to determines, whether we support logs.
            Hide
            michaelneale Michael Neale added a comment -

            One point Vivek and I discussed: do we need the latest run of each pipeline to return everything? or could it be a summary of what we known is needed on the screen? when a pipeline detail is loaded then we do a full fetch anyway right Cliff Meyers Thorsten Scherler?

            Vivek also thinks making use of the tree api could be a better bet (ie only request what we know is needed).

            Show
            michaelneale Michael Neale added a comment - One point Vivek and I discussed: do we need the latest run of each pipeline to return everything? or could it be a summary of what we known is needed on the screen? when a pipeline detail is loaded then we do a full fetch anyway right Cliff Meyers Thorsten Scherler ? Vivek also thinks making use of the tree api could be a better bet (ie only request what we know is needed).
            Hide
            cliffmeyers Cliff Meyers added a comment -

            Michael Neale as far as I can see, we don't use the latestRun anywhere in the main Pipeline list. The latestRun has a little bloat, but it's really more the "_actions" property for each pipeline and each latestRun that seems to contribute the most JSON bloat (see below). My 2 cents:

            • "_class" property is absolutely critical since the UI will can now automatically fetch all of the associated capabilities for all unique "_class" values returned in a response. Personalization uses this everywhere and we use it for the main pipelines fetch as well as activity screen too.
            • "_links" property is absolutely critical as there are numerous actions (run, start, stop, replay, remove from queue, others) that require the "self" href. AFAIK we aren't using any of the other links. It's sort of debatable whether we should use the "runs" link to get runs for a pipeline, or whether it's fine just to append "/runs/" to the self href like we do now in a few places.
            • "_actions" which I think provides most of the JSON bloat is only used in one place I think, some of the code that Thorsten Scherler worked on in Run Details (I am not familiar with the specifics).

            Seems like trimming out "_actions" except where we need would have the least impact to existing code and make the biggest different in JSON. However before we go optimizing, I wonder have we measured this to see that it's really a problem? I have to imagine that if gzip is enabled that the overall size of the JSON response is going to become very tiny because of how much data is duplicated in "_actions". It's untidy to look at for sure, but any basic editor / viewer will allow you to collapse the "_links" and "_actions" node down at which point to me the JSON is entirely readable.

            Show
            cliffmeyers Cliff Meyers added a comment - Michael Neale as far as I can see, we don't use the latestRun anywhere in the main Pipeline list. The latestRun has a little bloat, but it's really more the "_actions" property for each pipeline and each latestRun that seems to contribute the most JSON bloat (see below). My 2 cents: "_class" property is absolutely critical since the UI will can now automatically fetch all of the associated capabilities for all unique "_class" values returned in a response. Personalization uses this everywhere and we use it for the main pipelines fetch as well as activity screen too. "_links" property is absolutely critical as there are numerous actions (run, start, stop, replay, remove from queue, others) that require the "self" href. AFAIK we aren't using any of the other links. It's sort of debatable whether we should use the "runs" link to get runs for a pipeline, or whether it's fine just to append "/runs/" to the self href like we do now in a few places. "_actions" which I think provides most of the JSON bloat is only used in one place I think, some of the code that Thorsten Scherler worked on in Run Details (I am not familiar with the specifics). Seems like trimming out "_actions" except where we need would have the least impact to existing code and make the biggest different in JSON. However before we go optimizing, I wonder have we measured this to see that it's really a problem? I have to imagine that if gzip is enabled that the overall size of the JSON response is going to become very tiny because of how much data is duplicated in "_actions". It's untidy to look at for sure, but any basic editor / viewer will allow you to collapse the "_links" and "_actions" node down at which point to me the JSON is entirely readable.
            Hide
            michaelneale Michael Neale added a comment -

            Cliff Meyers yes I don't think the size, compressed, is a problem per se - its more the calculation of things like latestRun and perhaps some of the other things that may trip things up. It is a pain when diagnosing to look at the json as you can't see the wood for the trees. Good suggestions.

            Show
            michaelneale Michael Neale added a comment - Cliff Meyers yes I don't think the size, compressed, is a problem per se - its more the calculation of things like latestRun and perhaps some of the other things that may trip things up. It is a pain when diagnosing to look at the json as you can't see the wood for the trees. Good suggestions.
            Hide
            tscherler Thorsten Scherler added a comment -

            Michael Neale yeah size does not matter ...that's what .... Jokes apart Cliff Meyers the problem is in support when bo fails. To review the informations of BO we need to review the server response and as said above there is way to much wood in the response. e.g. a "clean" response https://developer.github.com/v3/repos/forks/#create-a-fork

            https://www.thoughtworks.com/es/insights/blog/rest-api-design-resource-modeling is a nice article on how REST resources should be designed.

            I actually starting to thing that we should do 3 different requests for one resource and we fetch them separately

            ```
            .../pipelines
            .../pipelines/links
            .../pipelines/actions
            ...
            ```

            Show
            tscherler Thorsten Scherler added a comment - Michael Neale yeah size does not matter ...that's what .... Jokes apart Cliff Meyers the problem is in support when bo fails. To review the informations of BO we need to review the server response and as said above there is way to much wood in the response. e.g. a "clean" response https://developer.github.com/v3/repos/forks/#create-a-fork https://www.thoughtworks.com/es/insights/blog/rest-api-design-resource-modeling is a nice article on how REST resources should be designed. I actually starting to thing that we should do 3 different requests for one resource and we fetch them separately ``` .../pipelines .../pipelines/links .../pipelines/actions ... ```
            Hide
            michaelneale Michael Neale added a comment -

            Tackling this as part of the effort to use the tree parameter.

            Show
            michaelneale Michael Neale added a comment - Tackling this as part of the effort to use the tree parameter.

              People

              Assignee:
              vivek Vivek Pandey
              Reporter:
              michaelneale Michael Neale
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: