Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-41205

Stage graph unsuitable for large and/or complex pipelines

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Major Major
    • blueocean-plugin
    • None
    • Jenkins 2.40
      Blue Ocean 1.0.0-b17

      Improvement on roadmap

      This improvement is on the Blue Ocean project roadmap. Check the roadmap page for updates.

      The Blue Ocean stage graph is great for small, simple pipelines however it breaks down with many parallel builds. See attached screenshot for an example.

      Because 'stage' can no longer be nested within 'parallel', all of our steps must belong under a single 'Test' stage. We have 19 parallel jobs, which is not an uncommon number for iOS/Android development where many combinations of app, device and OS version need to be tested. We'd actually like to split some of the jobs into smaller chunks to take advantage of idle build agents, but this would greatly exacerbate the problem.

      Grouping jobs under multiple stages would improve the UI experience, but also drastically increase the runtime of our integration runs as stages are executed serially.

      I envision two possible solutions:

      1. Stages have a 'parallel' option that allows them to run at the same time as other parallel stages.
      2. A step is introduced that is used purely as an annotation for the purposes of rendering a more appropriate graph. Ideally the step would be deeply nestable allowing for complex graph hierarchies.

      Thanks for all the hard work on Blue Ocean, it's really shaping up nicely and I eagerly await each new release.

          [JENKINS-41205] Stage graph unsuitable for large and/or complex pipelines

          Any insight on this issue ? I'm in the same situation with even more parallel steps (over 400 and growing). Basically we have so much browser tests that we need to split them a lot. Previously we were using a distributed system of our own using a queue system and consumers that upload the results back to the Jenkins master server but it kinda suck and we really want to move back to a basic master / slave management of your Jenkins.

          How can I help ? From my point of view, if we could simply say, using parallel, to not create a visual step for each branch of the parallel would be enough as a quickfix. Do you have any slack or whatever tool to talk directly ? I'm not used to Java so I might need some help with the contribution process.

          Clement Gautier added a comment - Any insight on this issue ? I'm in the same situation with even more parallel steps (over 400 and growing). Basically we have so much browser tests that we need to split them a lot. Previously we were using a distributed system of our own using a queue system and consumers that upload the results back to the Jenkins master server but it kinda suck and we really want to move back to a basic master / slave management of your Jenkins. How can I help ? From my point of view, if we could simply say, using parallel, to not create a visual step for each branch of the parallel would be enough as a quickfix. Do you have any slack or whatever tool to talk directly ? I'm not used to Java so I might need some help with the contribution process.

          Andrew Conti added a comment -

          I'm in a similar situation. I've got 25 separate build combinations, some of which are virtualized and have one or more packaging nodes as well. Each artifact produced then gets a test node, and all told there's over 40 artifacts to test. All told I've got about 100 nodes so I'm guessing that must be the limit. Just my 2¢ that the limit is too low.

          Andrew Conti added a comment - I'm in a similar situation. I've got 25 separate build combinations, some of which are virtualized and have one or more packaging nodes as well. Each artifact produced then gets a test node, and all told there's over 40 artifacts to test. All told I've got about 100 nodes so I'm guessing that must be the limit. Just my 2¢ that the limit is too low.

          Phil Clay added a comment -

          +1 on the limit of 100 being too low.  I have some pipelines that are slightly larger than 100, and this is a real pain.  My pipelines only have a few (non-nested) stages, but have lots of parallel steps per stage.  I'm fine with dealing with a slightly long UI graph if that means I get to see all the nodes.

           

          Is there any chance that a short-term solution could be put in place to allow the limit to be configurable, with no changes to the current UI graph display?  I have a feeling that refactoring the UI using some of the solutions proposed here will take quite a while, especially since the roadmap indicates it is "Not planned.

           

          I looked into it a bit.  

          The current UI makes one call to:

          http://jenkins/blue/rest/organizations/jenkins/pipelines/_job_/runs/_runId_/nodes/
          

          Internally, the nodes are retrieved via the PipelineNodeContainerImpl, which is a BluePipelineNodeContainer, which is a Container, which is a Pageable.

          Therefore, the response that is constructed is a PagedResponse, which has a default limit of 100.  Since no start/limit is passed in the request, the default start=0 and limit=100 are used.  So even though the container has all of the nodes in it, only the first 100 are returned in the HTTP response.

          The response also includes a Link header for the next page:

          Link: </blue/rest/organizations/jenkins/pipelines/_job_/runs/_runId_/nodes/?start=100&limit=100>; rel="next"
          

          However, the UI does not follow the link to retrieve the next page.

           

          While debugging, I was able to set a breakpoint on the server side, and hack around the arbitrary 100 limit to make the server return more than 100.  The UI worked just fine with the number of nodes in my pipelines.

           

          There are a few hacky ways to make all the nodes show on the current UI:

          1. Increase the default limit constant
          2. Provide a way to configure the default limit (and don't rely on a constant), therefore allowing users to modify the limit as they desire
          3. Have the call to .../nodes pass a higher limit by default... e.g. .../nodes/?limit=500
          4. Have the UI follow links to next pages (could be done by default, OR done if the user clicks on the "Unable to display more" node.

           

          In any case, a short term solution would be much appreciated.

           

           

          Phil Clay added a comment - +1 on the limit of 100 being too low.  I have some pipelines that are slightly larger than 100, and this is a real pain.  My pipelines only have a few (non-nested) stages, but have lots of parallel steps per stage.  I'm fine with dealing with a slightly long UI graph if that means I get to see all the nodes.   Is there any chance that a short-term solution could be put in place to allow the limit to be configurable, with no changes to the current UI graph display?  I have a feeling that refactoring the UI using some of the solutions proposed here will take quite a while, especially since the roadmap indicates it is "Not planned.   I looked into it a bit.   The current UI makes one call to: http://jenkins/blue/rest/organizations/jenkins/pipelines/_job_/runs/_runId_/nodes/ Internally, the nodes are retrieved via the PipelineNodeContainerImpl , which is a BluePipelineNodeContainer , which is a  Container , which is a Pageable . Therefore, the response that is constructed is a  PagedResponse , which has a default limit of 100 .  Since no start/limit is passed in the request, the default start=0 and limit=100 are used.  So even though the container has all of the nodes in it, only the first 100 are returned in the HTTP response. The response also includes a Link header for the next page: Link: </blue/rest/organizations/jenkins/pipelines/_job_/runs/_runId_/nodes/?start=100&limit=100>; rel="next" However, the UI does not follow the link to retrieve the next page.   While debugging, I was able to set a breakpoint on the server side, and hack around the arbitrary 100 limit to make the server return more than 100.  The UI worked just fine with the number of nodes in my pipelines.   There are a few hacky ways to make all the nodes show on the current UI: Increase the default limit constant Provide a way to configure the default limit (and don't rely on a constant), therefore allowing users to modify the limit as they desire Have the call to .../nodes pass a higher limit by default... e.g. .../nodes/?limit=500 Have the UI follow links to next pages (could be done by default, OR done if the user clicks on the "Unable to display more" node.   In any case, a short term solution would be much appreciated.    

          Ben Langfeld added a comment -

          My team has the necessary skills to contribute a fix to the UI to fetch all available pages. If we were to submit such a PR, is there a reasonable chance it would be merged for inclusion in the next BlueOcean release, or is there more roadmap politics to navigate than that?

          For anyone who is curious, the deficient object is the PipelinePager which doesn't actually page in data: https://github.com/jenkinsci/blueocean-plugin/blob/master/blueocean-dashboard/src/main/js/components/karaoke/services/pagers/PipelinePager.js past the initial page.

          Ben Langfeld added a comment - My team has the necessary skills to contribute a fix to the UI to fetch all available pages. If we were to submit such a PR, is there a reasonable chance it would be merged for inclusion in the next BlueOcean release, or is there more roadmap politics to navigate than that? For anyone who is curious, the deficient object is the PipelinePager which doesn't actually page in data: https://github.com/jenkinsci/blueocean-plugin/blob/master/blueocean-dashboard/src/main/js/components/karaoke/services/pagers/PipelinePager.js  past the initial page.

          This bug it greatly hindering my work. Could it be moved up the roadmap or could someone respond to the gentleman that offered to fix it in the above comment?

          Garett Arrowood added a comment - This bug it greatly hindering my work. Could it be moved up the roadmap or could someone respond to the gentleman that offered to fix it in the above comment?

          Keith Zantow added a comment -

          benlangfeld if you submit a PR that fixes the issue it would absolutely be considered for inclusion; we'd just have a look and make sure tests pass, etc.. Submissions are always welcome!

          Keith Zantow added a comment - benlangfeld if you submit a PR that fixes the issue it would absolutely be considered for inclusion; we'd just have a look and make sure tests pass, etc.. Submissions are always welcome!

          Cliff Meyers added a comment - - edited

          Seconded. benlangfeld I had looked at this problem in the past. Another option is to do successive fetches until all nodes / stages are loaded. If you look at the REST responses, you'll see there is pagination data written into a "Link" response header IIRC. That's a way to determine whether there is additional data to be fetched, and you could write some logic to grab say n=100 and just perform successive fetches until the Link header indicates there is no more data.

          We may want to be careful about doing a massive fetch up front (say n=500) as for complex pipelines this might have a perf impact server wide. I recall discussing this with vivek a while back, can you refresh my memory on whether it might be preferable to do a single large fetch (say n=500) or several smaller fetches (n=100) until all data is loaded? Intuitively fewer large fetches seems more efficient from client's perspective, but I seem to recall a concern with loading a large number of nodes concurrently in the context of a single request?

          Cliff Meyers added a comment - - edited Seconded. benlangfeld I had looked at this problem in the past. Another option is to do successive fetches until all nodes / stages are loaded. If you look at the REST responses, you'll see there is pagination data written into a "Link" response header IIRC. That's a way to determine whether there is additional data to be fetched, and you could write some logic to grab say n=100 and just perform successive fetches until the Link header indicates there is no more data. We may want to be careful about doing a massive fetch up front (say n=500) as for complex pipelines this might have a perf impact server wide. I recall discussing this with vivek a while back, can you refresh my memory on whether it might be preferable to do a single large fetch (say n=500) or several smaller fetches (n=100) until all data is loaded? Intuitively fewer large fetches seems more efficient from client's perspective, but I seem to recall a concern with loading a large number of nodes concurrently in the context of a single request?

          Ben Langfeld added a comment -

          Following Link is precisely what I had in mind cliffmeyers. We'll prep a patch. Thanks everyone.

          Ben Langfeld added a comment - Following Link is precisely what I had in mind cliffmeyers . We'll prep a patch. Thanks everyone.

          Michael Neale added a comment -

          benlangfeld just make sure you are nice and up to date with master as some recent changes were merged for how it follows along (may not affect you, just FYI). As for fetching the pages - absolutely why not, if you have a PR that would be wonderful. Go for it!

          Michael Neale added a comment - benlangfeld just make sure you are nice and up to date with master as some recent changes were merged for how it follows along (may not affect you, just FYI). As for fetching the pages - absolutely why not, if you have a PR that would be wonderful. Go for it!

          Ben Langfeld added a comment -

          A patch to resolve this is proposed at https://github.com/jenkinsci/blueocean-plugin/pull/1517. I would appreciate a review, particularly from cliffmeyers.

          Ben Langfeld added a comment - A patch to resolve this is proposed at https://github.com/jenkinsci/blueocean-plugin/pull/1517.  I would appreciate a review, particularly from cliffmeyers .

            benlangfeld Ben Langfeld
            ileitch Ian Leitch
            Votes:
            19 Vote for this issue
            Watchers:
            36 Start watching this issue

              Created:
              Updated:
              Resolved: