Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-41205

Stage graph unsuitable for large and/or complex pipelines

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Major Major
    • blueocean-plugin
    • None
    • Jenkins 2.40
      Blue Ocean 1.0.0-b17

      Improvement on roadmap

      This improvement is on the Blue Ocean project roadmap. Check the roadmap page for updates.

      The Blue Ocean stage graph is great for small, simple pipelines however it breaks down with many parallel builds. See attached screenshot for an example.

      Because 'stage' can no longer be nested within 'parallel', all of our steps must belong under a single 'Test' stage. We have 19 parallel jobs, which is not an uncommon number for iOS/Android development where many combinations of app, device and OS version need to be tested. We'd actually like to split some of the jobs into smaller chunks to take advantage of idle build agents, but this would greatly exacerbate the problem.

      Grouping jobs under multiple stages would improve the UI experience, but also drastically increase the runtime of our integration runs as stages are executed serially.

      I envision two possible solutions:

      1. Stages have a 'parallel' option that allows them to run at the same time as other parallel stages.
      2. A step is introduced that is used purely as an annotation for the purposes of rendering a more appropriate graph. Ideally the step would be deeply nestable allowing for complex graph hierarchies.

      Thanks for all the hard work on Blue Ocean, it's really shaping up nicely and I eagerly await each new release.

          [JENKINS-41205] Stage graph unsuitable for large and/or complex pipelines

          James Dumay added a comment -

          Thanks ileitch! For that second point would you be interested in something like JENKINS-38442 ?

          James Dumay added a comment - Thanks ileitch ! For that second point would you be interested in something like JENKINS-38442 ?

          Ian Leitch added a comment -

          jamesdumay It's not clear in JENKINS-38442 how the graph would be rendered. Nested stages would solve the issue for me provided they do alter the graph, however ideally both nested stages and parallel stages would provide the most flexibility.

          Ian Leitch added a comment - jamesdumay It's not clear in JENKINS-38442 how the graph would be rendered. Nested stages would solve the issue for me provided they do alter the graph, however ideally both nested stages and parallel stages would provide the most flexibility.

          James Dumay added a comment -

          ileitch we are not even clear how things should be rendered because users ideas of how that should look are so diverse

          I think there are some things we can do here to make the existing graph more acceptable in your case. We will also be looking at how to solve the nesting problem this year but I can't give a good ETA on it yet.

          James Dumay added a comment - ileitch we are not even clear how things should be rendered because users ideas of how that should look are so diverse I think there are some things we can do here to make the existing graph more acceptable in your case. We will also be looking at how to solve the nesting problem this year but I can't give a good ETA on it yet.

          Ian Leitch added a comment -

          The sketch you provided: https://issues.jenkins-ci.org/secure/attachment/34100/blueocean.sketch%202016-09-28%2015-03-57.png
          would not be ideal as it only alters the steps table, not the graph. If we applied this approach to our pipeline, the graph would only display a single "Test" stage, with all of the meaningful stages being shown in the list below. This would make the graph linear, and somewhat redundant.

          Ian Leitch added a comment - The sketch you provided: https://issues.jenkins-ci.org/secure/attachment/34100/blueocean.sketch%202016-09-28%2015-03-57.png would not be ideal as it only alters the steps table, not the graph. If we applied this approach to our pipeline, the graph would only display a single "Test" stage, with all of the meaningful stages being shown in the list below. This would make the graph linear, and somewhat redundant.

          James Dumay added a comment -

          ileitch yeah I think that design is totally out for the reason you provided and because two steps can appear on the screen it makes the running behaviour really complex.

          James Dumay added a comment - ileitch yeah I think that design is totally out for the reason you provided and because two steps can appear on the screen it makes the running behaviour really complex.

          James Dumay added a comment -

          ileitch actually would you be up for a Google Hangout in the next few weeks? We can discuss this in greater detail. Would love to see some of the Pipelines you have built and how they show up in Blue Ocean.

          James Dumay added a comment - ileitch actually would you be up for a Google Hangout in the next few weeks? We can discuss this in greater detail. Would love to see some of the Pipelines you have built and how they show up in Blue Ocean.

          Ian Leitch added a comment - jamesdumay Please see my proposal here: https://issues.jenkins-ci.org/browse/JENKINS-38442?focusedCommentId=284175&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-284175

          Ian Leitch added a comment -

          jamesdumay Sorry just noticed the Google Hangout question. Yeah I'm happy to do so, though be warned that I'm very new to Jenkins (I was tasked with switching away from TeamCity). We only have a single Pipeline at the moment.

          Ian Leitch added a comment - jamesdumay Sorry just noticed the Google Hangout question. Yeah I'm happy to do so, though be warned that I'm very new to Jenkins (I was tasked with switching away from TeamCity). We only have a single Pipeline at the moment.

          James Dumay added a comment -

          Oh no that's perfect that you are new to things! Can you email me on your work email? I'm jdumay@cloudbees.com

          James Dumay added a comment - Oh no that's perfect that you are new to things! Can you email me on your work email? I'm jdumay@cloudbees.com

          Alastair D'Silva added a comment - - edited

          A multiaxis way of displaying steps would be nice too, consider the following:

          def hostPlatforms = ['ppc64le', 'x86_64']
          def targetPlatforms = ['x86_64', 'power32', 'ppc64le', 'ppc64be', 'arm']
          def hostOSs = ['ubuntu-16.04']
          
          def builds = [:]
          
          for (hostPlatform in hostPlatforms) {
              for (targetPlatform in targetPlatforms) {
                  if (hostPlatform.equals('ppc64le') && targetPlatform.equals('x86_64')) {
                      continue;
                  }
          
                  for (hostOS in hostOSs) {
                      def name = 'build-' + hostPlatform + '-' + targetPlatform
                      def myHostPlatform = hostPlatform
                      def myHostOS = hostOS
                      def myTargetPlatform = targetPlatform
          
                      builds[name] = {buildToolchain(myHostPlatform, myHostOS, myTargetPlatform)}
                  }
              }
          }
          
          stage('build') {
              parallel builds
          }
          

          This currently ends up with a long list scrolling down the page, but would be better represented by a display similar to the Matrix Project plugin.

          Alastair D'Silva added a comment - - edited A multiaxis way of displaying steps would be nice too, consider the following: def hostPlatforms = [ 'ppc64le' , 'x86_64' ] def targetPlatforms = [ 'x86_64' , 'power32' , 'ppc64le' , 'ppc64be' , 'arm' ] def hostOSs = [ 'ubuntu-16.04' ] def builds = [:] for (hostPlatform in hostPlatforms) { for (targetPlatform in targetPlatforms) { if (hostPlatform.equals( 'ppc64le' ) && targetPlatform.equals( 'x86_64' )) { continue ; } for (hostOS in hostOSs) { def name = 'build-' + hostPlatform + '-' + targetPlatform def myHostPlatform = hostPlatform def myHostOS = hostOS def myTargetPlatform = targetPlatform builds[name] = {buildToolchain(myHostPlatform, myHostOS, myTargetPlatform)} } } } stage( 'build' ) { parallel builds } This currently ends up with a long list scrolling down the page, but would be better represented by a display similar to the Matrix Project plugin.

          Sam Gruber added a comment - - edited

          This pipeline of ours is unworkable in the current view (1.0.0rc3) because we cannot click through to any information on our later stages (we have approximately 250 stages total, and this view only shows the first 90). Even if the full graph is unrenderable, it would be nice to have some way of loading these stages. Just a dropdown list would be preferable to the current state.

          Sam Gruber added a comment - - edited This pipeline of ours is unworkable in the current view (1.0.0rc3) because we cannot click through to any information on our later stages (we have approximately 250 stages total, and this view only shows the first 90). Even if the full graph is unrenderable, it would be nice to have some way of loading these stages. Just a dropdown list would be preferable to the current state.

          James Dumay added a comment -

          scgruber great description of your use case here. I have a few questions:

          • What actions are you performing with the parallels?
          • Why 250 stages? What meaning do they have?

          James Dumay added a comment - scgruber great description of your use case here. I have a few questions: What actions are you performing with the parallels? Why 250 stages? What meaning do they have?

          Sam Gruber added a comment -

          jamesdumay this pipeline is an integration test against our different hardware platforms. We have a sequence of tests (asdf3, asdf4, ...) which run in parallel over our hardware platforms (zxcv1, zxcv2, zxcv3, ...). So we have a stage for each combination of test and hardware, and that's how we get to 250.

          Sam Gruber added a comment - jamesdumay this pipeline is an integration test against our different hardware platforms. We have a sequence of tests (asdf3, asdf4, ...) which run in parallel over our hardware platforms (zxcv1, zxcv2, zxcv3, ...). So we have a stage for each combination of test and hardware, and that's how we get to 250.

          James Dumay added a comment -

          scgruber very interesting! Thanks for the additional context. When we plan this feature, could we contact you for a Google Hangout to discuss the new design?

          James Dumay added a comment - scgruber very interesting! Thanks for the additional context. When we plan this feature, could we contact you for a Google Hangout to discuss the new design?

          Sam Gruber added a comment -

          jdumay sure, as long as it can be scheduled during US Pacific (UTC-7) business hours.

          Sam Gruber added a comment - jdumay sure, as long as it can be scheduled during US Pacific (UTC-7) business hours.

          Any insight on this issue ? I'm in the same situation with even more parallel steps (over 400 and growing). Basically we have so much browser tests that we need to split them a lot. Previously we were using a distributed system of our own using a queue system and consumers that upload the results back to the Jenkins master server but it kinda suck and we really want to move back to a basic master / slave management of your Jenkins.

          How can I help ? From my point of view, if we could simply say, using parallel, to not create a visual step for each branch of the parallel would be enough as a quickfix. Do you have any slack or whatever tool to talk directly ? I'm not used to Java so I might need some help with the contribution process.

          Clement Gautier added a comment - Any insight on this issue ? I'm in the same situation with even more parallel steps (over 400 and growing). Basically we have so much browser tests that we need to split them a lot. Previously we were using a distributed system of our own using a queue system and consumers that upload the results back to the Jenkins master server but it kinda suck and we really want to move back to a basic master / slave management of your Jenkins. How can I help ? From my point of view, if we could simply say, using parallel, to not create a visual step for each branch of the parallel would be enough as a quickfix. Do you have any slack or whatever tool to talk directly ? I'm not used to Java so I might need some help with the contribution process.

          Andrew Conti added a comment -

          I'm in a similar situation. I've got 25 separate build combinations, some of which are virtualized and have one or more packaging nodes as well. Each artifact produced then gets a test node, and all told there's over 40 artifacts to test. All told I've got about 100 nodes so I'm guessing that must be the limit. Just my 2¢ that the limit is too low.

          Andrew Conti added a comment - I'm in a similar situation. I've got 25 separate build combinations, some of which are virtualized and have one or more packaging nodes as well. Each artifact produced then gets a test node, and all told there's over 40 artifacts to test. All told I've got about 100 nodes so I'm guessing that must be the limit. Just my 2¢ that the limit is too low.

          Phil Clay added a comment -

          +1 on the limit of 100 being too low.  I have some pipelines that are slightly larger than 100, and this is a real pain.  My pipelines only have a few (non-nested) stages, but have lots of parallel steps per stage.  I'm fine with dealing with a slightly long UI graph if that means I get to see all the nodes.

           

          Is there any chance that a short-term solution could be put in place to allow the limit to be configurable, with no changes to the current UI graph display?  I have a feeling that refactoring the UI using some of the solutions proposed here will take quite a while, especially since the roadmap indicates it is "Not planned.

           

          I looked into it a bit.  

          The current UI makes one call to:

          http://jenkins/blue/rest/organizations/jenkins/pipelines/_job_/runs/_runId_/nodes/
          

          Internally, the nodes are retrieved via the PipelineNodeContainerImpl, which is a BluePipelineNodeContainer, which is a Container, which is a Pageable.

          Therefore, the response that is constructed is a PagedResponse, which has a default limit of 100.  Since no start/limit is passed in the request, the default start=0 and limit=100 are used.  So even though the container has all of the nodes in it, only the first 100 are returned in the HTTP response.

          The response also includes a Link header for the next page:

          Link: </blue/rest/organizations/jenkins/pipelines/_job_/runs/_runId_/nodes/?start=100&limit=100>; rel="next"
          

          However, the UI does not follow the link to retrieve the next page.

           

          While debugging, I was able to set a breakpoint on the server side, and hack around the arbitrary 100 limit to make the server return more than 100.  The UI worked just fine with the number of nodes in my pipelines.

           

          There are a few hacky ways to make all the nodes show on the current UI:

          1. Increase the default limit constant
          2. Provide a way to configure the default limit (and don't rely on a constant), therefore allowing users to modify the limit as they desire
          3. Have the call to .../nodes pass a higher limit by default... e.g. .../nodes/?limit=500
          4. Have the UI follow links to next pages (could be done by default, OR done if the user clicks on the "Unable to display more" node.

           

          In any case, a short term solution would be much appreciated.

           

           

          Phil Clay added a comment - +1 on the limit of 100 being too low.  I have some pipelines that are slightly larger than 100, and this is a real pain.  My pipelines only have a few (non-nested) stages, but have lots of parallel steps per stage.  I'm fine with dealing with a slightly long UI graph if that means I get to see all the nodes.   Is there any chance that a short-term solution could be put in place to allow the limit to be configurable, with no changes to the current UI graph display?  I have a feeling that refactoring the UI using some of the solutions proposed here will take quite a while, especially since the roadmap indicates it is "Not planned.   I looked into it a bit.   The current UI makes one call to: http://jenkins/blue/rest/organizations/jenkins/pipelines/_job_/runs/_runId_/nodes/ Internally, the nodes are retrieved via the PipelineNodeContainerImpl , which is a BluePipelineNodeContainer , which is a  Container , which is a Pageable . Therefore, the response that is constructed is a  PagedResponse , which has a default limit of 100 .  Since no start/limit is passed in the request, the default start=0 and limit=100 are used.  So even though the container has all of the nodes in it, only the first 100 are returned in the HTTP response. The response also includes a Link header for the next page: Link: </blue/rest/organizations/jenkins/pipelines/_job_/runs/_runId_/nodes/?start=100&limit=100>; rel="next" However, the UI does not follow the link to retrieve the next page.   While debugging, I was able to set a breakpoint on the server side, and hack around the arbitrary 100 limit to make the server return more than 100.  The UI worked just fine with the number of nodes in my pipelines.   There are a few hacky ways to make all the nodes show on the current UI: Increase the default limit constant Provide a way to configure the default limit (and don't rely on a constant), therefore allowing users to modify the limit as they desire Have the call to .../nodes pass a higher limit by default... e.g. .../nodes/?limit=500 Have the UI follow links to next pages (could be done by default, OR done if the user clicks on the "Unable to display more" node.   In any case, a short term solution would be much appreciated.    

          Ben Langfeld added a comment -

          My team has the necessary skills to contribute a fix to the UI to fetch all available pages. If we were to submit such a PR, is there a reasonable chance it would be merged for inclusion in the next BlueOcean release, or is there more roadmap politics to navigate than that?

          For anyone who is curious, the deficient object is the PipelinePager which doesn't actually page in data: https://github.com/jenkinsci/blueocean-plugin/blob/master/blueocean-dashboard/src/main/js/components/karaoke/services/pagers/PipelinePager.js past the initial page.

          Ben Langfeld added a comment - My team has the necessary skills to contribute a fix to the UI to fetch all available pages. If we were to submit such a PR, is there a reasonable chance it would be merged for inclusion in the next BlueOcean release, or is there more roadmap politics to navigate than that? For anyone who is curious, the deficient object is the PipelinePager which doesn't actually page in data: https://github.com/jenkinsci/blueocean-plugin/blob/master/blueocean-dashboard/src/main/js/components/karaoke/services/pagers/PipelinePager.js  past the initial page.

          This bug it greatly hindering my work. Could it be moved up the roadmap or could someone respond to the gentleman that offered to fix it in the above comment?

          Garett Arrowood added a comment - This bug it greatly hindering my work. Could it be moved up the roadmap or could someone respond to the gentleman that offered to fix it in the above comment?

          Keith Zantow added a comment -

          benlangfeld if you submit a PR that fixes the issue it would absolutely be considered for inclusion; we'd just have a look and make sure tests pass, etc.. Submissions are always welcome!

          Keith Zantow added a comment - benlangfeld if you submit a PR that fixes the issue it would absolutely be considered for inclusion; we'd just have a look and make sure tests pass, etc.. Submissions are always welcome!

          Cliff Meyers added a comment - - edited

          Seconded. benlangfeld I had looked at this problem in the past. Another option is to do successive fetches until all nodes / stages are loaded. If you look at the REST responses, you'll see there is pagination data written into a "Link" response header IIRC. That's a way to determine whether there is additional data to be fetched, and you could write some logic to grab say n=100 and just perform successive fetches until the Link header indicates there is no more data.

          We may want to be careful about doing a massive fetch up front (say n=500) as for complex pipelines this might have a perf impact server wide. I recall discussing this with vivek a while back, can you refresh my memory on whether it might be preferable to do a single large fetch (say n=500) or several smaller fetches (n=100) until all data is loaded? Intuitively fewer large fetches seems more efficient from client's perspective, but I seem to recall a concern with loading a large number of nodes concurrently in the context of a single request?

          Cliff Meyers added a comment - - edited Seconded. benlangfeld I had looked at this problem in the past. Another option is to do successive fetches until all nodes / stages are loaded. If you look at the REST responses, you'll see there is pagination data written into a "Link" response header IIRC. That's a way to determine whether there is additional data to be fetched, and you could write some logic to grab say n=100 and just perform successive fetches until the Link header indicates there is no more data. We may want to be careful about doing a massive fetch up front (say n=500) as for complex pipelines this might have a perf impact server wide. I recall discussing this with vivek a while back, can you refresh my memory on whether it might be preferable to do a single large fetch (say n=500) or several smaller fetches (n=100) until all data is loaded? Intuitively fewer large fetches seems more efficient from client's perspective, but I seem to recall a concern with loading a large number of nodes concurrently in the context of a single request?

          Ben Langfeld added a comment -

          Following Link is precisely what I had in mind cliffmeyers. We'll prep a patch. Thanks everyone.

          Ben Langfeld added a comment - Following Link is precisely what I had in mind cliffmeyers . We'll prep a patch. Thanks everyone.

          Michael Neale added a comment -

          benlangfeld just make sure you are nice and up to date with master as some recent changes were merged for how it follows along (may not affect you, just FYI). As for fetching the pages - absolutely why not, if you have a PR that would be wonderful. Go for it!

          Michael Neale added a comment - benlangfeld just make sure you are nice and up to date with master as some recent changes were merged for how it follows along (may not affect you, just FYI). As for fetching the pages - absolutely why not, if you have a PR that would be wonderful. Go for it!

          Ben Langfeld added a comment -

          A patch to resolve this is proposed at https://github.com/jenkinsci/blueocean-plugin/pull/1517. I would appreciate a review, particularly from cliffmeyers.

          Ben Langfeld added a comment - A patch to resolve this is proposed at https://github.com/jenkinsci/blueocean-plugin/pull/1517.  I would appreciate a review, particularly from cliffmeyers .

            benlangfeld Ben Langfeld
            ileitch Ian Leitch
            Votes:
            19 Vote for this issue
            Watchers:
            36 Start watching this issue

              Created:
              Updated:
              Resolved: