Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-35821

API to return per node averages, and ETA based on history of execution

    • 1.0-m7

      In scope

          [JENKINS-35821] API to return per node averages, and ETA based on history of execution

          Sam Van Oort added a comment -

          vpandey The parallel blocks handling seems reasonable (though I think it needs to define handling for other combinations of states in a similar fashion).

          > I don't think we are handling these at the moment. So NotExecuted is something thats going to be executed eventually?

          No, NotExecuted is set on all steps skipped when you resume from a checkpoint - those steps are never executed (since the checkpoint resume takes care of their results). It's a separate state (stage view has handling for this), and it needs to be handled if we are going to support the CJP value-add in cloudbees workflow plugin.

          > This behavior sounds correct. Why are we marking that step as failure? IMO, user is explicitly declaring try/catch, that tells pipeline machinery should not flag it as ErrorAction.

          That was my impression too - if the error is caught, the step should either not get an ErrorAction, or it should have some marker to indicate that it the error was caught. Of course there is more than one school of thought on this matter unfortunately.

          Sam Van Oort added a comment - vpandey The parallel blocks handling seems reasonable (though I think it needs to define handling for other combinations of states in a similar fashion). > I don't think we are handling these at the moment. So NotExecuted is something thats going to be executed eventually? No, NotExecuted is set on all steps skipped when you resume from a checkpoint - those steps are never executed (since the checkpoint resume takes care of their results). It's a separate state (stage view has handling for this), and it needs to be handled if we are going to support the CJP value-add in cloudbees workflow plugin. > This behavior sounds correct. Why are we marking that step as failure? IMO, user is explicitly declaring try/catch, that tells pipeline machinery should not flag it as ErrorAction. That was my impression too - if the error is caught, the step should either not get an ErrorAction, or it should have some marker to indicate that it the error was caught. Of course there is more than one school of thought on this matter unfortunately.

          James Dumay added a comment -

          svanoort hey mate whats the current status of the work on your end?

          James Dumay added a comment - svanoort hey mate whats the current status of the work on your end?

          Sam Van Oort added a comment -

          jdumay It's coming along, the core that I was finishing previously is done, and I'm got work well underway on the block scanning visitor. Vivek and I will probably want to touch base next week about the API structure and consumption patterns.

          Sam Van Oort added a comment - jdumay It's coming along, the core that I was finishing previously is done, and I'm got work well underway on the block scanning visitor. Vivek and I will probably want to touch base next week about the API structure and consumption patterns.

          James Dumay added a comment -

          svanoort fantastic. Could you setup a hangout with him next week?

          James Dumay added a comment - svanoort fantastic. Could you setup a hangout with him next week?

          Sam Van Oort added a comment -

          jdumay Already done!

          Sam Van Oort added a comment - jdumay Already done!

          Michael Neale added a comment -

          vivek is this really redundant now that the real work is to move to bismuth api? if so, feel free to close it.

          Michael Neale added a comment - vivek is this really redundant now that the real work is to move to bismuth api? if so, feel free to close it.

          Vivek Pandey added a comment -

          michaelneale Bismuth API gives basic structure to parse executed flow nodes in SAX like parser, it also gives structure for timing info. Lets keep this ticket open as we need to implement these functionalities in blueocean api.

          Vivek Pandey added a comment - michaelneale Bismuth API gives basic structure to parse executed flow nodes in SAX like parser, it also gives structure for timing info. Lets keep this ticket open as we need to implement these functionalities in blueocean api.

          James Dumay added a comment -

          svanoort is there an API todo this yet?

          James Dumay added a comment - svanoort is there an API todo this yet?

          James Dumay added a comment -

          As discussed:

          • There is a low level API in bismuth to retrieve this data.
          • We will use the low level API in Blue Ocean to drive out the requirements of a future higher level API
          • Think of it as a PoC that lives within Blue Ocean that we could either throw away or reuse.
          • vivek to meet with svanoort to co-ordinate how it should be done.

          James Dumay added a comment - As discussed: There is a low level API in bismuth to retrieve this data. We will use the low level API in Blue Ocean to drive out the requirements of a future higher level API Think of it as a PoC that lives within Blue Ocean that we could either throw away or reuse. vivek to meet with svanoort to co-ordinate how it should be done.

          Sam Van Oort added a comment - - edited

          Per meeting with @vivek on Friday, there are a couple parts to this, and by breaking it up into smaller pieces we can make it easier. We have a followup meeting planned in a couple weeks to synch again and hammer out more of the fine details.

          Pieces:

          1. Collect a bit more structural information in BO during flow analysis (an additional Map or two). This lets us overlay a treelike, or DOM-like overlay of flow node data (depth-limited for BO though, so simplified).
          2. Create a generic API accepting DOM-like interface, which generates mappings of similar ("homologous") parts of the flow execution across WorkflowRuns (i.e. stage 'bob' in Run #1 maps to stage 'bob' in Run #2).
          • This combines a mapping component (first stage with same name matches) with a filtering component (ex: if stage didn't complete with SUCCESSFUL, it can't be used for prediction).
          • This will support pluggable strategies to accomplish this (heuristics). This lets us do the simplest version then rip it out entirely if needed (and give Blue Ocean a basic form without much nesting, but analytics can do a fancier one if desired). Also important because mappings are very fiddly (see: tons of Stage View Bugs logged for this aspect).
          • Mappings will be fuzzy/best-case, and can fail entirely if two runs are too different.
          • For more complex mappings we'll do it recursively to simplify this – map stages against each other, then map steps within each stage.
          1. Take the homologous (similar) pieces of flows and generate predictions for run time by aggregating them. We can use the status/timing APIs. This will combine run time, pause time, and maybe status. It should be VERY simple by design. Probably just do an average or median of times (subtracting pause), maybe report an error bar if enough of them.
          2. Optional: a basic API where you can request run analysis by giving a WorkflowRun as input – this can internally use caching or an analysis thread pool (to avoid overloading the system).

          Basically what we'd do is digest a flow, then digest previous runs, try to map similar parts, then see if we have enough data for predictions, find the similar bits, then do estimates based on those. We're aiming for the simplest workable solution initially.

          Note: to map flow chunks, each flowchunk must be identified by:

          1. Label if present (for stages, parallel branches).
          2. Index within enclosing container (example, 1st FlowNode in the stage, 1st FlowNode in a parallel, 1st stage in a run).
          3. EITHER containers have a list of contained chunks (i.e. for BO a WorkflowRun will have a list of stages and parallel blocks), OR each chunk lists its parent container. This is to let us (for example) see if we have 2 parallel blocks with the same branch names vs. one parallel block.

          *Why all this?*

          1. We're seeing increasingly dynamic pipeline structures – conditional execution of parallel branches, whole stages, or more.
          2. We can't directly line up nodes by ID, because some stages may have different numbers of steps (example: retry blocks).
          3. Generating mappings has all the hard bits, so we isolate it for testability and simplicity.
          4. We can delegate to the pipeline graph analysis StatusAndTiming APIs for all the timing info

          Sam Van Oort added a comment - - edited Per meeting with @vivek on Friday, there are a couple parts to this, and by breaking it up into smaller pieces we can make it easier. We have a followup meeting planned in a couple weeks to synch again and hammer out more of the fine details. Pieces: Collect a bit more structural information in BO during flow analysis (an additional Map or two). This lets us overlay a treelike, or DOM-like overlay of flow node data (depth-limited for BO though, so simplified). Create a generic API accepting DOM-like interface, which generates mappings of similar ("homologous") parts of the flow execution across WorkflowRuns (i.e. stage 'bob' in Run #1 maps to stage 'bob' in Run #2). This combines a mapping component (first stage with same name matches) with a filtering component (ex: if stage didn't complete with SUCCESSFUL, it can't be used for prediction). This will support pluggable strategies to accomplish this (heuristics). This lets us do the simplest version then rip it out entirely if needed (and give Blue Ocean a basic form without much nesting, but analytics can do a fancier one if desired). Also important because mappings are very fiddly (see: tons of Stage View Bugs logged for this aspect). Mappings will be fuzzy/best-case, and can fail entirely if two runs are too different. For more complex mappings we'll do it recursively to simplify this – map stages against each other, then map steps within each stage. Take the homologous (similar) pieces of flows and generate predictions for run time by aggregating them. We can use the status/timing APIs. This will combine run time, pause time, and maybe status. It should be VERY simple by design. Probably just do an average or median of times (subtracting pause), maybe report an error bar if enough of them. Optional: a basic API where you can request run analysis by giving a WorkflowRun as input – this can internally use caching or an analysis thread pool (to avoid overloading the system). Basically what we'd do is digest a flow, then digest previous runs, try to map similar parts, then see if we have enough data for predictions, find the similar bits, then do estimates based on those. We're aiming for the simplest workable solution initially. Note: to map flow chunks, each flowchunk must be identified by : Label if present (for stages, parallel branches). Index within enclosing container (example, 1st FlowNode in the stage, 1st FlowNode in a parallel, 1st stage in a run). EITHER containers have a list of contained chunks (i.e. for BO a WorkflowRun will have a list of stages and parallel blocks), OR each chunk lists its parent container. This is to let us (for example) see if we have 2 parallel blocks with the same branch names vs. one parallel block. * Why all this? * We're seeing increasingly dynamic pipeline structures – conditional execution of parallel branches, whole stages, or more. We can't directly line up nodes by ID, because some stages may have different numbers of steps (example: retry blocks). Generating mappings has all the hard bits, so we isolate it for testability and simplicity. We can delegate to the pipeline graph analysis StatusAndTiming APIs for all the timing info

            Unassigned Unassigned
            jamesdumay James Dumay
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: