Per meeting with @vivek on Friday, there are a couple parts to this, and by breaking it up into smaller pieces we can make it easier. We have a followup meeting planned in a couple weeks to synch again and hammer out more of the fine details.
- Collect a bit more structural information in BO during flow analysis (an additional Map or two). This lets us overlay a treelike, or DOM-like overlay of flow node data (depth-limited for BO though, so simplified).
- Create a generic API accepting DOM-like interface, which generates mappings of similar ("homologous") parts of the flow execution across WorkflowRuns (i.e. stage 'bob' in Run #1 maps to stage 'bob' in Run #2).
- This combines a mapping component (first stage with same name matches) with a filtering component (ex: if stage didn't complete with SUCCESSFUL, it can't be used for prediction).
- This will support pluggable strategies to accomplish this (heuristics). This lets us do the simplest version then rip it out entirely if needed (and give Blue Ocean a basic form without much nesting, but analytics can do a fancier one if desired). Also important because mappings are very fiddly (see: tons of Stage View Bugs logged for this aspect).
- Mappings will be fuzzy/best-case, and can fail entirely if two runs are too different.
- For more complex mappings we'll do it recursively to simplify this – map stages against each other, then map steps within each stage.
- Take the homologous (similar) pieces of flows and generate predictions for run time by aggregating them. We can use the status/timing APIs. This will combine run time, pause time, and maybe status. It should be VERY simple by design. Probably just do an average or median of times (subtracting pause), maybe report an error bar if enough of them.
- Optional: a basic API where you can request run analysis by giving a WorkflowRun as input – this can internally use caching or an analysis thread pool (to avoid overloading the system).
Basically what we'd do is digest a flow, then digest previous runs, try to map similar parts, then see if we have enough data for predictions, find the similar bits, then do estimates based on those. We're aiming for the simplest workable solution initially.
Note: to map flow chunks, each flowchunk must be identified by:
- Label if present (for stages, parallel branches).
- Index within enclosing container (example, 1st FlowNode in the stage, 1st FlowNode in a parallel, 1st stage in a run).
- EITHER containers have a list of contained chunks (i.e. for BO a WorkflowRun will have a list of stages and parallel blocks), OR each chunk lists its parent container. This is to let us (for example) see if we have 2 parallel blocks with the same branch names vs. one parallel block.
*Why all this?*
- We're seeing increasingly dynamic pipeline structures – conditional execution of parallel branches, whole stages, or more.
- We can't directly line up nodes by ID, because some stages may have different numbers of steps (example: retry blocks).
- Generating mappings has all the hard bits, so we isolate it for testability and simplicity.
- We can delegate to the pipeline graph analysis StatusAndTiming APIs for all the timing info