As a pipeline user, I WANNA GO FAST.
Pipelines are highly IO-bound, and unfortunately are triggering multiple writes to a FlowNode (record of step execution) while it is being configured with the initial actions. As-is, every action we attach triggers a persistence cycle and we attach several usually.
We should defer persisting the FlowNode until the initial set-up is done and the step begins executing meaningful work.
- Provide FlowNodeStorage with ability to track FlowNodes that haven't been written to disk yet
- Provide FlowNodeStorage with APIs to force a node to be flushed-to-disk
- Provide FlowNodeStorage API to flush everything to disk (for restarts, etc)
- Provide a way to enable auto-persistence of a FlowNode once we have attached initial actions
This will reduce IO use, reduce CPU use in persistence, and reduce memory garbage generated that has to be garbage-collected. An early prototype version of this showed a 50% increase in build throughput (33% reduction in runtime) in one general case under a properly-constructed benchmark.