<aside> 🚧
This document is a Work In Progress
</aside>
The bulk of the logic in the existing ingestion engine is used to track and sync collections. Since this will be mostly handled by the processing pipeline and we want to make these changes in a backwards compatible way to reduce change management overhead, we should create a new version of the engine.
The ingestion engine has the following responsibilities:
Results Forest
push-receipts
Results Forest
On startup, the ingestion engine loads the LatestPersistedSealedResultHeight
, and initializes the forest using the result for this block as the base. It then ExecutionResults
and ExecutionReceipts
for all descending blocks into the forest. Depending on the state of the node, there may be a small or large number of sealed results, as well as a set of unsealed results.
We can generalize this to 3 cases:
Case 1 is the happy path, and we should load all results into the forest and immediately start ingesting results from the network.
For cases 2 and 3, the loader will quickly fill the forest to its max size, and need to wait for pipelines to complete or be abandoned before adding more. We should not start ingesting results from the network since these would compete with the finalized data for space.
In case 2, the forest may not fill until after the initial persisted blocks are all loaded. i.e. it may fill with real-time network data. The ingestion engine would then need to fall back to loading results from disk.
In case 3, the loader could stay running until all data is loaded, switching to a real-time mode afterwards.