Issue summary
A CAR file is an unordered stream of (CID, Ipld) pairs representing a (possibly incomplete) DAG. We often have to query data in a CAR file, and we currently do it by loading each key-value pair into a database. This is relatively slow, though, and it temporarily doubles the required storage space.
It should be possible to memory map a CAR file, scan through each key-value pair, build a mapping from key to position in the mapped file, and query it directly without using a database. Keeping the entire index in memory is feasible since each key is only 4 bytes, and there are roughly 55 million pairs in a mainnet snapshot file.
Doing the same with a zstd compressed CAR file is more work but still doable.
Use cases:
forest-cli snapshot validate. Could validate the integrity without requiring any extra storage space.
forest --import-snapshot .... Could connect to a network without a lengthy DB import.
- In the future, queries on historical data could be completed much quicker if CAR files were directly used.
Other information and links
Issue summary
A CAR file is an unordered stream of
(CID, Ipld)pairs representing a (possibly incomplete) DAG. We often have to query data in a CAR file, and we currently do it by loading each key-value pair into a database. This is relatively slow, though, and it temporarily doubles the required storage space.It should be possible to memory map a CAR file, scan through each key-value pair, build a mapping from key to position in the mapped file, and query it directly without using a database. Keeping the entire index in memory is feasible since each key is only 4 bytes, and there are roughly 55 million pairs in a mainnet snapshot file.
Doing the same with a zstd compressed CAR file is more work but still doable.
Use cases:
forest-cli snapshot validate. Could validate the integrity without requiring any extra storage space.forest --import-snapshot .... Could connect to a network without a lengthy DB import.Other information and links