Skip to content

Database overlay backed by CAR file #3074

@lemmih

Description

@lemmih

Issue summary

A CAR file is an unordered stream of (CID, Ipld) pairs representing a (possibly incomplete) DAG. We often have to query data in a CAR file, and we currently do it by loading each key-value pair into a database. This is relatively slow, though, and it temporarily doubles the required storage space.

It should be possible to memory map a CAR file, scan through each key-value pair, build a mapping from key to position in the mapped file, and query it directly without using a database. Keeping the entire index in memory is feasible since each key is only 4 bytes, and there are roughly 55 million pairs in a mainnet snapshot file.

Doing the same with a zstd compressed CAR file is more work but still doable.

Use cases:

  • forest-cli snapshot validate. Could validate the integrity without requiring any extra storage space.
  • forest --import-snapshot .... Could connect to a network without a lengthy DB import.
  • In the future, queries on historical data could be completed much quicker if CAR files were directly used.

Other information and links

Metadata

Metadata

Assignees

Labels

ReadyIssue is ready for work and anyone can freely assign it to themselves

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions