Skip to content

Add garbage collection to Forest #2292

@LesnyRumcajs

Description

@LesnyRumcajs

Issue summary

Forest DB, on a longer-running Forest instance, gets large at around 30G per day. This forces every node operator to implement its shrinking mechanism, with the simplest being:

  • export snapshot and turn off the node (or turn off the node and download it from a trusted source, it may be faster),
  • import the new snapshot
  • repeat when available disk space gets low.

We can do something better on our own (though following roughly the same logic). The rough idea is to mark entries as exportable and then delete them from the database. This should theoretically put us back to the just-after-import db size.

Task summary

  • Implement garbage collection for 1 DB backend of choice
  • Test exhaustively
  • Check for corner cases (e.g. SIGKILL during GC pause because why not) - if overly complicated, we might create a separate issue.
  • Implement for other DB backends

Acceptance Criteria

  • GC pause is minimal,
  • disk space overhead is minimal,
  • works on calibnet,
  • works on mainnet on a reasonable machine (e.g., the default Digital Ocean VPS we are using with 320GB SSD disk),
  • works on all supported DB backends.

Other information and links

Not exactly the way we decided to move forward at the moment, but worth mentioning: #1708

Metadata

Metadata

Labels

NodePriority: 2 - HighVery important and should be addressed ASAPReadyIssue is ready for work and anyone can freely assign it to themselves

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions