feat: database garbage collection#2638
Conversation
|
Ran benchmarks with the entire snapshot in the "old" partition and it doesn't affect performance in any way I can measure. |
lemmih
left a comment
There was a problem hiding this comment.
Alright!
I'll add issues for the following improvements:
- Load data directly into the old space rather than the new space.
- Show progress during garbage collection (and chain exporting).
- Possibly get rid of the
Storetrait altogether. - Yield during GC to allow Forest to shutdown properly.
LesnyRumcajs
left a comment
There was a problem hiding this comment.
It'd be fantastic to make a blogpost about it :)
| let db_garbage_collector = { | ||
| let db = db.clone(); | ||
| let chain_store = chain_store.clone(); | ||
| let get_tipset = move || chain_store.heaviest_tipset().as_ref().clone(); | ||
| Arc::new(DbGarbageCollector::new(db, get_tipset)) | ||
| }; | ||
|
|
||
| #[allow(clippy::redundant_async_block)] | ||
| services.spawn({ | ||
| let db_garbage_collector = db_garbage_collector.clone(); | ||
| async move { db_garbage_collector.collect_loop_passive().await } | ||
| }); | ||
| #[allow(clippy::redundant_async_block)] | ||
| services.spawn({ | ||
| let db_garbage_collector = db_garbage_collector.clone(); | ||
| async move { db_garbage_collector.collect_loop_event().await } | ||
| }); |
There was a problem hiding this comment.
It'd be nice to move it to a separate method. To my understanding, those calls in general need each other.
There was a problem hiding this comment.
collect_loop_passive and collect_loop_event work independently, either or both can be disabled
| use super::*; | ||
|
|
||
| #[tokio::test] | ||
| #[tokio::test(flavor = "multi_thread")] |
There was a problem hiding this comment.
Snapshot import spawns a task for writing to DB now, so that single thread become insufficient
There was a problem hiding this comment.
Why is the single threaded runtime insufficient? Surely it can run multiple tasks in parallel.
There was a problem hiding this comment.
The test deadlocked so that I made the change
There was a problem hiding this comment.
I think we need to debug why that happened. We shouldn't be doing anything blocking.
There was a problem hiding this comment.
Just reverted the multi_threading change and the deadlock is gone
350fd3f
There was a problem hiding this comment.
The last time I encountered deadlock with tokio was because there were multiple runtimes. It was a bug. :)
| //! used to speed up the database write operation | ||
| //! | ||
| //! ## Scheduling | ||
| //! 1. GC is triggered automatically when total DB size is greater than 2x of |
There was a problem hiding this comment.
It'd be nice to add an actual mainnet example.
There was a problem hiding this comment.
Do u mean an example output in log?
There was a problem hiding this comment.
Just example numbers with mainnet.
There was a problem hiding this comment.
Example numbers added
| .ok_or_else(|| anyhow::anyhow!("Cid {cid} not found in blockstore"))?; | ||
|
|
||
| let pair = (cid, block.clone()); | ||
| // Key size is 32 bytes in paritydb |
There was a problem hiding this comment.
So, does this work at all with RocksDb? If not, how does Forest behave when RocksDb is used?
There was a problem hiding this comment.
All functionalities work with rocksdb, however, I did not test thru the crash resilience story for rocksdb so it might not be able to recover when forest is shutdown improperly. I believe we had crash resilience issue with rocksdb before so I think the potential regression is not introduced by this change
There was a problem hiding this comment.
So you tested that the GC works with rocksdb? How long does it take? I'm asking because the key size value is hardcoded here.
There was a problem hiding this comment.
The time cost is similar. Key size is used for estimating reachable data size as trigger conditions. However, rocksdb is compressed while paritydb is (mostly) not, the gc trigger interval of rocksdb tends to be longer
| //! This module contains a concurrent, semi-space garbage collector. The garbage | ||
| //! collector is guaranteed to be non-blocking and can be expected to run with a | ||
| //! fixed memory overhead and require disk space proportional to the size of the | ||
| //! reachable graph. For example, if the size of the reachable graph is 100GiB, | ||
| //! expect this garbage collector to use 3x100GiB = 300GiB of storage. |
There was a problem hiding this comment.
Before merging this, we should update the specs of our DO droplets.
There was a problem hiding this comment.
Agreed, 320GB is not sufficient if docker and rust toolchains need to be installed
There was a problem hiding this comment.
But we could mount a volume of ~500GB on the fly and symbol link the data folder
| } | ||
| } | ||
| self.put_many_keyed(buffer)?; | ||
| info!( |
There was a problem hiding this comment.
Is this instead a debug? Also, I'm not sure calling timers by default makes sense in a performance-related method.
There was a problem hiding this comment.
This method is called either during snapshot import or during database garbage collection which take 1-2h on laptop or 3-5h on DO droplet. The info log should never flood the screen but it's quite useful for collecting metrics. I would prefer to keep it info level so that it's available in the log by default. What do u think?
There was a problem hiding this comment.
If it's helpful, perhaps let's have a proper Prometheus metric? Like, average speed or something like this.
There was a problem hiding this comment.
In this case, I think log of individual runs is more helpful than aggregated metrics to understand details and nature of the database and for troubleshooting. How about keeping it as it is, given it only emits 2-3 lines every 1.5-2 days?
Co-authored-by: Hubert <hubert@chainsafe.io>
Summary of changes
Changes introduced in this pull request:
ChainStoreforest-cli(Blocked by #2635)
Reference issue to close (if applicable)
Closes
#2292
#1708
Other information and links
Change checklist