Skip to content

Commit 1dcb77a

Browse files
committed
2.8
1 parent b39bd8d commit 1dcb77a

File tree

7 files changed

+139
-10
lines changed

7 files changed

+139
-10
lines changed

astro.config.mjs

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,34 @@
1+
import mdx from "@astrojs/mdx";
12
import sitemap from "@astrojs/sitemap";
23
import solidJs from "@astrojs/solid-js";
34
import { defineConfig } from "astro/config";
45
import unocss from "unocss/astro";
5-
import config from "./src/config";
66

7-
import mdx from "@astrojs/mdx";
7+
import config from "./src/config";
88

99
// https://astro.build/config
1010
export default defineConfig({
1111
integrations: [
1212
unocss({
13-
injectReset: true
13+
injectReset: true,
1414
}),
1515
sitemap(),
1616
solidJs(),
17-
mdx()
17+
mdx(),
1818
],
1919
site: config.site.url,
2020
base: config.site.baseUrl,
2121
prefetch: {
2222
prefetchAll: true,
23-
defaultStrategy: "hover"
23+
defaultStrategy: "hover",
2424
},
2525
markdown: {
2626
shikiConfig: {
27-
themes: config.post.code.theme
28-
}
27+
themes: config.post.code.theme,
28+
},
2929
},
3030
redirects: {
31+
"/post/announcing-fjall-2": "/post/fjall-2",
3132
"/post/announcing-fjall-22": "/post/fjall-2-2",
32-
}
33-
});
33+
},
34+
});
30 KB
Loading
19.9 KB
Loading
34 KB
Loading
27.1 KB
Loading

src/content/blog/2024-09-27_fjall-2.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Announcing Fjall 2.0
3-
slug: announcing-fjall-2
3+
slug: fjall-2
44
description: Available in all Cargo registries near you
55
tags:
66
- major
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
---
2+
title: Fjall 2.8
3+
slug: fjall-2-8
4+
description: "Better cache API & fast bulk loading"
5+
tags:
6+
- release
7+
- block cache
8+
- performance
9+
published_at: 2025-03-30T19:06:53.186Z
10+
last_modified_at: 2025-03-30T19:06:53.186Z
11+
image: /media/thumbs/kawaii.png
12+
---
13+
14+
Fjall is an embeddable LSM-based forbid-unsafe Rust key-value storage engine.
15+
Its goal is to be a reliable & predictable but performant general-purpose KV storage engine.
16+
17+
---
18+
19+
## Bulk loading API
20+
21+
For bulk loading, we want fast insertion speeds of an existing data set.
22+
Because the data set already exists, we can insert in ascending order, which makes bulk loading faster.
23+
The write path of a naive LSM-tree is already notoriously short:
24+
Any write object gets appended to the write-ahead journal, followed by an insert into the in-memory Memtable (e.g. a skiplist).
25+
Most writes do not even trigger a disk I/O when we do not explicitly sync to disk, which is desirable to make bulk loads fast.
26+
27+
However, this write path is still unnecessarily slow for bulk loads.
28+
Logging to the journal, and periodically flushing memtables has a write amplification of 2\* (for ascending data, we never need to compact).
29+
30+
If we can guarantee:
31+
32+
1. the data is inserted in ascending order
33+
2. our tree starts out empty
34+
35+
we can skip the journal, flushing and compaction machinery altogether, while also needing less temporary memory.
36+
37+
Using a new API, we gain access directly to the internal segment writing mechanism of the LSM-tree.
38+
This API takes a sorted iterator and creates a list of disk segments (a.k.a. `SSTables`), then registers them atomically into the tree.
39+
40+
This API is very useful for:
41+
42+
- schema migrations from tree A to tree B
43+
- migrating from a different DB
44+
- restoring data from a backup
45+
46+
```rs
47+
let new_tree = ...
48+
49+
let stream = (0..1_000_000).map(|x| /* create KV-tuples */);
50+
51+
new_tree.ingest(stream.into_iter())?;
52+
53+
assert_eq!(new_tree.len(), 1_000_000, "bulk load was not correct");
54+
```
55+
56+
> \* Actually the write amp in such case is 3 currently, but [that will change to be 2 in the future](https://github.com/fjall-rs/lsm-tree/issues/121).
57+
58+
### Benchmark
59+
60+
This benchmark writes 100 million monotonically ascending `u128` integer keys with 100 byte values:
61+
62+
<div style="margin-top: 10px; width: 100%; display: flex; justify-content: center">
63+
<img style="border-radius: 16px; max-height: 500px" src="/media/posts/fjall-28/ingest_cpu.png" />
64+
</div>
65+
<div style="margin-top: 10px; width: 100%; display: flex; justify-content: center">
66+
<img style="border-radius: 16px; max-height: 500px" src="/media/posts/fjall-28/ingest_write_amp.png" />
67+
</div>
68+
<div style="margin-top: 10px; width: 100%; display: flex; justify-content: center">
69+
<img style="border-radius: 16px; max-height: 500px" src="/media/posts/fjall-28/ingest_write_buffer.png" />
70+
</div>
71+
72+
Note:
73+
74+
- `redb` used a single write transaction to load all the data, which is the fastest way to load a lot of data
75+
- `sled` does not have a bulk loading mechanism, so comparing it would be unfair
76+
77+
## Unified cache API
78+
79+
Setting the cache capacity has always been a bit of an awkward API.
80+
Originally it was intended to eventually allow different types of caches, however that never really materialized.
81+
Not only is the API verbose, it also makes it impossible to tune the cache capacity well in case of key-value separation.
82+
83+
With key-value separation, we have different types of storages: the index tree and the value log.
84+
The index tree is typically much smaller than the value log because it only stores pointers into the value log.
85+
Only small values are directly added to the index tree.
86+
Depending on the average value size, it is easily possible to have an index-to-vLog size ratio of 1:30'000 (in case all values are ~1 MB).
87+
So for 100 GB of blobs, the index tree would be ~3 MB.
88+
That makes it pretty trivial to fully cache the index tree.
89+
90+
However, unless _perfectly_ knowing the size ratio and data set size, it is impossible to properly set the block cache size such that the index tree can be fully cached.
91+
92+
Consider the example below: we want to spend 1 GB of memory on caching, so we allow 900 MB to be used for blob caching, while reserving 100 MB for the index tree's blocks.
93+
However, if we stored the 100 GB mentioned above, the index tree would be much smaller than our configured cache size, resulting in essentially 97 MB of wasted cache that could be used for blobs (or bloom filters) instead.
94+
95+
```rs
96+
// Before (< 2.8); now deprecated
97+
use std::sync::Arc;
98+
use fjall::{BlobCache, BlockCache, Config};
99+
100+
let keyspace = Config::new(&folder)
101+
.block_cache(Arc::new(BlockCache::with_capacity_bytes(/* 100 MB */ 100_000_000)))
102+
.blob_cache(Arc::new(BlobCache::with_capacity_bytes(/* 900 MB */ 900_000_000)))
103+
.open()?;
104+
105+
// After (2.8+)
106+
use fjall::Config;
107+
108+
let keyspace = Config::new(&folder)
109+
.cache_size(/* 1 GB */ 1_000_000_000)
110+
.open()?;
111+
```
112+
113+
Internally, now the cache is one unified cache that stores all types of items (blocks and blobs).
114+
That way we only have to set its capacity, and let the internal caching algorithm handle what data to evict.
115+
116+
## Replaced `std::slice::partition_point`
117+
118+
The implementation of `std::slice::partition_point` was modified around rustc **1.82**, which caused performance regressions in binary searches.
119+
120+
Using a less smart, cookie cutter implementation seems to perform better for `lsm-tree`, restoring some read performance in cached, random key scenarios:
121+
122+
<div style="margin-top: 10px; width: 100%; display: flex; justify-content: center">
123+
<img style="border-radius: 16px; max-height: 500px" src="/media/posts/fjall-28/ycsb_c_binary_search.png" />
124+
</div>
125+
126+
## 2.7
127+
128+
2.7 was mostly a maintenance update with minor uninteresting features.

0 commit comments

Comments
 (0)