feat: add sqlite-vec vector backend with Strategy pattern#300
feat: add sqlite-vec vector backend with Strategy pattern#300Tanmay-008 wants to merge 2 commits into
Conversation
- Introduce VectorBackend interface with MemoryVectorIndex (default) and SqliteVectorIndex (opt-in via VECTOR_BACKEND=sqlite-vec) - Two-table schema (vec_meta + vec_data) with ACID transactions - Auto-migration from legacy KV store to sqlite-vec on startup - Dynamic imports with graceful fallback for native deps - optionalDependencies + postinstall.js for native dep detection - All methods now async to support both sync and async backends - Benchmark: 100k vectors baseline (MemoryVectorIndex ~40ms search) 838/838 tests passing, zero regressions.
📝 WalkthroughWalkthroughThis PR refactors vector indexing from a synchronous in-memory implementation to an async, pluggable VectorBackend architecture, adds Memory and Sqlite backends, introduces optional native-dependency checks and a postinstall hook, updates persistence/startup migration, and adapts benchmarks and tests to the async API. ChangesVector Backend Refactoring and Multi-Backend Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/index.ts`:
- Around line 367-376: The cleanup currently only runs when migrated > 0 so
legacyVecData can persist if restore deduplicates existing rows; update the
logic around legacyVecData and vectorIndex.restoreFrom so that after await
vectorIndex.restoreFrom(legacyVecData) you always clear the legacy KV payload
(call kv.set(KV.bm25Index, "vectors", "[]")) whenever legacyVecData is a
non-empty string (i.e. the same condition that triggers the restore), and
optionally log a message when migrated === 0 to indicate the legacy data was
removed without adding vectors.
In `@src/state/vector-index-memory.ts`:
- Around line 79-105: The restoreFrom method currently merges parsed rows into
the existing this.vectors map causing removed keys to persist; change
restoreFrom (the async function taking serializedJson) to fully replace state by
clearing or assigning a new Map for this.vectors before iterating the parsed
array, then populate it exactly from the snapshot entries (same validation using
base64ToFloat32 and sessionId checks) so only snapshot keys remain after
restore.
In `@src/state/vector-index-sqlite.ts`:
- Around line 24-54: ensureInitialized() is race-prone when add/search/remove
call it concurrently; create a shared initialization promise (e.g.
this.initPromise) used by ensureInitialized so the first caller performs the
init and others await the same promise. Modify ensureInitialized to: if
(this.initialized) return; if (this.initPromise) await this.initPromise;
otherwise set this.initPromise = (async () => { perform loadNativeDeps(),
Database/ sqliteVec setup, CREATE TABLE/VIRTUAL TABLE execs, set
this.initialized = true })(); await the promise and in a finally block clear
this.initPromise on failure; update callers (add/search/remove) to await
ensureInitialized as before.
- Around line 155-157: The Float32Array construction reads from the underlying
ArrayBuffer without honoring the Buffer's byteOffset/length; update the code
that builds embedding from entry.embedding so you first assign the decoded
buffer (e.g., const buf = Buffer.from(entry.embedding, "base64")) and then
create the typed view using the buffer, buf.byteOffset, and buf.byteLength
divided by Float32Array.BYTES_PER_ELEMENT so the Float32Array only covers the
Buffer's exact bytes (replace the current new
Float32Array(Buffer.from(...).buffer) usage).
- Around line 63-70: Replace the current INSERT OR REPLACE pattern in the
transaction inside the VecIndex/VectorIndexSqlite implementation so you don't
delete and reinsert rows (which changes rowid and leaves orphaned vec_data).
Instead, run an INSERT INTO vec_meta (obs_id, session_id) VALUES (?, ?) ON
CONFLICT(obs_id) DO UPDATE SET session_id = excluded.session_id RETURNING rowid
to obtain the existing-or-new rowid, then use that rowid to INSERT INTO vec_data
(rowid, embedding) VALUES (?, ?) ON CONFLICT(rowid) DO UPDATE SET embedding =
excluded.embedding; keep this logic inside the same transaction used in the
original method to ensure atomicity and continue passing the embedding as a
Buffer (Buffer.from(embedding.buffer)).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5681ea1d-ce84-4522-a41b-2f36506ee46c
📒 Files selected for processing (16)
benchmark/longmemeval-bench.tsbenchmark/quality-eval.tsbenchmark/real-embeddings-eval.tsbenchmark/scale-eval.tsbenchmark/vector-backend-bench.tspackage.jsonpostinstall.jssrc/index.tssrc/state/hybrid-search.tssrc/state/index-persistence.tssrc/state/vector-index-memory.tssrc/state/vector-index-sqlite.tssrc/state/vector-index.tssrc/utils/native-deps.d.tstest/index-persistence.test.tstest/vector-index.test.ts
|
@Tanmay-008 is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/state/vector-index-sqlite.ts`:
- Around line 164-198: The per-row try/catch in insertTx() (the loop that reads
rows, decodes embedding, and writes to vec_meta/vec_data) currently swallows all
errors; change it to catch (err) and increment a local failure counter and push
error details (obsId and err.message) into a small list instead of silently
continuing, and after the loop return or throw a summary if failures > 0 (e.g.,
include failure count and sample errors) so callers can react; keep all existing
DB operations (SELECT/DELETE/UPDATE/INSERT on vec_meta/vec_data) and only alter
the catch behavior and the function's return/throw to surface aggregated
failures.
- Around line 145-149: The clear() method executes two separate DELETEs which
can leave tables inconsistent if the second fails; wrap the two deletes in a
single SQLite transaction in clear() (after ensureInitialized()) using
BEGIN/COMMIT and ROLLBACK on error or the DB client's transaction API, executing
both DELETE FROM vec_data and DELETE FROM vec_meta inside that transaction and
ensuring rollback and error propagation if any statement fails.
- Around line 27-57: The init routine assigns this.initPromise to an async IIFE
but never resets it if the IIFE throws, causing future calls to await a
permanently rejected promise; wrap the IIFE body with a try/catch (or append
.catch) that on any error sets this.initPromise = null, closes/cleans up this.db
if it was opened (this.db), and then rethrows the error so callers see it;
ensure the unique symbols touched are this.initPromise, loadNativeDeps(),
Database, sqliteVec.load(this.db), and this.db.exec(...) so you reset state on
any failure path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2704fb2b-7f49-4518-952a-e91df9fdee5c
📒 Files selected for processing (3)
src/index.tssrc/state/vector-index-memory.tssrc/state/vector-index-sqlite.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/index.ts
- src/state/vector-index-memory.ts
| this.initPromise = (async () => { | ||
| const deps = await loadNativeDeps(); | ||
| if (!deps) { | ||
| this.initPromise = null; | ||
| throw new Error( | ||
| "[agentmemory] better-sqlite3/sqlite-vec not installed. Run: npm install better-sqlite3 sqlite-vec", | ||
| ); | ||
| } | ||
|
|
||
| const { Database, sqliteVec } = deps; | ||
| this.db = new Database(this.dbPath); | ||
| sqliteVec.load(this.db); | ||
|
|
||
| this.db.exec(` | ||
| CREATE TABLE IF NOT EXISTS vec_meta ( | ||
| rowid INTEGER PRIMARY KEY AUTOINCREMENT, | ||
| obs_id TEXT UNIQUE NOT NULL, | ||
| session_id TEXT NOT NULL | ||
| ) | ||
| `); | ||
|
|
||
| this.db.exec(` | ||
| CREATE VIRTUAL TABLE IF NOT EXISTS vec_data USING vec0( | ||
| rowid INTEGER PRIMARY KEY, | ||
| embedding float[${this.dimensions}] | ||
| ) | ||
| `); | ||
| })(); | ||
|
|
||
| return this.initPromise; | ||
| } |
There was a problem hiding this comment.
Reset initPromise on initialization failure paths.
If initialization fails after this.initPromise is assigned (e.g., DB open/schema error), later calls keep awaiting the same rejected promise and the backend cannot recover.
Suggested fix
private ensureInitialized(): Promise<void> {
if (this.initPromise) return this.initPromise;
this.initPromise = (async () => {
const deps = await loadNativeDeps();
if (!deps) {
- this.initPromise = null;
throw new Error(
"[agentmemory] better-sqlite3/sqlite-vec not installed. Run: npm install better-sqlite3 sqlite-vec",
);
}
const { Database, sqliteVec } = deps;
this.db = new Database(this.dbPath);
sqliteVec.load(this.db);
this.db.exec(`
CREATE TABLE IF NOT EXISTS vec_meta (
rowid INTEGER PRIMARY KEY AUTOINCREMENT,
obs_id TEXT UNIQUE NOT NULL,
session_id TEXT NOT NULL
)
`);
this.db.exec(`
CREATE VIRTUAL TABLE IF NOT EXISTS vec_data USING vec0(
rowid INTEGER PRIMARY KEY,
embedding float[${this.dimensions}]
)
`);
})();
- return this.initPromise;
+ return this.initPromise.catch((error) => {
+ this.initPromise = null;
+ this.db = null;
+ throw error;
+ });
}🧰 Tools
🪛 OpenGrep (1.20.0)
[ERROR] 48-53: Dynamic command passed to child_process.exec/execSync. Use child_process.execFile or spawn with an argument array instead.
(coderabbit.command-injection.exec-js)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/state/vector-index-sqlite.ts` around lines 27 - 57, The init routine
assigns this.initPromise to an async IIFE but never resets it if the IIFE
throws, causing future calls to await a permanently rejected promise; wrap the
IIFE body with a try/catch (or append .catch) that on any error sets
this.initPromise = null, closes/cleans up this.db if it was opened (this.db),
and then rethrows the error so callers see it; ensure the unique symbols touched
are this.initPromise, loadNativeDeps(), Database, sqliteVec.load(this.db), and
this.db.exec(...) so you reset state on any failure path.
| async clear(): Promise<void> { | ||
| await this.ensureInitialized(); | ||
| this.db.exec("DELETE FROM vec_data"); | ||
| this.db.exec("DELETE FROM vec_meta"); | ||
| } |
There was a problem hiding this comment.
Make clear() atomic with a transaction.
clear() currently does two independent deletes. If the second statement fails, metadata/vector tables can diverge.
Suggested fix
async clear(): Promise<void> {
await this.ensureInitialized();
- this.db.exec("DELETE FROM vec_data");
- this.db.exec("DELETE FROM vec_meta");
+ const clearTx = this.db.transaction(() => {
+ this.db.exec("DELETE FROM vec_data");
+ this.db.exec("DELETE FROM vec_meta");
+ });
+ clearTx();
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/state/vector-index-sqlite.ts` around lines 145 - 149, The clear() method
executes two separate DELETEs which can leave tables inconsistent if the second
fails; wrap the two deletes in a single SQLite transaction in clear() (after
ensureInitialized()) using BEGIN/COMMIT and ROLLBACK on error or the DB client's
transaction API, executing both DELETE FROM vec_data and DELETE FROM vec_meta
inside that transaction and ensuring rollback and error propagation if any
statement fails.
| try { | ||
| if (!Array.isArray(row) || row.length < 2) continue; | ||
| const [obsId, entry] = row; | ||
| if ( | ||
| typeof obsId !== "string" || | ||
| typeof entry?.embedding !== "string" || | ||
| typeof entry?.sessionId !== "string" | ||
| ) | ||
| continue; | ||
| const buf = Buffer.from(entry.embedding, "base64"); | ||
| const embedding = new Float32Array(buf.buffer, buf.byteOffset, buf.byteLength / 4); | ||
|
|
||
| const existing = this.db | ||
| .prepare("SELECT rowid FROM vec_meta WHERE obs_id = ?") | ||
| .get(obsId) as { rowid: number } | undefined; | ||
|
|
||
| if (existing) { | ||
| this.db.prepare("DELETE FROM vec_data WHERE rowid = ?").run(existing.rowid); | ||
| this.db | ||
| .prepare("UPDATE vec_meta SET session_id = ? WHERE rowid = ?") | ||
| .run(entry.sessionId, existing.rowid); | ||
| this.db | ||
| .prepare("INSERT INTO vec_data (rowid, embedding) VALUES (?, ?)") | ||
| .run(existing.rowid, Buffer.from(embedding.buffer, embedding.byteOffset, embedding.byteLength)); | ||
| } else { | ||
| const info = this.db | ||
| .prepare("INSERT INTO vec_meta (obs_id, session_id) VALUES (?, ?)") | ||
| .run(obsId, entry.sessionId); | ||
| this.db | ||
| .prepare("INSERT INTO vec_data (rowid, embedding) VALUES (?, ?)") | ||
| .run(info.lastInsertRowid, Buffer.from(embedding.buffer, embedding.byteOffset, embedding.byteLength)); | ||
| } | ||
| } catch { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Don’t silently swallow restore failures.
Per-row catch { continue; } hides malformed payloads and DB write errors, making migrations look successful while dropping vectors.
Surface a failure count (or throw with summary) after insertTx() so callers can react.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/state/vector-index-sqlite.ts` around lines 164 - 198, The per-row
try/catch in insertTx() (the loop that reads rows, decodes embedding, and writes
to vec_meta/vec_data) currently swallows all errors; change it to catch (err)
and increment a local failure counter and push error details (obsId and
err.message) into a small list instead of silently continuing, and after the
loop return or throw a summary if failures > 0 (e.g., include failure count and
sample errors) so callers can react; keep all existing DB operations
(SELECT/DELETE/UPDATE/INSERT on vec_meta/vec_data) and only alter the catch
behavior and the function's return/throw to surface aggregated failures.
838/838 tests passing, zero regressions.
Summary by CodeRabbit
New Features
Improvements
Chores