memory: SQLite + FTS5 keyword index over workspace/memory/ (Phase A)#74
Conversation
Phase A of the hybrid memory system. Mirrors openclaw's model: durable
notes live as Markdown files (MEMORY.md, workspace/memory/<date>.md)
that the model edits with the existing fs_write tool. SQLite is a
derived index — never the source of truth, lazy-rebuilt on each
memory_search call by walking mtimes.
Components
----------
1. PasClaw.Memory.Index — IMemoryIndex interface + cross-target
implementation.
- {$IFDEF FPC}: TSQLite3Connection + TSQLQuery from sqldb. Needs
fcl-db at build time; libsqlite3 at runtime.
- {$ELSE}: TFDConnection + TFDQuery from FireDAC. Needs
FireDAC at build time (ships with Delphi); sqlite3.dll at
runtime.
Open returns False and logs a warning when libsqlite3 can't be
loaded; memory_search then degrades to "index unavailable"
without taking down the rest of the agent.
Schema:
memory_files(rowid PK, path UNIQUE, mtime, indexed_at)
memory_fts USING fts5(path UNINDEXED, content,
tokenize='porter unicode61')
Each file is one FTS5 row, sharing rowid with memory_files. The
reindex path deletes both rows by path and reinserts so rowids
stay in sync across edits.
SyncDir walks workspace/memory/MEMORY.md + workspace/memory/*.md,
compares mtimes, reindexes whatever's newer, and drops rows for
files that vanished from disk.
Search runs MATCH against memory_fts, returns up to K hits with
the FTS5 snippet() excerpt around the matched term and the
bm25() score (smaller = better) — wrapped as TMemoryHitArray.
2. PasClaw.Tools.Memory — registers memory_search(query, k?) tool.
Opens the index at <home>/workspace/memory/.index.db, syncs, runs
the FTS5 query, returns up to 25 hits as
path (bm25=N.NNN)
snippet text…
one per match. Missing memory dir surfaces a "(no memory
directory yet — write to MEMORY.md first)" hint instead of an
error. No memory_add tool by design — model uses fs_write per
openclaw's convention.
3. BuildMemorySection extension — used to inject only MEMORY.md.
Now also injects today's and yesterday's daily notes (if either
exists), wrapped under "### Today (<date>)" / "### Yesterday
(<date>)" subsections so the model can tell durable from dated
material apart. Mirrors openclaw's bootstrap loading. Older
daily notes stay on disk and reach the model only when
memory_search hits them.
4. Rules section update — rule 5 now teaches the model to write
durable preferences to MEMORY.md AND ephemeral session context
to workspace/memory/<today>.md, and to call memory_search before
answering questions about prior conversations.
5. RegisterMemoryTools wired into every tool-registry build site:
PasClaw.Cmd.Agent / Gateway / TUI / Serve, and
PasClaw.Component (both registry constructors). Honours
--no-tools just like the other tools — disabled when the
registry is nil.
Build
-----
Makefile adds FCLDB_DIR / SQLITE_DIR (default to Debian fp-units-db
paths) and a $(if LAZUTILS_DIR,…) include so Tools.FS.Masks
continues to resolve under environments missing fcl-db's masks unit.
PasClaw.dproj adds DCCReference entries for the two new units.
Verification
------------
- make builds clean (FPC 3.2.2).
- Standalone smoke test (open / SyncDir / Search / reindex on edit /
drop on delete) — 7 assertions, all pass.
- pasclaw --help still lists every existing command unchanged.
Deferred to Phase B
-------------------
- vec0 / embeddings / hybrid RRF retrieval (sqlite-vec loadable
extension + ILLMProvider.Embed).
- openclaw's "memory flush" pre-compaction prompt (PasClaw has no
compaction yet).
- "Dreaming" consolidation pass.
- Eviction / decay / importance scoring.
https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 78e29a57fc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| 'SELECT path, snippet(memory_fts, 1, ''«'', ''»'', ''…'', 24), bm25(memory_fts) ' + | ||
| 'FROM memory_fts WHERE memory_fts MATCH :q ' + | ||
| 'ORDER BY bm25(memory_fts) LIMIT :k'; | ||
| Q.Params.ParamByName('q').AsString := Query; |
There was a problem hiding this comment.
Escape natural-language queries before MATCH
When memory_search is called with ordinary prior-conversation questions or code/path tokens containing FTS punctuation (for example what did we discuss?, C++, or foo/bar), SQLite FTS5 raises a syntax error; this method catches that and returns an empty result, so the tool reports no matches even when indexed notes contain the terms. Since the prompt tells the model to use this tool for natural-language memory lookups, sanitize/tokenize the query or fall back to a quoted/ANDed term query before passing it to MATCH (the same raw binding is duplicated in the FireDAC branch).
Useful? React with 👍 / 👎.
…tches PR #74's Codex review (P2) flagged that memory_search passes its query verbatim into the FTS5 MATCH operator. FTS5 treats unquoted ASCII punctuation as syntax, so common inputs blew up with "fts5: syntax error" — caught by the surrounding try/except and silently turned into "no matches": what did we discuss? - terminal ? is a syntax error C++ - + is an operator foo/bar - / is illegal skill_<name> - < terminates the column qualifier "phrase search - unbalanced quote ( AND OR NOT NEAR - reserved keywords with no operand The model's perspective was indistinguishable from "no matches in the indexed notes", so legitimate hits stayed invisible. Fix: new SanitizeFtsQuery helper at the top of the implementation section. Splits the raw input on every non-token byte and reassembles the surviving tokens as OR-ed quoted phrases. Bytes >= $80 stay in their tokens so UTF-8 multi-byte sequences survive intact (the porter unicode61 tokenizer folds them at index time). Empty / all-separator input yields '' so the caller's early-exit path runs without ever binding the query. OR over AND: natural-language queries include filler words ("what", "did", "we") that won't appear in terse notes. AND would require every token to land and silently return zero hits the moment any filler word missed. OR returns matches as soon as ANY token lands, and FTS5's BM25 ranks documents that match more tokens higher — so the top-K results are still the most relevant ones. Confirmed: search for "what did we discuss authentication" on a note that reads "We discussed authentication tokens" returns that note as hit 1. Both Search overloads (FPC sqldb and Delphi FireDAC) now run the input through SanitizeFtsQuery before binding the :q parameter. The brace-style block comment in the helper's docstring switched to (* *) — the previous { ... } variant tripped FPC's comment parser when an example like '"phrase search"' contained a literal } from a nested `{ ... }` aside. Verified with a standalone regression smoke test covering 9 cases: the five FTS5-hostile-but-meaningful queries from the review (all must find their expected hit) plus four edge cases (?!?, '', '"unbalanced', '( AND OR NOT NEAR') that must return without raising. All pass. https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
Phase A of the hybrid memory system. Mirrors openclaw's model: durable notes live as Markdown files (
MEMORY.md,workspace/memory/<date>.md) that the model edits with the existingfs_writetool. SQLite is a derived index — never the source of truth, lazy-rebuilt on eachmemory_searchcall by walking mtimes. No embeddings, nomemory_addtool, noILLMProvider.Embed— all deferred to Phase B.What's in the box
1.
PasClaw.Memory.Index— cross-target SQLite + FTS5 indexIMemoryIndexinterface withOpen / Close / SyncDir / Search. Two backends gated by IFDEF:TSQLite3Connection+TSQLQuery(sqldb/sqlite3conn)fcl-db(apt:fp-units-db)libsqlite3.soTFDConnection+TFDQuery(FireDAC SQLite)sqlite3.dllIf
libsqlite3can't be loaded,OpenreturnsFalseandmemory_searchdegrades to "index unavailable" — the rest of the agent keeps running.Schema:
Each file is one FTS5 row sharing rowid with
memory_files. The reindex path deletes both rows by path and reinserts, so rowids stay in sync across edits.SyncDirwalksworkspace/memory/MEMORY.md+workspace/memory/*.md, compares mtimes, reindexes whatever's newer, and drops rows for files that disappeared from disk.SearchrunsMATCHand returns up to K hits with the FTS5snippet()excerpt andbm25()score (smaller = better).2.
PasClaw.Tools.Memory—memory_searchtool{ "name": "memory_search", "description": "Search the workspace memory directory (MEMORY.md + workspace/memory/*.md) with SQLite FTS5 BM25 ranking…", "parameters": { "type": "object", "properties": { "query": {"type": "string"}, "k": {"type": "integer", "minimum": 1, "maximum": 25} }, "required": ["query"] } }Opens
$PASCLAW_HOME/workspace/memory/.index.db, syncs against the filesystem, runs the FTS5 query, returns up to 25 hits aspath (bm25=N.NNN)+ snippet. Missing memory dir surfaces a friendly(no memory directory yet — write to MEMORY.md first)instead of an error.No
memory_addtool by design. Model writes durable memories by editing files withfs_write— same convention openclaw uses.3.
BuildMemorySectionextensionThe system prompt's
## Memorysection used to inject onlyMEMORY.md. It now also injects today's and yesterday's daily notes (if either exists), wrapped under### Today (<date>)/### Yesterday (<date>)subsections. Mirrors openclaw's bootstrap loading; older daily notes stay on disk and reach the model only whenmemory_searchreturns them.4. Rules update (rule 5)
Teaches the model to:
MEMORY.md,workspace/memory/<today>.md,memory_searchbefore answering questions about prior conversations.5. Tool registration wired into every site
RegisterMemoryToolsadded toPasClaw.Cmd.Agent/Gateway/TUI/Serveand bothTPasClawAgent.EnsureToolsRegisteredpaths inPasClaw.Component. Honours--no-tools(no registry → no memory_search).Build
MakefileaddsFCLDB_DIR/SQLITE_DIR(defaulting to Debianfp-units-dbpaths) plus an optionalLAZUTILS_DIRsoTools.FS.Maskskeeps resolving on environments where it lives in Lazarus rather than fcl-base.PasClaw.dprojgets DCCReference entries for the two new units.Verification
make— clean build under FPC 3.2.2.MEMORY.md/ FTS5 search hits "FTS5" / reindex picks up new term after rewrite / removed file drops from index. 7 assertions, all pass.pasclaw --helpstill lists every existing command unchanged.Deferred to Phase B
vec0+ embeddings + hybrid RRF retrieval (sqlite-vec loadable extension +ILLMProvider.Embed).https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
Generated by Claude Code