Skip to content

memory: SQLite + FTS5 keyword index over workspace/memory/ (Phase A)#74

Merged
FMXExpress merged 2 commits into
mainfrom
claude/memory-fts5
May 29, 2026
Merged

memory: SQLite + FTS5 keyword index over workspace/memory/ (Phase A)#74
FMXExpress merged 2 commits into
mainfrom
claude/memory-fts5

Conversation

@FMXExpress

Copy link
Copy Markdown
Owner

Phase A of the hybrid memory system. Mirrors openclaw's model: durable notes live as Markdown files (MEMORY.md, workspace/memory/<date>.md) that the model edits with the existing fs_write tool. SQLite is a derived index — never the source of truth, lazy-rebuilt on each memory_search call by walking mtimes. No embeddings, no memory_add tool, no ILLMProvider.Embed — all deferred to Phase B.

What's in the box

1. PasClaw.Memory.Index — cross-target SQLite + FTS5 index

IMemoryIndex interface with Open / Close / SyncDir / Search. Two backends gated by IFDEF:

Target Driver Build dep Runtime dep
FPC TSQLite3Connection + TSQLQuery (sqldb / sqlite3conn) fcl-db (apt: fp-units-db) libsqlite3.so
Delphi TFDConnection + TFDQuery (FireDAC SQLite) ships with Delphi sqlite3.dll

If libsqlite3 can't be loaded, Open returns False and memory_search degrades to "index unavailable" — the rest of the agent keeps running.

Schema:

CREATE TABLE memory_files (
  rowid INTEGER PRIMARY KEY AUTOINCREMENT,
  path TEXT UNIQUE NOT NULL,
  mtime INTEGER NOT NULL,
  indexed_at INTEGER NOT NULL
);
CREATE VIRTUAL TABLE memory_fts USING fts5(
  path UNINDEXED, content,
  tokenize='porter unicode61'
);

Each file is one FTS5 row sharing rowid with memory_files. The reindex path deletes both rows by path and reinserts, so rowids stay in sync across edits.

SyncDir walks workspace/memory/MEMORY.md + workspace/memory/*.md, compares mtimes, reindexes whatever's newer, and drops rows for files that disappeared from disk.

Search runs MATCH and returns up to K hits with the FTS5 snippet() excerpt and bm25() score (smaller = better).

2. PasClaw.Tools.Memorymemory_search tool

{
  "name": "memory_search",
  "description": "Search the workspace memory directory (MEMORY.md + workspace/memory/*.md) with SQLite FTS5 BM25 ranking…",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {"type": "string"},
      "k":     {"type": "integer", "minimum": 1, "maximum": 25}
    },
    "required": ["query"]
  }
}

Opens $PASCLAW_HOME/workspace/memory/.index.db, syncs against the filesystem, runs the FTS5 query, returns up to 25 hits as path (bm25=N.NNN) + snippet. Missing memory dir surfaces a friendly (no memory directory yet — write to MEMORY.md first) instead of an error.

No memory_add tool by design. Model writes durable memories by editing files with fs_write — same convention openclaw uses.

3. BuildMemorySection extension

The system prompt's ## Memory section used to inject only MEMORY.md. It now also injects today's and yesterday's daily notes (if either exists), wrapped under ### Today (<date>) / ### Yesterday (<date>) subsections. Mirrors openclaw's bootstrap loading; older daily notes stay on disk and reach the model only when memory_search returns them.

4. Rules update (rule 5)

Teaches the model to:

  • write durable preferences/project facts to MEMORY.md,
  • write episodic session context to workspace/memory/<today>.md,
  • call memory_search before answering questions about prior conversations.

5. Tool registration wired into every site

RegisterMemoryTools added to PasClaw.Cmd.Agent / Gateway / TUI / Serve and both TPasClawAgent.EnsureToolsRegistered paths in PasClaw.Component. Honours --no-tools (no registry → no memory_search).

Build

Makefile adds FCLDB_DIR / SQLITE_DIR (defaulting to Debian fp-units-db paths) plus an optional LAZUTILS_DIR so Tools.FS.Masks keeps resolving on environments where it lives in Lazarus rather than fcl-base. PasClaw.dproj gets DCCReference entries for the two new units.

Verification

  • make — clean build under FPC 3.2.2.
  • Standalone smoke test: open / SyncDir / Search hits "pascal" → top hit is MEMORY.md / FTS5 search hits "FTS5" / reindex picks up new term after rewrite / removed file drops from index. 7 assertions, all pass.
  • pasclaw --help still lists every existing command unchanged.

Deferred to Phase B

  • vec0 + embeddings + hybrid RRF retrieval (sqlite-vec loadable extension + ILLMProvider.Embed).
  • openclaw's "memory flush" pre-compaction prompt (PasClaw has no compaction yet).
  • "Dreaming" consolidation pass.
  • Eviction / decay / importance scoring.

https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon


Generated by Claude Code

Phase A of the hybrid memory system. Mirrors openclaw's model: durable
notes live as Markdown files (MEMORY.md, workspace/memory/<date>.md)
that the model edits with the existing fs_write tool. SQLite is a
derived index — never the source of truth, lazy-rebuilt on each
memory_search call by walking mtimes.

Components
----------
1. PasClaw.Memory.Index — IMemoryIndex interface + cross-target
   implementation.
   - {$IFDEF FPC}: TSQLite3Connection + TSQLQuery from sqldb. Needs
     fcl-db at build time; libsqlite3 at runtime.
   - {$ELSE}:      TFDConnection + TFDQuery from FireDAC. Needs
     FireDAC at build time (ships with Delphi); sqlite3.dll at
     runtime.
   Open returns False and logs a warning when libsqlite3 can't be
   loaded; memory_search then degrades to "index unavailable"
   without taking down the rest of the agent.

   Schema:
     memory_files(rowid PK, path UNIQUE, mtime, indexed_at)
     memory_fts USING fts5(path UNINDEXED, content,
                            tokenize='porter unicode61')
   Each file is one FTS5 row, sharing rowid with memory_files. The
   reindex path deletes both rows by path and reinserts so rowids
   stay in sync across edits.

   SyncDir walks workspace/memory/MEMORY.md + workspace/memory/*.md,
   compares mtimes, reindexes whatever's newer, and drops rows for
   files that vanished from disk.

   Search runs MATCH against memory_fts, returns up to K hits with
   the FTS5 snippet() excerpt around the matched term and the
   bm25() score (smaller = better) — wrapped as TMemoryHitArray.

2. PasClaw.Tools.Memory — registers memory_search(query, k?) tool.
   Opens the index at <home>/workspace/memory/.index.db, syncs, runs
   the FTS5 query, returns up to 25 hits as
     path  (bm25=N.NNN)
     snippet text…
   one per match. Missing memory dir surfaces a "(no memory
   directory yet — write to MEMORY.md first)" hint instead of an
   error. No memory_add tool by design — model uses fs_write per
   openclaw's convention.

3. BuildMemorySection extension — used to inject only MEMORY.md.
   Now also injects today's and yesterday's daily notes (if either
   exists), wrapped under "### Today (<date>)" / "### Yesterday
   (<date>)" subsections so the model can tell durable from dated
   material apart. Mirrors openclaw's bootstrap loading. Older
   daily notes stay on disk and reach the model only when
   memory_search hits them.

4. Rules section update — rule 5 now teaches the model to write
   durable preferences to MEMORY.md AND ephemeral session context
   to workspace/memory/<today>.md, and to call memory_search before
   answering questions about prior conversations.

5. RegisterMemoryTools wired into every tool-registry build site:
   PasClaw.Cmd.Agent / Gateway / TUI / Serve, and
   PasClaw.Component (both registry constructors). Honours
   --no-tools just like the other tools — disabled when the
   registry is nil.

Build
-----
Makefile adds FCLDB_DIR / SQLITE_DIR (default to Debian fp-units-db
paths) and a $(if LAZUTILS_DIR,…) include so Tools.FS.Masks
continues to resolve under environments missing fcl-db's masks unit.
PasClaw.dproj adds DCCReference entries for the two new units.

Verification
------------
- make builds clean (FPC 3.2.2).
- Standalone smoke test (open / SyncDir / Search / reindex on edit /
  drop on delete) — 7 assertions, all pass.
- pasclaw --help still lists every existing command unchanged.

Deferred to Phase B
-------------------
- vec0 / embeddings / hybrid RRF retrieval (sqlite-vec loadable
  extension + ILLMProvider.Embed).
- openclaw's "memory flush" pre-compaction prompt (PasClaw has no
  compaction yet).
- "Dreaming" consolidation pass.
- Eviction / decay / importance scoring.

https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78e29a57fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/pkg/memory/PasClaw.Memory.Index.pas Outdated
'SELECT path, snippet(memory_fts, 1, ''«'', ''»'', ''…'', 24), bm25(memory_fts) ' +
'FROM memory_fts WHERE memory_fts MATCH :q ' +
'ORDER BY bm25(memory_fts) LIMIT :k';
Q.Params.ParamByName('q').AsString := Query;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Escape natural-language queries before MATCH

When memory_search is called with ordinary prior-conversation questions or code/path tokens containing FTS punctuation (for example what did we discuss?, C++, or foo/bar), SQLite FTS5 raises a syntax error; this method catches that and returns an empty result, so the tool reports no matches even when indexed notes contain the terms. Since the prompt tells the model to use this tool for natural-language memory lookups, sanitize/tokenize the query or fall back to a quoted/ANDed term query before passing it to MATCH (the same raw binding is duplicated in the FireDAC branch).

Useful? React with 👍 / 👎.

…tches

PR #74's Codex review (P2) flagged that memory_search passes its
query verbatim into the FTS5 MATCH operator. FTS5 treats unquoted
ASCII punctuation as syntax, so common inputs blew up with
"fts5: syntax error" — caught by the surrounding try/except and
silently turned into "no matches":

  what did we discuss?  - terminal ? is a syntax error
  C++                   - + is an operator
  foo/bar               - / is illegal
  skill_<name>          - < terminates the column qualifier
  "phrase search        - unbalanced quote
  ( AND OR NOT NEAR     - reserved keywords with no operand

The model's perspective was indistinguishable from "no matches in
the indexed notes", so legitimate hits stayed invisible.

Fix: new SanitizeFtsQuery helper at the top of the implementation
section. Splits the raw input on every non-token byte and reassembles
the surviving tokens as OR-ed quoted phrases. Bytes >= $80 stay in
their tokens so UTF-8 multi-byte sequences survive intact (the porter
unicode61 tokenizer folds them at index time). Empty / all-separator
input yields '' so the caller's early-exit path runs without ever
binding the query.

OR over AND: natural-language queries include filler words ("what",
"did", "we") that won't appear in terse notes. AND would require
every token to land and silently return zero hits the moment any
filler word missed. OR returns matches as soon as ANY token lands,
and FTS5's BM25 ranks documents that match more tokens higher — so
the top-K results are still the most relevant ones. Confirmed:
search for "what did we discuss authentication" on a note that
reads "We discussed authentication tokens" returns that note as
hit 1.

Both Search overloads (FPC sqldb and Delphi FireDAC) now run the
input through SanitizeFtsQuery before binding the :q parameter.

The brace-style block comment in the helper's docstring switched to
(* *) — the previous { ... } variant tripped FPC's comment parser
when an example like '"phrase search"' contained a literal } from a
nested `{ ... }` aside.

Verified with a standalone regression smoke test covering 9 cases:
the five FTS5-hostile-but-meaningful queries from the review (all
must find their expected hit) plus four edge cases (?!?, '',
'"unbalanced', '( AND OR NOT NEAR') that must return without
raising. All pass.

https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon
@FMXExpress FMXExpress merged commit 3cd308d into main May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants