memory: SQLite + FTS5 keyword index over workspace/memory/ (Phase A) by FMXExpress · Pull Request #74 · FMXExpress/PasClaw

FMXExpress · 2026-05-29T13:09:38Z

Phase A of the hybrid memory system. Mirrors openclaw's model: durable notes live as Markdown files (MEMORY.md, workspace/memory/<date>.md) that the model edits with the existing fs_write tool. SQLite is a derived index — never the source of truth, lazy-rebuilt on each memory_search call by walking mtimes. No embeddings, no memory_add tool, no ILLMProvider.Embed — all deferred to Phase B.

What's in the box

1. `PasClaw.Memory.Index` — cross-target SQLite + FTS5 index

IMemoryIndex interface with Open / Close / SyncDir / Search. Two backends gated by IFDEF:

Target	Driver	Build dep	Runtime dep
FPC	`TSQLite3Connection` + `TSQLQuery` (`sqldb` / `sqlite3conn`)	`fcl-db` (apt: `fp-units-db`)	`libsqlite3.so`
Delphi	`TFDConnection` + `TFDQuery` (FireDAC SQLite)	ships with Delphi	`sqlite3.dll`

If libsqlite3 can't be loaded, Open returns False and memory_search degrades to "index unavailable" — the rest of the agent keeps running.

Schema:

CREATE TABLE memory_files (
  rowid INTEGER PRIMARY KEY AUTOINCREMENT,
  path TEXT UNIQUE NOT NULL,
  mtime INTEGER NOT NULL,
  indexed_at INTEGER NOT NULL
);
CREATE VIRTUAL TABLE memory_fts USING fts5(
  path UNINDEXED, content,
  tokenize='porter unicode61'
);

Each file is one FTS5 row sharing rowid with memory_files. The reindex path deletes both rows by path and reinserts, so rowids stay in sync across edits.

SyncDir walks workspace/memory/MEMORY.md + workspace/memory/*.md, compares mtimes, reindexes whatever's newer, and drops rows for files that disappeared from disk.

Search runs MATCH and returns up to K hits with the FTS5 snippet() excerpt and bm25() score (smaller = better).

2. `PasClaw.Tools.Memory` — `memory_search` tool

{
  "name": "memory_search",
  "description": "Search the workspace memory directory (MEMORY.md + workspace/memory/*.md) with SQLite FTS5 BM25 ranking…",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {"type": "string"},
      "k":     {"type": "integer", "minimum": 1, "maximum": 25}
    },
    "required": ["query"]
  }
}

Opens $PASCLAW_HOME/workspace/memory/.index.db, syncs against the filesystem, runs the FTS5 query, returns up to 25 hits as path (bm25=N.NNN) + snippet. Missing memory dir surfaces a friendly (no memory directory yet — write to MEMORY.md first) instead of an error.

No memory_add tool by design. Model writes durable memories by editing files with fs_write — same convention openclaw uses.

3. `BuildMemorySection` extension

The system prompt's ## Memory section used to inject only MEMORY.md. It now also injects today's and yesterday's daily notes (if either exists), wrapped under ### Today (<date>) / ### Yesterday (<date>) subsections. Mirrors openclaw's bootstrap loading; older daily notes stay on disk and reach the model only when memory_search returns them.

4. Rules update (rule 5)

Teaches the model to:

write durable preferences/project facts to MEMORY.md,
write episodic session context to workspace/memory/<today>.md,
call memory_search before answering questions about prior conversations.

5. Tool registration wired into every site

RegisterMemoryTools added to PasClaw.Cmd.Agent / Gateway / TUI / Serve and both TPasClawAgent.EnsureToolsRegistered paths in PasClaw.Component. Honours --no-tools (no registry → no memory_search).

Build

Makefile adds FCLDB_DIR / SQLITE_DIR (defaulting to Debian fp-units-db paths) plus an optional LAZUTILS_DIR so Tools.FS.Masks keeps resolving on environments where it lives in Lazarus rather than fcl-base. PasClaw.dproj gets DCCReference entries for the two new units.

Verification

make — clean build under FPC 3.2.2.
Standalone smoke test: open / SyncDir / Search hits "pascal" → top hit is MEMORY.md / FTS5 search hits "FTS5" / reindex picks up new term after rewrite / removed file drops from index. 7 assertions, all pass.
pasclaw --help still lists every existing command unchanged.

Deferred to Phase B

vec0 + embeddings + hybrid RRF retrieval (sqlite-vec loadable extension + ILLMProvider.Embed).
openclaw's "memory flush" pre-compaction prompt (PasClaw has no compaction yet).
"Dreaming" consolidation pass.
Eviction / decay / importance scoring.

https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon

Generated by Claude Code

Phase A of the hybrid memory system. Mirrors openclaw's model: durable notes live as Markdown files (MEMORY.md, workspace/memory/<date>.md) that the model edits with the existing fs_write tool. SQLite is a derived index — never the source of truth, lazy-rebuilt on each memory_search call by walking mtimes. Components ---------- 1. PasClaw.Memory.Index — IMemoryIndex interface + cross-target implementation. - {$IFDEF FPC}: TSQLite3Connection + TSQLQuery from sqldb. Needs fcl-db at build time; libsqlite3 at runtime. - {$ELSE}: TFDConnection + TFDQuery from FireDAC. Needs FireDAC at build time (ships with Delphi); sqlite3.dll at runtime. Open returns False and logs a warning when libsqlite3 can't be loaded; memory_search then degrades to "index unavailable" without taking down the rest of the agent. Schema: memory_files(rowid PK, path UNIQUE, mtime, indexed_at) memory_fts USING fts5(path UNINDEXED, content, tokenize='porter unicode61') Each file is one FTS5 row, sharing rowid with memory_files. The reindex path deletes both rows by path and reinserts so rowids stay in sync across edits. SyncDir walks workspace/memory/MEMORY.md + workspace/memory/*.md, compares mtimes, reindexes whatever's newer, and drops rows for files that vanished from disk. Search runs MATCH against memory_fts, returns up to K hits with the FTS5 snippet() excerpt around the matched term and the bm25() score (smaller = better) — wrapped as TMemoryHitArray. 2. PasClaw.Tools.Memory — registers memory_search(query, k?) tool. Opens the index at <home>/workspace/memory/.index.db, syncs, runs the FTS5 query, returns up to 25 hits as path (bm25=N.NNN) snippet text… one per match. Missing memory dir surfaces a "(no memory directory yet — write to MEMORY.md first)" hint instead of an error. No memory_add tool by design — model uses fs_write per openclaw's convention. 3. BuildMemorySection extension — used to inject only MEMORY.md. Now also injects today's and yesterday's daily notes (if either exists), wrapped under "### Today (<date>)" / "### Yesterday (<date>)" subsections so the model can tell durable from dated material apart. Mirrors openclaw's bootstrap loading. Older daily notes stay on disk and reach the model only when memory_search hits them. 4. Rules section update — rule 5 now teaches the model to write durable preferences to MEMORY.md AND ephemeral session context to workspace/memory/<today>.md, and to call memory_search before answering questions about prior conversations. 5. RegisterMemoryTools wired into every tool-registry build site: PasClaw.Cmd.Agent / Gateway / TUI / Serve, and PasClaw.Component (both registry constructors). Honours --no-tools just like the other tools — disabled when the registry is nil. Build ----- Makefile adds FCLDB_DIR / SQLITE_DIR (default to Debian fp-units-db paths) and a $(if LAZUTILS_DIR,…) include so Tools.FS.Masks continues to resolve under environments missing fcl-db's masks unit. PasClaw.dproj adds DCCReference entries for the two new units. Verification ------------ - make builds clean (FPC 3.2.2). - Standalone smoke test (open / SyncDir / Search / reindex on edit / drop on delete) — 7 assertions, all pass. - pasclaw --help still lists every existing command unchanged. Deferred to Phase B ------------------- - vec0 / embeddings / hybrid RRF retrieval (sqlite-vec loadable extension + ILLMProvider.Embed). - openclaw's "memory flush" pre-compaction prompt (PasClaw has no compaction yet). - "Dreaming" consolidation pass. - Eviction / decay / importance scoring. https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78e29a57fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T13:11:12Z

+      'SELECT path, snippet(memory_fts, 1, ''«'', ''»'', ''…'', 24), bm25(memory_fts) ' +
+      'FROM memory_fts WHERE memory_fts MATCH :q ' +
+      'ORDER BY bm25(memory_fts) LIMIT :k';
+    Q.Params.ParamByName('q').AsString := Query;


Escape natural-language queries before MATCH

When memory_search is called with ordinary prior-conversation questions or code/path tokens containing FTS punctuation (for example what did we discuss?, C++, or foo/bar), SQLite FTS5 raises a syntax error; this method catches that and returns an empty result, so the tool reports no matches even when indexed notes contain the terms. Since the prompt tells the model to use this tool for natural-language memory lookups, sanitize/tokenize the query or fall back to a quoted/ANDed term query before passing it to MATCH (the same raw binding is duplicated in the FireDAC branch).

Useful? React with 👍 / 👎.

…tches PR #74's Codex review (P2) flagged that memory_search passes its query verbatim into the FTS5 MATCH operator. FTS5 treats unquoted ASCII punctuation as syntax, so common inputs blew up with "fts5: syntax error" — caught by the surrounding try/except and silently turned into "no matches": what did we discuss? - terminal ? is a syntax error C++ - + is an operator foo/bar - / is illegal skill_<name> - < terminates the column qualifier "phrase search - unbalanced quote ( AND OR NOT NEAR - reserved keywords with no operand The model's perspective was indistinguishable from "no matches in the indexed notes", so legitimate hits stayed invisible. Fix: new SanitizeFtsQuery helper at the top of the implementation section. Splits the raw input on every non-token byte and reassembles the surviving tokens as OR-ed quoted phrases. Bytes >= $80 stay in their tokens so UTF-8 multi-byte sequences survive intact (the porter unicode61 tokenizer folds them at index time). Empty / all-separator input yields '' so the caller's early-exit path runs without ever binding the query. OR over AND: natural-language queries include filler words ("what", "did", "we") that won't appear in terse notes. AND would require every token to land and silently return zero hits the moment any filler word missed. OR returns matches as soon as ANY token lands, and FTS5's BM25 ranks documents that match more tokens higher — so the top-K results are still the most relevant ones. Confirmed: search for "what did we discuss authentication" on a note that reads "We discussed authentication tokens" returns that note as hit 1. Both Search overloads (FPC sqldb and Delphi FireDAC) now run the input through SanitizeFtsQuery before binding the :q parameter. The brace-style block comment in the helper's docstring switched to (* *) — the previous { ... } variant tripped FPC's comment parser when an example like '"phrase search"' contained a literal } from a nested `{ ... }` aside. Verified with a standalone regression smoke test covering 9 cases: the five FTS5-hostile-but-meaningful queries from the review (all must find their expected hit) plus four edge cases (?!?, '', '"unbalanced', '( AND OR NOT NEAR') that must return without raising. All pass. https://claude.ai/code/session_01TBcLtmpj7dqA5tyFbGnQon

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

FMXExpress merged commit 3cd308d into main May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory: SQLite + FTS5 keyword index over workspace/memory/ (Phase A)#74

memory: SQLite + FTS5 keyword index over workspace/memory/ (Phase A)#74
FMXExpress merged 2 commits into
mainfrom
claude/memory-fts5

FMXExpress commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FMXExpress commented May 29, 2026

What's in the box

1. PasClaw.Memory.Index — cross-target SQLite + FTS5 index

2. PasClaw.Tools.Memory — memory_search tool

3. BuildMemorySection extension

4. Rules update (rule 5)

5. Tool registration wired into every site

Build

Verification

Deferred to Phase B

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `PasClaw.Memory.Index` — cross-target SQLite + FTS5 index

2. `PasClaw.Tools.Memory` — `memory_search` tool

3. `BuildMemorySection` extension