From 473c803e367867da721c55259c390a2deb69260c Mon Sep 17 00:00:00 2001 From: Colby McHenry Date: Tue, 26 May 2026 10:16:37 -0500 Subject: [PATCH] docs: explain auto-syncing (no manual sync needed) in site + README MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Originated from issue #438 ("Will newly created files be missing from query results if sync is not manually run?"). Real users are second- guessing whether their agent's freshly-created files are getting indexed. They shouldn't have to test for themselves to find out. ## site/src/content/docs/guides/indexing.md Expanded the existing 2-sentence "Stay fresh automatically" section into the full three-layer explanation: 1. File watcher with debounced auto-sync (default 2000ms, tunable via CODEGRAPH_WATCH_DEBOUNCE_MS, clamp [100ms, 60s]). 2. Per-file staleness banner (#403) — covers the debounce window. Quoted the actual banner format + the verified Claude Code follow-up Read behaviour. 3. Connect-time catch-up (#414) — covers gaps when the MCP server wasn't running. Plus: how to verify state via codegraph_status (### Pending sync:), when manual codegraph sync DOES make sense (watcher disabled / CI scripting), and a link out to the v0.9.5 release notes. ## README.md Added a
collapsible right under the Key Features table — primed by the existing 'Always Fresh' row in that table. Condensed to ~10 lines covering the same three layers + a code-block flow diagram + the verify command, with a deep link to the full guide. GitHub renders
blocks natively, so the section is collapsed by default and doesn't make the README scroll-length grow visibly. Heading kept as 'Stay fresh automatically' (single-word slug) so the README's deep-link anchor is predictable; the longer tagline lives on its own line below. 940/942 tests still pass; no code changes. Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 27 ++++++++++ site/src/content/docs/guides/indexing.md | 66 +++++++++++++++++++++++- 2 files changed, 91 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 3be488af4..3cd1355d8 100644 --- a/README.md +++ b/README.md @@ -140,6 +140,33 @@ The gains scale with codebase size: on large repos the agent answers from the in | **Mixed iOS / React Native / Expo** | Closes cross-language flows that static parsing misses: Swift ↔ ObjC bridging, React Native legacy bridge + TurboModules + Fabric view components, native → JS event emitters, Expo Modules | | **100% Local** | No data leaves your machine. No API keys. No external services. SQLite database only | +
+How auto-syncing works — and why you don't need to run codegraph sync manually + +When your agent (Claude Code, Cursor, Codex, opencode) launches `codegraph serve --mcp`, three layers keep the index in step with your code — and make sure the agent never gets a silent wrong answer in the brief window between an edit and the next sync: + +1. **File watcher with debounced auto-sync.** A native FSEvents / inotify / ReadDirectoryChangesW watcher captures every source-file create / modify / delete and triggers a re-index after a debounce window (default `2000ms`, tunable via `CODEGRAPH_WATCH_DEBOUNCE_MS`, clamped to `[100ms, 60s]`). Bursts of edits collapse into a single sync. + +2. **Per-file staleness banner.** During the brief debounce window, MCP tool responses that would reference a still-pending file prepend a `⚠️` banner naming it and telling the agent to `Read` it directly. Pending files NOT referenced by the response surface as a small footer instead. Either way, the agent gets an explicit signal — validated with Claude Code, where the agent literally says "Reading the file directly for the live content" before opening it. + +3. **Connect-time catch-up.** When the MCP server (re)connects, codegraph runs a fast `(size, mtime)` + content-hash reconciliation against the working tree before answering the first query — so edits made while no MCP server was running (a `git pull` from the terminal, edits from another editor, a previous agent session that exited) get absorbed on the next session's first tool call. + +``` +agent writes src/Widget.ts + → watcher fires (<100ms) + → debounce (default 2s) + → sync; Widget.ts is in the index + → next agent query sees it +``` + +**Verify any time** with `codegraph_status` (via MCP) or `codegraph status` (CLI). If anything is pending, you'll see a `### Pending sync:` section naming the files and their edit age. + +The handful of cases where manual `codegraph sync` makes sense: the watcher is disabled (sandboxed environments, or `CODEGRAPH_NO_DAEMON=1`), or you're scripting against the index outside an agent session and want a pre-flight sync at the start of your script. + +→ Full deep-dive in [Guides → Indexing a Project](https://colbymchenry.github.io/codegraph/guides/indexing/#stay-fresh-automatically). + +
+ --- ## Framework-aware Routes diff --git a/site/src/content/docs/guides/indexing.md b/site/src/content/docs/guides/indexing.md index 536b430bd..96514bc3c 100644 --- a/site/src/content/docs/guides/indexing.md +++ b/site/src/content/docs/guides/indexing.md @@ -24,7 +24,69 @@ codegraph sync # incremental — only changed files ## Stay fresh automatically -When the MCP server is running, CodeGraph watches your project with native OS file events and syncs in the background — debounced, and filtered to source files only. You don't need to run `sync` by hand during an agent session. +**You don't need to run `codegraph sync` by hand during an agent session.** When your agent (Claude Code, Cursor, Codex, opencode) launches `codegraph serve --mcp`, three layers cooperate to keep the index in step with your code — and to never give the agent a quiet wrong answer in the small window between an edit and the next sync. + +### 1. File watcher with debounced auto-sync (always on) + +`serve --mcp` spins up a native file watcher (FSEvents on macOS, inotify on Linux, ReadDirectoryChangesW on Windows) over the project root. Every source-file create / modify / delete is captured. A debounce timer collapses bursts of edits into a single sync. + +``` +agent writes src/Widget.ts + → watcher fires (event delivery: typically <100ms) + → 2000ms debounce + → sync runs; Widget.ts's nodes + edges are in the index + → next agent query sees it +``` + +**Tunable**: `CODEGRAPH_WATCH_DEBOUNCE_MS` overrides the default 2000ms, clamped to `[100ms, 60s]`. Useful when a build step or formatter writes many files in a tight burst — bump it to `5000` or `10000` so the watcher coalesces them into one sync. + +### 2. Per-file staleness banner — covers the debounce window + +The watcher debounce introduces a small window (typically 2s) where a freshly-edited file is on disk but not yet in the index. CodeGraph closes that window with a per-file staleness banner: if any MCP tool response would reference a file that's currently pending re-index, the response prepends a `⚠️` banner naming the stale file: + +``` +⚠️ Some files referenced below were edited since the last index sync — +their codegraph entries may be stale: + - src/Widget.ts (edited 800ms ago, pending sync) +For accurate content of those specific files, Read them directly. +The rest of this response is fresh. + +## Code Context +… +``` + +Agents read this and follow up with a direct `Read` on the named file — validated end-to-end with Claude Code, where the agent literally says "Reading the file directly for the live content" before opening it. So even during the 2-second debounce window, the agent never gets a silent wrong answer. + +Pending files **not** referenced by the response surface as a small footer instead (`(Note: N file(s) elsewhere in this project are pending index sync but were not referenced above: …)`). Either way, the signal is explicit. + +### 3. Connect-time catch-up — covers gaps when the MCP server wasn't running + +When your editor / agent (re)connects to the MCP server, codegraph runs a fast filesystem-based reconciliation (a `(size, mtime)` stat pre-filter, then a content hash on the rest) before answering the first query. So files changed while no MCP server was running — a `git pull` from the terminal, an edit from another editor, an agent that finished and exited — are caught up automatically on the next session's first tool call. + +### Verify what the watcher sees + +`codegraph_status` exposes the pending set first-class — useful for an agent asking "is the index caught up?" in one call: + +``` +codegraph_status → + ## CodeGraph Status + … + ### Pending sync: + - src/Widget.ts (edited 1200ms ago) +``` + +If `### Pending sync:` isn't in the response, nothing is in flight. + +### When manual `codegraph sync` makes sense + +Almost never. The edge cases: + +- **The watcher is disabled.** Sandboxes that block local fs watchers, or you've set `CODEGRAPH_NO_DAEMON=1` to opt out of the shared daemon. In those cases `codegraph sync` is the manual fallback. +- **Pre-flight before a CI run.** If you're scripting against the index outside an agent session, a single `codegraph sync` at the start of the script guarantees the index reflects the current working tree. + +Otherwise: just use it. The watcher + banner + connect-sync covers the AI-assisted workflow end-to-end. If you're seeing files genuinely missed after the debounce window has passed, that's a bug — please file an issue with a reproduction. + +> See the v0.9.5 release notes for the [staleness banner (#403)](https://github.com/colbymchenry/codegraph/releases/tag/v0.9.5) and the connect-time catch-up (#414); both shipped together. ## Check status @@ -32,7 +94,7 @@ When the MCP server is running, CodeGraph watches your project with native OS fi codegraph status ``` -Reports node/edge/file counts, the active SQLite backend, and the journal mode. +Reports node/edge/file counts, the active SQLite backend, and the journal mode. In an agent session, the MCP-side `codegraph_status` additionally surfaces the `### Pending sync:` block described above. ## What gets indexed