Cache cross-link index across serve hot reloads#3219
Conversation
Every .md save in serve mode was re-fetching link-index.json over S3 for each cross-link entry, twice per repo, blocking the browser refresh. Cache FetchedCrossLinks across reloads (re-fetched only on configuration changes), and fold the duplicate per-repo GetRegistry calls into one fetch per registry up front. Same wins flow through to docs-builder build and codex builds via DocSetConfigurationCrossLinkFetcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs`:
- Around line 96-105: The TryGetRegistry method currently swallows all
exceptions including cancellation; update it so cancellation is not swallowed by
allowing OperationCanceledException (and TaskCanceledException if used) to
propagate: either add an early check or rethrow when the caught exception is an
OperationCanceledException (e.g., if (ex is OperationCanceledException) throw;),
and only LogWarning/return null for non-cancellation exceptions from
reader.GetRegistry(ctx). Reference TryGetRegistry and
ILinkIndexReader.GetRegistry to locate the change.
In `@src/tooling/docs-builder/Http/ReloadableGeneratorState.cs`:
- Around line 72-78: When reloadConfiguration is true, you only reload
_cachedCrossLinks but leave _crossLinkFetcher and _codexReader wired to the
original startup configuration; recreate/refresh those instances from the
updated _context/config before fetching new cross-links. Specifically: when you
run _context.ReloadConfiguration() (and when reloadConfiguration is true), clear
or reset _cachedCrossLinks and reinstantiate _crossLinkFetcher and _codexReader
from the current context/configuration (the same factories or constructors used
at startup) so that the subsequent _crossLinkFetcher.FetchCrossLinks(ctx) call
uses the new registries and returns fresh cross-links.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e7087069-6fba-4667-a0e8-b1d230530955
📒 Files selected for processing (2)
src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cssrc/tooling/docs-builder/Http/ReloadableGeneratorState.cs
For plain .md edits during serve, skip the DocumentationSet rebuild, ResolveDirectoryTree, and in-memory validation build entirely — the serve path already reads fresh content from disk via ParseFullAsync. Full rebuilds still run for structural changes (config/toc edits, file add/delete). Also watch common asset files (images, yml, toml) and trigger a browser-only refresh when they change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…reload Don't swallow OperationCanceledException in TryGetRegistry so cancellation propagates promptly instead of degrading to null. Recreate _crossLinkFetcher and _codexReader when configuration reloads so registry switches in docset.yml take effect immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs (1)
50-64:⚠️ Potential issue | 🟠 MajorPropagate cancellation in the per-repository fetch loop.
The
catch (Exception)block at lines 62–83 swallowsOperationCanceledExceptionfromFetchLinkIndexEntryFromReader(), converting a stop/reload request into placeholder links instead of aborting. Add acatch (OperationCanceledException)before the generalcatchto re-throw, matching the pattern already inTryGetRegistry().Suggested fix
try { if (registry is null || !registry.Repositories.TryGetValue(entry.Repository, out var repoBranches)) throw new Exception($"Repository {entry.Repository} not found in link index"); var linkIndexEntry = GetNextContentSourceLinkIndexEntry(repoBranches, entry.Repository); var linkReference = await FetchLinkIndexEntryFromReader(reader, entry.Repository, linkIndexEntry, ctx); linkReferences.Add(entry.Repository, linkReference); linkIndexEntries.Add(entry.Repository, linkIndexEntry); registryUrlsByRepository[entry.Repository] = reader.RegistryUrl; } +catch (OperationCanceledException) +{ + throw; +} catch (Exception ex) { _logger.LogWarning(ex, "Error fetching link data for repository '{Repository}'. Cross-links to this repository may not resolve correctly.", entry.Repository);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs` around lines 50 - 64, The per-repository loop in DocSetConfigurationCrossLinkFetcher is currently catching all exceptions and thus swallowing OperationCanceledException from FetchLinkIndexEntryFromReader; add a specific catch (OperationCanceledException) before the existing general catch to re-throw the cancellation so the operation can abort (match the pattern used in TryGetRegistry()), leaving the existing _logger.LogWarning fallback for other exceptions unchanged and ensuring the cancellation propagates up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/tooling/docs-builder/Http/ReloadableGeneratorState.cs`:
- Around line 72-87: The code currently assigns the value returned by
_crossLinkFetcher.FetchCrossLinks(ctx) directly into _cachedCrossLinks, which
causes placeholder/fallback data from a transient registry failure to be cached
and never retried; change the flow to call var newLinks = await
_crossLinkFetcher.FetchCrossLinks(ctx) and only set _cachedCrossLinks = newLinks
when newLinks represents a real successful fetch (e.g. newLinks.IsFallback ==
false or newLinks != Placeholder), otherwise leave _cachedCrossLinks untouched
(and optionally log a warning); if FetchCrossLinks does not yet expose a
success/fallback indicator, update its return to a wrapper (or throw on
unrecoverable failure) so ReloadableGeneratorState can reliably decide whether
to cache the result, keeping the early-return behavior for non-config reloads
intact.
---
Outside diff comments:
In
`@src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs`:
- Around line 50-64: The per-repository loop in
DocSetConfigurationCrossLinkFetcher is currently catching all exceptions and
thus swallowing OperationCanceledException from FetchLinkIndexEntryFromReader;
add a specific catch (OperationCanceledException) before the existing general
catch to re-throw the cancellation so the operation can abort (match the pattern
used in TryGetRegistry()), leaving the existing _logger.LogWarning fallback for
other exceptions unchanged and ensuring the cancellation propagates up.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 79573cde-5549-464f-b68b-a151f72f10b8
📒 Files selected for processing (3)
src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cssrc/tooling/docs-builder/Http/ReloadGeneratorService.cssrc/tooling/docs-builder/Http/ReloadableGeneratorState.cs
Add IsComplete flag to FetchedCrossLinks so callers can distinguish a clean fetch from one that fell back to placeholder data. Only cache complete results in ReloadableGeneratorState so transient S3 outages get retried on the next reload instead of being sticky for the session. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Cache cross-link index across serve hot reloads (#2845) Every .md save in serve mode was re-fetching link-index.json over S3 for each cross-link entry, twice per repo, blocking the browser refresh. Cache FetchedCrossLinks across reloads (re-fetched only on configuration changes), and fold the duplicate per-repo GetRegistry calls into one fetch per registry up front. Same wins flow through to docs-builder build and codex builds via DocSetConfigurationCrossLinkFetcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Skip full rebuild and validation on content-only changes For plain .md edits during serve, skip the DocumentationSet rebuild, ResolveDirectoryTree, and in-memory validation build entirely — the serve path already reads fresh content from disk via ParseFullAsync. Full rebuilds still run for structural changes (config/toc edits, file add/delete). Also watch common asset files (images, yml, toml) and trigger a browser-only refresh when they change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Propagate cancellation in TryGetRegistry, recreate fetcher on config reload Don't swallow OperationCanceledException in TryGetRegistry so cancellation propagates promptly instead of degrading to null. Recreate _crossLinkFetcher and _codexReader when configuration reloads so registry switches in docset.yml take effect immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Felipe Cotti <felipe.cotti@elastic.co>
Closes #2845.
Why
In
docs-builder serve, every.mdsave was re-fetchinglink-index.jsonfrom S3 for every cross-link entry indocset.yml, twice per repo (once viaFetchCrossLinksFromReader, once explicitly afterwards). For a docset with ~30 cross-link entries that's ~60 S3 GetObject calls per keystroke save, on the critical path before the browser refresh fires. Reporters described "the page reload doesn't do anything until this process is finished."The fetcher's
configurationreference is captured at construction time, so cross-link entries can't actually change during a serve session. Re-fetching on every.mdedit was always wasted work.What
FetchedCrossLinksinReloadableGeneratorState. First fetch populates the cache; subsequent reloads reuse it unlessreloadConfiguration: true(i.e.docset.yml/toc.yml/_docset.ymlchanged).FetchCrossLinkscall inDocSetConfigurationCrossLinkFetcher, instead of once per repo, twice. The duplicateGetRegistrycall at the bottom of the per-repo loop is gone — the entry lookup now uses the pre-fetched registry.TryGetRegistryto preserve the existing "log warning, fall through to empty placeholder" resilience when S3 is unreachable. The end-state when S3 is down is identical to before (every repo gets an empty placeholder); the log shape changes slightly (one registry-level warning + N "not found" warnings, instead of N fetch warnings).For an N-entry docset, S3 GetObject calls on cross-link fetch drop from 2N → 1 (single-registry) or 2N → 2 (dual-registry). On a
.mdsave in serve mode the count drops to 0 after the first fetch warms the cache.These optimizations also flow through to
docs-builder buildand codex builds, which use the sameDocSetConfigurationCrossLinkFetcher. The instance-level cache is serve-only (a fresh fetcher is constructed per build), but the registry dedup applies everywhere.Benchmark
Setup:
docs-builder build --path docs-content --skip-apiagainst a local clone of elastic/docs-content (58 cross-link entries). Native ARM64 binaries, 10-core M-series Mac, 3 runs each. Per-repolinks-*.jsonwere already warm in the disk cache, isolating thelink-index.jsonregistry-fetch path.main(today)~4× faster on this docset. User CPU is unchanged — the same total work happens — but the cross-link fetch is no longer a serial network bottleneck, so the downstream parse phase parallelizes across cores. The
main-vs-v1.6.0delta is within noise: the speedup is almost entirely attributable to this PR.How this scales on CI runners
The local 4× number reflects ~5-core effective parallelism. The speedup is sensitive to two environment factors:
Core count.
mainblocks ~10–20s of wall time on 58 × 2 sequential S3 calls tolink-index.json, independent of core count. After this PR that collapses to one ~200ms call, and the remaining ~22s of user CPU work fans out to whatever cores are available. Rough projections:mainwallDisk-cache state. Per-repo
links-*.jsonare still fetched sequentially in the foreach loop — only the registry was deduplicated here. On a cold cache (e.g. a fresh runner withoutactions/cachefor~/.local/share/elastic/docs-builder/links/) bothmainand this PR pay an extra ~58 sequential per-repo S3 round-trips, narrowing the relative gap to ~1.5–2× while keeping the absolute wall-clock saving similar.So even on a 2-core hosted runner with a cold cache, this PR should noticeably reduce wall-clock build time. Parallelizing the per-repo links fetch would help cold-cache CI further — out of scope here, clean follow-up.
Not in this PR
A few related improvements were considered and deliberately deferred:
link-index.jsonchanges until you touchdocset.ymlor restart serve. Mitigations exist (manual invalidation hook via HTTP endpoint, sentinel file, or time-based expiry) but each adds surface area for a workflow that's relatively rare. Worth adding if real users complain; not worth adding speculatively.CrossLinkEntriesonly references one of them. A pre-scan could skip the unneeded fetch (saving ~100–300ms in those cases). Modest win and orthogonal to this issue's hot-reload bug — left as a clean follow-up.links-*.jsonfetch. The foreach loop overCrossLinkEntriesis still sequential. Worth parallelizing for cold-cache CI runs, but not strictly required by Local hot reload has slowed down #2845.ReloadAsyncstill constructs a freshDocumentationSetand re-runsResolveDirectoryTreeon every save. Reusing the set across reloads isn't safe: navigation, breadcrumbs, and cross-references all depend on every file's parsed metadata being current, and the set is effectively immutable. A real incremental reload needs page-level HTML caching keyed by content/frontmatter — that's a separate, larger refactor.