Cache cross-link index across serve hot reloads by theletterf · Pull Request #3219 · elastic/docs-builder

theletterf · 2026-04-30T13:58:40Z

Closes #2845.

Why

In docs-builder serve, every .md save was re-fetching link-index.json from S3 for every cross-link entry in docset.yml, twice per repo (once via FetchCrossLinksFromReader, once explicitly afterwards). For a docset with ~30 cross-link entries that's ~60 S3 GetObject calls per keystroke save, on the critical path before the browser refresh fires. Reporters described "the page reload doesn't do anything until this process is finished."

The fetcher's configuration reference is captured at construction time, so cross-link entries can't actually change during a serve session. Re-fetching on every .md edit was always wasted work.

What

Cache FetchedCrossLinks in ReloadableGeneratorState. First fetch populates the cache; subsequent reloads reuse it unless reloadConfiguration: true (i.e. docset.yml / toc.yml / _docset.yml changed).
Pre-fetch each registry once per FetchCrossLinks call in DocSetConfigurationCrossLinkFetcher, instead of once per repo, twice. The duplicate GetRegistry call at the bottom of the per-repo loop is gone — the entry lookup now uses the pre-fetched registry.
Add TryGetRegistry to preserve the existing "log warning, fall through to empty placeholder" resilience when S3 is unreachable. The end-state when S3 is down is identical to before (every repo gets an empty placeholder); the log shape changes slightly (one registry-level warning + N "not found" warnings, instead of N fetch warnings).

For an N-entry docset, S3 GetObject calls on cross-link fetch drop from 2N → 1 (single-registry) or 2N → 2 (dual-registry). On a .md save in serve mode the count drops to 0 after the first fetch warms the cache.

These optimizations also flow through to docs-builder build and codex builds, which use the same DocSetConfigurationCrossLinkFetcher. The instance-level cache is serve-only (a fresh fetcher is constructed per build), but the registry dedup applies everywhere.

Benchmark

Setup: docs-builder build --path docs-content --skip-api against a local clone of elastic/docs-content (58 cross-link entries). Native ARM64 binaries, 10-core M-series Mac, 3 runs each. Per-repo links-*.json were already warm in the disk cache, isolating the link-index.json registry-fetch path.

Build	Median wall-clock	User CPU	CPU usage
installed v1.6.0	~27.5s	~10s	~70%
`main` (today)	~26.9s	~22s	~120%
this PR	~6.7s	~22s	~462%

~4× faster on this docset. User CPU is unchanged — the same total work happens — but the cross-link fetch is no longer a serial network bottleneck, so the downstream parse phase parallelizes across cores. The main-vs-v1.6.0 delta is within noise: the speedup is almost entirely attributable to this PR.

How this scales on CI runners

The local 4× number reflects ~5-core effective parallelism. The speedup is sensitive to two environment factors:

Core count. main blocks ~10–20s of wall time on 58 × 2 sequential S3 calls to link-index.json, independent of core count. After this PR that collapses to one ~200ms call, and the remaining ~22s of user CPU work fans out to whatever cores are available. Rough projections:

Cores main wall this PR wall Speedup

2 (small CI) ~25s ~12s ~2×

4 (standard CI) ~25s ~7s ~3.5×

8+ (large CI) ~25s ~4s ~6×
Disk-cache state. Per-repo links-*.json are still fetched sequentially in the foreach loop — only the registry was deduplicated here. On a cold cache (e.g. a fresh runner without actions/cache for ~/.local/share/elastic/docs-builder/links/) both main and this PR pay an extra ~58 sequential per-repo S3 round-trips, narrowing the relative gap to ~1.5–2× while keeping the absolute wall-clock saving similar.

So even on a 2-core hosted runner with a cold cache, this PR should noticeably reduce wall-clock build time. Parallelizing the per-repo links fetch would help cold-cache CI further — out of scope here, clean follow-up.

Not in this PR

A few related improvements were considered and deliberately deferred:

Stale upstream cross-links during a serve session. With the cache in place, if another team publishes new docs while you're running serve, you won't pick up their link-index.json changes until you touch docset.yml or restart serve. Mitigations exist (manual invalidation hook via HTTP endpoint, sentinel file, or time-based expiry) but each adds surface area for a workflow that's relatively rare. Worth adding if real users complain; not worth adding speculatively.
Skip-by-build-context registry fetches. In dual-registry mode we still fetch both the public and codex registries even when CrossLinkEntries only references one of them. A pre-scan could skip the unneeded fetch (saving ~100–300ms in those cases). Modest win and orthogonal to this issue's hot-reload bug — left as a clean follow-up.
Parallel per-repo links-*.json fetch. The foreach loop over CrossLinkEntries is still sequential. Worth parallelizing for cold-cache CI runs, but not strictly required by Local hot reload has slowed down #2845.
Incremental file-level reload (the original "fix aside directive #4" direction). ReloadAsync still constructs a fresh DocumentationSet and re-runs ResolveDirectoryTree on every save. Reusing the set across reloads isn't safe: navigation, breadcrumbs, and cross-references all depend on every file's parsed metadata being current, and the set is effectively immutable. A real incremental reload needs page-level HTML caching keyed by content/frontmatter — that's a separate, larger refactor.
Asymmetry between public and codex docsets. Codex docsets can cross-link to public repos; public docsets cannot cross-link to codex repos (the dual-registry routing is one-way). This isn't introduced by this PR, but surfaced during analysis. Whether the asymmetry is intentional or worth fixing is a product question for a different PR.

Every .md save in serve mode was re-fetching link-index.json over S3 for each cross-link entry, twice per repo, blocking the browser refresh. Cache FetchedCrossLinks across reloads (re-fetched only on configuration changes), and fold the duplicate per-repo GetRegistry calls into one fetch per registry up front. Same wins flow through to docs-builder build and codex builds via DocSetConfigurationCrossLinkFetcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs`:
- Around line 96-105: The TryGetRegistry method currently swallows all
exceptions including cancellation; update it so cancellation is not swallowed by
allowing OperationCanceledException (and TaskCanceledException if used) to
propagate: either add an early check or rethrow when the caught exception is an
OperationCanceledException (e.g., if (ex is OperationCanceledException) throw;),
and only LogWarning/return null for non-cancellation exceptions from
reader.GetRegistry(ctx). Reference TryGetRegistry and
ILinkIndexReader.GetRegistry to locate the change.

In `@src/tooling/docs-builder/Http/ReloadableGeneratorState.cs`:
- Around line 72-78: When reloadConfiguration is true, you only reload
_cachedCrossLinks but leave _crossLinkFetcher and _codexReader wired to the
original startup configuration; recreate/refresh those instances from the
updated _context/config before fetching new cross-links. Specifically: when you
run _context.ReloadConfiguration() (and when reloadConfiguration is true), clear
or reset _cachedCrossLinks and reinstantiate _crossLinkFetcher and _codexReader
from the current context/configuration (the same factories or constructors used
at startup) so that the subsequent _crossLinkFetcher.FetchCrossLinks(ctx) call
uses the new registries and returns fresh cross-links.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e7087069-6fba-4667-a0e8-b1d230530955

📥 Commits

Reviewing files that changed from the base of the PR and between f834fcd and 8e52b82.

📒 Files selected for processing (2)

src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs
src/tooling/docs-builder/Http/ReloadableGeneratorState.cs

For plain .md edits during serve, skip the DocumentationSet rebuild, ResolveDirectoryTree, and in-memory validation build entirely — the serve path already reads fresh content from disk via ParseFullAsync. Full rebuilds still run for structural changes (config/toc edits, file add/delete). Also watch common asset files (images, yml, toml) and trigger a browser-only refresh when they change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…reload Don't swallow OperationCanceledException in TryGetRegistry so cancellation propagates promptly instead of degrading to null. Recreate _crossLinkFetcher and _codexReader when configuration reloads so registry switches in docset.yml take effect immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs (1)

50-64: ⚠️ Potential issue | 🟠 Major

Propagate cancellation in the per-repository fetch loop.

The catch (Exception) block at lines 62–83 swallows OperationCanceledException from FetchLinkIndexEntryFromReader(), converting a stop/reload request into placeholder links instead of aborting. Add a catch (OperationCanceledException) before the general catch to re-throw, matching the pattern already in TryGetRegistry().

Suggested fix

 try
 {
 	if (registry is null || !registry.Repositories.TryGetValue(entry.Repository, out var repoBranches))
 		throw new Exception($"Repository {entry.Repository} not found in link index");

 	var linkIndexEntry = GetNextContentSourceLinkIndexEntry(repoBranches, entry.Repository);
 	var linkReference = await FetchLinkIndexEntryFromReader(reader, entry.Repository, linkIndexEntry, ctx);

 	linkReferences.Add(entry.Repository, linkReference);
 	linkIndexEntries.Add(entry.Repository, linkIndexEntry);
 	registryUrlsByRepository[entry.Repository] = reader.RegistryUrl;
 }
+catch (OperationCanceledException)
+{
+	throw;
+}
 catch (Exception ex)
 {
 	_logger.LogWarning(ex, "Error fetching link data for repository '{Repository}'. Cross-links to this repository may not resolve correctly.", entry.Repository);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs`
around lines 50 - 64, The per-repository loop in
DocSetConfigurationCrossLinkFetcher is currently catching all exceptions and
thus swallowing OperationCanceledException from FetchLinkIndexEntryFromReader;
add a specific catch (OperationCanceledException) before the existing general
catch to re-throw the cancellation so the operation can abort (match the pattern
used in TryGetRegistry()), leaving the existing _logger.LogWarning fallback for
other exceptions unchanged and ensuring the cancellation propagates up.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/tooling/docs-builder/Http/ReloadableGeneratorState.cs`:
- Around line 72-87: The code currently assigns the value returned by
_crossLinkFetcher.FetchCrossLinks(ctx) directly into _cachedCrossLinks, which
causes placeholder/fallback data from a transient registry failure to be cached
and never retried; change the flow to call var newLinks = await
_crossLinkFetcher.FetchCrossLinks(ctx) and only set _cachedCrossLinks = newLinks
when newLinks represents a real successful fetch (e.g. newLinks.IsFallback ==
false or newLinks != Placeholder), otherwise leave _cachedCrossLinks untouched
(and optionally log a warning); if FetchCrossLinks does not yet expose a
success/fallback indicator, update its return to a wrapper (or throw on
unrecoverable failure) so ReloadableGeneratorState can reliably decide whether
to cache the result, keeping the early-return behavior for non-config reloads
intact.

---

Outside diff comments:
In
`@src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs`:
- Around line 50-64: The per-repository loop in
DocSetConfigurationCrossLinkFetcher is currently catching all exceptions and
thus swallowing OperationCanceledException from FetchLinkIndexEntryFromReader;
add a specific catch (OperationCanceledException) before the existing general
catch to re-throw the cancellation so the operation can abort (match the pattern
used in TryGetRegistry()), leaving the existing _logger.LogWarning fallback for
other exceptions unchanged and ensuring the cancellation propagates up.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 79573cde-5549-464f-b68b-a151f72f10b8

📥 Commits

Reviewing files that changed from the base of the PR and between 8e52b82 and cdac328.

📒 Files selected for processing (3)

src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs
src/tooling/docs-builder/Http/ReloadGeneratorService.cs
src/tooling/docs-builder/Http/ReloadableGeneratorState.cs

Add IsComplete flag to FetchedCrossLinks so callers can distinguish a clean fetch from one that fell back to placeholder data. Only cache complete results in ReloadableGeneratorState so transient S3 outages get retried on the next reload instead of being sticky for the session. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixed

* Cache cross-link index across serve hot reloads (#2845) Every .md save in serve mode was re-fetching link-index.json over S3 for each cross-link entry, twice per repo, blocking the browser refresh. Cache FetchedCrossLinks across reloads (re-fetched only on configuration changes), and fold the duplicate per-repo GetRegistry calls into one fetch per registry up front. Same wins flow through to docs-builder build and codex builds via DocSetConfigurationCrossLinkFetcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Skip full rebuild and validation on content-only changes For plain .md edits during serve, skip the DocumentationSet rebuild, ResolveDirectoryTree, and in-memory validation build entirely — the serve path already reads fresh content from disk via ParseFullAsync. Full rebuilds still run for structural changes (config/toc edits, file add/delete). Also watch common asset files (images, yml, toml) and trigger a browser-only refresh when they change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Propagate cancellation in TryGetRegistry, recreate fetcher on config reload Don't swallow OperationCanceledException in TryGetRegistry so cancellation propagates promptly instead of degrading to null. Recreate _crossLinkFetcher and _codexReader when configuration reloads so registry switches in docset.yml take effect immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Felipe Cotti <felipe.cotti@elastic.co>

theletterf requested a review from a team as a code owner April 30, 2026 13:58

theletterf added the bug label Apr 30, 2026

theletterf requested a review from Mpdreamz April 30, 2026 13:58

coderabbitai Bot requested changes Apr 30, 2026

View reviewed changes

Comment thread src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs

Comment thread src/tooling/docs-builder/Http/ReloadableGeneratorState.cs

Mpdreamz approved these changes Apr 30, 2026

View reviewed changes

cotti enabled auto-merge (squash) April 30, 2026 14:24

coderabbitai Bot previously requested changes Apr 30, 2026

View reviewed changes

Comment thread src/tooling/docs-builder/Http/ReloadableGeneratorState.cs Outdated

cotti merged commit 657d42e into main Apr 30, 2026
24 checks passed

cotti deleted the fix/serve-cross-link-perf-2845 branch April 30, 2026 14:54

elastic deleted a comment from coderabbitai Bot Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache cross-link index across serve hot reloads#3219

Cache cross-link index across serve hot reloads#3219
cotti merged 4 commits into
mainfrom
fix/serve-cross-link-perf-2845

theletterf commented Apr 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Cores	`main` wall	this PR wall	Speedup
2 (small CI)	~25s	~12s	~2×
4 (standard CI)	~25s	~7s	~3.5×
8+ (large CI)	~25s	~4s	~6×

Uh oh!

Conversation

theletterf commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Benchmark

How this scales on CI runners

Not in this PR

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

theletterf commented Apr 30, 2026 •

edited

Loading