Skip to content

Cache cross-link index across serve hot reloads#3219

Merged
cotti merged 4 commits into
mainfrom
fix/serve-cross-link-perf-2845
Apr 30, 2026
Merged

Cache cross-link index across serve hot reloads#3219
cotti merged 4 commits into
mainfrom
fix/serve-cross-link-perf-2845

Conversation

@theletterf

@theletterf theletterf commented Apr 30, 2026

Copy link
Copy Markdown
Member

Closes #2845.

Why

In docs-builder serve, every .md save was re-fetching link-index.json from S3 for every cross-link entry in docset.yml, twice per repo (once via FetchCrossLinksFromReader, once explicitly afterwards). For a docset with ~30 cross-link entries that's ~60 S3 GetObject calls per keystroke save, on the critical path before the browser refresh fires. Reporters described "the page reload doesn't do anything until this process is finished."

The fetcher's configuration reference is captured at construction time, so cross-link entries can't actually change during a serve session. Re-fetching on every .md edit was always wasted work.

What

  • Cache FetchedCrossLinks in ReloadableGeneratorState. First fetch populates the cache; subsequent reloads reuse it unless reloadConfiguration: true (i.e. docset.yml / toc.yml / _docset.yml changed).
  • Pre-fetch each registry once per FetchCrossLinks call in DocSetConfigurationCrossLinkFetcher, instead of once per repo, twice. The duplicate GetRegistry call at the bottom of the per-repo loop is gone — the entry lookup now uses the pre-fetched registry.
  • Add TryGetRegistry to preserve the existing "log warning, fall through to empty placeholder" resilience when S3 is unreachable. The end-state when S3 is down is identical to before (every repo gets an empty placeholder); the log shape changes slightly (one registry-level warning + N "not found" warnings, instead of N fetch warnings).

For an N-entry docset, S3 GetObject calls on cross-link fetch drop from 2N → 1 (single-registry) or 2N → 2 (dual-registry). On a .md save in serve mode the count drops to 0 after the first fetch warms the cache.

These optimizations also flow through to docs-builder build and codex builds, which use the same DocSetConfigurationCrossLinkFetcher. The instance-level cache is serve-only (a fresh fetcher is constructed per build), but the registry dedup applies everywhere.

Benchmark

Setup: docs-builder build --path docs-content --skip-api against a local clone of elastic/docs-content (58 cross-link entries). Native ARM64 binaries, 10-core M-series Mac, 3 runs each. Per-repo links-*.json were already warm in the disk cache, isolating the link-index.json registry-fetch path.

Build Median wall-clock User CPU CPU usage
installed v1.6.0 ~27.5s ~10s ~70%
main (today) ~26.9s ~22s ~120%
this PR ~6.7s ~22s ~462%

~4× faster on this docset. User CPU is unchanged — the same total work happens — but the cross-link fetch is no longer a serial network bottleneck, so the downstream parse phase parallelizes across cores. The main-vs-v1.6.0 delta is within noise: the speedup is almost entirely attributable to this PR.

How this scales on CI runners

The local 4× number reflects ~5-core effective parallelism. The speedup is sensitive to two environment factors:

  • Core count. main blocks ~10–20s of wall time on 58 × 2 sequential S3 calls to link-index.json, independent of core count. After this PR that collapses to one ~200ms call, and the remaining ~22s of user CPU work fans out to whatever cores are available. Rough projections:

    Cores main wall this PR wall Speedup
    2 (small CI) ~25s ~12s ~2×
    4 (standard CI) ~25s ~7s ~3.5×
    8+ (large CI) ~25s ~4s ~6×
  • Disk-cache state. Per-repo links-*.json are still fetched sequentially in the foreach loop — only the registry was deduplicated here. On a cold cache (e.g. a fresh runner without actions/cache for ~/.local/share/elastic/docs-builder/links/) both main and this PR pay an extra ~58 sequential per-repo S3 round-trips, narrowing the relative gap to ~1.5–2× while keeping the absolute wall-clock saving similar.

So even on a 2-core hosted runner with a cold cache, this PR should noticeably reduce wall-clock build time. Parallelizing the per-repo links fetch would help cold-cache CI further — out of scope here, clean follow-up.

Not in this PR

A few related improvements were considered and deliberately deferred:

  • Stale upstream cross-links during a serve session. With the cache in place, if another team publishes new docs while you're running serve, you won't pick up their link-index.json changes until you touch docset.yml or restart serve. Mitigations exist (manual invalidation hook via HTTP endpoint, sentinel file, or time-based expiry) but each adds surface area for a workflow that's relatively rare. Worth adding if real users complain; not worth adding speculatively.
  • Skip-by-build-context registry fetches. In dual-registry mode we still fetch both the public and codex registries even when CrossLinkEntries only references one of them. A pre-scan could skip the unneeded fetch (saving ~100–300ms in those cases). Modest win and orthogonal to this issue's hot-reload bug — left as a clean follow-up.
  • Parallel per-repo links-*.json fetch. The foreach loop over CrossLinkEntries is still sequential. Worth parallelizing for cold-cache CI runs, but not strictly required by Local hot reload has slowed down #2845.
  • Incremental file-level reload (the original "fix aside directive #4" direction). ReloadAsync still constructs a fresh DocumentationSet and re-runs ResolveDirectoryTree on every save. Reusing the set across reloads isn't safe: navigation, breadcrumbs, and cross-references all depend on every file's parsed metadata being current, and the set is effectively immutable. A real incremental reload needs page-level HTML caching keyed by content/frontmatter — that's a separate, larger refactor.
  • Asymmetry between public and codex docsets. Codex docsets can cross-link to public repos; public docsets cannot cross-link to codex repos (the dual-registry routing is one-way). This isn't introduced by this PR, but surfaced during analysis. Whether the asymmetry is intentional or worth fixing is a product question for a different PR.

Every .md save in serve mode was re-fetching link-index.json over S3
for each cross-link entry, twice per repo, blocking the browser
refresh. Cache FetchedCrossLinks across reloads (re-fetched only on
configuration changes), and fold the duplicate per-repo GetRegistry
calls into one fetch per registry up front. Same wins flow through to
docs-builder build and codex builds via DocSetConfigurationCrossLinkFetcher.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@theletterf theletterf requested a review from a team as a code owner April 30, 2026 13:58
@theletterf theletterf added the bug label Apr 30, 2026
@theletterf theletterf requested a review from Mpdreamz April 30, 2026 13:58

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs`:
- Around line 96-105: The TryGetRegistry method currently swallows all
exceptions including cancellation; update it so cancellation is not swallowed by
allowing OperationCanceledException (and TaskCanceledException if used) to
propagate: either add an early check or rethrow when the caught exception is an
OperationCanceledException (e.g., if (ex is OperationCanceledException) throw;),
and only LogWarning/return null for non-cancellation exceptions from
reader.GetRegistry(ctx). Reference TryGetRegistry and
ILinkIndexReader.GetRegistry to locate the change.

In `@src/tooling/docs-builder/Http/ReloadableGeneratorState.cs`:
- Around line 72-78: When reloadConfiguration is true, you only reload
_cachedCrossLinks but leave _crossLinkFetcher and _codexReader wired to the
original startup configuration; recreate/refresh those instances from the
updated _context/config before fetching new cross-links. Specifically: when you
run _context.ReloadConfiguration() (and when reloadConfiguration is true), clear
or reset _cachedCrossLinks and reinstantiate _crossLinkFetcher and _codexReader
from the current context/configuration (the same factories or constructors used
at startup) so that the subsequent _crossLinkFetcher.FetchCrossLinks(ctx) call
uses the new registries and returns fresh cross-links.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e7087069-6fba-4667-a0e8-b1d230530955

📥 Commits

Reviewing files that changed from the base of the PR and between f834fcd and 8e52b82.

📒 Files selected for processing (2)
  • src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs
  • src/tooling/docs-builder/Http/ReloadableGeneratorState.cs

Comment thread src/tooling/docs-builder/Http/ReloadableGeneratorState.cs
For plain .md edits during serve, skip the DocumentationSet rebuild,
ResolveDirectoryTree, and in-memory validation build entirely — the
serve path already reads fresh content from disk via ParseFullAsync.
Full rebuilds still run for structural changes (config/toc edits,
file add/delete). Also watch common asset files (images, yml, toml)
and trigger a browser-only refresh when they change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…reload

Don't swallow OperationCanceledException in TryGetRegistry so
cancellation propagates promptly instead of degrading to null.
Recreate _crossLinkFetcher and _codexReader when configuration
reloads so registry switches in docset.yml take effect immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cotti cotti enabled auto-merge (squash) April 30, 2026 14:24
coderabbitai[bot]
coderabbitai Bot previously requested changes Apr 30, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs (1)

50-64: ⚠️ Potential issue | 🟠 Major

Propagate cancellation in the per-repository fetch loop.

The catch (Exception) block at lines 62–83 swallows OperationCanceledException from FetchLinkIndexEntryFromReader(), converting a stop/reload request into placeholder links instead of aborting. Add a catch (OperationCanceledException) before the general catch to re-throw, matching the pattern already in TryGetRegistry().

Suggested fix
 try
 {
 	if (registry is null || !registry.Repositories.TryGetValue(entry.Repository, out var repoBranches))
 		throw new Exception($"Repository {entry.Repository} not found in link index");

 	var linkIndexEntry = GetNextContentSourceLinkIndexEntry(repoBranches, entry.Repository);
 	var linkReference = await FetchLinkIndexEntryFromReader(reader, entry.Repository, linkIndexEntry, ctx);

 	linkReferences.Add(entry.Repository, linkReference);
 	linkIndexEntries.Add(entry.Repository, linkIndexEntry);
 	registryUrlsByRepository[entry.Repository] = reader.RegistryUrl;
 }
+catch (OperationCanceledException)
+{
+	throw;
+}
 catch (Exception ex)
 {
 	_logger.LogWarning(ex, "Error fetching link data for repository '{Repository}'. Cross-links to this repository may not resolve correctly.", entry.Repository);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs`
around lines 50 - 64, The per-repository loop in
DocSetConfigurationCrossLinkFetcher is currently catching all exceptions and
thus swallowing OperationCanceledException from FetchLinkIndexEntryFromReader;
add a specific catch (OperationCanceledException) before the existing general
catch to re-throw the cancellation so the operation can abort (match the pattern
used in TryGetRegistry()), leaving the existing _logger.LogWarning fallback for
other exceptions unchanged and ensuring the cancellation propagates up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/tooling/docs-builder/Http/ReloadableGeneratorState.cs`:
- Around line 72-87: The code currently assigns the value returned by
_crossLinkFetcher.FetchCrossLinks(ctx) directly into _cachedCrossLinks, which
causes placeholder/fallback data from a transient registry failure to be cached
and never retried; change the flow to call var newLinks = await
_crossLinkFetcher.FetchCrossLinks(ctx) and only set _cachedCrossLinks = newLinks
when newLinks represents a real successful fetch (e.g. newLinks.IsFallback ==
false or newLinks != Placeholder), otherwise leave _cachedCrossLinks untouched
(and optionally log a warning); if FetchCrossLinks does not yet expose a
success/fallback indicator, update its return to a wrapper (or throw on
unrecoverable failure) so ReloadableGeneratorState can reliably decide whether
to cache the result, keeping the early-return behavior for non-config reloads
intact.

---

Outside diff comments:
In
`@src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs`:
- Around line 50-64: The per-repository loop in
DocSetConfigurationCrossLinkFetcher is currently catching all exceptions and
thus swallowing OperationCanceledException from FetchLinkIndexEntryFromReader;
add a specific catch (OperationCanceledException) before the existing general
catch to re-throw the cancellation so the operation can abort (match the pattern
used in TryGetRegistry()), leaving the existing _logger.LogWarning fallback for
other exceptions unchanged and ensuring the cancellation propagates up.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 79573cde-5549-464f-b68b-a151f72f10b8

📥 Commits

Reviewing files that changed from the base of the PR and between 8e52b82 and cdac328.

📒 Files selected for processing (3)
  • src/Elastic.Documentation.Links/CrossLinks/DocSetConfigurationCrossLinkFetcher.cs
  • src/tooling/docs-builder/Http/ReloadGeneratorService.cs
  • src/tooling/docs-builder/Http/ReloadableGeneratorState.cs

Comment thread src/tooling/docs-builder/Http/ReloadableGeneratorState.cs Outdated
Add IsComplete flag to FetchedCrossLinks so callers can distinguish
a clean fetch from one that fell back to placeholder data. Only cache
complete results in ReloadableGeneratorState so transient S3 outages
get retried on the next reload instead of being sticky for the session.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cotti cotti merged commit 657d42e into main Apr 30, 2026
24 checks passed
@cotti cotti deleted the fix/serve-cross-link-perf-2845 branch April 30, 2026 14:54
@elastic elastic deleted a comment from coderabbitai Bot Apr 30, 2026
Mpdreamz pushed a commit that referenced this pull request May 11, 2026
* Cache cross-link index across serve hot reloads (#2845)

Every .md save in serve mode was re-fetching link-index.json over S3
for each cross-link entry, twice per repo, blocking the browser
refresh. Cache FetchedCrossLinks across reloads (re-fetched only on
configuration changes), and fold the duplicate per-repo GetRegistry
calls into one fetch per registry up front. Same wins flow through to
docs-builder build and codex builds via DocSetConfigurationCrossLinkFetcher.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Skip full rebuild and validation on content-only changes

For plain .md edits during serve, skip the DocumentationSet rebuild,
ResolveDirectoryTree, and in-memory validation build entirely — the
serve path already reads fresh content from disk via ParseFullAsync.
Full rebuilds still run for structural changes (config/toc edits,
file add/delete). Also watch common asset files (images, yml, toml)
and trigger a browser-only refresh when they change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Propagate cancellation in TryGetRegistry, recreate fetcher on config reload

Don't swallow OperationCanceledException in TryGetRegistry so
cancellation propagates promptly instead of degrading to null.
Recreate _crossLinkFetcher and _codexReader when configuration
reloads so registry switches in docset.yml take effect immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Felipe Cotti <felipe.cotti@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Local hot reload has slowed down aside directive

3 participants