Skip to content

Replace sphinx-multiversion with per-branch docs builds#5227

Open
AntoineRichard wants to merge 7 commits intomainfrom
antoiner/docs-per-branch-builds-main
Open

Replace sphinx-multiversion with per-branch docs builds#5227
AntoineRichard wants to merge 7 commits intomainfrom
antoiner/docs-per-branch-builds-main

Conversation

@AntoineRichard
Copy link
Copy Markdown
Collaborator

Summary

Same changes as the develop PR (#5226), applied to main's diverged files.

See spec: docs/superpowers/specs/2026-04-10-multi-version-docs-design.md

Test plan

  • Docs CI builds successfully on this PR
  • After merge, verify docs deploy correctly for main

Same changes as the develop PR (#5226), applied to main's diverged
files. Removes sphinx-multiversion, adds per-branch CI builds with
version switcher, and one-time migration workflow.
@github-actions github-actions bot added documentation Improvements or additions to documentation infrastructure labels Apr 10, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 10, 2026

Greptile Summary

This PR replaces sphinx-multiversion with a per-branch docs-build approach: each branch/tag builds only its own docs, a custom JavaScript version switcher + the pydata-sphinx-theme built-in switcher provide navigation across versions, and a one-time migration workflow (docs-migration.yaml) rebuilds all historical versions.

  • Race condition in versions.json update (docs.yaml, lines 141–196): the deploy job fetches the live versions.json, adds the current branch's entry, and force-pushes. Two concurrent deploys (e.g. a release tag and main merging simultaneously) will each start from the same stale JSON and whichever push wins will silently drop the other branch's entry from the version switcher.
  • DOCS_VERSION_SLUG not set in migration workflow (docs-migration.yaml, lines 76–81): branches with the new conf.py (main, develop, release branches) will build with version_match falling back to the semver string instead of the directory slug, so the built-in theme switcher will not highlight the current version.

Confidence Score: 4/5

Functionally sound for single-branch deploys, but has a real race condition that can silently drop version entries from the switcher when two branches deploy concurrently

Two P1 findings: a fetch-mutate-push race on versions.json and the missing DOCS_VERSION_SLUG in the migration workflow. The race is unlikely in normal operation but is a true correctness bug when it does occur, and the missing env var will cause incorrect version highlighting for every branch built during migration.

.github/workflows/docs.yaml (race condition in versions.json update) and .github/workflows/docs-migration.yaml (missing DOCS_VERSION_SLUG env var)

Important Files Changed

Filename Overview
.github/workflows/docs.yaml New per-branch deploy workflow; has a race condition in the versions.json upsert step and dead code in the NAME assignment block
.github/workflows/docs-migration.yaml One-time migration workflow that rebuilds all historical docs; missing DOCS_VERSION_SLUG env var for recent branches causes wrong version_match in the built-in switcher
docs/_static/version-switcher.js New standalone version switcher injected into old docs that lack the theme built-in; path parsing assumes /project/version/ URL structure, which is correct for the current gh-pages setup
docs/conf.py Removes sphinx_multiversion extension, adds pydata-sphinx-theme built-in version switcher with DOCS_VERSION_SLUG env-var fallback; clean change
docs/requirements.txt Drops sphinx-multiversion, pins sphinx>=7.0,<9 and sphinx-book-theme>=1.1 for version-switcher support
docs/Makefile Adds current-docs target that builds into _build/current, used by docs.yaml CI
docs/_templates/versioning.html Deleted sphinx-multiversion sidebar template, superseded by the theme built-in and standalone JS switcher

Sequence Diagram

sequenceDiagram
    participant GH as GitHub Push (branch/tag)
    participant CI as docs.yaml CI
    participant Pages as gh-pages
    participant Site as Live Site

    GH->>CI: trigger on push
    CI->>CI: build docs (sphinx-build)
    CI->>CI: set DOCS_VERSION_SLUG
    CI->>Pages: download existing versions.json
    Pages-->>CI: versions.json (may be stale)
    CI->>CI: upsert current version entry
    CI->>Pages: deploy site/SLUG/ + versions.json (keep_files: true)
    Pages-->>Site: updated versions.json + versioned docs

    Note over CI,Pages: Race: two concurrent pushes both read stale JSON and overwrite each other

    participant Mig as docs-migration.yaml
    Note over Mig: One-shot: rebuilds all historical versions
    Mig->>Mig: matrix build all refs
    Mig->>Pages: deploy full site (keep_files: false - full reset)
Loading

Reviews (1): Last reviewed commit: "Replace sphinx-multiversion with per-bra..." | Re-trigger Greptile

Comment on lines +141 to +196
# Fetch existing versions.json from gh-pages (may not exist yet)
PAGES_URL="https://isaac-sim.github.io/IsaacLab/versions.json"
if curl -sf "$PAGES_URL" -o existing.json; then
echo "Fetched existing versions.json"
else
echo "[]" > existing.json
echo "No existing versions.json, starting fresh"
fi

# Determine display name
REF="${{ github.ref }}"
if [[ "$REF" == refs/tags/* ]]; then
NAME="${SLUG}"
else
NAME="${SLUG}"
fi

# Upsert this version's entry
python3 -c "
import json, sys

slug = '${SLUG}'
name = '${NAME}'

with open('existing.json') as f:
versions = json.load(f)

# Remove existing entry for this slug
versions = [v for v in versions if v['version'] != slug]

# Add updated entry
entry = {
'name': name,
'version': slug,
'url': '/IsaacLab/' + slug + '/',
}
versions.append(entry)

# Sort: branches first (main, develop), then releases, then tags by version
def sort_key(v):
s = v['version']
if s == 'main':
return (0, s)
elif s == 'develop':
return (1, s)
elif s.startswith('release-'):
return (2, s)
else:
return (3, s)

versions.sort(key=sort_key)

with open('site/versions.json', 'w') as f:
json.dump(versions, f, indent=2)
f.write('\n')
"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Race condition in versions.json upsert

Two concurrent deploys (e.g. a release tag and main merging in quick succession) each fetch the live versions.json, compute their own updated copy independently, and force-push. Whichever push lands second silently overwrites the first, dropping the other branch's entry from the version switcher for all users until that branch deploys again.

The fetch → mutate → push pattern is inherently racy without a lock or atomic update mechanism. Consider using a dedicated gh-pages-update workflow triggered serially (e.g. via workflow_run with concurrency: group: versions-json, cancel-in-progress: false) or generating versions.json from the directory listing of the deployed site inside the push step rather than before it.

Comment on lines +76 to +81
- name: Build docs
working-directory: ./docs
# No -W flag: old versions may have warnings that are acceptable.
run: |
rm -rf "_build"
sphinx-build -b html . "_build"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 DOCS_VERSION_SLUG not passed to migration builds

For branches that already carry the updated conf.py (main, develop, release/2.1.0, release/2.2.0, release/2.3.0), _version_slug will fall back to the semver from the VERSION file (e.g. "3.0.0") instead of the directory slug (e.g. "main"). The pydata-sphinx-theme built-in switcher uses version_match to highlight the current version; a mismatch means no version is highlighted and users lose orientation. The regular docs.yaml sets this variable correctly:

Suggested change
- name: Build docs
working-directory: ./docs
# No -W flag: old versions may have warnings that are acceptable.
run: |
rm -rf "_build"
sphinx-build -b html . "_build"
- name: Build docs
working-directory: ./docs
env:
DOCS_VERSION_SLUG: ${{ steps.slug.outputs.slug }}
# No -W flag: old versions may have warnings that are acceptable.
run: |
rm -rf "_build"
sphinx-build -b html . "_build"

Comment on lines +151 to +156
REF="${{ github.ref }}"
if [[ "$REF" == refs/tags/* ]]; then
NAME="${SLUG}"
else
NAME="${SLUG}"
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead conditional — both branches assign the same value

Both arms of the if/else set NAME="${SLUG}", so the condition is never meaningful. The block can be simplified or removed entirely.

Suggested change
REF="${{ github.ref }}"
if [[ "$REF" == refs/tags/* ]]; then
NAME="${SLUG}"
else
NAME="${SLUG}"
fi
NAME="${SLUG}"

Copy link
Copy Markdown
Contributor

@kellyguo11 kellyguo11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Replace sphinx-multiversion with per-branch docs builds

Good direction — sphinx-multiversion has been unmaintained since 2020 and has been a recurring source of build pain. Per-branch builds are simpler, more reliable, and easier to reason about. This mirrors the same approach as #5226 on develop, applied to main's diverged files.

CI Status

Both CI failures are pre-existing / not caused by this PR:

  1. Docs build failure — 18 autosummary.import_cycle warnings (pre-existing on main) treated as errors by -W. Additionally, the pydata-sphinx-theme version switcher tries to fetch versions.json at build time and logs a warning when it 404s (expected — the file won't exist until the first deploy). The previous PR #5223 added suppress_warnings = ["autosummary.import_cycle"] to conf.py but this PR doesn't include that suppression. Since this PR says it supersedes #5223, it should carry forward the warning suppression.

  2. Link checker failurehttps://graphics.pixar.com/usd/docs/index.html returns 404. Pre-existing, unrelated to this PR.

Issues

Should-fix:

  1. Missing suppress_warnings for autosummary.import_cycle — PR #5223 (which this supersedes) added suppression for these warnings. Without it, the -W flag in make current-docs will keep failing on main. Need to add suppress_warnings = ["autosummary.import_cycle"] to conf.py or the build will never pass.

  2. versions.json 404 warning breaks -W build — When building locally or in CI before the first deploy, the pydata-sphinx-theme version switcher fetches versions.json and emits a warning on 404. With -W (warnings as errors), this is fatal. Consider either:

    • Suppressing this specific warning category in conf.py
    • Adding a placeholder versions.json to the repo
    • Or documenting that the first build will need -W relaxed
  3. Shell injection risk in docs.yaml deploy step — The SLUG variable is interpolated directly into shell via ${SLUG} in the "Assemble deploy directory" and "Update versions.json" steps. While the branch/tag names are constrained by the workflow trigger patterns, the Python script uses '${SLUG}' and '${NAME}' (single-quoted shell substitution inside a Python heredoc), which could break if a tag/branch name contains single quotes. Consider passing these as environment variables that Python reads via os.environ instead.

  4. docs-migration.yaml uses actions/download-artifact@v8 — This action version doesn't exist yet (current latest is v4). The main docs.yaml also uses @v7 for upload and @v8 for download. These should be pinned to actually-released versions (v4 for both).

  5. actions/checkout@v6 doesn't exist — Current latest is v4. Both workflow files reference @v6. This will cause the workflows to fail.

Suggestions:

  1. Duplicate slug determination logic — The "Determine version slug" step appears in both build-docs and deploy-docs jobs in docs.yaml with identical logic. Consider extracting it once and passing via job outputs, or at minimum adding a comment noting the intentional duplication.

  2. Migration workflow actions/upload-artifact@v7 — Same version concern as above; should be @v4.

  3. XHR in version-switcher.jsXMLHttpRequest is legacy. Since this is new code, consider using fetch() with a fallback, or just fetch() since all modern browsers support it. Not blocking, but the code would be cleaner.

  4. HEAD request for page existence check — The version switcher does a HEAD request to check if a page exists in the target version before navigating. This is a nice UX touch, but be aware some CDN/hosting setups don't handle HEAD well for static files. A comment explaining the fallback behavior would help future maintainers.

  5. develdevelop rename in trigger — The workflow trigger changes devel to develop. Make sure all references to the old branch name are cleaned up across the repo (not just this file).

Overall

Solid architectural improvement. The standalone version-switcher.js approach is clever — it allows injecting version switching into old builds that never had it. The migration workflow for one-time historical rebuild is well thought out.

Main blockers are the action version references (v6/v7/v8 don't exist) and the missing warning suppression from #5223. Fix those and this is good to go.

},
],
"icon_links_label": "Quick Links",
"switcher": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR supersedes #5223, which added suppress_warnings = ["autosummary.import_cycle"] to fix the 18 import_cycle warnings that break -W builds on main. That suppression isn't carried forward here, so the docs build will continue to fail.

Also, the pydata-sphinx-theme version switcher emits a warning when versions.json returns 404 (which it will until first deploy), adding another -W failure.

Suggestion: add the suppression from #5223 here too:

suppress_warnings = ["autosummary.import_cycle"]

And consider suppressing the version-switcher fetch warning as well, or providing a placeholder versions.json.

if: "${{ github.repository == env.REPO_NAME && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/devel' || startsWith(github.ref, 'refs/heads/release/')) }}"
if: >-
github.repository == env.REPO_NAME &&
(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both actions/checkout@v6 and actions/upload-artifact@v7 / actions/download-artifact@v8 referenced in this workflow don't exist yet. Current latest versions are @v4 for all of these. The workflow will fail to run as-is.

Same issue in docs-migration.yaml.

env:
SLUG: ${{ steps.slug.outputs.slug }}
run: |
mkdir -p site
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SLUG and NAME variables are interpolated into the Python heredoc via '${SLUG}' / '${NAME}'. If a branch/tag name ever contains a single quote, this breaks the Python string literal. Safer approach:

import os
slug = os.environ['SLUG']
name = os.environ['NAME']

(with SLUG and NAME passed as env vars to the step, which you already have for SLUG).

- Add deploy concurrency group to serialize gh-pages writes (race fix)
- Only set DOCS_VERSION_SLUG for deploy branches, unset for PRs
- SHA-pin peaceiris/actions-gh-pages to v4.0.0
- Set DOCS_VERSION_SLUG in migration workflow build step
- Use os.environ instead of shell interpolation in Python heredoc
- Remove dead NAME conditional block
- Suppress config.invalid_switcher_json and autosummary.import_cycle warnings
- Exclude superpowers/ directory from Sphinx build
- SHA-pin all GitHub Actions (checkout, setup-python, upload/download-artifact)
- Add explicit permissions: (contents: read default, write for deploy)
- Pass github.ref via env: block to avoid shell injection
- Add empty-slug guard in deploy job (fail-fast instead of corrupt root)
- Fetch versions.json from gh-pages branch via gh api (avoid CDN cache)
- Add Python input validation: JSON decode errors, schema checks, empty slug
- Use heredoc for inline Python (avoids shell interpolation issues)
- Add sed injection verification count
- Add min-count assertion in migration workflow
- JS: replace empty catch with console.warn, add onerror handler, validate array
- Use 'or' fallback for DOCS_VERSION_SLUG (empty string != unset)
- Add explicit GH_TOKEN to versions.json fetch step
- Add sed injection verification to migration workflow
- Emit ::warning:: when injection count mismatches in docs.yaml
- Log non-200 HTTP responses in version-switcher.js
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 10, 2026

Test Results Summary

3 687 tests   3 158 ✅  2h 28m 40s ⏱️
  119 suites    528 💤
    1 files        0 ❌  1 🔥

For more details on these errors, see this check.

Results for commit e864af5.

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR replaces the unmaintained sphinx-multiversion with a per-branch/per-tag docs build strategy. Each push to main, develop, release/**, or a version tag builds only its own docs and deploys to a versioned subdirectory on gh-pages. A standalone version-switcher.js is injected into all versions (including historical ones via a one-time migration workflow) to provide cross-version navigation. The approach is sound and the implementation is well-structured.

Design Assessment

Design is sound. Replacing sphinx-multiversion with per-branch builds is the right call — sphinx-multiversion has been unmaintained since 2020, requires fetching all branches to build any version, and causes frequent CI pain. The new approach is simpler, faster (each version builds independently), and the injected JS switcher elegantly handles backward compatibility with old versions that never had a version dropdown. The concurrency serialization on the deploy job prevents race conditions on versions.json. Good engineering.

Findings

🟡 Warning: Hardcoded /IsaacLab/ path prefix throughout — Multiple files

The repo name IsaacLab is hardcoded in several places:

  • docs-migration.yaml:128sed injection: <script src="https://github.com/IsaacLab/version-switcher.js">
  • docs-migration.yaml:160versions.json URL entries: '/IsaacLab/' + name + '/'
  • docs.yaml:154 — same sed injection
  • docs.yaml:196 — same URL pattern
  • version-switcher.js:30 — comment says "Works for both /IsaacLab/main/... and custom domain setups" but the path-parsing logic assumes a /project/version/... URL structure that would break on a custom domain (where there's no project prefix)
  • conf.py:305json_url uses the full GitHub Pages URL

If this repo is ever renamed or the docs move to a custom domain, all of these break silently. Consider extracting the base path as a workflow-level variable (e.g., DOCS_BASE_PATH: /IsaacLab/) and passing it through, at least for the workflow files. The JS comment on line 30 of version-switcher.js should also be corrected — it does not currently work for custom domain setups.

🟡 Warning: Unpinned upper bound on sphinx-book-themedocs/requirements.txt:3

The change from sphinx-book-theme==1.0.1 to sphinx-book-theme>=1.1 with no upper bound means a future major version bump could break docs builds without any change to this repo. sphinx>=7.0,<9 has a sensible upper bound; sphinx-book-theme should follow the same pattern:

sphinx-book-theme>=1.1,<2

🔵 Suggestion: version-switcher.js uses XMLHttpRequest instead of fetchdocs/_static/version-switcher.js

Since this is brand new code, consider using fetch() instead of XMLHttpRequest. All browsers that support ES5 strict mode (which this code uses) also support fetch(). The HEAD request for page-existence checking (lines 66-75) and the GET for versions.json (lines 101-116) would both be cleaner with fetch. Not blocking, but it would reduce the code by ~30%.

🔵 Suggestion: Duplicate slug-determination logic.github/workflows/docs.yaml

The "Determine version slug" step appears twice in docs.yaml — once in build-docs (lines 59-74) and again in deploy-docs (lines 112-126) with identical logic. Since the build job already computes the slug, consider passing it as a job output to avoid the duplication and potential for the two copies to drift:

outputs:
  slug: ${{ steps.slug.outputs.slug }}

🔵 Suggestion: Migration workflow matrix could be auto-discovered.github/workflows/docs-migration.yaml

The migration workflow hardcodes 22 refs in the matrix. If a new release is cut before migration runs, it'll be missed. A minor concern since this is a one-time workflow, but worth a comment in the file noting that the matrix should be updated if new releases are created before the migration is executed.

Test Coverage

This is a documentation/infrastructure change with no runtime code changes.

  • Docs CI (Build Docs): ✅ Passes — the make current-docs target builds successfully with the new configuration.
  • Warning suppressions: Properly added for autosummary.import_cycle and config.invalid_switcher_json, preventing -W failures.
  • Manual test plan: PR includes two test items (CI builds, post-merge deploy verification) which are appropriate for this type of change.
  • No new runtime tests needed — this affects only the docs build pipeline, not the simulation framework.

CI Status

  • Build Docs — passes
  • pre-commit, license-check, labeler, test-general, test-isaaclab-tasks — pass
  • Check for Broken Links — pre-existing failure (Pixar USD docs URL returns 404), unrelated to this PR
  • Tests Summary / combine-results / Build and Test Results — 1 test failure out of 2,952, pre-existing on main, unrelated to this PR
  • ⏭️ Deploy Docs — skipped (expected for PRs, only runs on push to deploy branches)

Verdict

COMMENT

This is a well-designed infrastructure improvement. The only actionable item is pinning sphinx-book-theme's upper bound to prevent future surprise breakage. The hardcoded paths are a maintenance concern but not blocking given the current deployment target is fixed. The remaining suggestions are quality improvements that can be addressed at the author's discretion.

# for building the docs
sphinx-book-theme==1.0.1
sphinx>=7.0,<9
sphinx-book-theme>=1.1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Warning: Unpinned upper bound — a future sphinx-book-theme 2.x release could break docs builds without any change to this repo. Consider:

Suggested change
sphinx-book-theme>=1.1
sphinx-book-theme>=1.1,<2

// The relative page path within the version (e.g. "api/index.html").
var pagePath = pathParts.slice(2).join("/") || "index.html";

var JSON_URL = basePath + "versions.json";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Suggestion: This comment is misleading — the path-parsing logic below assumes a /<project>/<version>/... URL structure (extracting pathParts[0] as the project and pathParts[1] as the version slug). On a custom domain without a project prefix (e.g., docs.example.com/main/...), pathParts[0] would be the version, not the project, and basePath would be wrong.

Consider updating the comment to:

Suggested change
var JSON_URL = basePath + "versions.json";
// Works for GitHub Pages at /<project>/<version>/... URL structure.

Split the unified docs workflow into release-specific handling.
This workflow covers main/release branch pushes, tag deploys,
and PRs against main/release branches. The develop branch is
handled separately by docs-dev.yml (PR #5226).

Key changes from the old unified workflow:
- Remove develop branch triggers (moved to docs-dev.yml)
- Add branch filter on pull_request to scope to main/release
- Add /stable/ alias creation for tag deploys
- Conditional root index.html only on tag deploys
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants