Replace sphinx-multiversion with per-branch docs builds#5227
Replace sphinx-multiversion with per-branch docs builds#5227AntoineRichard wants to merge 7 commits intomainfrom
Conversation
Same changes as the develop PR (#5226), applied to main's diverged files. Removes sphinx-multiversion, adds per-branch CI builds with version switcher, and one-time migration workflow.
Greptile SummaryThis PR replaces
Confidence Score: 4/5Functionally sound for single-branch deploys, but has a real race condition that can silently drop version entries from the switcher when two branches deploy concurrently Two P1 findings: a fetch-mutate-push race on versions.json and the missing DOCS_VERSION_SLUG in the migration workflow. The race is unlikely in normal operation but is a true correctness bug when it does occur, and the missing env var will cause incorrect version highlighting for every branch built during migration. .github/workflows/docs.yaml (race condition in versions.json update) and .github/workflows/docs-migration.yaml (missing DOCS_VERSION_SLUG env var) Important Files Changed
Sequence DiagramsequenceDiagram
participant GH as GitHub Push (branch/tag)
participant CI as docs.yaml CI
participant Pages as gh-pages
participant Site as Live Site
GH->>CI: trigger on push
CI->>CI: build docs (sphinx-build)
CI->>CI: set DOCS_VERSION_SLUG
CI->>Pages: download existing versions.json
Pages-->>CI: versions.json (may be stale)
CI->>CI: upsert current version entry
CI->>Pages: deploy site/SLUG/ + versions.json (keep_files: true)
Pages-->>Site: updated versions.json + versioned docs
Note over CI,Pages: Race: two concurrent pushes both read stale JSON and overwrite each other
participant Mig as docs-migration.yaml
Note over Mig: One-shot: rebuilds all historical versions
Mig->>Mig: matrix build all refs
Mig->>Pages: deploy full site (keep_files: false - full reset)
Reviews (1): Last reviewed commit: "Replace sphinx-multiversion with per-bra..." | Re-trigger Greptile |
.github/workflows/docs.yaml
Outdated
| # Fetch existing versions.json from gh-pages (may not exist yet) | ||
| PAGES_URL="https://isaac-sim.github.io/IsaacLab/versions.json" | ||
| if curl -sf "$PAGES_URL" -o existing.json; then | ||
| echo "Fetched existing versions.json" | ||
| else | ||
| echo "[]" > existing.json | ||
| echo "No existing versions.json, starting fresh" | ||
| fi | ||
|
|
||
| # Determine display name | ||
| REF="${{ github.ref }}" | ||
| if [[ "$REF" == refs/tags/* ]]; then | ||
| NAME="${SLUG}" | ||
| else | ||
| NAME="${SLUG}" | ||
| fi | ||
|
|
||
| # Upsert this version's entry | ||
| python3 -c " | ||
| import json, sys | ||
|
|
||
| slug = '${SLUG}' | ||
| name = '${NAME}' | ||
|
|
||
| with open('existing.json') as f: | ||
| versions = json.load(f) | ||
|
|
||
| # Remove existing entry for this slug | ||
| versions = [v for v in versions if v['version'] != slug] | ||
|
|
||
| # Add updated entry | ||
| entry = { | ||
| 'name': name, | ||
| 'version': slug, | ||
| 'url': '/IsaacLab/' + slug + '/', | ||
| } | ||
| versions.append(entry) | ||
|
|
||
| # Sort: branches first (main, develop), then releases, then tags by version | ||
| def sort_key(v): | ||
| s = v['version'] | ||
| if s == 'main': | ||
| return (0, s) | ||
| elif s == 'develop': | ||
| return (1, s) | ||
| elif s.startswith('release-'): | ||
| return (2, s) | ||
| else: | ||
| return (3, s) | ||
|
|
||
| versions.sort(key=sort_key) | ||
|
|
||
| with open('site/versions.json', 'w') as f: | ||
| json.dump(versions, f, indent=2) | ||
| f.write('\n') | ||
| " |
There was a problem hiding this comment.
Race condition in
versions.json upsert
Two concurrent deploys (e.g. a release tag and main merging in quick succession) each fetch the live versions.json, compute their own updated copy independently, and force-push. Whichever push lands second silently overwrites the first, dropping the other branch's entry from the version switcher for all users until that branch deploys again.
The fetch → mutate → push pattern is inherently racy without a lock or atomic update mechanism. Consider using a dedicated gh-pages-update workflow triggered serially (e.g. via workflow_run with concurrency: group: versions-json, cancel-in-progress: false) or generating versions.json from the directory listing of the deployed site inside the push step rather than before it.
| - name: Build docs | ||
| working-directory: ./docs | ||
| # No -W flag: old versions may have warnings that are acceptable. | ||
| run: | | ||
| rm -rf "_build" | ||
| sphinx-build -b html . "_build" |
There was a problem hiding this comment.
DOCS_VERSION_SLUG not passed to migration builds
For branches that already carry the updated conf.py (main, develop, release/2.1.0, release/2.2.0, release/2.3.0), _version_slug will fall back to the semver from the VERSION file (e.g. "3.0.0") instead of the directory slug (e.g. "main"). The pydata-sphinx-theme built-in switcher uses version_match to highlight the current version; a mismatch means no version is highlighted and users lose orientation. The regular docs.yaml sets this variable correctly:
| - name: Build docs | |
| working-directory: ./docs | |
| # No -W flag: old versions may have warnings that are acceptable. | |
| run: | | |
| rm -rf "_build" | |
| sphinx-build -b html . "_build" | |
| - name: Build docs | |
| working-directory: ./docs | |
| env: | |
| DOCS_VERSION_SLUG: ${{ steps.slug.outputs.slug }} | |
| # No -W flag: old versions may have warnings that are acceptable. | |
| run: | | |
| rm -rf "_build" | |
| sphinx-build -b html . "_build" |
.github/workflows/docs.yaml
Outdated
| REF="${{ github.ref }}" | ||
| if [[ "$REF" == refs/tags/* ]]; then | ||
| NAME="${SLUG}" | ||
| else | ||
| NAME="${SLUG}" | ||
| fi |
There was a problem hiding this comment.
Dead conditional — both branches assign the same value
Both arms of the if/else set NAME="${SLUG}", so the condition is never meaningful. The block can be simplified or removed entirely.
| REF="${{ github.ref }}" | |
| if [[ "$REF" == refs/tags/* ]]; then | |
| NAME="${SLUG}" | |
| else | |
| NAME="${SLUG}" | |
| fi | |
| NAME="${SLUG}" |
kellyguo11
left a comment
There was a problem hiding this comment.
Review: Replace sphinx-multiversion with per-branch docs builds
Good direction — sphinx-multiversion has been unmaintained since 2020 and has been a recurring source of build pain. Per-branch builds are simpler, more reliable, and easier to reason about. This mirrors the same approach as #5226 on develop, applied to main's diverged files.
CI Status
Both CI failures are pre-existing / not caused by this PR:
-
Docs build failure — 18
autosummary.import_cyclewarnings (pre-existing on main) treated as errors by-W. Additionally, thepydata-sphinx-themeversion switcher tries to fetchversions.jsonat build time and logs a warning when it 404s (expected — the file won't exist until the first deploy). The previous PR #5223 addedsuppress_warnings = ["autosummary.import_cycle"]toconf.pybut this PR doesn't include that suppression. Since this PR says it supersedes #5223, it should carry forward the warning suppression. -
Link checker failure —
https://graphics.pixar.com/usd/docs/index.htmlreturns 404. Pre-existing, unrelated to this PR.
Issues
Should-fix:
-
Missing
suppress_warningsforautosummary.import_cycle— PR #5223 (which this supersedes) added suppression for these warnings. Without it, the-Wflag inmake current-docswill keep failing on main. Need to addsuppress_warnings = ["autosummary.import_cycle"]toconf.pyor the build will never pass. -
versions.json404 warning breaks-Wbuild — When building locally or in CI before the first deploy, the pydata-sphinx-theme version switcher fetchesversions.jsonand emits a warning on 404. With-W(warnings as errors), this is fatal. Consider either:- Suppressing this specific warning category in conf.py
- Adding a placeholder
versions.jsonto the repo - Or documenting that the first build will need
-Wrelaxed
-
Shell injection risk in
docs.yamldeploy step — TheSLUGvariable is interpolated directly into shell via${SLUG}in the "Assemble deploy directory" and "Update versions.json" steps. While the branch/tag names are constrained by the workflow trigger patterns, the Python script uses'${SLUG}'and'${NAME}'(single-quoted shell substitution inside a Python heredoc), which could break if a tag/branch name contains single quotes. Consider passing these as environment variables that Python reads viaos.environinstead. -
docs-migration.yamlusesactions/download-artifact@v8— This action version doesn't exist yet (current latest is v4). The maindocs.yamlalso uses@v7for upload and@v8for download. These should be pinned to actually-released versions (v4 for both). -
actions/checkout@v6doesn't exist — Current latest is v4. Both workflow files reference@v6. This will cause the workflows to fail.
Suggestions:
-
Duplicate slug determination logic — The "Determine version slug" step appears in both
build-docsanddeploy-docsjobs indocs.yamlwith identical logic. Consider extracting it once and passing via job outputs, or at minimum adding a comment noting the intentional duplication. -
Migration workflow
actions/upload-artifact@v7— Same version concern as above; should be@v4. -
XHR in version-switcher.js —
XMLHttpRequestis legacy. Since this is new code, consider usingfetch()with a fallback, or justfetch()since all modern browsers support it. Not blocking, but the code would be cleaner. -
HEAD request for page existence check — The version switcher does a HEAD request to check if a page exists in the target version before navigating. This is a nice UX touch, but be aware some CDN/hosting setups don't handle HEAD well for static files. A comment explaining the fallback behavior would help future maintainers.
-
devel→developrename in trigger — The workflow trigger changesdeveltodevelop. Make sure all references to the old branch name are cleaned up across the repo (not just this file).
Overall
Solid architectural improvement. The standalone version-switcher.js approach is clever — it allows injecting version switching into old builds that never had it. The migration workflow for one-time historical rebuild is well thought out.
Main blockers are the action version references (v6/v7/v8 don't exist) and the missing warning suppression from #5223. Fix those and this is good to go.
| }, | ||
| ], | ||
| "icon_links_label": "Quick Links", | ||
| "switcher": { |
There was a problem hiding this comment.
This PR supersedes #5223, which added suppress_warnings = ["autosummary.import_cycle"] to fix the 18 import_cycle warnings that break -W builds on main. That suppression isn't carried forward here, so the docs build will continue to fail.
Also, the pydata-sphinx-theme version switcher emits a warning when versions.json returns 404 (which it will until first deploy), adding another -W failure.
Suggestion: add the suppression from #5223 here too:
suppress_warnings = ["autosummary.import_cycle"]And consider suppressing the version-switcher fetch warning as well, or providing a placeholder versions.json.
| if: "${{ github.repository == env.REPO_NAME && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/devel' || startsWith(github.ref, 'refs/heads/release/')) }}" | ||
| if: >- | ||
| github.repository == env.REPO_NAME && | ||
| ( |
There was a problem hiding this comment.
Both actions/checkout@v6 and actions/upload-artifact@v7 / actions/download-artifact@v8 referenced in this workflow don't exist yet. Current latest versions are @v4 for all of these. The workflow will fail to run as-is.
Same issue in docs-migration.yaml.
| env: | ||
| SLUG: ${{ steps.slug.outputs.slug }} | ||
| run: | | ||
| mkdir -p site |
There was a problem hiding this comment.
The SLUG and NAME variables are interpolated into the Python heredoc via '${SLUG}' / '${NAME}'. If a branch/tag name ever contains a single quote, this breaks the Python string literal. Safer approach:
import os
slug = os.environ['SLUG']
name = os.environ['NAME'](with SLUG and NAME passed as env vars to the step, which you already have for SLUG).
- Add deploy concurrency group to serialize gh-pages writes (race fix) - Only set DOCS_VERSION_SLUG for deploy branches, unset for PRs - SHA-pin peaceiris/actions-gh-pages to v4.0.0 - Set DOCS_VERSION_SLUG in migration workflow build step - Use os.environ instead of shell interpolation in Python heredoc - Remove dead NAME conditional block - Suppress config.invalid_switcher_json and autosummary.import_cycle warnings - Exclude superpowers/ directory from Sphinx build
- SHA-pin all GitHub Actions (checkout, setup-python, upload/download-artifact) - Add explicit permissions: (contents: read default, write for deploy) - Pass github.ref via env: block to avoid shell injection - Add empty-slug guard in deploy job (fail-fast instead of corrupt root) - Fetch versions.json from gh-pages branch via gh api (avoid CDN cache) - Add Python input validation: JSON decode errors, schema checks, empty slug - Use heredoc for inline Python (avoids shell interpolation issues) - Add sed injection verification count - Add min-count assertion in migration workflow - JS: replace empty catch with console.warn, add onerror handler, validate array
- Use 'or' fallback for DOCS_VERSION_SLUG (empty string != unset) - Add explicit GH_TOKEN to versions.json fetch step - Add sed injection verification to migration workflow - Emit ::warning:: when injection count mismatches in docs.yaml - Log non-200 HTTP responses in version-switcher.js
Test Results Summary3 687 tests 3 158 ✅ 2h 28m 40s ⏱️ For more details on these errors, see this check. Results for commit e864af5. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR replaces the unmaintained sphinx-multiversion with a per-branch/per-tag docs build strategy. Each push to main, develop, release/**, or a version tag builds only its own docs and deploys to a versioned subdirectory on gh-pages. A standalone version-switcher.js is injected into all versions (including historical ones via a one-time migration workflow) to provide cross-version navigation. The approach is sound and the implementation is well-structured.
Design Assessment
Design is sound. Replacing sphinx-multiversion with per-branch builds is the right call — sphinx-multiversion has been unmaintained since 2020, requires fetching all branches to build any version, and causes frequent CI pain. The new approach is simpler, faster (each version builds independently), and the injected JS switcher elegantly handles backward compatibility with old versions that never had a version dropdown. The concurrency serialization on the deploy job prevents race conditions on versions.json. Good engineering.
Findings
🟡 Warning: Hardcoded /IsaacLab/ path prefix throughout — Multiple files
The repo name IsaacLab is hardcoded in several places:
docs-migration.yaml:128—sedinjection:<script src="https://github.com/IsaacLab/version-switcher.js">docs-migration.yaml:160—versions.jsonURL entries:'/IsaacLab/' + name + '/'docs.yaml:154— samesedinjectiondocs.yaml:196— same URL patternversion-switcher.js:30— comment says "Works for both /IsaacLab/main/... and custom domain setups" but the path-parsing logic assumes a/project/version/...URL structure that would break on a custom domain (where there's no project prefix)conf.py:305—json_urluses the full GitHub Pages URL
If this repo is ever renamed or the docs move to a custom domain, all of these break silently. Consider extracting the base path as a workflow-level variable (e.g., DOCS_BASE_PATH: /IsaacLab/) and passing it through, at least for the workflow files. The JS comment on line 30 of version-switcher.js should also be corrected — it does not currently work for custom domain setups.
🟡 Warning: Unpinned upper bound on sphinx-book-theme — docs/requirements.txt:3
The change from sphinx-book-theme==1.0.1 to sphinx-book-theme>=1.1 with no upper bound means a future major version bump could break docs builds without any change to this repo. sphinx>=7.0,<9 has a sensible upper bound; sphinx-book-theme should follow the same pattern:
sphinx-book-theme>=1.1,<2
🔵 Suggestion: version-switcher.js uses XMLHttpRequest instead of fetch — docs/_static/version-switcher.js
Since this is brand new code, consider using fetch() instead of XMLHttpRequest. All browsers that support ES5 strict mode (which this code uses) also support fetch(). The HEAD request for page-existence checking (lines 66-75) and the GET for versions.json (lines 101-116) would both be cleaner with fetch. Not blocking, but it would reduce the code by ~30%.
🔵 Suggestion: Duplicate slug-determination logic — .github/workflows/docs.yaml
The "Determine version slug" step appears twice in docs.yaml — once in build-docs (lines 59-74) and again in deploy-docs (lines 112-126) with identical logic. Since the build job already computes the slug, consider passing it as a job output to avoid the duplication and potential for the two copies to drift:
outputs:
slug: ${{ steps.slug.outputs.slug }}🔵 Suggestion: Migration workflow matrix could be auto-discovered — .github/workflows/docs-migration.yaml
The migration workflow hardcodes 22 refs in the matrix. If a new release is cut before migration runs, it'll be missed. A minor concern since this is a one-time workflow, but worth a comment in the file noting that the matrix should be updated if new releases are created before the migration is executed.
Test Coverage
This is a documentation/infrastructure change with no runtime code changes.
- Docs CI (Build Docs): ✅ Passes — the
make current-docstarget builds successfully with the new configuration. - Warning suppressions: Properly added for
autosummary.import_cycleandconfig.invalid_switcher_json, preventing-Wfailures. - Manual test plan: PR includes two test items (CI builds, post-merge deploy verification) which are appropriate for this type of change.
- No new runtime tests needed — this affects only the docs build pipeline, not the simulation framework.
CI Status
- ✅ Build Docs — passes
- ✅ pre-commit, license-check, labeler, test-general, test-isaaclab-tasks — pass
- ❌ Check for Broken Links — pre-existing failure (Pixar USD docs URL returns 404), unrelated to this PR
- ❌ Tests Summary / combine-results / Build and Test Results — 1 test failure out of 2,952, pre-existing on main, unrelated to this PR
- ⏭️ Deploy Docs — skipped (expected for PRs, only runs on push to deploy branches)
Verdict
COMMENT
This is a well-designed infrastructure improvement. The only actionable item is pinning sphinx-book-theme's upper bound to prevent future surprise breakage. The hardcoded paths are a maintenance concern but not blocking given the current deployment target is fixed. The remaining suggestions are quality improvements that can be addressed at the author's discretion.
| # for building the docs | ||
| sphinx-book-theme==1.0.1 | ||
| sphinx>=7.0,<9 | ||
| sphinx-book-theme>=1.1 |
There was a problem hiding this comment.
🟡 Warning: Unpinned upper bound — a future sphinx-book-theme 2.x release could break docs builds without any change to this repo. Consider:
| sphinx-book-theme>=1.1 | |
| sphinx-book-theme>=1.1,<2 |
| // The relative page path within the version (e.g. "api/index.html"). | ||
| var pagePath = pathParts.slice(2).join("/") || "index.html"; | ||
|
|
||
| var JSON_URL = basePath + "versions.json"; |
There was a problem hiding this comment.
🔵 Suggestion: This comment is misleading — the path-parsing logic below assumes a /<project>/<version>/... URL structure (extracting pathParts[0] as the project and pathParts[1] as the version slug). On a custom domain without a project prefix (e.g., docs.example.com/main/...), pathParts[0] would be the version, not the project, and basePath would be wrong.
Consider updating the comment to:
| var JSON_URL = basePath + "versions.json"; | |
| // Works for GitHub Pages at /<project>/<version>/... URL structure. |
Split the unified docs workflow into release-specific handling. This workflow covers main/release branch pushes, tag deploys, and PRs against main/release branches. The develop branch is handled separately by docs-dev.yml (PR #5226). Key changes from the old unified workflow: - Remove develop branch triggers (moved to docs-dev.yml) - Add branch filter on pull_request to scope to main/release - Add /stable/ alias creation for tag deploys - Conditional root index.html only on tag deploys
Summary
Same changes as the develop PR (#5226), applied to main's diverged files.
sphinx-multiversionfrom docs builddocs-migration.yamlworkflowsphinx>=7.0,<9andsphinx-book-theme>=1.1(supersedes Pin Sphinx <9 on main for sphinx-multiversion compatibility #5223)See spec:
docs/superpowers/specs/2026-04-10-multi-version-docs-design.mdTest plan