diff --git a/.github/instructions/cicd.instructions.md b/.github/instructions/cicd.instructions.md
index bcfb9afaf..e4628cdef 100644
--- a/.github/instructions/cicd.instructions.md
+++ b/.github/instructions/cicd.instructions.md
@@ -6,27 +6,32 @@ description: "CI/CD Pipeline configuration for PyInstaller binary packaging and
 # CI/CD Pipeline Instructions
 
 ## Workflow Architecture (Fork-safe)
-Three workflows split by trigger and secret requirements:
+Four workflows split by trigger and secret requirements:
 
 1. **`ci.yml`** — `pull_request` trigger (all PRs, including forks)
-   - **Linux + Windows** (ubuntu-24.04, windows-latest). Unit tests in parallel on both platforms + single Linux binary build. No secrets needed.
-   - Windows job catches path separator, encoding, and platform-specific issues before merge.
+   - **Linux-only** (ubuntu-24.04). Combined `build-and-test` job: unit tests + binary build in a single runner. No secrets needed.
    - Uploads Linux x86_64 binary artifact for downstream integration testing.
 2. **`ci-integration.yml`** — `workflow_run` trigger (after CI completes, environment-gated)
    - **Linux-only**. Smoke tests, integration tests, release validation. Requires `integration-tests` environment approval.
    - Security: uses `workflow_run` (not `pull_request_target`) — PR code is NEVER checked out.
    - Downloads Linux binary artifact from ci.yml, runs test scripts from default branch (main).
    - Reports results back to PR via commit status API.
+   - Detects CI circular dependency (upstream failure → reports `pending` instead of blocking).
+   - Annotates originating PR URL for traceability.
 3. **`build-release.yml`** — `push` to main, tags, schedule, `workflow_dispatch`
-   - **Linux + Windows** run separate `test → build → integration-tests → release-validation` jobs.
-   - **macOS Intel** uses `build-and-validate-macos-intel` (always runs — Intel runners are plentiful with <1 min queue). Builds the binary on every push for early regression feedback; integration + release-validation phases conditional on tag/schedule/dispatch.
-   - **macOS ARM** uses `build-and-validate-macos-arm` (tag/schedule/dispatch only — ARM runners are extremely scarce with 2-4h+ queue waits). Only requested when the binary is actually needed for a release.
+   - **Linux + Windows** run combined `build-and-test` (unit tests + binary build in one job).
+   - **macOS Intel** uses `build-and-validate-macos-intel` (root node, runs own unit tests — no dependency on `build-and-test`). Builds the binary on every push for early regression feedback; integration + release-validation phases conditional on tag/schedule/dispatch.
+   - **macOS ARM** uses `build-and-validate-macos-arm` (root node, tag/schedule/dispatch only — ARM runners are extremely scarce with 2-4h+ queue waits). Only requested when the binary is actually needed for a release.
    - Secrets always available. Full 5-platform binary output (linux x86_64/arm64, darwin x86_64/arm64, windows x86_64).
+4. **`ci-runtime.yml`** — nightly schedule, manual dispatch, path-filtered push
+   - **Linux x86_64 only**. Live inference smoke tests (`apm run`) isolated from release pipeline.
+   - Uses `GH_MODELS_PAT` for GitHub Models API access.
+   - Failures do not block releases — annotated as warnings.
 
 ## Platform Testing Strategy
-- **PR time**: Linux + Windows in parallel. Catches logic bugs, dependency issues, path separators, encoding, and Windows-specific problems before merge.
+- **PR time**: Linux-only combined build-and-test in `ci.yml`. Catches logic bugs and dependency issues before merge. Windows + macOS are tested post-merge (platform-specific issues are rare and the full matrix runs on every push to main).
 - **Post-merge**: Full 5-platform matrix (linux x86_64/arm64, darwin x86_64/arm64, windows x86_64) catches remaining platform-specific issues on main.
-- **Rationale**: Linux + Windows PR coverage catches the two fundamentally different platform families (Unix vs Windows). macOS-specific issues are rare and caught post-merge.
+- **Rationale**: ci.yml has always been Linux-only — Windows and macOS are covered by `build-release.yml` on every push to main. This keeps PR feedback fast while still catching platform issues before release.
 
 ## PyInstaller Binary Packaging
 - **CRITICAL**: Uses `--onedir` mode (NOT `--onefile`) for faster CLI startup performance
@@ -44,11 +49,17 @@ Three workflows split by trigger and secret requirements:
 2. **Release Validation**: ISOLATION testing - no source checkout, validates exact shipped binary experience
 3. **Path Resolution**: Use symlinks and PATH manipulation for isolated binary testing
 
+## Inference Testing (Decoupled)
+- Live inference tests (`apm run`) are **isolated** in `ci-runtime.yml` — they do NOT gate releases
+- `APM_RUN_INFERENCE_TESTS=1` env var enables inference in test scripts; absent = skipped
+- `GH_MODELS_PAT` is only used in `ci-runtime.yml` and smoke-test jobs — NOT in integration-tests or release-validation
+- Rationale: 8 inference executions × 2% failure rate = 14.9% false-negative per release; APM core UVPs require zero live inference
+
 ## Release Flow Dependencies
-- **PR workflow**: ci.yml (test → build, Linux-only) then ci-integration.yml via workflow_run (approve → smoke-test → integration-tests → release-validation → report-status, all Linux-only)
-- **Push/Release workflow (Linux + Windows)**: test → build → integration-tests → release-validation → create-release → publish-pypi → update-homebrew
-- **Push/Release workflow (macOS Intel)**: test → build-and-validate-macos-intel (build always + conditional integration/release-validation) → create-release
-- **Push/Release workflow (macOS ARM)**: test → build-and-validate-macos-arm (tag/schedule/dispatch only; all phases run) → create-release
+- **PR workflow**: ci.yml (build-and-test, Linux-only) then ci-integration.yml via workflow_run (approve → smoke-test → integration-tests → release-validation → report-status, all Linux-only)
+- **Push/Release workflow (Linux + Windows)**: build-and-test → integration-tests → release-validation → create-release → publish-pypi → update-homebrew (gh-aw-compat runs in parallel, informational)
+- **Push/Release workflow (macOS Intel)**: build-and-validate-macos-intel (root node: unit tests + build always + conditional integration/release-validation) → create-release
+- **Push/Release workflow (macOS ARM)**: build-and-validate-macos-arm (root node, tag/schedule/dispatch only; all phases run) → create-release
 - **Tag Triggers**: Only `v*.*.*` tags trigger full release pipeline
 - **Artifact Retention**: 30 days for debugging failed releases
 - **Cross-workflow artifacts**: ci-integration.yml downloads artifacts from ci.yml using `run-id` and `github-token`
@@ -63,9 +74,13 @@ Three workflows split by trigger and secret requirements:
 ## Key Environment Variables
 - `PYTHON_VERSION: '3.12'` - Standardized across all jobs
 - `GITHUB_TOKEN` - Fallback token for compatibility (GitHub Actions built-in)
+- `APM_RUN_INFERENCE_TESTS` - When `1`, enables live inference tests in validation scripts
 
 ## Performance Considerations
-- **PR CI is Linux-only**: Eliminates macOS runner queue delays. Full platform coverage runs post-merge.
+- **Combined build-and-test**: Eliminates ~1.5m runner re-provisioning overhead by running unit tests and binary build in the same job.
+- **macOS as root nodes**: macOS consolidated jobs run their own unit tests and start immediately — no dependency on Linux/Windows test completion.
+- **Native uv caching**: `setup-uv` action with `enable-cache: true` replaces manual `actions/cache@v3` blocks.
+- **Targeted setup-node usage**: Node.js is only installed in `ci-runtime.yml`, macOS consolidated jobs, and integration-tests/release-validation phases (for `apm runtime setup copilot` → npm install).
 - **macOS runner consolidation**: Each macOS arch has a single consolidated job (build + integration + release-validation). Intel (`build-and-validate-macos-intel`) runs on every push since Intel runners are plentiful. ARM (`build-and-validate-macos-arm`) is gated to tag/schedule/dispatch only since ARM runners are extremely scarce (2-4h+ queue waits). This avoids serial re-queuing of runners across multiple jobs.
 - **Unit tests skip macOS**: Python unit tests are platform-agnostic; Linux + Windows coverage is sufficient. macOS-specific validation (binary build, integration tests, release validation) still runs via the consolidated job.
 - UPX compression when available (reduces binary size ~50%)
diff --git a/.github/workflows/build-release.yml b/.github/workflows/build-release.yml
index 8fbbff7b3..c5357540a 100644
--- a/.github/workflows/build-release.yml
+++ b/.github/workflows/build-release.yml
@@ -27,13 +27,11 @@ permissions:
   contents: read
 
 jobs:
-  # Unit tests on Linux + Windows only. macOS runners are scarce and Python unit
-  # tests are platform-agnostic — macOS-specific validation (build + integration +
-  # release-validation) is consolidated into a single job per arch below.
-  test:
-    runs-on: ${{ matrix.os }}
-    permissions:
-      contents: read
+  # Unit tests + binary build combined (Linux + Windows).
+  # Merging test+build eliminates ~1.5m runner re-provisioning overhead per platform.
+  # macOS runners are scarce and have their own consolidated jobs below.
+  build-and-test:
+    name: Build & Test
     strategy:
       fail-fast: false
       matrix:
@@ -41,98 +39,48 @@ jobs:
           - os: ubuntu-24.04
             arch: x86_64
             platform: linux
-          - os: ubuntu-24.04-arm
-            arch: arm64
-            platform: linux
-          - os: windows-latest
-            arch: x86_64
-            platform: windows
-    
-    steps:
-    - uses: actions/checkout@v4
-    
-    - name: Set up Node.js
-      uses: actions/setup-node@v4
-      with:
-        node-version: '24'
-    
-    - name: Set up Python
-      uses: actions/setup-python@v5
-      with:
-        python-version: ${{ env.PYTHON_VERSION }}
-    
-    - name: Install uv
-      uses: astral-sh/setup-uv@v6
-    
-    - name: Cache uv environments
-      uses: actions/cache@v3
-      with:
-        path: |
-          ~/.cache/uv
-          ~/.local/share/uv
-          ~\AppData\Local\uv\cache
-        key: ${{ runner.os }}-uv-${{ hashFiles('**/pyproject.toml') }}
-        restore-keys: |
-          ${{ runner.os }}-uv-
-    
-    - name: Install dependencies
-      run: uv sync --extra dev
-    
-    - name: Test with pytest
-      run: uv run pytest tests/unit tests/test_console.py
-
-    - name: Run smoke tests
-      env:
-        GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
-        GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
-      run: uv run pytest tests/integration/test_runtime_smoke.py -v
-
-  # Build binaries (Linux + Windows). macOS builds are in build-and-validate-macos-intel / -arm.
-  build:
-    name: Build APM Binary
-    needs: [test]
-    strategy:
-      matrix:
-        include:
-          - os: ubuntu-24.04
-            platform: linux
-            arch: x86_64
             binary_name: apm-linux-x86_64
           - os: ubuntu-24.04-arm
-            platform: linux
             arch: arm64
+            platform: linux
             binary_name: apm-linux-arm64
           - os: windows-latest
-            platform: windows
             arch: x86_64
+            platform: windows
             binary_name: apm-windows-x86_64
-    
+
     runs-on: ${{ matrix.os }}
     permissions:
-      contents: read  # Checkout code; upload-artifact uses separate Actions API
-    
+      contents: read
+
     steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
-      
+      - uses: actions/checkout@v4
+
       - name: Set up Python
         uses: actions/setup-python@v5
         with:
           python-version: ${{ env.PYTHON_VERSION }}
-      
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v6
+        with:
+          enable-cache: true
+
+      - name: Install dependencies
+        run: uv sync --extra dev --extra build
+
+      - name: Run tests
+        env:
+          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
+          GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
+        run: uv run pytest tests/unit tests/test_console.py tests/integration/test_runtime_smoke.py -n auto --dist worksteal
+
       - name: Install UPX (Linux)
         if: matrix.platform == 'linux'
         run: |
           sudo apt-get update
           sudo apt-get install -y upx-ucl
-      
-      - name: Install uv
-        uses: astral-sh/setup-uv@v6
-      
-      - name: Install Python dependencies
-        run: |
-          uv sync --extra dev --extra build
-      
+
       - name: Build binary (Unix)
         if: matrix.platform != 'windows'
         run: |
@@ -144,7 +92,7 @@ jobs:
         shell: pwsh
         run: |
           uv run pwsh scripts/windows/build-binary.ps1
-      
+
       - name: Upload binary as workflow artifact
         uses: actions/upload-artifact@v4
         with:
@@ -166,13 +114,12 @@ jobs:
   # Intel runners are plentiful (<1 min queue) so this runs on every push for
   # early macOS build-regression feedback. Integration and release-validation
   # phases are conditional on tag/schedule/dispatch.
+  # Root node — runs its own unit tests instead of waiting for build-and-test.
   build-and-validate-macos-intel:
     name: Build & Validate (macOS x86_64)
-    needs: [test]
     runs-on: macos-15-intel
     permissions:
       contents: read
-      models: read
 
     steps:
       - name: Checkout code
@@ -198,10 +145,15 @@ jobs:
 
       - name: Install uv
         uses: astral-sh/setup-uv@v6
+        with:
+          enable-cache: true
 
       - name: Install dependencies
         run: uv sync --extra dev --extra build
 
+      - name: Run unit tests
+        run: uv run pytest tests/unit tests/test_console.py -n auto --dist worksteal
+
       # ── PHASE 1: BUILD BINARY ──
       - name: Build binary
         run: |
@@ -227,7 +179,6 @@ jobs:
         if: github.ref_type == 'tag' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
         env:
           APM_E2E_TESTS: "1"
-          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
           GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
           ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }}
         run: |
@@ -255,7 +206,6 @@ jobs:
         if: github.ref_type == 'tag' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
         env:
           APM_E2E_TESTS: "1"
-          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
           GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
           ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }}
         run: |
@@ -269,14 +219,13 @@ jobs:
   # common. This job is gated to tag/schedule/dispatch only so push-to-main never
   # blocks on ARM availability. All phases run unconditionally since the job itself
   # is already release-gated.
+  # Root node — runs its own unit tests instead of waiting for build-and-test.
   build-and-validate-macos-arm:
     name: Build & Validate (macOS arm64)
-    needs: [test]
     if: github.ref_type == 'tag' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
     runs-on: macos-latest
     permissions:
       contents: read
-      models: read
 
     steps:
       - name: Checkout code
@@ -297,10 +246,15 @@ jobs:
 
       - name: Install uv
         uses: astral-sh/setup-uv@v6
+        with:
+          enable-cache: true
 
       - name: Install dependencies
         run: uv sync --extra dev --extra build
 
+      - name: Run unit tests
+        run: uv run pytest tests/unit tests/test_console.py -n auto --dist worksteal
+
       # ── PHASE 1: BUILD BINARY ──
       - name: Build binary
         run: |
@@ -325,7 +279,6 @@ jobs:
       - name: Run integration tests
         env:
           APM_E2E_TESTS: "1"
-          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
           GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
           ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }}
         run: |
@@ -351,7 +304,6 @@ jobs:
       - name: Run release validation tests
         env:
           APM_E2E_TESTS: "1"
-          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
           GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
           ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }}
         run: |
@@ -367,7 +319,7 @@ jobs:
   integration-tests:
     name: Integration Tests
     if: github.ref_type == 'tag' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
-    needs: [test, build]
+    needs: [build-and-test]
     strategy:
       matrix:
         include:
@@ -383,43 +335,43 @@ jobs:
             arch: x86_64
             platform: windows
             binary_name: apm-windows-x86_64
-    
+
     runs-on: ${{ matrix.os }}
     permissions:
       contents: read
-      models: read  # Required for GitHub Models API access
-    
+
     steps:
       - name: Checkout code
         uses: actions/checkout@v4
-      
+
       - name: Download APM binary from build artifacts
         uses: actions/download-artifact@v4
         with:
           name: ${{ matrix.binary_name }}
           path: ./
-      
+
       - name: Set up Node.js
         uses: actions/setup-node@v4
         with:
           node-version: '24'
-      
+
       - name: Set up Python
         uses: actions/setup-python@v5
         with:
           python-version: ${{ env.PYTHON_VERSION }}
-      
+
       - name: Install uv
         uses: astral-sh/setup-uv@v6
-      
+        with:
+          enable-cache: true
+
       - name: Install test dependencies
         run: uv sync --extra dev
-      
+
       - name: Run integration tests (Unix)
         if: matrix.platform != 'windows'
         env:
           APM_E2E_TESTS: "1"
-          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}    # Models access
           GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}    # Primary: APM module access
           ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }}      # Azure DevOps module access
         run: |
@@ -432,7 +384,6 @@ jobs:
         shell: pwsh
         env:
           APM_E2E_TESTS: "1"
-          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
           GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
           ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }}
         run: |
@@ -444,7 +395,7 @@ jobs:
   release-validation:
     name: Release Validation
     if: github.ref_type == 'tag' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
-    needs: [test, build, integration-tests]
+    needs: [build-and-test, integration-tests]
     strategy:
       matrix:
         include:
@@ -460,12 +411,11 @@ jobs:
             arch: x86_64
             platform: windows
             binary_name: apm-windows-x86_64
-    
+
     runs-on: ${{ matrix.os }}
     permissions:
       contents: read
-      models: read  # Required for GitHub Models API access
-    
+
     steps:
       - name: Set up Node.js
         uses: actions/setup-node@v4
@@ -476,26 +426,26 @@ jobs:
         uses: actions/setup-python@v5
         with:
           python-version: ${{ env.PYTHON_VERSION }}
-          
+
       - name: Download APM binary from build artifacts
         uses: actions/download-artifact@v4
         with:
           name: ${{ matrix.binary_name }}
           path: ${{ matrix.platform == 'windows' && 'D:\apm-isolated-test' || '/tmp/apm-isolated-test/' }}
-      
+
       - name: Make binary executable and verify isolation (Unix)
         if: matrix.platform != 'windows'
         run: |
           cd /tmp/apm-isolated-test
-          
+
           # Debug: List the downloaded structure
           echo "Downloaded structure:"
           find . -name "apm" -type f
           ls -la ./dist/
-          
+
           # Make the binary executable
           chmod +x ./dist/${{ matrix.binary_name }}/apm
-      
+
       - name: Create APM symlink for testing (Unix)
         if: matrix.platform != 'windows'
         run: |
@@ -508,21 +458,20 @@ jobs:
         shell: pwsh
         run: |
           cd D:\apm-isolated-test
-          
+
           # Debug: List the downloaded structure
           Write-Host "Downloaded structure:"
           Get-ChildItem -Recurse -Filter "apm.exe"
           Get-ChildItem .\dist\
-          
+
           # Add binary directory to PATH
           $binDir = "D:\apm-isolated-test\dist\${{ matrix.binary_name }}"
           echo $binDir | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
-      
+
       - name: Run release validation tests (Unix)
         if: matrix.platform != 'windows'
         env:
           APM_E2E_TESTS: "1"
-          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
           GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
           ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }}
         run: |
@@ -536,7 +485,6 @@ jobs:
         shell: pwsh
         env:
           APM_E2E_TESTS: "1"
-          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
           GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
           ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }}
         run: |
@@ -547,7 +495,7 @@ jobs:
 
   create-release:
     name: Create GitHub Release
-    needs: [test, build, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation]
+    needs: [build-and-test, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation]
     if: github.ref_type == 'tag'  # All tags create GitHub releases
     runs-on: ubuntu-latest
     permissions:
@@ -555,27 +503,27 @@ jobs:
     outputs:
       is_prerelease: ${{ steps.release_type.outputs.is_prerelease }}
       is_private_repo: ${{ github.event.repository.private }}
-    
+
     steps:
       - name: Download all build artifacts
         uses: actions/download-artifact@v4
         with:
           path: ./dist
-      
+
       - name: Prepare release binaries
         run: |
           # Debug: Show the actual downloaded structure
           echo "Directory listing:"
           ls -la ./dist
           echo ""
-          
+
           # Create tar.gz archives from directory structure for release and Homebrew
           cd dist
           for binary in apm-linux-x86_64 apm-linux-arm64 apm-darwin-x86_64 apm-darwin-arm64; do
             # With artifacts containing both scripts and dist/, the binary is in artifact/dist/binary/
             artifact_dir="${binary}"
             binary_dir="${artifact_dir}/dist/${binary}"
-            
+
             if [ -d "$binary_dir" ] && [ -f "$binary_dir/apm" ]; then
               echo "Processing $binary_dir directory..."
               # Ensure the binary is executable before archiving
@@ -625,12 +573,12 @@ jobs:
             fi
             exit 1
           fi
-      
+
       - name: Determine release type
         id: release_type
         run: |
           TAG_NAME="${{ github.ref_name }}"
-          
+
           # Check if tag matches stable semver pattern (e.g., v1.2.3, v0.2.0)
           # Stable: starts with v, followed by digits.digits.digits, then end of string
           if [[ "$TAG_NAME" =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
@@ -641,10 +589,10 @@ jobs:
             echo "is_prerelease=true" >> $GITHUB_OUTPUT
             echo "Detected PEP 440 prerelease: $TAG_NAME"
           else
-            echo "is_prerelease=true" >> $GITHUB_OUTPUT  
+            echo "is_prerelease=true" >> $GITHUB_OUTPUT
             echo "Detected other prerelease: $TAG_NAME"
           fi
-      
+
       - name: Create GitHub Release
         id: release
         uses: softprops/action-gh-release@v2
@@ -663,14 +611,17 @@ jobs:
             ./dist/apm-windows-x86_64.zip
             ./dist/apm-windows-x86_64.zip.sha256
 
-  # GH-AW Compatibility Gate — validates the released binary works in the
+  # GH-AW Compatibility — validates the released binary works in the
   # exact flow GitHub Agentic Workflows uses (isolated install + pack, no token).
-  # Gates publish-pypi and update-homebrew so broken versions don't reach stable distribution.
+  # Informational: runs as continue-on-error because it depends on external services
+  # (GitHub API, npm, microsoft/apm-action, microsoft/apm-sample-package) that can
+  # fail independently of APM code quality (~4% false-negative rate per execution).
   gh-aw-compat:
     name: GH-AW Compatibility
     needs: [create-release]
     if: github.ref_type == 'tag'
     runs-on: ubuntu-24.04
+    continue-on-error: true
     steps:
       - name: Install and pack with apm-action
         uses: microsoft/apm-action@v1
@@ -690,11 +641,16 @@ jobs:
           test -f "${{ steps.pack.outputs.bundle-path }}"
           echo "✅ GH-AW compatibility test passed"
 
-  # Publish to PyPI (only stable releases from public repo)  
+      - name: Warn on failure
+        if: failure()
+        run: |
+          echo "::warning::GH-AW compatibility test failed. This is informational — external dependency may be unavailable. Investigate before next release."
+
+  # Publish to PyPI (only stable releases from public repo)
   publish-pypi:
     name: Publish to PyPI
     runs-on: ubuntu-latest
-    needs: [test, build, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation, create-release, gh-aw-compat]
+    needs: [build-and-test, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation, create-release]
     if: github.ref_type == 'tag' && needs.create-release.outputs.is_private_repo != 'true' && needs.create-release.outputs.is_prerelease != 'true'
     environment:
       name: pypi
@@ -702,27 +658,27 @@ jobs:
     permissions:
       contents: read    # Required for actions/checkout
       id-token: write   # IMPORTANT: this permission is mandatory for trusted publishing
-    
+
     steps:
       - name: Checkout code
         uses: actions/checkout@v4
-      
+
       - name: Set up Python
         uses: actions/setup-python@v5
         with:
           python-version: ${{ env.PYTHON_VERSION }}
-      
+
       - name: Install build dependencies
         run: |
           python -m pip install --upgrade pip
           pip install build twine
-      
+
       - name: Build Python package
         run: python -m build
-      
+
       - name: Check package
         run: twine check dist/*
-      
+
       - name: Publish to PyPI
         uses: pypa/gh-action-pypi-publish@release/v1
         with:
@@ -732,20 +688,20 @@ jobs:
   update-homebrew:
     name: Update Homebrew Formula
     runs-on: ubuntu-latest
-    needs: [test, build, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation, create-release, gh-aw-compat, publish-pypi]
+    needs: [build-and-test, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation, create-release, publish-pypi]
     if: github.ref_type == 'tag' && needs.create-release.outputs.is_private_repo != 'true' && needs.create-release.outputs.is_prerelease != 'true'
     permissions:
       contents: read
-    
+
     steps:
       - name: Extract SHA256 checksums from GitHub release
         id: checksums
         run: |
           # Download the SHA256 checksum files from the GitHub release
           RELEASE_TAG="${{ github.ref_name }}"
-          
+
           echo "Downloading checksums for release $RELEASE_TAG"
-          
+
           # Download checksum files directly from the release
           curl -L -o apm-darwin-arm64.tar.gz.sha256 \
             "https://github.com/${{ github.repository }}/releases/download/$RELEASE_TAG/apm-darwin-arm64.tar.gz.sha256"
@@ -755,23 +711,23 @@ jobs:
             "https://github.com/${{ github.repository }}/releases/download/$RELEASE_TAG/apm-linux-x86_64.tar.gz.sha256"
           curl -L -o apm-linux-arm64.tar.gz.sha256 \
             "https://github.com/${{ github.repository }}/releases/download/$RELEASE_TAG/apm-linux-arm64.tar.gz.sha256"
-          
+
           # Extract SHA256 checksums
           DARWIN_ARM64_SHA=$(cat apm-darwin-arm64.tar.gz.sha256 | cut -d' ' -f1)
           DARWIN_X86_64_SHA=$(cat apm-darwin-x86_64.tar.gz.sha256 | cut -d' ' -f1)
           LINUX_X86_64_SHA=$(cat apm-linux-x86_64.tar.gz.sha256 | cut -d' ' -f1)
           LINUX_ARM64_SHA=$(cat apm-linux-arm64.tar.gz.sha256 | cut -d' ' -f1)
-          
+
           echo "darwin-arm64-sha=$DARWIN_ARM64_SHA" >> $GITHUB_OUTPUT
           echo "darwin-x86_64-sha=$DARWIN_X86_64_SHA" >> $GITHUB_OUTPUT
           echo "linux-x86_64-sha=$LINUX_X86_64_SHA" >> $GITHUB_OUTPUT
           echo "linux-arm64-sha=$LINUX_ARM64_SHA" >> $GITHUB_OUTPUT
-          
+
           echo "Darwin ARM64 SHA: $DARWIN_ARM64_SHA"
           echo "Darwin x86_64 SHA: $DARWIN_X86_64_SHA"
           echo "Linux x86_64 SHA: $LINUX_X86_64_SHA"
           echo "Linux ARM64 SHA: $LINUX_ARM64_SHA"
-      
+
       - name: Trigger Homebrew tap repository update
         uses: peter-evans/repository-dispatch@v3
         with:
@@ -797,7 +753,7 @@ jobs:
   update-scoop:
     name: Update Scoop Bucket
     runs-on: ubuntu-latest
-    needs: [test, build, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation, create-release, gh-aw-compat, publish-pypi]
+    needs: [build-and-test, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation, create-release, publish-pypi]
     # TODO: Enable once downstream repository and secrets are configured (see #88)
     if: false && github.ref_type == 'tag' && needs.create-release.outputs.is_private_repo != 'true' && needs.create-release.outputs.is_prerelease != 'true'
     permissions:
diff --git a/.github/workflows/ci-integration.yml b/.github/workflows/ci-integration.yml
index bc500472a..085225d35 100644
--- a/.github/workflows/ci-integration.yml
+++ b/.github/workflows/ci-integration.yml
@@ -58,10 +58,15 @@ jobs:
     # Checkout default branch (main) — never PR code
     - uses: actions/checkout@v4
 
-    - name: Set up Node.js
-      uses: actions/setup-node@v4
-      with:
-        node-version: '24'
+    # O8: PR traceability — annotate with originating PR URL
+    - name: Annotate originating PR
+      run: |
+        PR_NUMBER=$(echo '${{ toJSON(github.event.workflow_run.pull_requests) }}' | jq -r '.[0].number // empty')
+        if [ -n "$PR_NUMBER" ]; then
+          echo "::notice::🔗 Originating PR: https://github.com/${{ github.repository }}/pull/${PR_NUMBER}"
+        else
+          echo "::notice::Triggered by push to ${{ github.event.workflow_run.head_branch }}"
+        fi
 
     - name: Set up Python
       uses: actions/setup-python@v5
@@ -70,16 +75,8 @@ jobs:
 
     - name: Install uv
       uses: astral-sh/setup-uv@v6
-
-    - name: Cache uv environments
-      uses: actions/cache@v3
       with:
-        path: |
-          ~/.cache/uv
-          ~/.local/share/uv
-        key: ${{ runner.os }}-uv-${{ hashFiles('**/pyproject.toml') }}
-        restore-keys: |
-          ${{ runner.os }}-uv-
+        enable-cache: true
 
     - name: Install dependencies
       run: uv sync --extra dev
@@ -99,7 +96,6 @@ jobs:
     permissions:
       contents: read
       actions: read
-      models: read
 
     steps:
       # Checkout default branch (main) for test scripts — never PR code
@@ -126,6 +122,8 @@ jobs:
 
       - name: Install uv
         uses: astral-sh/setup-uv@v6
+        with:
+          enable-cache: true
 
       - name: Install test dependencies
         run: uv sync --extra dev
@@ -133,7 +131,6 @@ jobs:
       - name: Run integration tests
         env:
           APM_E2E_TESTS: "1"
-          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
           GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
           ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }}
         run: |
@@ -150,7 +147,6 @@ jobs:
     permissions:
       contents: read
       actions: read
-      models: read
 
     steps:
       # Checkout default branch for test scripts — never PR code
@@ -203,7 +199,6 @@ jobs:
       - name: Run release validation tests
         env:
           APM_E2E_TESTS: "1"
-          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
           GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
           ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }}
         run: |
@@ -224,16 +219,38 @@ jobs:
         uses: actions/github-script@v7
         with:
           script: |
-            const success = '${{ needs.release-validation.result }}' === 'success'
-              && '${{ needs.integration-tests.result }}' === 'success'
-              && '${{ needs.smoke-test.result }}' === 'success';
+            const ciConclusion = '${{ github.event.workflow_run.conclusion }}';
+            const smokeResult = '${{ needs.smoke-test.result }}';
+            const integResult = '${{ needs.integration-tests.result }}';
+            const relvalResult = '${{ needs.release-validation.result }}';
+
+            // O7: Detect CI circular dependency — when ci.yml fails, all approval
+            // jobs are skipped (their `if: conclusion == 'success'` fails), which
+            // cascades to skip smoke-test → integration-tests → release-validation.
+            // Without this check, we'd report 'failure' and block the CI-fixing PR.
+            const allSkipped = smokeResult === 'skipped' && integResult === 'skipped' && relvalResult === 'skipped';
+
+            let state, description;
+            if (allSkipped && ciConclusion !== 'success') {
+              // Upstream CI failed — all jobs were skipped via approval guards, not due to test failures
+              state = 'pending';
+              description = `CI workflow ${ciConclusion} — integration tests skipped (not blocking)`;
+              core.notice(`CI workflow concluded with '${ciConclusion}' — integration tests were skipped. This is expected when ci.yml itself has an error.`);
+            } else {
+              const success = relvalResult === 'success'
+                && integResult === 'success'
+                && smokeResult === 'success';
+              state = success ? 'success' : 'failure';
+              description = success ? 'All integration tests passed' : 'Integration tests failed';
+            }
+
             await github.rest.repos.createCommitStatus({
               owner: context.repo.owner,
               repo: context.repo.repo,
               sha: context.payload.workflow_run.head_sha,
-              state: success ? 'success' : 'failure',
+              state,
               target_url: `https://github.com/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`,
               context: 'Integration Tests (PR)',
-              description: success ? 'All integration tests passed' : 'Integration tests failed'
+              description
             });
 
diff --git a/.github/workflows/ci-runtime.yml b/.github/workflows/ci-runtime.yml
new file mode 100644
index 000000000..d416d716d
--- /dev/null
+++ b/.github/workflows/ci-runtime.yml
@@ -0,0 +1,90 @@
+name: Runtime Inference Tests
+
+env:
+  PYTHON_VERSION: '3.12'
+
+on:
+  schedule:
+    # Daily at 05:00 UTC — standalone schedule, no dependency on build-release.yml
+    - cron: '0 5 * * *'
+  workflow_dispatch:
+  push:
+    branches: [main]
+    paths:
+      - 'src/apm_cli/commands/run.py'
+      - 'src/apm_cli/runtime/**'
+      - 'scripts/runtime/**'
+      - 'scripts/test-release-validation.sh'
+      - 'tests/integration/test_runtime_smoke.py'
+
+permissions:
+  contents: read
+
+jobs:
+  # Live inference smoke tests — isolated from the release pipeline to avoid
+  # false-negative failures from external API flakiness (14.9% per-release risk).
+  # Runs nightly, on manual dispatch, and when runtime code changes.
+  live-inference-smoke:
+    name: Live Inference Smoke (Linux x86_64)
+    runs-on: ubuntu-24.04
+    permissions:
+      contents: read
+      models: read
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: '24'
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ env.PYTHON_VERSION }}
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v6
+        with:
+          enable-cache: true
+
+      - name: Install dependencies
+        run: uv sync --extra dev --extra build
+
+      - name: Build binary
+        run: |
+          chmod +x scripts/build-binary.sh
+          uv run ./scripts/build-binary.sh
+
+      - name: Setup binary
+        run: |
+          chmod +x ./dist/apm-linux-x86_64/apm
+          ln -s "$(pwd)/dist/apm-linux-x86_64/apm" "$(pwd)/apm"
+          echo "$(pwd)" >> $GITHUB_PATH
+
+      - name: Run smoke tests
+        env:
+          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
+          GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
+        run: uv run pytest tests/integration/test_runtime_smoke.py -v
+
+      - name: Run inference validation
+        env:
+          APM_RUN_INFERENCE_TESTS: "1"
+          APM_E2E_TESTS: "1"
+          GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }}
+          GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }}
+        run: |
+          chmod +x scripts/test-release-validation.sh
+          ./scripts/test-release-validation.sh
+        timeout-minutes: 20
+
+      - name: Annotate result
+        if: always()
+        run: |
+          if [[ "${{ job.status }}" == "success" ]]; then
+            echo "::notice::✅ Runtime inference tests passed"
+          else
+            echo "::warning::⚠️ Runtime inference tests failed — this does not block releases"
+          fi
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 3c3c30289..05b2e2039 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -16,7 +16,9 @@ permissions:
 
 jobs:
   # Linux-only for PR feedback. Full platform matrix (incl. macOS + Windows) runs post-merge in build-release.yml.
-  test:
+  # Combines unit tests + binary build into a single job to eliminate runner re-provisioning overhead.
+  build-and-test:
+    name: Build & Test (Linux)
     runs-on: ubuntu-24.04
     permissions:
       contents: read
@@ -24,11 +26,6 @@ jobs:
     steps:
     - uses: actions/checkout@v4
 
-    - name: Set up Node.js
-      uses: actions/setup-node@v4
-      with:
-        node-version: '24'
-
     - name: Set up Python
       uses: actions/setup-python@v5
       with:
@@ -36,66 +33,37 @@ jobs:
 
     - name: Install uv
       uses: astral-sh/setup-uv@v6
-
-    - name: Cache uv environments
-      uses: actions/cache@v3
       with:
-        path: |
-          ~/.cache/uv
-          ~/.local/share/uv
-        key: ${{ runner.os }}-uv-${{ hashFiles('**/pyproject.toml') }}
-        restore-keys: |
-          ${{ runner.os }}-uv-
+        enable-cache: true
 
     - name: Install dependencies
-      run: uv sync --extra dev
+      run: uv sync --extra dev --extra build
 
-    - name: Test with pytest
-      run: uv run pytest tests/unit tests/test_console.py
+    - name: Run tests
+      run: uv run pytest tests/unit tests/test_console.py -n auto --dist worksteal
 
-  # Linux-only binary build for PR validation. Full platform builds run post-merge.
-  build:
-    name: Build APM Binary (Linux)
-    needs: [test]
-    runs-on: ubuntu-24.04
-    permissions:
-      contents: read
+    - name: Install UPX
+      run: |
+        sudo apt-get update
+        sudo apt-get install -y upx-ucl
 
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
+    - name: Build binary
+      run: |
+        chmod +x scripts/build-binary.sh
+        uv run ./scripts/build-binary.sh
 
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: ${{ env.PYTHON_VERSION }}
-
-      - name: Install UPX
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y upx-ucl
-
-      - name: Install uv
-        uses: astral-sh/setup-uv@v6
-
-      - name: Install Python dependencies
-        run: |
-          uv sync --extra dev --extra build
-
-      - name: Build binary
-        run: |
-          chmod +x scripts/build-binary.sh
-          uv run ./scripts/build-binary.sh
-
-      - name: Upload binary as workflow artifact
-        uses: actions/upload-artifact@v4
-        with:
-          name: apm-linux-x86_64
-          path: |
-            ./dist/apm-linux-x86_64
-            ./dist/apm-linux-x86_64.sha256
-            ./scripts/test-release-validation.sh
-            ./scripts/github-token-helper.sh
-          include-hidden-files: true
-          retention-days: 30
-          if-no-files-found: error
+    - name: Upload binary as workflow artifact
+      uses: actions/upload-artifact@v4
+      with:
+        name: apm-linux-x86_64
+        # Scripts are included to preserve the artifact root at ./ (not ./dist/).
+        # Without a sibling directory, upload-artifact strips the dist/ prefix,
+        # breaking download paths in ci-integration.yml which expects dist/$BINARY_NAME/apm.
+        path: |
+          ./dist/apm-linux-x86_64
+          ./dist/apm-linux-x86_64.sha256
+          ./scripts/test-release-validation.sh
+          ./scripts/github-token-helper.sh
+        include-hidden-files: true
+        retention-days: 30
+        if-no-files-found: error
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7e791617d..72563893e 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
+- `ci-runtime.yml` workflow for nightly + manual runtime inference tests, decoupled from release pipeline (#371)
+- `APM_RUN_INFERENCE_TESTS` env var to gate live inference (`apm run`) in test scripts (#371)
+- PR traceability `::notice` annotation in `ci-integration.yml` smoke-test job (#371)
+
+### Changed
+
+- Merged `test` + `build` into single `build-and-test` job across `ci.yml` and `build-release.yml` — eliminates ~1.5m runner re-provisioning per platform (#371)
+- macOS consolidated jobs are now root nodes (no `needs: [test]`) — run own unit tests for full independence (#371)
+- Removed `setup-node@v4` from unit test and build jobs that don't need Node.js (#371)
+- Enabled native `setup-uv` caching (`enable-cache: true`), removed manual `actions/cache@v3` blocks (#371)
+- Decoupled live inference tests (`apm run`) from release pipeline — reduces 14.9% false-negative rate and `GH_MODELS_PAT` secret exposure (#371)
+- `ci-integration.yml` report-status now detects CI circular dependency (upstream failure) and reports `pending` instead of blocking (#371)
+- `gh-aw-compat` is now informational (`continue-on-error: true`) — non-deterministic external dependencies should not block releases (#371)
 - Copilot encoding instructions: `encoding.instructions.md` (`applyTo: "**"`) bans non-ASCII characters in source and CLI output; updated `copilot-instructions.md` and `cli.instructions.md` to use ASCII bracket notation (`[+]`/`[!]`/`[x]`/`[i]`/`[*]`/`[>]`) instead of emoji STATUS_SYMBOLS (#282)
 
 ## [0.8.4] - 2026-03-22
diff --git a/pyproject.toml b/pyproject.toml
index 1d8d30c60..0e392ffad 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -71,7 +71,7 @@ warn_return_any = true
 warn_unused_configs = true
 
 [tool.pytest.ini_options]
-addopts = "-m 'not benchmark' -n auto"
+addopts = "-m 'not benchmark'"
 markers = [
     "integration: marks tests as integration tests that may require network access",
     "slow: marks tests as slow running tests",
diff --git a/scripts/test-integration.sh b/scripts/test-integration.sh
index a5ce439ec..1ca3a19d8 100755
--- a/scripts/test-integration.sh
+++ b/scripts/test-integration.sh
@@ -117,7 +117,13 @@ detect_environment() {
         log_info "Found existing binary: ./dist/$BINARY_NAME/apm (CI mode)"
     else
         USE_EXISTING_BINARY=false
-        log_info "No existing binary found, will build locally"
+        log_info "No existing binary found at ./dist/$BINARY_NAME/apm, will build locally"
+        # Debug: show what's actually in dist/ to diagnose artifact download issues
+        if [[ -d "./dist" ]]; then
+            log_info "Contents of ./dist/: $(ls -la ./dist/ 2>/dev/null | head -10)"
+        else
+            log_info "No ./dist/ directory exists"
+        fi
     fi
 }
 # Build binary (like CI build job does) - only if needed
@@ -133,7 +139,11 @@ build_binary() {
     log_info "Installing Python dependencies..."
     if command -v uv >/dev/null 2>&1; then
         log_info "Using uv for binary build dependencies..."
-        uv venv
+        if [[ -d ".venv" ]]; then
+            log_info "Virtual environment already exists, reusing it..."
+        else
+            uv venv
+        fi
         source .venv/bin/activate
         uv pip install -e ".[dev]"
         uv pip install pyinstaller
diff --git a/scripts/test-release-validation.sh b/scripts/test-release-validation.sh
index 0358df8bf..d0fbba11d 100755
--- a/scripts/test-release-validation.sh
+++ b/scripts/test-release-validation.sh
@@ -156,7 +156,14 @@ run_with_timeout() {
 
 # HERO SCENARIO 1: 30-Second Zero-Config
 # Test the exact README flow: runtime setup → run virtual package
+# Gated by APM_RUN_INFERENCE_TESTS — live inference tests are decoupled from
+# the release pipeline and run in ci-runtime.yml (nightly/manual/path-filtered).
 test_hero_zero_config() {
+    if [[ "${APM_RUN_INFERENCE_TESTS:-}" != "1" ]]; then
+        log_info "Skipping HERO SCENARIO 1 (inference tests decoupled — set APM_RUN_INFERENCE_TESTS=1 to enable)"
+        return 0
+    fi
+
     log_test "HERO SCENARIO 1: 30-Second Zero-Config (README lines 35-44)"
     
     # Create temporary directory for this test
@@ -288,22 +295,28 @@ test_hero_guardrailing() {
     log_success "Compiled to AGENTS.md (guardrails active)"
     
     # Step 5: apm run design-review (from installed package)
-    echo "Running: $BINARY_PATH run design-review (with 10s timeout)"
-    echo "--- Command Output Start ---"
-    run_with_timeout 10 "$BINARY_PATH run design-review"
-    exit_code=$?
-    echo "--- Command Output End ---"
-    echo "Exit code: $exit_code"
-    
-    if [[ $exit_code -eq 124 ]]; then
-        # Timeout is expected and OK - prompt started executing
-        log_success "design-review prompt executed with compiled guardrails"
-    elif [[ $exit_code -eq 0 ]]; then
-        log_success "design-review completed successfully"
+    # Gated by APM_RUN_INFERENCE_TESTS — live inference is decoupled from
+    # the release pipeline and runs in ci-runtime.yml.
+    if [[ "${APM_RUN_INFERENCE_TESTS:-}" == "1" ]]; then
+        echo "Running: $BINARY_PATH run design-review (with 10s timeout)"
+        echo "--- Command Output Start ---"
+        run_with_timeout 10 "$BINARY_PATH run design-review"
+        exit_code=$?
+        echo "--- Command Output End ---"
+        echo "Exit code: $exit_code"
+
+        if [[ $exit_code -eq 124 ]]; then
+            # Timeout is expected and OK - prompt started executing
+            log_success "design-review prompt executed with compiled guardrails"
+        elif [[ $exit_code -eq 0 ]]; then
+            log_success "design-review completed successfully"
+        else
+            log_error "apm run design-review failed immediately"
+            cd ..
+            return 1
+        fi
     else
-        log_error "apm run design-review failed immediately"
-        cd ..
-        return 1
+        log_info "Skipping apm run design-review (inference tests decoupled — set APM_RUN_INFERENCE_TESTS=1 to enable)"
     fi
     
     cd ..
@@ -432,11 +445,21 @@ echo ""
     echo "Binary found and executable: $BINARY_PATH"
     
     local tests_passed=0
-    local tests_total=6  # Prerequisites, basic commands, gh-aw compat, runtime setup, 2 hero scenarios
+    local tests_total=5  # Prerequisites, basic commands, gh-aw compat, runtime setup, guardrailing (init/install/compile)
     local dependency_tests_run=false
+    local inference_tests_run=false
+    
+    # Hero scenario 1 (zero-config) is entirely inference-based — only counted when enabled
+    if [[ "${APM_RUN_INFERENCE_TESTS:-}" == "1" ]]; then
+        tests_total=$((tests_total + 1))
+        inference_tests_run=true
+        log_info "Inference tests enabled (APM_RUN_INFERENCE_TESTS=1)"
+    else
+        log_info "Inference tests decoupled — skipping apm run tests (set APM_RUN_INFERENCE_TESTS=1 to enable)"
+    fi
     
     # Add dependency tests to total if available and GITHUB token is present
-    if [[ "$DEPENDENCY_TESTS_AVAILABLE" == "true" ]] && [[ -n "${GITHUB_CLI_PAT:-}" || -n "${GITHUB_TOKEN:-}" ]]; then
+    if [[ "$DEPENDENCY_TESTS_AVAILABLE" == "true" ]] && [[ -n "${GITHUB_APM_PAT:-}" || -n "${GITHUB_TOKEN:-}" ]]; then
         tests_total=$((tests_total + 1))
         dependency_tests_run=true
         log_info "Dependency integration tests will be included"
@@ -475,11 +498,15 @@ echo ""
         log_error "Runtime setup test failed"
     fi
     
-    # HERO SCENARIO 1: 30-second zero-config
-    if test_hero_zero_config; then
-        ((tests_passed++))
+    # HERO SCENARIO 1: 30-second zero-config (only when inference tests enabled)
+    if [[ "$inference_tests_run" == "true" ]]; then
+        if test_hero_zero_config; then
+            ((tests_passed++))
+        else
+            log_error "Hero scenario 1 (30-sec zero-config) failed"
+        fi
     else
-        log_error "Hero scenario 1 (30-sec zero-config) failed"
+        test_hero_zero_config  # Runs but auto-skips and returns 0
     fi
     
     # HERO SCENARIO 2: 2-minute guardrailing
@@ -510,24 +537,29 @@ echo ""
         echo ""
         echo "🚀 Binary is ready for production release"
         echo "📦 End-user experience validated successfully" 
-        echo "🎯 Both README hero scenarios work perfectly"
         echo ""
         echo "Validated user journeys:"
-        echo "  1. Prerequisites (GITHUB_TOKEN) ✅"
+        echo "  1. Prerequisites (GITHUB_APM_PAT) ✅"
         echo "  2. Binary accessibility ✅"
         echo "  3. Runtime setup (copilot) ✅"
         echo "  4. GH-AW compatibility (tokenless install + pack) ✅"
         echo ""
-        echo "  HERO SCENARIO 1: 30-Second Zero-Config ✨"
-        echo "    - Run virtual package directly ✅"
-        echo "    - Auto-install on first run ✅"
-        echo "    - Use cached package on second run ✅"
-        echo ""
+        if [[ "$inference_tests_run" == "true" ]]; then
+            echo "  HERO SCENARIO 1: 30-Second Zero-Config ✨"
+            echo "    - Run virtual package directly ✅"
+            echo "    - Auto-install on first run ✅"
+            echo "    - Use cached package on second run ✅"
+            echo ""
+        fi
         echo "  HERO SCENARIO 2: 2-Minute Guardrailing ✨"
         echo "    - Project initialization ✅"
         echo "    - Install APM packages ✅"
         echo "    - Compile to AGENTS.md guardrails ✅"
-        echo "    - Run prompts with guardrails ✅"
+        if [[ "$inference_tests_run" == "true" ]]; then
+            echo "    - Run prompts with guardrails ✅"
+        else
+            echo "    - Run prompts (decoupled to ci-runtime.yml) ⏭️"
+        fi
         if [[ "$dependency_tests_run" == "true" ]]; then
             echo ""
             echo "  BONUS: Real dependency integration ✅"