diff --git a/.github/instructions/cicd.instructions.md b/.github/instructions/cicd.instructions.md index bcfb9afaf..e4628cdef 100644 --- a/.github/instructions/cicd.instructions.md +++ b/.github/instructions/cicd.instructions.md @@ -6,27 +6,32 @@ description: "CI/CD Pipeline configuration for PyInstaller binary packaging and # CI/CD Pipeline Instructions ## Workflow Architecture (Fork-safe) -Three workflows split by trigger and secret requirements: +Four workflows split by trigger and secret requirements: 1. **`ci.yml`** — `pull_request` trigger (all PRs, including forks) - - **Linux + Windows** (ubuntu-24.04, windows-latest). Unit tests in parallel on both platforms + single Linux binary build. No secrets needed. - - Windows job catches path separator, encoding, and platform-specific issues before merge. + - **Linux-only** (ubuntu-24.04). Combined `build-and-test` job: unit tests + binary build in a single runner. No secrets needed. - Uploads Linux x86_64 binary artifact for downstream integration testing. 2. **`ci-integration.yml`** — `workflow_run` trigger (after CI completes, environment-gated) - **Linux-only**. Smoke tests, integration tests, release validation. Requires `integration-tests` environment approval. - Security: uses `workflow_run` (not `pull_request_target`) — PR code is NEVER checked out. - Downloads Linux binary artifact from ci.yml, runs test scripts from default branch (main). - Reports results back to PR via commit status API. + - Detects CI circular dependency (upstream failure → reports `pending` instead of blocking). + - Annotates originating PR URL for traceability. 3. **`build-release.yml`** — `push` to main, tags, schedule, `workflow_dispatch` - - **Linux + Windows** run separate `test → build → integration-tests → release-validation` jobs. - - **macOS Intel** uses `build-and-validate-macos-intel` (always runs — Intel runners are plentiful with <1 min queue). Builds the binary on every push for early regression feedback; integration + release-validation phases conditional on tag/schedule/dispatch. - - **macOS ARM** uses `build-and-validate-macos-arm` (tag/schedule/dispatch only — ARM runners are extremely scarce with 2-4h+ queue waits). Only requested when the binary is actually needed for a release. + - **Linux + Windows** run combined `build-and-test` (unit tests + binary build in one job). + - **macOS Intel** uses `build-and-validate-macos-intel` (root node, runs own unit tests — no dependency on `build-and-test`). Builds the binary on every push for early regression feedback; integration + release-validation phases conditional on tag/schedule/dispatch. + - **macOS ARM** uses `build-and-validate-macos-arm` (root node, tag/schedule/dispatch only — ARM runners are extremely scarce with 2-4h+ queue waits). Only requested when the binary is actually needed for a release. - Secrets always available. Full 5-platform binary output (linux x86_64/arm64, darwin x86_64/arm64, windows x86_64). +4. **`ci-runtime.yml`** — nightly schedule, manual dispatch, path-filtered push + - **Linux x86_64 only**. Live inference smoke tests (`apm run`) isolated from release pipeline. + - Uses `GH_MODELS_PAT` for GitHub Models API access. + - Failures do not block releases — annotated as warnings. ## Platform Testing Strategy -- **PR time**: Linux + Windows in parallel. Catches logic bugs, dependency issues, path separators, encoding, and Windows-specific problems before merge. +- **PR time**: Linux-only combined build-and-test in `ci.yml`. Catches logic bugs and dependency issues before merge. Windows + macOS are tested post-merge (platform-specific issues are rare and the full matrix runs on every push to main). - **Post-merge**: Full 5-platform matrix (linux x86_64/arm64, darwin x86_64/arm64, windows x86_64) catches remaining platform-specific issues on main. -- **Rationale**: Linux + Windows PR coverage catches the two fundamentally different platform families (Unix vs Windows). macOS-specific issues are rare and caught post-merge. +- **Rationale**: ci.yml has always been Linux-only — Windows and macOS are covered by `build-release.yml` on every push to main. This keeps PR feedback fast while still catching platform issues before release. ## PyInstaller Binary Packaging - **CRITICAL**: Uses `--onedir` mode (NOT `--onefile`) for faster CLI startup performance @@ -44,11 +49,17 @@ Three workflows split by trigger and secret requirements: 2. **Release Validation**: ISOLATION testing - no source checkout, validates exact shipped binary experience 3. **Path Resolution**: Use symlinks and PATH manipulation for isolated binary testing +## Inference Testing (Decoupled) +- Live inference tests (`apm run`) are **isolated** in `ci-runtime.yml` — they do NOT gate releases +- `APM_RUN_INFERENCE_TESTS=1` env var enables inference in test scripts; absent = skipped +- `GH_MODELS_PAT` is only used in `ci-runtime.yml` and smoke-test jobs — NOT in integration-tests or release-validation +- Rationale: 8 inference executions × 2% failure rate = 14.9% false-negative per release; APM core UVPs require zero live inference + ## Release Flow Dependencies -- **PR workflow**: ci.yml (test → build, Linux-only) then ci-integration.yml via workflow_run (approve → smoke-test → integration-tests → release-validation → report-status, all Linux-only) -- **Push/Release workflow (Linux + Windows)**: test → build → integration-tests → release-validation → create-release → publish-pypi → update-homebrew -- **Push/Release workflow (macOS Intel)**: test → build-and-validate-macos-intel (build always + conditional integration/release-validation) → create-release -- **Push/Release workflow (macOS ARM)**: test → build-and-validate-macos-arm (tag/schedule/dispatch only; all phases run) → create-release +- **PR workflow**: ci.yml (build-and-test, Linux-only) then ci-integration.yml via workflow_run (approve → smoke-test → integration-tests → release-validation → report-status, all Linux-only) +- **Push/Release workflow (Linux + Windows)**: build-and-test → integration-tests → release-validation → create-release → publish-pypi → update-homebrew (gh-aw-compat runs in parallel, informational) +- **Push/Release workflow (macOS Intel)**: build-and-validate-macos-intel (root node: unit tests + build always + conditional integration/release-validation) → create-release +- **Push/Release workflow (macOS ARM)**: build-and-validate-macos-arm (root node, tag/schedule/dispatch only; all phases run) → create-release - **Tag Triggers**: Only `v*.*.*` tags trigger full release pipeline - **Artifact Retention**: 30 days for debugging failed releases - **Cross-workflow artifacts**: ci-integration.yml downloads artifacts from ci.yml using `run-id` and `github-token` @@ -63,9 +74,13 @@ Three workflows split by trigger and secret requirements: ## Key Environment Variables - `PYTHON_VERSION: '3.12'` - Standardized across all jobs - `GITHUB_TOKEN` - Fallback token for compatibility (GitHub Actions built-in) +- `APM_RUN_INFERENCE_TESTS` - When `1`, enables live inference tests in validation scripts ## Performance Considerations -- **PR CI is Linux-only**: Eliminates macOS runner queue delays. Full platform coverage runs post-merge. +- **Combined build-and-test**: Eliminates ~1.5m runner re-provisioning overhead by running unit tests and binary build in the same job. +- **macOS as root nodes**: macOS consolidated jobs run their own unit tests and start immediately — no dependency on Linux/Windows test completion. +- **Native uv caching**: `setup-uv` action with `enable-cache: true` replaces manual `actions/cache@v3` blocks. +- **Targeted setup-node usage**: Node.js is only installed in `ci-runtime.yml`, macOS consolidated jobs, and integration-tests/release-validation phases (for `apm runtime setup copilot` → npm install). - **macOS runner consolidation**: Each macOS arch has a single consolidated job (build + integration + release-validation). Intel (`build-and-validate-macos-intel`) runs on every push since Intel runners are plentiful. ARM (`build-and-validate-macos-arm`) is gated to tag/schedule/dispatch only since ARM runners are extremely scarce (2-4h+ queue waits). This avoids serial re-queuing of runners across multiple jobs. - **Unit tests skip macOS**: Python unit tests are platform-agnostic; Linux + Windows coverage is sufficient. macOS-specific validation (binary build, integration tests, release validation) still runs via the consolidated job. - UPX compression when available (reduces binary size ~50%) diff --git a/.github/workflows/build-release.yml b/.github/workflows/build-release.yml index 8fbbff7b3..c5357540a 100644 --- a/.github/workflows/build-release.yml +++ b/.github/workflows/build-release.yml @@ -27,13 +27,11 @@ permissions: contents: read jobs: - # Unit tests on Linux + Windows only. macOS runners are scarce and Python unit - # tests are platform-agnostic — macOS-specific validation (build + integration + - # release-validation) is consolidated into a single job per arch below. - test: - runs-on: ${{ matrix.os }} - permissions: - contents: read + # Unit tests + binary build combined (Linux + Windows). + # Merging test+build eliminates ~1.5m runner re-provisioning overhead per platform. + # macOS runners are scarce and have their own consolidated jobs below. + build-and-test: + name: Build & Test strategy: fail-fast: false matrix: @@ -41,98 +39,48 @@ jobs: - os: ubuntu-24.04 arch: x86_64 platform: linux - - os: ubuntu-24.04-arm - arch: arm64 - platform: linux - - os: windows-latest - arch: x86_64 - platform: windows - - steps: - - uses: actions/checkout@v4 - - - name: Set up Node.js - uses: actions/setup-node@v4 - with: - node-version: '24' - - - name: Set up Python - uses: actions/setup-python@v5 - with: - python-version: ${{ env.PYTHON_VERSION }} - - - name: Install uv - uses: astral-sh/setup-uv@v6 - - - name: Cache uv environments - uses: actions/cache@v3 - with: - path: | - ~/.cache/uv - ~/.local/share/uv - ~\AppData\Local\uv\cache - key: ${{ runner.os }}-uv-${{ hashFiles('**/pyproject.toml') }} - restore-keys: | - ${{ runner.os }}-uv- - - - name: Install dependencies - run: uv sync --extra dev - - - name: Test with pytest - run: uv run pytest tests/unit tests/test_console.py - - - name: Run smoke tests - env: - GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} - GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} - run: uv run pytest tests/integration/test_runtime_smoke.py -v - - # Build binaries (Linux + Windows). macOS builds are in build-and-validate-macos-intel / -arm. - build: - name: Build APM Binary - needs: [test] - strategy: - matrix: - include: - - os: ubuntu-24.04 - platform: linux - arch: x86_64 binary_name: apm-linux-x86_64 - os: ubuntu-24.04-arm - platform: linux arch: arm64 + platform: linux binary_name: apm-linux-arm64 - os: windows-latest - platform: windows arch: x86_64 + platform: windows binary_name: apm-windows-x86_64 - + runs-on: ${{ matrix.os }} permissions: - contents: read # Checkout code; upload-artifact uses separate Actions API - + contents: read + steps: - - name: Checkout code - uses: actions/checkout@v4 - + - uses: actions/checkout@v4 + - name: Set up Python uses: actions/setup-python@v5 with: python-version: ${{ env.PYTHON_VERSION }} - + + - name: Install uv + uses: astral-sh/setup-uv@v6 + with: + enable-cache: true + + - name: Install dependencies + run: uv sync --extra dev --extra build + + - name: Run tests + env: + GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} + GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} + run: uv run pytest tests/unit tests/test_console.py tests/integration/test_runtime_smoke.py -n auto --dist worksteal + - name: Install UPX (Linux) if: matrix.platform == 'linux' run: | sudo apt-get update sudo apt-get install -y upx-ucl - - - name: Install uv - uses: astral-sh/setup-uv@v6 - - - name: Install Python dependencies - run: | - uv sync --extra dev --extra build - + - name: Build binary (Unix) if: matrix.platform != 'windows' run: | @@ -144,7 +92,7 @@ jobs: shell: pwsh run: | uv run pwsh scripts/windows/build-binary.ps1 - + - name: Upload binary as workflow artifact uses: actions/upload-artifact@v4 with: @@ -166,13 +114,12 @@ jobs: # Intel runners are plentiful (<1 min queue) so this runs on every push for # early macOS build-regression feedback. Integration and release-validation # phases are conditional on tag/schedule/dispatch. + # Root node — runs its own unit tests instead of waiting for build-and-test. build-and-validate-macos-intel: name: Build & Validate (macOS x86_64) - needs: [test] runs-on: macos-15-intel permissions: contents: read - models: read steps: - name: Checkout code @@ -198,10 +145,15 @@ jobs: - name: Install uv uses: astral-sh/setup-uv@v6 + with: + enable-cache: true - name: Install dependencies run: uv sync --extra dev --extra build + - name: Run unit tests + run: uv run pytest tests/unit tests/test_console.py -n auto --dist worksteal + # ── PHASE 1: BUILD BINARY ── - name: Build binary run: | @@ -227,7 +179,6 @@ jobs: if: github.ref_type == 'tag' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' env: APM_E2E_TESTS: "1" - GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }} run: | @@ -255,7 +206,6 @@ jobs: if: github.ref_type == 'tag' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' env: APM_E2E_TESTS: "1" - GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }} run: | @@ -269,14 +219,13 @@ jobs: # common. This job is gated to tag/schedule/dispatch only so push-to-main never # blocks on ARM availability. All phases run unconditionally since the job itself # is already release-gated. + # Root node — runs its own unit tests instead of waiting for build-and-test. build-and-validate-macos-arm: name: Build & Validate (macOS arm64) - needs: [test] if: github.ref_type == 'tag' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' runs-on: macos-latest permissions: contents: read - models: read steps: - name: Checkout code @@ -297,10 +246,15 @@ jobs: - name: Install uv uses: astral-sh/setup-uv@v6 + with: + enable-cache: true - name: Install dependencies run: uv sync --extra dev --extra build + - name: Run unit tests + run: uv run pytest tests/unit tests/test_console.py -n auto --dist worksteal + # ── PHASE 1: BUILD BINARY ── - name: Build binary run: | @@ -325,7 +279,6 @@ jobs: - name: Run integration tests env: APM_E2E_TESTS: "1" - GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }} run: | @@ -351,7 +304,6 @@ jobs: - name: Run release validation tests env: APM_E2E_TESTS: "1" - GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }} run: | @@ -367,7 +319,7 @@ jobs: integration-tests: name: Integration Tests if: github.ref_type == 'tag' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' - needs: [test, build] + needs: [build-and-test] strategy: matrix: include: @@ -383,43 +335,43 @@ jobs: arch: x86_64 platform: windows binary_name: apm-windows-x86_64 - + runs-on: ${{ matrix.os }} permissions: contents: read - models: read # Required for GitHub Models API access - + steps: - name: Checkout code uses: actions/checkout@v4 - + - name: Download APM binary from build artifacts uses: actions/download-artifact@v4 with: name: ${{ matrix.binary_name }} path: ./ - + - name: Set up Node.js uses: actions/setup-node@v4 with: node-version: '24' - + - name: Set up Python uses: actions/setup-python@v5 with: python-version: ${{ env.PYTHON_VERSION }} - + - name: Install uv uses: astral-sh/setup-uv@v6 - + with: + enable-cache: true + - name: Install test dependencies run: uv sync --extra dev - + - name: Run integration tests (Unix) if: matrix.platform != 'windows' env: APM_E2E_TESTS: "1" - GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} # Models access GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} # Primary: APM module access ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }} # Azure DevOps module access run: | @@ -432,7 +384,6 @@ jobs: shell: pwsh env: APM_E2E_TESTS: "1" - GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }} run: | @@ -444,7 +395,7 @@ jobs: release-validation: name: Release Validation if: github.ref_type == 'tag' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' - needs: [test, build, integration-tests] + needs: [build-and-test, integration-tests] strategy: matrix: include: @@ -460,12 +411,11 @@ jobs: arch: x86_64 platform: windows binary_name: apm-windows-x86_64 - + runs-on: ${{ matrix.os }} permissions: contents: read - models: read # Required for GitHub Models API access - + steps: - name: Set up Node.js uses: actions/setup-node@v4 @@ -476,26 +426,26 @@ jobs: uses: actions/setup-python@v5 with: python-version: ${{ env.PYTHON_VERSION }} - + - name: Download APM binary from build artifacts uses: actions/download-artifact@v4 with: name: ${{ matrix.binary_name }} path: ${{ matrix.platform == 'windows' && 'D:\apm-isolated-test' || '/tmp/apm-isolated-test/' }} - + - name: Make binary executable and verify isolation (Unix) if: matrix.platform != 'windows' run: | cd /tmp/apm-isolated-test - + # Debug: List the downloaded structure echo "Downloaded structure:" find . -name "apm" -type f ls -la ./dist/ - + # Make the binary executable chmod +x ./dist/${{ matrix.binary_name }}/apm - + - name: Create APM symlink for testing (Unix) if: matrix.platform != 'windows' run: | @@ -508,21 +458,20 @@ jobs: shell: pwsh run: | cd D:\apm-isolated-test - + # Debug: List the downloaded structure Write-Host "Downloaded structure:" Get-ChildItem -Recurse -Filter "apm.exe" Get-ChildItem .\dist\ - + # Add binary directory to PATH $binDir = "D:\apm-isolated-test\dist\${{ matrix.binary_name }}" echo $binDir | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append - + - name: Run release validation tests (Unix) if: matrix.platform != 'windows' env: APM_E2E_TESTS: "1" - GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }} run: | @@ -536,7 +485,6 @@ jobs: shell: pwsh env: APM_E2E_TESTS: "1" - GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }} run: | @@ -547,7 +495,7 @@ jobs: create-release: name: Create GitHub Release - needs: [test, build, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation] + needs: [build-and-test, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation] if: github.ref_type == 'tag' # All tags create GitHub releases runs-on: ubuntu-latest permissions: @@ -555,27 +503,27 @@ jobs: outputs: is_prerelease: ${{ steps.release_type.outputs.is_prerelease }} is_private_repo: ${{ github.event.repository.private }} - + steps: - name: Download all build artifacts uses: actions/download-artifact@v4 with: path: ./dist - + - name: Prepare release binaries run: | # Debug: Show the actual downloaded structure echo "Directory listing:" ls -la ./dist echo "" - + # Create tar.gz archives from directory structure for release and Homebrew cd dist for binary in apm-linux-x86_64 apm-linux-arm64 apm-darwin-x86_64 apm-darwin-arm64; do # With artifacts containing both scripts and dist/, the binary is in artifact/dist/binary/ artifact_dir="${binary}" binary_dir="${artifact_dir}/dist/${binary}" - + if [ -d "$binary_dir" ] && [ -f "$binary_dir/apm" ]; then echo "Processing $binary_dir directory..." # Ensure the binary is executable before archiving @@ -625,12 +573,12 @@ jobs: fi exit 1 fi - + - name: Determine release type id: release_type run: | TAG_NAME="${{ github.ref_name }}" - + # Check if tag matches stable semver pattern (e.g., v1.2.3, v0.2.0) # Stable: starts with v, followed by digits.digits.digits, then end of string if [[ "$TAG_NAME" =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then @@ -641,10 +589,10 @@ jobs: echo "is_prerelease=true" >> $GITHUB_OUTPUT echo "Detected PEP 440 prerelease: $TAG_NAME" else - echo "is_prerelease=true" >> $GITHUB_OUTPUT + echo "is_prerelease=true" >> $GITHUB_OUTPUT echo "Detected other prerelease: $TAG_NAME" fi - + - name: Create GitHub Release id: release uses: softprops/action-gh-release@v2 @@ -663,14 +611,17 @@ jobs: ./dist/apm-windows-x86_64.zip ./dist/apm-windows-x86_64.zip.sha256 - # GH-AW Compatibility Gate — validates the released binary works in the + # GH-AW Compatibility — validates the released binary works in the # exact flow GitHub Agentic Workflows uses (isolated install + pack, no token). - # Gates publish-pypi and update-homebrew so broken versions don't reach stable distribution. + # Informational: runs as continue-on-error because it depends on external services + # (GitHub API, npm, microsoft/apm-action, microsoft/apm-sample-package) that can + # fail independently of APM code quality (~4% false-negative rate per execution). gh-aw-compat: name: GH-AW Compatibility needs: [create-release] if: github.ref_type == 'tag' runs-on: ubuntu-24.04 + continue-on-error: true steps: - name: Install and pack with apm-action uses: microsoft/apm-action@v1 @@ -690,11 +641,16 @@ jobs: test -f "${{ steps.pack.outputs.bundle-path }}" echo "✅ GH-AW compatibility test passed" - # Publish to PyPI (only stable releases from public repo) + - name: Warn on failure + if: failure() + run: | + echo "::warning::GH-AW compatibility test failed. This is informational — external dependency may be unavailable. Investigate before next release." + + # Publish to PyPI (only stable releases from public repo) publish-pypi: name: Publish to PyPI runs-on: ubuntu-latest - needs: [test, build, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation, create-release, gh-aw-compat] + needs: [build-and-test, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation, create-release] if: github.ref_type == 'tag' && needs.create-release.outputs.is_private_repo != 'true' && needs.create-release.outputs.is_prerelease != 'true' environment: name: pypi @@ -702,27 +658,27 @@ jobs: permissions: contents: read # Required for actions/checkout id-token: write # IMPORTANT: this permission is mandatory for trusted publishing - + steps: - name: Checkout code uses: actions/checkout@v4 - + - name: Set up Python uses: actions/setup-python@v5 with: python-version: ${{ env.PYTHON_VERSION }} - + - name: Install build dependencies run: | python -m pip install --upgrade pip pip install build twine - + - name: Build Python package run: python -m build - + - name: Check package run: twine check dist/* - + - name: Publish to PyPI uses: pypa/gh-action-pypi-publish@release/v1 with: @@ -732,20 +688,20 @@ jobs: update-homebrew: name: Update Homebrew Formula runs-on: ubuntu-latest - needs: [test, build, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation, create-release, gh-aw-compat, publish-pypi] + needs: [build-and-test, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation, create-release, publish-pypi] if: github.ref_type == 'tag' && needs.create-release.outputs.is_private_repo != 'true' && needs.create-release.outputs.is_prerelease != 'true' permissions: contents: read - + steps: - name: Extract SHA256 checksums from GitHub release id: checksums run: | # Download the SHA256 checksum files from the GitHub release RELEASE_TAG="${{ github.ref_name }}" - + echo "Downloading checksums for release $RELEASE_TAG" - + # Download checksum files directly from the release curl -L -o apm-darwin-arm64.tar.gz.sha256 \ "https://github.com/${{ github.repository }}/releases/download/$RELEASE_TAG/apm-darwin-arm64.tar.gz.sha256" @@ -755,23 +711,23 @@ jobs: "https://github.com/${{ github.repository }}/releases/download/$RELEASE_TAG/apm-linux-x86_64.tar.gz.sha256" curl -L -o apm-linux-arm64.tar.gz.sha256 \ "https://github.com/${{ github.repository }}/releases/download/$RELEASE_TAG/apm-linux-arm64.tar.gz.sha256" - + # Extract SHA256 checksums DARWIN_ARM64_SHA=$(cat apm-darwin-arm64.tar.gz.sha256 | cut -d' ' -f1) DARWIN_X86_64_SHA=$(cat apm-darwin-x86_64.tar.gz.sha256 | cut -d' ' -f1) LINUX_X86_64_SHA=$(cat apm-linux-x86_64.tar.gz.sha256 | cut -d' ' -f1) LINUX_ARM64_SHA=$(cat apm-linux-arm64.tar.gz.sha256 | cut -d' ' -f1) - + echo "darwin-arm64-sha=$DARWIN_ARM64_SHA" >> $GITHUB_OUTPUT echo "darwin-x86_64-sha=$DARWIN_X86_64_SHA" >> $GITHUB_OUTPUT echo "linux-x86_64-sha=$LINUX_X86_64_SHA" >> $GITHUB_OUTPUT echo "linux-arm64-sha=$LINUX_ARM64_SHA" >> $GITHUB_OUTPUT - + echo "Darwin ARM64 SHA: $DARWIN_ARM64_SHA" echo "Darwin x86_64 SHA: $DARWIN_X86_64_SHA" echo "Linux x86_64 SHA: $LINUX_X86_64_SHA" echo "Linux ARM64 SHA: $LINUX_ARM64_SHA" - + - name: Trigger Homebrew tap repository update uses: peter-evans/repository-dispatch@v3 with: @@ -797,7 +753,7 @@ jobs: update-scoop: name: Update Scoop Bucket runs-on: ubuntu-latest - needs: [test, build, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation, create-release, gh-aw-compat, publish-pypi] + needs: [build-and-test, build-and-validate-macos-intel, build-and-validate-macos-arm, integration-tests, release-validation, create-release, publish-pypi] # TODO: Enable once downstream repository and secrets are configured (see #88) if: false && github.ref_type == 'tag' && needs.create-release.outputs.is_private_repo != 'true' && needs.create-release.outputs.is_prerelease != 'true' permissions: diff --git a/.github/workflows/ci-integration.yml b/.github/workflows/ci-integration.yml index bc500472a..085225d35 100644 --- a/.github/workflows/ci-integration.yml +++ b/.github/workflows/ci-integration.yml @@ -58,10 +58,15 @@ jobs: # Checkout default branch (main) — never PR code - uses: actions/checkout@v4 - - name: Set up Node.js - uses: actions/setup-node@v4 - with: - node-version: '24' + # O8: PR traceability — annotate with originating PR URL + - name: Annotate originating PR + run: | + PR_NUMBER=$(echo '${{ toJSON(github.event.workflow_run.pull_requests) }}' | jq -r '.[0].number // empty') + if [ -n "$PR_NUMBER" ]; then + echo "::notice::🔗 Originating PR: https://github.com/${{ github.repository }}/pull/${PR_NUMBER}" + else + echo "::notice::Triggered by push to ${{ github.event.workflow_run.head_branch }}" + fi - name: Set up Python uses: actions/setup-python@v5 @@ -70,16 +75,8 @@ jobs: - name: Install uv uses: astral-sh/setup-uv@v6 - - - name: Cache uv environments - uses: actions/cache@v3 with: - path: | - ~/.cache/uv - ~/.local/share/uv - key: ${{ runner.os }}-uv-${{ hashFiles('**/pyproject.toml') }} - restore-keys: | - ${{ runner.os }}-uv- + enable-cache: true - name: Install dependencies run: uv sync --extra dev @@ -99,7 +96,6 @@ jobs: permissions: contents: read actions: read - models: read steps: # Checkout default branch (main) for test scripts — never PR code @@ -126,6 +122,8 @@ jobs: - name: Install uv uses: astral-sh/setup-uv@v6 + with: + enable-cache: true - name: Install test dependencies run: uv sync --extra dev @@ -133,7 +131,6 @@ jobs: - name: Run integration tests env: APM_E2E_TESTS: "1" - GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }} run: | @@ -150,7 +147,6 @@ jobs: permissions: contents: read actions: read - models: read steps: # Checkout default branch for test scripts — never PR code @@ -203,7 +199,6 @@ jobs: - name: Run release validation tests env: APM_E2E_TESTS: "1" - GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} ADO_APM_PAT: ${{ secrets.ADO_APM_PAT }} run: | @@ -224,16 +219,38 @@ jobs: uses: actions/github-script@v7 with: script: | - const success = '${{ needs.release-validation.result }}' === 'success' - && '${{ needs.integration-tests.result }}' === 'success' - && '${{ needs.smoke-test.result }}' === 'success'; + const ciConclusion = '${{ github.event.workflow_run.conclusion }}'; + const smokeResult = '${{ needs.smoke-test.result }}'; + const integResult = '${{ needs.integration-tests.result }}'; + const relvalResult = '${{ needs.release-validation.result }}'; + + // O7: Detect CI circular dependency — when ci.yml fails, all approval + // jobs are skipped (their `if: conclusion == 'success'` fails), which + // cascades to skip smoke-test → integration-tests → release-validation. + // Without this check, we'd report 'failure' and block the CI-fixing PR. + const allSkipped = smokeResult === 'skipped' && integResult === 'skipped' && relvalResult === 'skipped'; + + let state, description; + if (allSkipped && ciConclusion !== 'success') { + // Upstream CI failed — all jobs were skipped via approval guards, not due to test failures + state = 'pending'; + description = `CI workflow ${ciConclusion} — integration tests skipped (not blocking)`; + core.notice(`CI workflow concluded with '${ciConclusion}' — integration tests were skipped. This is expected when ci.yml itself has an error.`); + } else { + const success = relvalResult === 'success' + && integResult === 'success' + && smokeResult === 'success'; + state = success ? 'success' : 'failure'; + description = success ? 'All integration tests passed' : 'Integration tests failed'; + } + await github.rest.repos.createCommitStatus({ owner: context.repo.owner, repo: context.repo.repo, sha: context.payload.workflow_run.head_sha, - state: success ? 'success' : 'failure', + state, target_url: `https://github.com/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`, context: 'Integration Tests (PR)', - description: success ? 'All integration tests passed' : 'Integration tests failed' + description }); diff --git a/.github/workflows/ci-runtime.yml b/.github/workflows/ci-runtime.yml new file mode 100644 index 000000000..d416d716d --- /dev/null +++ b/.github/workflows/ci-runtime.yml @@ -0,0 +1,90 @@ +name: Runtime Inference Tests + +env: + PYTHON_VERSION: '3.12' + +on: + schedule: + # Daily at 05:00 UTC — standalone schedule, no dependency on build-release.yml + - cron: '0 5 * * *' + workflow_dispatch: + push: + branches: [main] + paths: + - 'src/apm_cli/commands/run.py' + - 'src/apm_cli/runtime/**' + - 'scripts/runtime/**' + - 'scripts/test-release-validation.sh' + - 'tests/integration/test_runtime_smoke.py' + +permissions: + contents: read + +jobs: + # Live inference smoke tests — isolated from the release pipeline to avoid + # false-negative failures from external API flakiness (14.9% per-release risk). + # Runs nightly, on manual dispatch, and when runtime code changes. + live-inference-smoke: + name: Live Inference Smoke (Linux x86_64) + runs-on: ubuntu-24.04 + permissions: + contents: read + models: read + + steps: + - uses: actions/checkout@v4 + + - name: Set up Node.js + uses: actions/setup-node@v4 + with: + node-version: '24' + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: ${{ env.PYTHON_VERSION }} + + - name: Install uv + uses: astral-sh/setup-uv@v6 + with: + enable-cache: true + + - name: Install dependencies + run: uv sync --extra dev --extra build + + - name: Build binary + run: | + chmod +x scripts/build-binary.sh + uv run ./scripts/build-binary.sh + + - name: Setup binary + run: | + chmod +x ./dist/apm-linux-x86_64/apm + ln -s "$(pwd)/dist/apm-linux-x86_64/apm" "$(pwd)/apm" + echo "$(pwd)" >> $GITHUB_PATH + + - name: Run smoke tests + env: + GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} + GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} + run: uv run pytest tests/integration/test_runtime_smoke.py -v + + - name: Run inference validation + env: + APM_RUN_INFERENCE_TESTS: "1" + APM_E2E_TESTS: "1" + GITHUB_TOKEN: ${{ secrets.GH_MODELS_PAT }} + GITHUB_APM_PAT: ${{ secrets.GH_CLI_PAT }} + run: | + chmod +x scripts/test-release-validation.sh + ./scripts/test-release-validation.sh + timeout-minutes: 20 + + - name: Annotate result + if: always() + run: | + if [[ "${{ job.status }}" == "success" ]]; then + echo "::notice::✅ Runtime inference tests passed" + else + echo "::warning::⚠️ Runtime inference tests failed — this does not block releases" + fi diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 3c3c30289..05b2e2039 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -16,7 +16,9 @@ permissions: jobs: # Linux-only for PR feedback. Full platform matrix (incl. macOS + Windows) runs post-merge in build-release.yml. - test: + # Combines unit tests + binary build into a single job to eliminate runner re-provisioning overhead. + build-and-test: + name: Build & Test (Linux) runs-on: ubuntu-24.04 permissions: contents: read @@ -24,11 +26,6 @@ jobs: steps: - uses: actions/checkout@v4 - - name: Set up Node.js - uses: actions/setup-node@v4 - with: - node-version: '24' - - name: Set up Python uses: actions/setup-python@v5 with: @@ -36,66 +33,37 @@ jobs: - name: Install uv uses: astral-sh/setup-uv@v6 - - - name: Cache uv environments - uses: actions/cache@v3 with: - path: | - ~/.cache/uv - ~/.local/share/uv - key: ${{ runner.os }}-uv-${{ hashFiles('**/pyproject.toml') }} - restore-keys: | - ${{ runner.os }}-uv- + enable-cache: true - name: Install dependencies - run: uv sync --extra dev + run: uv sync --extra dev --extra build - - name: Test with pytest - run: uv run pytest tests/unit tests/test_console.py + - name: Run tests + run: uv run pytest tests/unit tests/test_console.py -n auto --dist worksteal - # Linux-only binary build for PR validation. Full platform builds run post-merge. - build: - name: Build APM Binary (Linux) - needs: [test] - runs-on: ubuntu-24.04 - permissions: - contents: read + - name: Install UPX + run: | + sudo apt-get update + sudo apt-get install -y upx-ucl - steps: - - name: Checkout code - uses: actions/checkout@v4 + - name: Build binary + run: | + chmod +x scripts/build-binary.sh + uv run ./scripts/build-binary.sh - - name: Set up Python - uses: actions/setup-python@v5 - with: - python-version: ${{ env.PYTHON_VERSION }} - - - name: Install UPX - run: | - sudo apt-get update - sudo apt-get install -y upx-ucl - - - name: Install uv - uses: astral-sh/setup-uv@v6 - - - name: Install Python dependencies - run: | - uv sync --extra dev --extra build - - - name: Build binary - run: | - chmod +x scripts/build-binary.sh - uv run ./scripts/build-binary.sh - - - name: Upload binary as workflow artifact - uses: actions/upload-artifact@v4 - with: - name: apm-linux-x86_64 - path: | - ./dist/apm-linux-x86_64 - ./dist/apm-linux-x86_64.sha256 - ./scripts/test-release-validation.sh - ./scripts/github-token-helper.sh - include-hidden-files: true - retention-days: 30 - if-no-files-found: error + - name: Upload binary as workflow artifact + uses: actions/upload-artifact@v4 + with: + name: apm-linux-x86_64 + # Scripts are included to preserve the artifact root at ./ (not ./dist/). + # Without a sibling directory, upload-artifact strips the dist/ prefix, + # breaking download paths in ci-integration.yml which expects dist/$BINARY_NAME/apm. + path: | + ./dist/apm-linux-x86_64 + ./dist/apm-linux-x86_64.sha256 + ./scripts/test-release-validation.sh + ./scripts/github-token-helper.sh + include-hidden-files: true + retention-days: 30 + if-no-files-found: error diff --git a/CHANGELOG.md b/CHANGELOG.md index 7e791617d..72563893e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added +- `ci-runtime.yml` workflow for nightly + manual runtime inference tests, decoupled from release pipeline (#371) +- `APM_RUN_INFERENCE_TESTS` env var to gate live inference (`apm run`) in test scripts (#371) +- PR traceability `::notice` annotation in `ci-integration.yml` smoke-test job (#371) + +### Changed + +- Merged `test` + `build` into single `build-and-test` job across `ci.yml` and `build-release.yml` — eliminates ~1.5m runner re-provisioning per platform (#371) +- macOS consolidated jobs are now root nodes (no `needs: [test]`) — run own unit tests for full independence (#371) +- Removed `setup-node@v4` from unit test and build jobs that don't need Node.js (#371) +- Enabled native `setup-uv` caching (`enable-cache: true`), removed manual `actions/cache@v3` blocks (#371) +- Decoupled live inference tests (`apm run`) from release pipeline — reduces 14.9% false-negative rate and `GH_MODELS_PAT` secret exposure (#371) +- `ci-integration.yml` report-status now detects CI circular dependency (upstream failure) and reports `pending` instead of blocking (#371) +- `gh-aw-compat` is now informational (`continue-on-error: true`) — non-deterministic external dependencies should not block releases (#371) - Copilot encoding instructions: `encoding.instructions.md` (`applyTo: "**"`) bans non-ASCII characters in source and CLI output; updated `copilot-instructions.md` and `cli.instructions.md` to use ASCII bracket notation (`[+]`/`[!]`/`[x]`/`[i]`/`[*]`/`[>]`) instead of emoji STATUS_SYMBOLS (#282) ## [0.8.4] - 2026-03-22 diff --git a/pyproject.toml b/pyproject.toml index 1d8d30c60..0e392ffad 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -71,7 +71,7 @@ warn_return_any = true warn_unused_configs = true [tool.pytest.ini_options] -addopts = "-m 'not benchmark' -n auto" +addopts = "-m 'not benchmark'" markers = [ "integration: marks tests as integration tests that may require network access", "slow: marks tests as slow running tests", diff --git a/scripts/test-integration.sh b/scripts/test-integration.sh index a5ce439ec..1ca3a19d8 100755 --- a/scripts/test-integration.sh +++ b/scripts/test-integration.sh @@ -117,7 +117,13 @@ detect_environment() { log_info "Found existing binary: ./dist/$BINARY_NAME/apm (CI mode)" else USE_EXISTING_BINARY=false - log_info "No existing binary found, will build locally" + log_info "No existing binary found at ./dist/$BINARY_NAME/apm, will build locally" + # Debug: show what's actually in dist/ to diagnose artifact download issues + if [[ -d "./dist" ]]; then + log_info "Contents of ./dist/: $(ls -la ./dist/ 2>/dev/null | head -10)" + else + log_info "No ./dist/ directory exists" + fi fi } # Build binary (like CI build job does) - only if needed @@ -133,7 +139,11 @@ build_binary() { log_info "Installing Python dependencies..." if command -v uv >/dev/null 2>&1; then log_info "Using uv for binary build dependencies..." - uv venv + if [[ -d ".venv" ]]; then + log_info "Virtual environment already exists, reusing it..." + else + uv venv + fi source .venv/bin/activate uv pip install -e ".[dev]" uv pip install pyinstaller diff --git a/scripts/test-release-validation.sh b/scripts/test-release-validation.sh index 0358df8bf..d0fbba11d 100755 --- a/scripts/test-release-validation.sh +++ b/scripts/test-release-validation.sh @@ -156,7 +156,14 @@ run_with_timeout() { # HERO SCENARIO 1: 30-Second Zero-Config # Test the exact README flow: runtime setup → run virtual package +# Gated by APM_RUN_INFERENCE_TESTS — live inference tests are decoupled from +# the release pipeline and run in ci-runtime.yml (nightly/manual/path-filtered). test_hero_zero_config() { + if [[ "${APM_RUN_INFERENCE_TESTS:-}" != "1" ]]; then + log_info "Skipping HERO SCENARIO 1 (inference tests decoupled — set APM_RUN_INFERENCE_TESTS=1 to enable)" + return 0 + fi + log_test "HERO SCENARIO 1: 30-Second Zero-Config (README lines 35-44)" # Create temporary directory for this test @@ -288,22 +295,28 @@ test_hero_guardrailing() { log_success "Compiled to AGENTS.md (guardrails active)" # Step 5: apm run design-review (from installed package) - echo "Running: $BINARY_PATH run design-review (with 10s timeout)" - echo "--- Command Output Start ---" - run_with_timeout 10 "$BINARY_PATH run design-review" - exit_code=$? - echo "--- Command Output End ---" - echo "Exit code: $exit_code" - - if [[ $exit_code -eq 124 ]]; then - # Timeout is expected and OK - prompt started executing - log_success "design-review prompt executed with compiled guardrails" - elif [[ $exit_code -eq 0 ]]; then - log_success "design-review completed successfully" + # Gated by APM_RUN_INFERENCE_TESTS — live inference is decoupled from + # the release pipeline and runs in ci-runtime.yml. + if [[ "${APM_RUN_INFERENCE_TESTS:-}" == "1" ]]; then + echo "Running: $BINARY_PATH run design-review (with 10s timeout)" + echo "--- Command Output Start ---" + run_with_timeout 10 "$BINARY_PATH run design-review" + exit_code=$? + echo "--- Command Output End ---" + echo "Exit code: $exit_code" + + if [[ $exit_code -eq 124 ]]; then + # Timeout is expected and OK - prompt started executing + log_success "design-review prompt executed with compiled guardrails" + elif [[ $exit_code -eq 0 ]]; then + log_success "design-review completed successfully" + else + log_error "apm run design-review failed immediately" + cd .. + return 1 + fi else - log_error "apm run design-review failed immediately" - cd .. - return 1 + log_info "Skipping apm run design-review (inference tests decoupled — set APM_RUN_INFERENCE_TESTS=1 to enable)" fi cd .. @@ -432,11 +445,21 @@ echo "" echo "Binary found and executable: $BINARY_PATH" local tests_passed=0 - local tests_total=6 # Prerequisites, basic commands, gh-aw compat, runtime setup, 2 hero scenarios + local tests_total=5 # Prerequisites, basic commands, gh-aw compat, runtime setup, guardrailing (init/install/compile) local dependency_tests_run=false + local inference_tests_run=false + + # Hero scenario 1 (zero-config) is entirely inference-based — only counted when enabled + if [[ "${APM_RUN_INFERENCE_TESTS:-}" == "1" ]]; then + tests_total=$((tests_total + 1)) + inference_tests_run=true + log_info "Inference tests enabled (APM_RUN_INFERENCE_TESTS=1)" + else + log_info "Inference tests decoupled — skipping apm run tests (set APM_RUN_INFERENCE_TESTS=1 to enable)" + fi # Add dependency tests to total if available and GITHUB token is present - if [[ "$DEPENDENCY_TESTS_AVAILABLE" == "true" ]] && [[ -n "${GITHUB_CLI_PAT:-}" || -n "${GITHUB_TOKEN:-}" ]]; then + if [[ "$DEPENDENCY_TESTS_AVAILABLE" == "true" ]] && [[ -n "${GITHUB_APM_PAT:-}" || -n "${GITHUB_TOKEN:-}" ]]; then tests_total=$((tests_total + 1)) dependency_tests_run=true log_info "Dependency integration tests will be included" @@ -475,11 +498,15 @@ echo "" log_error "Runtime setup test failed" fi - # HERO SCENARIO 1: 30-second zero-config - if test_hero_zero_config; then - ((tests_passed++)) + # HERO SCENARIO 1: 30-second zero-config (only when inference tests enabled) + if [[ "$inference_tests_run" == "true" ]]; then + if test_hero_zero_config; then + ((tests_passed++)) + else + log_error "Hero scenario 1 (30-sec zero-config) failed" + fi else - log_error "Hero scenario 1 (30-sec zero-config) failed" + test_hero_zero_config # Runs but auto-skips and returns 0 fi # HERO SCENARIO 2: 2-minute guardrailing @@ -510,24 +537,29 @@ echo "" echo "" echo "🚀 Binary is ready for production release" echo "📦 End-user experience validated successfully" - echo "🎯 Both README hero scenarios work perfectly" echo "" echo "Validated user journeys:" - echo " 1. Prerequisites (GITHUB_TOKEN) ✅" + echo " 1. Prerequisites (GITHUB_APM_PAT) ✅" echo " 2. Binary accessibility ✅" echo " 3. Runtime setup (copilot) ✅" echo " 4. GH-AW compatibility (tokenless install + pack) ✅" echo "" - echo " HERO SCENARIO 1: 30-Second Zero-Config ✨" - echo " - Run virtual package directly ✅" - echo " - Auto-install on first run ✅" - echo " - Use cached package on second run ✅" - echo "" + if [[ "$inference_tests_run" == "true" ]]; then + echo " HERO SCENARIO 1: 30-Second Zero-Config ✨" + echo " - Run virtual package directly ✅" + echo " - Auto-install on first run ✅" + echo " - Use cached package on second run ✅" + echo "" + fi echo " HERO SCENARIO 2: 2-Minute Guardrailing ✨" echo " - Project initialization ✅" echo " - Install APM packages ✅" echo " - Compile to AGENTS.md guardrails ✅" - echo " - Run prompts with guardrails ✅" + if [[ "$inference_tests_run" == "true" ]]; then + echo " - Run prompts with guardrails ✅" + else + echo " - Run prompts (decoupled to ci-runtime.yml) ⏭️" + fi if [[ "$dependency_tests_run" == "true" ]]; then echo "" echo " BONUS: Real dependency integration ✅"