Skip to content

Consolidate runtime-diagnostics pipeline into the runtime pipeline#124593

Draft
steveisok wants to merge 9 commits intodotnet:mainfrom
steveisok:consolidate-diag-pipeline
Draft

Consolidate runtime-diagnostics pipeline into the runtime pipeline#124593
steveisok wants to merge 9 commits intodotnet:mainfrom
steveisok:consolidate-diag-pipeline

Conversation

@steveisok
Copy link
Member

Summary

Move the cDAC and DAC/SOS diagnostic test jobs from the standalone runtime-diagnostics pipeline into runtime.yml, eliminating the redundant CoreCLR rebuild that runtime-diagnostics performed.

Changes

  • eng/pipelines/runtime.yml: Add dotnet/diagnostics repository resource (with configurable branch parameter), a shared framework upload step (DiagnosticsRuntime_*) to the existing CoreCLR_Libraries post-build, and two new diagnostic test jobs (cDAC and DAC) that depend on the existing CoreCLR_Libraries
    windows_x64 build.
  • docs/infra/diagnostics-pipeline-consolidation.md: Design doc covering motivation, alternatives evaluated, and the chosen approach.
  • docs/infra/runtime-pipeline-architecture.md: Comprehensive reference for the runtime.yml pipeline (template chain, platform matrix, variables, artifact flow, job catalog).

Motivation

The standalone runtime-diagnostics pipeline rebuilt CoreCLR from scratch just to produce shared framework artifacts for SOS tests. By reusing the artifacts already built by CoreCLR_Libraries in runtime.yml, we save ~30 minutes of redundant build time per run and keep diagnostic test results visible
alongside the main build.

Notes

  • Both diagnostic jobs use the same EvaluatePaths conditions as their CoreCLR_Libraries dependency.
  • The runtime-diagnostics.yml pipeline can be retired once this is validated.

steveisok and others added 3 commits February 19, 2026 08:11
Move the cDAC and DAC SOS test jobs from the standalone
runtime-diagnostics pipeline into runtime.yml. This eliminates the
redundant CoreCLR rebuild that runtime-diagnostics performed.

The diagnostics jobs now depend on the existing CoreCLR_Libraries
windows_x64 build and download just the shared framework artifact
(DiagnosticsRuntime_*) that a new upload step produces.

Changes:
- Add dotnet/diagnostics repository resource with configurable branch
- Add shared framework upload step to CoreCLR_Libraries postBuildSteps
- Add cDAC and DAC test jobs at the end of the Build stage
- Both diagnostic jobs use the same path-based conditions as their
  dependency (CoreCLR_Libraries)

The runtime-diagnostics.yml pipeline can be retired once this is
validated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Documents the motivation, alternatives evaluated, and design for
moving SOS/DAC diagnostic tests from the standalone
runtime-diagnostics pipeline into runtime.yml.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comprehensive reference for the runtime.yml pipeline covering:
- Template chain (4 layers from runtime.yml to 1ES/public)
- Platform matrix system and xplat-setup variables
- EvaluatePaths stage and PR path filtering
- Job templates (global-build-job, runtime-diag-job, etc.)
- Key variables (debugOnPrReleaseOnRolling, isRollingBuild, etc.)
- Artifact flow (upload, download, cross-platform considerations)
- Helix test infrastructure
- Job categories catalog (~55 platform-matrix invocations)
- How-to guide for adding new jobs
- File reference table

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates the runtime-diagnostics pipeline into the main runtime.yml pipeline to eliminate redundant CoreCLR builds and improve CI efficiency. The diagnostics test jobs now reuse artifacts from the existing CoreCLR_Libraries build rather than rebuilding CoreCLR independently, saving approximately 30 minutes of build time per run.

Changes:

  • Adds diagnostics repository resource and parameter to runtime.yml for flexible branch targeting
  • Adds shared framework artifact upload (DiagnosticsRuntime_*) to CoreCLR_Libraries jobs
  • Adds two non-blocking diagnostic test jobs (cDAC and DAC) that depend on CoreCLR_Libraries windows_x64
  • Adds comprehensive documentation explaining the pipeline architecture and consolidation rationale

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
eng/pipelines/runtime.yml Adds diagnostics repo resource, DiagnosticsRuntime artifact upload to CoreCLR_Libraries, and two diagnostic test jobs (cDAC and DAC) that reuse those artifacts
docs/infra/runtime-pipeline-architecture.md New comprehensive reference documenting runtime.yml structure, template chain, platform matrix system, job templates, variables, artifact flow, and job categories
docs/infra/diagnostics-pipeline-consolidation.md Design document explaining the motivation for consolidation, alternatives evaluated, chosen approach, and implementation details

Move shouldContinueOnError from inside jobParameters to the
platform-matrix.yml parameters level. xplat-setup.yml already
passes shouldContinueOnError separately, so having it inside
jobParameters caused a duplicate definition error at template
expansion time, which cascaded into an invalid StageList error.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
steveisok and others added 2 commits February 24, 2026 08:07
The diagnostics test jobs depend on the CoreCLR_Libraries build for
shared framework artifacts, but the cDAC tools were not being built.
Add tools.cdac to the build args so the cDAC binaries are included
in the testhost output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 24, 2026 13:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (3)

eng/pipelines/runtime.yml:299

  • The PowerShell script and artifact upload steps for DiagnosticsRuntime are executed for all platforms in the CoreCLR_Libraries matrix (linux_x64, linux_musl_x64, osx_arm64, and windows_x64), but the diagnostic test jobs only consume the windows_x64 artifact. This wastes CI time and resources uploading unused artifacts from Linux and macOS builds. Consider adding a condition ${{ if eq(parameters.osGroup, 'windows') }}: around lines 287-299 to only execute these steps on Windows, or alternatively, add conditions to the individual steps to check the osGroup at runtime.
              - powershell: |
                  $versionDir = Get-ChildItem -Directory -Path "$(Build.SourcesDirectory)/artifacts/bin/testhost/net*/shared/Microsoft.NETCore.App" | Select-Object -First 1 -ExpandProperty FullName
                  Write-Host "##vso[task.setvariable variable=versionDir]$versionDir"
                displayName: 'Set Path to Shared Framework Artifacts'
              - template: /eng/pipelines/common/upload-artifact-step.yml
                parameters:
                  rootFolder: $(versionDir)
                  includeRootFolder: false
                  archiveType: $(archiveType)
                  archiveExtension: $(archiveExtension)
                  tarCompression: $(tarCompression)
                  artifactName: DiagnosticsRuntime_$(osGroup)$(osSubgroup)_$(archType)_$(_BuildConfig)
                  displayName: Diagnostics Runtime

eng/pipelines/runtime.yml:2026

  • The condition for running diagnostic tests is missing a check for SetPathVars_tools_cdac.containsChange. The existing runtime-diagnostics.yml pipeline (lines 39-43) runs when cDAC-specific paths change (src/native/managed/cdac/** and src/coreclr/debug/runtimeinfo/**), which are covered by the tools_cdac subset in evaluate-default-paths.yml. Without this check, changes to cDAC code that should trigger diagnostic tests won't trigger them in PRs. Add eq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_tools_cdac.containsChange'], true) to the condition to ensure cDAC changes trigger diagnostic tests.
            condition: >-
              or(
                eq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_non_mono_and_wasm.containsChange'], true),
                eq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_libraries.containsChange'], true),
                eq(variables['isRollingBuild'], true))

eng/pipelines/runtime.yml:2066

  • The condition for running the DAC diagnostic test is missing a check for SetPathVars_tools_cdac.containsChange. The existing runtime-diagnostics.yml pipeline (lines 39-43) runs when cDAC-specific paths change (src/native/managed/cdac/** and src/coreclr/debug/runtimeinfo/**), which are covered by the tools_cdac subset in evaluate-default-paths.yml. Without this check, changes to cDAC code that should trigger diagnostic tests won't trigger them in PRs. Add eq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_tools_cdac.containsChange'], true) to the condition to ensure cDAC changes trigger diagnostic tests.
            condition: >-
              or(
                eq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_non_mono_and_wasm.containsChange'], true),
                eq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_libraries.containsChange'], true),
                eq(variables['isRollingBuild'], true))

steveisok and others added 3 commits February 24, 2026 20:30
Change testResultsFiles from '**/*.xml' to '**/SOS.*.xml' in both
cDAC and DAC PublishTestResults steps. The broad glob was picking up
unrelated test results (e.g. DotnetCounters unit tests) from the
diagnostics repo, causing garbled test run titles.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The runtime-diag-job template accepted shouldContinueOnError but
never applied it at the job level. Map it to the job's
continueOnError so that diagnostics test failures don't fail the
overall runtime pipeline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move cDAC and DAC test jobs from the Build stage into their own
Diagnostics stage that depends on Build (and EvaluatePaths for PRs).
This makes diagnostics failures show as a separate check on GitHub
PRs, so they don't block the main Build stage result.

The job-level dependsOn and shouldContinueOnError are no longer
needed since the stage dependency handles ordering and stage
isolation handles failure containment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 25, 2026 13:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (5)

docs/infra/diagnostics-pipeline-consolidation.md:118

  • This bullet list describes Build-stage jobs that depend on build_windows_x64_{config}_CoreCLR_Libraries and are non-blocking via shouldContinueOnError: true, but the current runtime.yml implementation uses a Diagnostics stage (stage-level dependsOn Build) and doesn’t pass shouldContinueOnError. Please reconcile the doc with the implemented behavior.
Two jobs are added at the end of the Build stage, one for cDAC and one for DAC:

- **Template**: `eng/pipelines/diagnostics/runtime-diag-job.yml` (unchanged)
- **Dependency**: `build_windows_x64_{config}_CoreCLR_Libraries`
- **Non-blocking**: `shouldContinueOnError: true` — failures show as warnings but do not fail the pipeline check
- **Path conditions**: Same as `CoreCLR_Libraries` — only runs when CoreCLR or library paths change

docs/infra/runtime-pipeline-architecture.md:25

  • The pipeline overview/diagram lists only EvaluatePaths and Build stages and places “Diagnostics test jobs” under Build, but runtime.yml now defines a separate Diagnostics stage. Please update the diagram and the ‘Stages’ description accordingly so readers don’t assume diagnostics runs within the Build stage.
  ├─ Stage: EvaluatePaths (PR only)
  │    └─ Determines which subsets changed → gates downstream jobs
  │
  └─ Stage: Build
       ├─ CoreCLR jobs (multiple platforms/configs)
       ├─ Libraries jobs
       ├─ Mono jobs
       ├─ WASM jobs
       ├─ Mobile jobs (Android, iOS)
       ├─ Installer jobs
       ├─ NativeAOT jobs
       ├─ Tool/CrossDac jobs
       └─ Diagnostics test jobs (non-blocking)
**eng/pipelines/runtime.yml:291**
* The shared-framework extraction + DiagnosticsRuntime artifact upload is added to the CoreCLR_Libraries postBuildSteps matrix, so it will run for linux_x64/linux_musl_x64/osx_arm64 as well as windows_x64. Since the new diagnostics jobs only ever download the Windows artifact, these extra archive/publish steps add unnecessary time/storage. Consider adding a step-level condition to run these steps only on windows_x64 (or expanding the diagnostics jobs to consume the non-Windows artifacts too).
          - powershell: |
              $versionDir = Get-ChildItem -Directory -Path "$(Build.SourcesDirectory)/artifacts/bin/testhost/net*/shared/Microsoft.NETCore.App" | Select-Object -First 1 -ExpandProperty FullName
              Write-Host "##vso[task.setvariable variable=versionDir]$versionDir"
            displayName: 'Set Path to Shared Framework Artifacts'
          - template: /eng/pipelines/common/upload-artifact-step.yml
**eng/pipelines/runtime.yml:1993**
* The Diagnostics stage depends on the entire Build stage, so these SOS jobs won’t start until *all* Build jobs complete (not just the windows_x64 CoreCLR_Libraries job that produces the artifact). This delays diagnostics signal and can negate the intended wall-clock savings; if early feedback is desired, consider keeping these jobs in the Build stage with an explicit job-level dependency on build_windows_x64_{config}_CoreCLR_Libraries (and using continueOnError to keep them non-blocking).
- stage: Diagnostics
  dependsOn:
  - Build
  - ${{ if eq(variables['Build.Reason'], 'PullRequest') }}:
    - EvaluatePaths
  jobs:
**docs/infra/diagnostics-pipeline-consolidation.md:48**
* This section says the diagnostics tests are added as additional jobs in the Build stage, but runtime.yml introduces a separate Diagnostics stage instead. Please update this doc to match the current pipeline structure (or adjust the pipeline to match the design described here).

Approach: single-pipeline consolidation

Instead of coordinating across pipelines, add the diagnostics test jobs directly to the runtime pipeline as additional jobs in the Build stage. They depend on the existing CoreCLR_Libraries build job and download only the shared framework artifact — no duplicate build required.

</details>

Comment on lines +2001 to +2005
name: cDAC
useCdac: true
isOfficialBuild: ${{ variables.isOfficialBuild }}
liveRuntimeDir: $(Build.SourcesDirectory)/artifacts/runtime
timeoutInMinutes: 360
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime-diag-job template defaults shouldContinueOnError to false, but this invocation doesn’t pass shouldContinueOnError: true. As written, a failing SOS run will fail the job/stage and still block the overall pipeline, contradicting the intent of making diagnostics non-blocking.

Copilot uses AI. Check for mistakes.
useCdac: false
isOfficialBuild: ${{ variables.isOfficialBuild }}
liveRuntimeDir: $(Build.SourcesDirectory)/artifacts/runtime
timeoutInMinutes: 360
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as the cDAC job: shouldContinueOnError isn’t set, so failures will be blocking. Pass shouldContinueOnError: true if these results are intended to be non-blocking.

Suggested change
timeoutInMinutes: 360
timeoutInMinutes: 360
shouldContinueOnError: true

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants