Consolidate runtime-diagnostics pipeline into the runtime pipeline#124593
Consolidate runtime-diagnostics pipeline into the runtime pipeline#124593steveisok wants to merge 9 commits intodotnet:mainfrom
Conversation
Move the cDAC and DAC SOS test jobs from the standalone runtime-diagnostics pipeline into runtime.yml. This eliminates the redundant CoreCLR rebuild that runtime-diagnostics performed. The diagnostics jobs now depend on the existing CoreCLR_Libraries windows_x64 build and download just the shared framework artifact (DiagnosticsRuntime_*) that a new upload step produces. Changes: - Add dotnet/diagnostics repository resource with configurable branch - Add shared framework upload step to CoreCLR_Libraries postBuildSteps - Add cDAC and DAC test jobs at the end of the Build stage - Both diagnostic jobs use the same path-based conditions as their dependency (CoreCLR_Libraries) The runtime-diagnostics.yml pipeline can be retired once this is validated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Documents the motivation, alternatives evaluated, and design for moving SOS/DAC diagnostic tests from the standalone runtime-diagnostics pipeline into runtime.yml. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comprehensive reference for the runtime.yml pipeline covering: - Template chain (4 layers from runtime.yml to 1ES/public) - Platform matrix system and xplat-setup variables - EvaluatePaths stage and PR path filtering - Job templates (global-build-job, runtime-diag-job, etc.) - Key variables (debugOnPrReleaseOnRolling, isRollingBuild, etc.) - Artifact flow (upload, download, cross-platform considerations) - Helix test infrastructure - Job categories catalog (~55 platform-matrix invocations) - How-to guide for adding new jobs - File reference table Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
There was a problem hiding this comment.
Pull request overview
This PR consolidates the runtime-diagnostics pipeline into the main runtime.yml pipeline to eliminate redundant CoreCLR builds and improve CI efficiency. The diagnostics test jobs now reuse artifacts from the existing CoreCLR_Libraries build rather than rebuilding CoreCLR independently, saving approximately 30 minutes of build time per run.
Changes:
- Adds diagnostics repository resource and parameter to runtime.yml for flexible branch targeting
- Adds shared framework artifact upload (DiagnosticsRuntime_*) to CoreCLR_Libraries jobs
- Adds two non-blocking diagnostic test jobs (cDAC and DAC) that depend on CoreCLR_Libraries windows_x64
- Adds comprehensive documentation explaining the pipeline architecture and consolidation rationale
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| eng/pipelines/runtime.yml | Adds diagnostics repo resource, DiagnosticsRuntime artifact upload to CoreCLR_Libraries, and two diagnostic test jobs (cDAC and DAC) that reuse those artifacts |
| docs/infra/runtime-pipeline-architecture.md | New comprehensive reference documenting runtime.yml structure, template chain, platform matrix system, job templates, variables, artifact flow, and job categories |
| docs/infra/diagnostics-pipeline-consolidation.md | Design document explaining the motivation for consolidation, alternatives evaluated, chosen approach, and implementation details |
Move shouldContinueOnError from inside jobParameters to the platform-matrix.yml parameters level. xplat-setup.yml already passes shouldContinueOnError separately, so having it inside jobParameters caused a duplicate definition error at template expansion time, which cascaded into an invalid StageList error. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The diagnostics test jobs depend on the CoreCLR_Libraries build for shared framework artifacts, but the cDAC tools were not being built. Add tools.cdac to the build args so the cDAC binaries are included in the testhost output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (3)
eng/pipelines/runtime.yml:299
- The PowerShell script and artifact upload steps for DiagnosticsRuntime are executed for all platforms in the CoreCLR_Libraries matrix (linux_x64, linux_musl_x64, osx_arm64, and windows_x64), but the diagnostic test jobs only consume the windows_x64 artifact. This wastes CI time and resources uploading unused artifacts from Linux and macOS builds. Consider adding a condition
${{ if eq(parameters.osGroup, 'windows') }}:around lines 287-299 to only execute these steps on Windows, or alternatively, add conditions to the individual steps to check the osGroup at runtime.
- powershell: |
$versionDir = Get-ChildItem -Directory -Path "$(Build.SourcesDirectory)/artifacts/bin/testhost/net*/shared/Microsoft.NETCore.App" | Select-Object -First 1 -ExpandProperty FullName
Write-Host "##vso[task.setvariable variable=versionDir]$versionDir"
displayName: 'Set Path to Shared Framework Artifacts'
- template: /eng/pipelines/common/upload-artifact-step.yml
parameters:
rootFolder: $(versionDir)
includeRootFolder: false
archiveType: $(archiveType)
archiveExtension: $(archiveExtension)
tarCompression: $(tarCompression)
artifactName: DiagnosticsRuntime_$(osGroup)$(osSubgroup)_$(archType)_$(_BuildConfig)
displayName: Diagnostics Runtime
eng/pipelines/runtime.yml:2026
- The condition for running diagnostic tests is missing a check for
SetPathVars_tools_cdac.containsChange. The existing runtime-diagnostics.yml pipeline (lines 39-43) runs when cDAC-specific paths change (src/native/managed/cdac/** and src/coreclr/debug/runtimeinfo/**), which are covered by thetools_cdacsubset in evaluate-default-paths.yml. Without this check, changes to cDAC code that should trigger diagnostic tests won't trigger them in PRs. Addeq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_tools_cdac.containsChange'], true)to the condition to ensure cDAC changes trigger diagnostic tests.
condition: >-
or(
eq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_non_mono_and_wasm.containsChange'], true),
eq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_libraries.containsChange'], true),
eq(variables['isRollingBuild'], true))
eng/pipelines/runtime.yml:2066
- The condition for running the DAC diagnostic test is missing a check for
SetPathVars_tools_cdac.containsChange. The existing runtime-diagnostics.yml pipeline (lines 39-43) runs when cDAC-specific paths change (src/native/managed/cdac/** and src/coreclr/debug/runtimeinfo/**), which are covered by thetools_cdacsubset in evaluate-default-paths.yml. Without this check, changes to cDAC code that should trigger diagnostic tests won't trigger them in PRs. Addeq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_tools_cdac.containsChange'], true)to the condition to ensure cDAC changes trigger diagnostic tests.
condition: >-
or(
eq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_non_mono_and_wasm.containsChange'], true),
eq(stageDependencies.EvaluatePaths.evaluate_paths.outputs['SetPathVars_libraries.containsChange'], true),
eq(variables['isRollingBuild'], true))
Change testResultsFiles from '**/*.xml' to '**/SOS.*.xml' in both cDAC and DAC PublishTestResults steps. The broad glob was picking up unrelated test results (e.g. DotnetCounters unit tests) from the diagnostics repo, causing garbled test run titles. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The runtime-diag-job template accepted shouldContinueOnError but never applied it at the job level. Map it to the job's continueOnError so that diagnostics test failures don't fail the overall runtime pipeline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move cDAC and DAC test jobs from the Build stage into their own Diagnostics stage that depends on Build (and EvaluatePaths for PRs). This makes diagnostics failures show as a separate check on GitHub PRs, so they don't block the main Build stage result. The job-level dependsOn and shouldContinueOnError are no longer needed since the stage dependency handles ordering and stage isolation handles failure containment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (5)
docs/infra/diagnostics-pipeline-consolidation.md:118
- This bullet list describes Build-stage jobs that depend on build_windows_x64_{config}_CoreCLR_Libraries and are non-blocking via shouldContinueOnError: true, but the current runtime.yml implementation uses a Diagnostics stage (stage-level dependsOn Build) and doesn’t pass shouldContinueOnError. Please reconcile the doc with the implemented behavior.
Two jobs are added at the end of the Build stage, one for cDAC and one for DAC:
- **Template**: `eng/pipelines/diagnostics/runtime-diag-job.yml` (unchanged)
- **Dependency**: `build_windows_x64_{config}_CoreCLR_Libraries`
- **Non-blocking**: `shouldContinueOnError: true` — failures show as warnings but do not fail the pipeline check
- **Path conditions**: Same as `CoreCLR_Libraries` — only runs when CoreCLR or library paths change
docs/infra/runtime-pipeline-architecture.md:25
- The pipeline overview/diagram lists only EvaluatePaths and Build stages and places “Diagnostics test jobs” under Build, but runtime.yml now defines a separate Diagnostics stage. Please update the diagram and the ‘Stages’ description accordingly so readers don’t assume diagnostics runs within the Build stage.
├─ Stage: EvaluatePaths (PR only)
│ └─ Determines which subsets changed → gates downstream jobs
│
└─ Stage: Build
├─ CoreCLR jobs (multiple platforms/configs)
├─ Libraries jobs
├─ Mono jobs
├─ WASM jobs
├─ Mobile jobs (Android, iOS)
├─ Installer jobs
├─ NativeAOT jobs
├─ Tool/CrossDac jobs
└─ Diagnostics test jobs (non-blocking)
**eng/pipelines/runtime.yml:291**
* The shared-framework extraction + DiagnosticsRuntime artifact upload is added to the CoreCLR_Libraries postBuildSteps matrix, so it will run for linux_x64/linux_musl_x64/osx_arm64 as well as windows_x64. Since the new diagnostics jobs only ever download the Windows artifact, these extra archive/publish steps add unnecessary time/storage. Consider adding a step-level condition to run these steps only on windows_x64 (or expanding the diagnostics jobs to consume the non-Windows artifacts too).
- powershell: |
$versionDir = Get-ChildItem -Directory -Path "$(Build.SourcesDirectory)/artifacts/bin/testhost/net*/shared/Microsoft.NETCore.App" | Select-Object -First 1 -ExpandProperty FullName
Write-Host "##vso[task.setvariable variable=versionDir]$versionDir"
displayName: 'Set Path to Shared Framework Artifacts'
- template: /eng/pipelines/common/upload-artifact-step.yml
**eng/pipelines/runtime.yml:1993**
* The Diagnostics stage depends on the entire Build stage, so these SOS jobs won’t start until *all* Build jobs complete (not just the windows_x64 CoreCLR_Libraries job that produces the artifact). This delays diagnostics signal and can negate the intended wall-clock savings; if early feedback is desired, consider keeping these jobs in the Build stage with an explicit job-level dependency on build_windows_x64_{config}_CoreCLR_Libraries (and using continueOnError to keep them non-blocking).
- stage: Diagnostics
dependsOn:
- Build
- ${{ if eq(variables['Build.Reason'], 'PullRequest') }}:
- EvaluatePaths
jobs:
**docs/infra/diagnostics-pipeline-consolidation.md:48**
* This section says the diagnostics tests are added as additional jobs in the Build stage, but runtime.yml introduces a separate Diagnostics stage instead. Please update this doc to match the current pipeline structure (or adjust the pipeline to match the design described here).
Approach: single-pipeline consolidation
Instead of coordinating across pipelines, add the diagnostics test jobs directly to the runtime pipeline as additional jobs in the Build stage. They depend on the existing CoreCLR_Libraries build job and download only the shared framework artifact — no duplicate build required.
</details>
| name: cDAC | ||
| useCdac: true | ||
| isOfficialBuild: ${{ variables.isOfficialBuild }} | ||
| liveRuntimeDir: $(Build.SourcesDirectory)/artifacts/runtime | ||
| timeoutInMinutes: 360 |
There was a problem hiding this comment.
The runtime-diag-job template defaults shouldContinueOnError to false, but this invocation doesn’t pass shouldContinueOnError: true. As written, a failing SOS run will fail the job/stage and still block the overall pipeline, contradicting the intent of making diagnostics non-blocking.
| useCdac: false | ||
| isOfficialBuild: ${{ variables.isOfficialBuild }} | ||
| liveRuntimeDir: $(Build.SourcesDirectory)/artifacts/runtime | ||
| timeoutInMinutes: 360 |
There was a problem hiding this comment.
Same issue as the cDAC job: shouldContinueOnError isn’t set, so failures will be blocking. Pass shouldContinueOnError: true if these results are intended to be non-blocking.
| timeoutInMinutes: 360 | |
| timeoutInMinutes: 360 | |
| shouldContinueOnError: true |
Summary
Move the cDAC and DAC/SOS diagnostic test jobs from the standalone runtime-diagnostics pipeline into runtime.yml, eliminating the redundant CoreCLR rebuild that runtime-diagnostics performed.
Changes
windows_x64 build.
Motivation
The standalone runtime-diagnostics pipeline rebuilt CoreCLR from scratch just to produce shared framework artifacts for SOS tests. By reusing the artifacts already built by CoreCLR_Libraries in runtime.yml, we save ~30 minutes of redundant build time per run and keep diagnostic test results visible
alongside the main build.
Notes