Enrich EndBuild hang diagnostics with logging service and submission state#13385
Merged
Conversation
ScheduleTimeRecord.AccumulatedTime throws InternalErrorException with 'Can't get the accumulated time while the timer is still running' during Scheduler.WriteDetailedSummary(). This exception kills the BuildManager work queue, preventing any further build results from being processed. EndBuild() hangs indefinitely, causing VS to freeze for hours. The fix returns the best-effort elapsed time (accumulated + current elapsed) when the timer is still running, instead of throwing. This is diagnostic summary data — throwing has no correctness benefit but causes a catastrophic hang. 11 hits in 30 days confirmed via telemetry (StackHash: 2C721D65...). All occurrences during solution close with running timers.
- Remove placeholder issues/XXXXX URL from XML doc comment - Add ScheduleTimeRecord_AccumulatedTime_DoesNotThrowWhenTimerIsRunning test - Add ScheduleTimeRecord_AccumulatedTime_IncludesPreviousAccumulation test
…state Add new telemetry properties to the EndBuildHang crash event to help diagnose why EndBuild gets stuck waiting for submissions to complete: - LoggingServiceState: whether the logging pipeline is alive or shutting down - LoggingEventQueueDepth: number of events backed up in the async queue - IsShuttingDown: whether BuildManager shutdown has been initiated - IsCancellationRequested: whether the cancellation token was triggered - WorkQueueDepth: pending items in the BuildManager work queue - SubmissionDetails: per-submission state (started, has result, has exception, logging completed) - RegisteredLoggerTypeNames: which loggers are registered on the node Also add inner exception diagnostics for all crash telemetry: - InnerExceptionStackTrace: sanitized stack trace of the inner exception - InnerExceptionMessage: truncated and path-sanitized inner exception message - LoggerEventType: the build event type being processed when a logger faulted - Include inner exception stack in StackHash computation for better bucketing All string fields are sanitized to remove file paths and truncated to prevent PII leakage.
Contributor
There was a problem hiding this comment.
Pull request overview
Improves MSBuild crash/hang telemetry to better diagnose EndBuild hangs (especially those related to logging/submission completion) by enriching CrashTelemetry and emitting additional EndBuild state.
Changes:
- Add inner-exception diagnostics (message/stack) and logger event type to crash telemetry, and improve stack hashing by incorporating inner stack traces.
- Expand EndBuild hang telemetry with logging service state/queue depth, work queue depth, cancellation/shutdown state, submission details, and registered logger types.
- Refactor EndBuild hang emission to pass a pre-populated
CrashTelemetryobject; addEventQueueCounttoILoggingServiceand implementations.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/Framework/Telemetry/CrashTelemetryRecorder.cs | Refactors EndBuild hang diagnostic emission to accept a pre-populated CrashTelemetry. |
| src/Framework/Telemetry/CrashTelemetry.cs | Adds new crash + EndBuild-hang properties and updates stack hashing and exception population logic. |
| src/Framework.UnitTests/CrashTelemetry_Tests.cs | Extends unit coverage for new telemetry fields and stack-hash behavior. |
| src/Build/BackEnd/Components/Scheduler/ScheduleTimeRecord.cs | Changes AccumulatedTime to return best-effort elapsed time while running instead of throwing. |
| src/Build/BackEnd/Components/Logging/LoggingService.cs | Exposes async logging queue depth via EventQueueCount. |
| src/Build/BackEnd/Components/Logging/ILoggingService.cs | Adds EventQueueCount to the logging service interface. |
| src/Build/BackEnd/BuildManager/BuildManager.cs | Populates and emits enriched EndBuild hang telemetry, including logging and submission state details. |
| src/Build.UnitTests/BackEnd/Scheduler_Tests.cs | Adds tests validating the new ScheduleTimeRecord.AccumulatedTime behavior. |
| src/Build.UnitTests/BackEnd/MockLoggingService.cs | Updates mock to implement the new EventQueueCount interface member. |
…ailures from crashing EndBuild
…nableNodeReuse, ActiveNodeDetails For WaitingForNodes hangs where nodes refuse to shut down, the existing telemetry only reports the count of active nodes. Add: - ActiveNodeIds: comma-separated list of stuck node IDs - EnableNodeReuse: whether nodes were told to go idle vs exit - ActiveNodeDetails: per-node state showing what each node was last executing (nodeId:configId:projectFileName), idle, or error
MichalPavlik
approved these changes
Mar 16, 2026
This was referenced Mar 16, 2026
AR-May
pushed a commit
to AR-May/msbuild
that referenced
this pull request
Mar 19, 2026
…state (dotnet#13385) ## Summary When `EndBuild` hangs waiting for submissions to complete, the existing `EndBuildHang` crash telemetry captures basic counts (pending submissions, unmatched project started events) but lacks the information needed to determine *why* the hang is occurring. This PR adds additional diagnostic properties to narrow down the root cause. ## New EndBuild Hang Properties | Property | Type | Description | |---|---|---| | `LoggingServiceState` | string | Whether the logging pipeline is alive, shutting down, or already shut down (`Initialized`, `ShuttingDown`, `Shutdown`) | | `LoggingEventQueueDepth` | int | Number of events queued in the async logging pipeline. A large value indicates the pipeline is backed up. | | `IsShuttingDown` | bool | Whether `BuildManager` shutdown has been initiated | | `IsCancellationRequested` | bool | Whether the execution cancellation token was triggered | | `WorkQueueDepth` | int | Pending items in the `BuildManager` work queue. `OnProjectFinished` posts to this queue, so a blocked queue prevents logging completion. | | `SubmissionDetails` | string | Per-submission diagnostic state: `id:Started:HasResult:HasException:LoggingCompleted` separated by semicolons | | `RegisteredLoggerTypeNames` | string | Semicolon-separated list of registered logger type names, to identify which loggers could be blocking the pipeline | ## New Crash Telemetry Properties (all crash types) | Property | Type | Description | |---|---|---| | `InnerExceptionStackTrace` | string | Sanitized stack trace of the inner exception. For wrapper exceptions like `InternalLoggerException`, the outer stack only shows MSBuild infrastructure — the inner stack reveals the actual faulting component. | | `InnerExceptionMessage` | string | Truncated and path-sanitized inner exception message | | `LoggerEventType` | string | The build event type name being processed when a logger faulted (extracted via reflection from `InternalLoggerException.BuildEventArgs`) | ## StackHash Improvement `ComputeStackHash` now includes the inner exception's stack trace so that wrapper exceptions (e.g., all `InternalLoggerException` instances thrown from `EventSourceSink.Consume`) get different telemetry buckets based on which logger actually faulted. ## Interface Change Added `EventQueueCount` property to `ILoggingService` (internal interface) to expose the async event queue depth for hang diagnostics.
This was referenced May 13, 2026
Closed
Closed
Closed
This was referenced May 26, 2026
This was referenced Jun 2, 2026
Closed
Bump Microsoft.Build from 18.4.0 to 18.6.3
SkylineCommunications/Skyline.DataMiner.CICD.Packages#159
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When
EndBuildhangs waiting for submissions to complete, the existingEndBuildHangcrash telemetry captures basic counts (pending submissions, unmatched project started events) but lacks the information needed to determine why the hang is occurring. This PR adds additional diagnostic properties to narrow down the root cause.New EndBuild Hang Properties
LoggingServiceStateInitialized,ShuttingDown,Shutdown)LoggingEventQueueDepthIsShuttingDownBuildManagershutdown has been initiatedIsCancellationRequestedWorkQueueDepthBuildManagerwork queue.OnProjectFinishedposts to this queue, so a blocked queue prevents logging completion.SubmissionDetailsid:Started:HasResult:HasException:LoggingCompletedseparated by semicolonsRegisteredLoggerTypeNamesNew Crash Telemetry Properties (all crash types)
InnerExceptionStackTraceInternalLoggerException, the outer stack only shows MSBuild infrastructure — the inner stack reveals the actual faulting component.InnerExceptionMessageLoggerEventTypeInternalLoggerException.BuildEventArgs)StackHash Improvement
ComputeStackHashnow includes the inner exception's stack trace so that wrapper exceptions (e.g., allInternalLoggerExceptioninstances thrown fromEventSourceSink.Consume) get different telemetry buckets based on which logger actually faulted.Interface Change
Added
EventQueueCountproperty toILoggingService(internal interface) to expose the async event queue depth for hang diagnostics.