Skip to content

feat(aspire): tear down Aspire on test-run abort via session cancellation token#6292

Merged
thomhurst merged 2 commits into
mainfrom
feat/aspire-engine-cancellation-token
Jun 24, 2026
Merged

feat(aspire): tear down Aspire on test-run abort via session cancellation token#6292
thomhurst merged 2 commits into
mainfrom
feat/aspire-engine-cancellation-token

Conversation

@thomhurst

@thomhurst thomhurst commented Jun 21, 2026

Copy link
Copy Markdown
Owner

@

Problem

AspireFixture startup (CreateAsync/BuildAsync/StartAsync and the resource-wait) only observed its own ResourceTimeout. It never saw the engines run-abort token, so a request to abort the test run (Ctrl+C, IDE stop, ProcessExit) could not:

  • interrupt an in-progress startup (it blocked for the full ResourceTimeout, e.g. 60s), and
  • stop/dispose the application before the forceful-exit timer killed the process — leaking containers.

Change

  1. Expose the engine token on TestSessionContext. TUnit.Core had no ambient session-scoped cancellation token (the engine token lived only in TUnitServiceProvider). Added public CancellationToken CancellationToken { get; internal set; } to TestSessionContext, populated once in TUnitTestFramework.ExecuteRequestAsync. Both reflection and source-generated modes funnel through that method, so a single wiring covers both. Session-scoped IAsyncInitializer fixtures (containers, distributed apps) can now observe run cancellation.

  2. AspireFixture honors it. New protected virtual RunCancellationToken (defaults to the session token). Linked into CreateAsync/BuildAsync/StartAsync and the resource-wait via CreateLinkedTokenSource(...) + CancelAfter(ResourceTimeout), so an abort interrupts startup promptly.

  3. Teardown on abort. RegisterAbortTeardown() fires a single-shot StopAndDisposeAsync() when the token trips, giving the app a head start on graceful shutdown within the forceful-exit window. DisposeAsync awaits the same memoized task, so teardown runs exactly once regardless of which path fires first.

An abort is distinguished from a genuine resource timeout (the linked token trips both) so it surfaces as cancellation rather than a misleading TimeoutException.

Why the session token (not a per-test one)

AspireFixture is used as [ClassDataSource<...>(Shared = SharedType.PerTestSession)] and outlives any single test. Its InitializeAsync runs lazily, triggered by the first consuming test. A per-test token would be cancelled (and disposed) when that first test ends, tearing the shared application down underneath every later test. The session tokens lifetime matches the fixtures. RunCancellationToken is virtual so a genuinely per-test fixture (Shared = None) can fold in its own token.

Notes

  • Public API change: new TestSessionContext.CancellationToken getter. All 4 Core PublicAPI snapshots updated; net8/9/10 pass.
  • Outside the engine the token defaults to None (CanBeCanceled == false) so the abort registration is a no-op and behavior is unchanged.
  • No integration test added — exercising real abort teardown needs live containers. The plumbing is covered by build + PublicAPI snapshots.

@codacy-production

codacy-production Bot commented Jun 21, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 6 complexity

Metric Results
Complexity 6

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — feat(aspire): tear down Aspire on test-run abort via engine cancellation token

The PR addresses a real gap: Aspire startup was invisible to the run-abort signal, leaving containers leaking and blocking shutdown for the full timeout. The approach is sound. A few concerns worth discussing before merging.


Medium: GlobalContext.CancellationToken naming creates ambiguity

The new public property (GlobalContext.CancellationToken) is named identically to TestContext.CancellationToken, which already exists for per-test cancellation. Both are CancellationToken properties on "context" objects, but with fundamentally different scopes — session vs. per-test.

Ironically, the fixture's own wrapper is called RunCancellationToken, which is already clearer. Leaking CancellationToken onto GlobalContext (vs. the fixture's RunCancellationToken) creates a naming inconsistency in the same PR. Users scanning both contexts will not immediately understand the scope difference.

Suggested names: SessionCancellationToken, RunAbortToken, or (to match the fixture) RunCancellationToken.

This is a public API change, and renaming after the fact requires a deprecation cycle, so it's worth getting right now.


Medium: Fire-and-forget in abort callback silently swallows teardown exceptions

_ = fixture.StopAndDisposeAsync();

The comment notes that DisposeAsync awaits the same memoized task, but that's only true if DisposeAsync is actually reached. Under a forced-kill scenario (e.g., the abort timeout fires before session cleanup completes), neither path observes the exception. Teardown errors disappear silently.

Consider logging errors within the callback instead of relying on the outer await:

_abortRegistration = RunCancellationToken.Register(static async state =>
{
    var fixture = (AspireFixture<TAppHost>)state!;
    fixture.LogProgress("Test run aborted — tearing down Aspire application...");
    try { await fixture.StopAndDisposeAsync(); }
    catch (Exception ex) { fixture.LogProgress($"Teardown error on abort: {ex.Message}"); }
}, this);

(Or use a non-async callback and Task.Run + logging — the key is not swallowing the exception.)


Minor: Abort teardown registered after BuildAsync, missing a window

RegisterAbortTeardown() is called after BuildAsync succeeds. If the run is aborted during BuildAsync, the linked RunCancellationToken will interrupt the build (good), but no abort callback is registered yet. Cleanup then depends entirely on TUnit calling DisposeAsync on the failed initializer.

StopAndDisposeCoreAsync() already handles _app == null gracefully, so RegisterAbortTeardown() could safely be moved to just after the OTLP receiver setup (before CreateAsync). This would eliminate the gap without introducing risk.


Positive observations

  • Memoized single-shot teardown (_teardownTask ??= StopAndDisposeCoreAsync() inside a lock) is the right pattern here. It correctly serialises both the abort-callback path and the DisposeAsync path onto the same work.
  • _abortRegistration.Dispose() before StopAndDisposeAsync() in DisposeAsync is correct — it prevents a racing late-firing callback while the memoised task already handles idempotency.
  • CancellationTokenSource.CreateLinkedTokenSource + CancelAfter is cleaner than new CancellationTokenSource(timeout) and correctly composes the two signals.
  • OperationCanceledException catch ordering — checking RunCancellationToken.IsCancellationRequested first so an abort is surfaced as cancellation rather than a misleading TimeoutException is the right call.
  • PublicAPI snapshots updated across all four target-framework variants — no gaps.
  • GlobalContext.CancellationToken defaults to None outside the engine, so the abort registration becomes a no-op and existing behaviour is unchanged. Well documented.

@claude

claude Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Code Review — feat(aspire): tear down Aspire on test-run abort via engine cancellation token

The PR addresses a real gap: Aspire startup was invisible to the run-abort signal, leaving containers leaking and blocking shutdown for the full timeout. The approach is sound. A few concerns worth discussing before merging.


Medium: GlobalContext.CancellationToken naming creates ambiguity

The new public property (GlobalContext.CancellationToken) is named identically to TestContext.CancellationToken, which already exists for per-test cancellation. Both are CancellationToken properties on "context" objects, but with fundamentally different scopes — session vs. per-test.

Ironically, the fixture's own wrapper is called RunCancellationToken, which is already clearer. Leaking CancellationToken onto GlobalContext (vs. the fixture's RunCancellationToken) creates a naming inconsistency in the same PR. Users scanning both contexts will not immediately understand the scope difference.

Suggested names: SessionCancellationToken, RunAbortToken, or (to match the fixture) RunCancellationToken.

This is a public API change, and renaming after the fact requires a deprecation cycle, so it's worth getting right now.


Medium: Fire-and-forget in abort callback silently swallows teardown exceptions

_ = fixture.StopAndDisposeAsync();

The comment notes that DisposeAsync awaits the same memoized task, but that's only true if DisposeAsync is actually reached. Under a forced-kill scenario (e.g., the abort timeout fires before session cleanup completes), neither path observes the exception. Teardown errors disappear silently.

Consider logging errors within the callback instead of relying on the outer await:

_abortRegistration = RunCancellationToken.Register(static state =>
{
    var fixture = (AspireFixture<TAppHost>)state!;
    fixture.LogProgress("Test run aborted — tearing down Aspire application...");
    _ = Task.Run(async () =>
    {
        try { await fixture.StopAndDisposeAsync(); }
        catch (Exception ex) { fixture.LogProgress($"Teardown error on abort: {ex.Message}"); }
    });
}, this);

Minor: Abort teardown registered after BuildAsync, missing a window

RegisterAbortTeardown() is called after BuildAsync succeeds. If the run is aborted during BuildAsync, the linked RunCancellationToken will interrupt the build (correct), but no abort callback is registered yet. Cleanup then depends entirely on TUnit calling DisposeAsync on the failed initializer.

StopAndDisposeCoreAsync() already handles _app == null gracefully, so RegisterAbortTeardown() could safely move to just after the OTLP receiver setup (before CreateAsync) to close this gap.


Positive observations

  • Memoized single-shot teardown (_teardownTask ??= StopAndDisposeCoreAsync() inside a lock) is the right pattern — it correctly serialises both the abort-callback path and the DisposeAsync path onto the same work.
  • _abortRegistration.Dispose() before StopAndDisposeAsync() in DisposeAsync is correct: prevents a racing late-firing callback while the memoised task handles idempotency.
  • CancellationTokenSource.CreateLinkedTokenSource + CancelAfter is cleaner than new CancellationTokenSource(timeout) and correctly composes both signals.
  • Catch ordering — checking RunCancellationToken.IsCancellationRequested first so an abort surfaces as cancellation rather than a misleading TimeoutException is the right call.
  • PublicAPI snapshots updated across all four target-framework variants — no gaps.
  • GlobalContext.CancellationToken defaults to None outside the engine, keeping existing behaviour unchanged.

@thomhurst thomhurst force-pushed the feat/aspire-engine-cancellation-token branch from caff369 to 9e1bbfc Compare June 21, 2026 23:11
@thomhurst thomhurst changed the title feat(aspire): tear down Aspire on test-run abort via engine cancellation token feat(aspire): tear down Aspire on test-run abort via session cancellation token Jun 21, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review — feat(aspire): tear down Aspire on test-run abort via session cancellation token

The implementation looks correct and the feature fills a real gap. However, the three issues raised in the two previous Claude reviews remain unaddressed. Noting them again here with updated context.


Still Open: TestSessionContext.CancellationToken naming ambiguity

The property is still named CancellationToken on TestSessionContext. TestContext already has a CancellationToken (per-test scope), so users who use both types will see identically-named properties with fundamentally different lifetimes and semantics.

This is a public API change — renaming it after release requires a deprecation cycle. The fixture's own RunCancellationToken (in the same PR) is already a clearer name. SessionCancellationToken would make the scope unambiguous at a glance.

Why it matters: A user who wires up TestSessionContext.Current?.CancellationToken in their own fixture but means the per-test token will get silent, hard-to-debug lifetime bugs because the name doesn't distinguish the scopes.


Still Open: Fire-and-forget in abort callback swallows teardown errors

_ = fixture.StopAndDisposeAsync();

The memoized task means DisposeAsync won't repeat the work — but if the process is killed before DisposeAsync is reached (the abort-timeout scenario this PR is specifically trying to defend against), exceptions from StopAndDisposeCoreAsync are lost completely. The user sees no trace of what went wrong with the container teardown.

Suggested fix:

_abortRegistration = RunCancellationToken.Register(static state =>
{
    var fixture = (AspireFixture<TAppHost>)state!;
    fixture.LogProgress("Test run aborted — tearing down Aspire application...");
    _ = Task.Run(async () =>
    {
        try { await fixture.StopAndDisposeAsync(); }
        catch (Exception ex) { fixture.LogProgress($"Teardown error on abort: {ex.Message}"); }
    });
}, this);

Why Task.Run here: CancellationToken.Register callbacks should not be async (the returned Task is discarded by the registration infrastructure). Wrapping in Task.Run keeps the fire-and-forget behaviour while giving teardown errors somewhere to surface.


Still Open: Abort teardown registration window after BuildAsync

RegisterAbortTeardown() is called after BuildAsync completes. If the run is aborted during BuildAsync, the RunCancellationToken correctly interrupts the build but no abort callback exists yet — cleanup then depends on TUnit calling DisposeAsync on the failed initializer, which may or may not happen promptly.

StopAndDisposeCoreAsync already null-checks _app, so moving RegisterAbortTeardown() to before the CreateAsync call is safe and closes the gap entirely. This is a low-risk, one-line move.


Positive observations (unchanged from prior reviews)

  • Memoized single-shot teardown via _teardownTask ??= StopAndDisposeCoreAsync() inside a lock is the right pattern — both the abort-callback and DisposeAsync paths converge correctly.
  • _abortRegistration.Dispose() before StopAndDisposeAsync() in DisposeAsync prevents a late-racing callback.
  • CreateLinkedTokenSource + CancelAfter is cleaner than new CancellationTokenSource(timeout) and correctly composes both signals.
  • Catch ordering (RunCancellationToken.IsCancellationRequested checked before the timeout branch) means abort surfaces as cancellation, not a misleading TimeoutException.
  • All four PublicAPI snapshot files updated — no gaps.
  • Token defaults to None outside the engine so existing tests are unaffected.

The core approach is solid. The naming concern is the one I'd most strongly recommend addressing before merge given it's a public API that can't be changed cleanly after the fact.

@thomhurst thomhurst force-pushed the feat/aspire-engine-cancellation-token branch from 9e1bbfc to 7596cef Compare June 22, 2026 00:51
@
feat(aspire): tear down Aspire on test-run abort via session token

AspireFixture startup (CreateAsync/BuildAsync/StartAsync and the
resource-wait) previously only observed its own ResourceTimeout, so a
run abort (Ctrl+C / IDE stop / ProcessExit) could neither interrupt a
stuck startup nor stop the application before the process was forcefully
killed, leaking containers.

Expose the engine-wide abort token on TestSessionContext (populated once
in TUnitTestFramework.ExecuteRequestAsync, covering both reflection and
source-generated modes) so session-scoped fixtures can observe run
cancellation. AspireFixture links it into all startup/wait operations
via a virtual RunCancellationToken and registers a single-shot teardown
that stops and disposes the application as soon as the run is aborted;
DisposeAsync awaits the same task so teardown runs exactly once.

The token lives on TestSessionContext (not a per-test context) because a
session- or class-scoped fixture outlives any single test; a per-test
token would tear the shared application down when its first consuming
test ends. RunCancellationToken is virtual for the rare per-test fixture.

Abort is distinguished from a genuine resource timeout so it surfaces as
cancellation rather than a misleading TimeoutException.
@
@thomhurst thomhurst force-pushed the feat/aspire-engine-cancellation-token branch from 7596cef to d85e4ac Compare June 22, 2026 00:57

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third Review — feat(aspire): tear down Aspire on test-run abort via session cancellation token

The current commit (7596cef) has been reviewed previously (twice). All three flagged issues remain unaddressed. Summarising concisely.


1. Naming ambiguity — still open

TestSessionContext.CancellationToken conflicts with the already-public TestContext.CancellationToken (per-test scope). Identically-named properties on parallel context objects with silently different lifetimes is a trap. The fixture's own RunCancellationToken is already the clearer name; SessionCancellationToken on TestSessionContext would also be unambiguous.

This is a public API — renaming it post-merge requires a deprecation cycle.


2. Fire-and-forget swallows teardown exceptions — still open

_ = fixture.StopAndDisposeAsync();  // abort callback

DisposeAsync awaits the memoised task, so teardown exceptions are observed — but only if DisposeAsync is called. This PR's core motivation is the forced-kill scenario where the abort-timeout fires before session cleanup completes. In exactly that scenario, DisposeAsync is never reached and teardown errors disappear silently. A try/catch inside a Task.Run within the callback would surface them via LogProgress.


3. Registration window after BuildAsync — still open

RegisterAbortTeardown() is called after BuildAsync returns. An abort during BuildAsync correctly interrupts the build via the linked RunCancellationToken, but has no callback yet to kick off StopAndDisposeAsync. StopAndDisposeCoreAsync already null-checks _app, so moving the registration to just before CreateAsync is a safe one-line change that closes this window.


Assessment

The overall design — memoised single-shot teardown, linked token + CancelAfter, abort-vs-timeout catch ordering, _abortRegistration.Dispose() before the shared path, CancellationToken.None default outside the engine — is solid. The implementation is correct in the common case.

Issue 1 is the only one I'd block on before merge, because it is an irreversible public API decision. Issues 2 and 3 are lower-risk improvements but would make the error-handling behaviour match the PR's stated goal (graceful teardown even under forced kill).

The engine cancellation token only listened to OS signals (Console.CancelKeyPress
for SIGINT, ProcessExit for SIGTERM). When the host drives cancellation itself --
IDE stop, --abort, or a CI runner cancelling the job -- it cancels MTP's
ExecuteRequestContext.CancellationToken, which the engine token never observed.

As a result, session-scoped fixtures that watch SessionCancellationToken (e.g.
AspireFixture) could keep booting after the run was cancelled instead of tearing
down promptly.

Link context.CancellationToken into the engine token in Initialise. When it
fires, the engine token (and thus SessionCancellationToken) fires too. The
platform path opts out of the forceful-exit timer via Cancel(armForcefulExit:
false) -- the platform owns process shutdown when it is the canceller, and
killing the process underneath it would truncate result reporting. The Ctrl+C
path keeps the timer (it suppresses default termination and must self-exit).

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fourth Review — feat(aspire): tear down Aspire on test-run abort via session cancellation token

All three issues raised in the previous three reviews have been addressed in the latest commits.


✅ Issue 1 Resolved: Naming ambiguity

TestSessionContext.SessionCancellationToken — the scope is now unambiguous at a glance. No conflict with TestContext.CancellationToken (per-test). This is the right name for a public API that can't be cleanly renamed post-release.


✅ Issue 2 Resolved: Fire-and-forget now logs teardown errors

_ = Task.Run(async () =>
{
    try { await fixture.StopAndDisposeAsync(); }
    catch (Exception ex) { fixture.LogProgress($"Teardown error on abort: {ex.Message}"); }
});

Teardown errors in the forced-kill path now surface via LogProgress rather than disappearing silently.


✅ Issue 3 Resolved: Registration window before CreateAsync closed

RegisterAbortTeardown() is now called before DistributedApplicationTestingBuilder.CreateAsync, so an abort during early AppHost creation will trigger the callback. The null-safe StopAndDisposeCoreAsync implementation handles the case where _app hasn't been assigned yet.


Second commit: engine-link of MTP platform token

The new EngineCancellationToken.Initialise(CancellationToken platformCancellationToken) change is clean:

  • _platformRegistration is correctly disposed in Dispose(). A default CancellationTokenRegistration (when CanBeCanceled == false) disposes as a no-op, so the unconditional dispose is safe.
  • The armForcefulExit: false path for platform-driven cancellation is well-reasoned — the platform owns process lifecycle when it is the canceller, and the forceful-exit timer is only necessary when the engine suppressed the OS default termination (Ctrl+C path).
  • The guard Interlocked.CompareExchange already prevents double-initialisation; the platform registration is set up within that guard, so there is no race.

Overall

This is a solid, well-motivated change. The design — memoized single-shot teardown, linked-token + CancelAfter, abort-vs-timeout catch ordering, SessionCancellationToken defaulting to None outside the engine — is correct throughout. All previously flagged issues are resolved. Ready to merge.

@thomhurst thomhurst merged commit 740f8b1 into main Jun 24, 2026
13 of 14 checks passed
@thomhurst thomhurst deleted the feat/aspire-engine-cancellation-token branch June 24, 2026 11:02
This was referenced Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant