Skip to content

Improve error handling and lifecycle management in application services#116

Open
jodavis wants to merge 2 commits intomainfrom
claude/fix-app-restart-crashes-PAQ9L
Open

Improve error handling and lifecycle management in application services#116
jodavis wants to merge 2 commits intomainfrom
claude/fix-app-restart-crashes-PAQ9L

Conversation

@jodavis
Copy link
Owner

@jodavis jodavis commented Mar 4, 2026

Summary

This PR enhances error handling, logging, and lifecycle management across the AdaptiveRemote application. It adds better exception handling during browser cleanup, improves test host initialization with settings overrides, and refines application lifecycle logging and cancellation token handling.

Key Changes

  • Browser Tracing Cleanup: Wrapped the browser tracing stop operation in a try-catch block to gracefully handle exceptions during cleanup and log them appropriately via Playwright_ErrorWhileCleaningUp.

  • Test Host Builder Improvements:

    • Added optional overrideSettings parameter to the Start() method to allow runtime configuration overrides
    • Refactored internal logic to StartWithSettings() for cleaner separation of concerns
    • Added process exit detection before attempting to connect to the test control endpoint, providing clearer error messages when the host process fails to start
  • Simulated Environment Fixes:

    • Changed SetLogLocation() to store the log location for use during the next host start rather than immediately reconfiguring the builder
    • Updated EnsureHostStarted() to properly dispose previous host instances and pass log location settings through the Start() method override
    • This ensures log configuration is applied consistently when hosts are restarted
  • Application Lifecycle Logging:

    • Added new logger methods for better visibility: ApplicationLifecycle_UnhandledError, ApplicationLifecycle_WaitingForScope, and ApplicationLifecycle_ScopeReleased
    • Enhanced exception handling in ExecuteAsync to log unhandled errors before cleanup
    • Improved logging to track scope acquisition and release
  • Cancellation Token Handling: Added WaitAsync() call when awaiting the application scope task to properly respect cancellation tokens and prevent indefinite waits during shutdown

Notable Implementation Details

The changes maintain backward compatibility while improving robustness. The test host builder changes allow for more flexible configuration during startup, and the lifecycle improvements provide better observability and graceful shutdown behavior.

https://claude.ai/code/session_0166sDkzTih5jmGpyeuYBBND

Root cause: When the host restarts after the startup/shutdown test, Playwright
can fail to navigate (e.g. due to an orphaned Chromium from the previous run).
This triggers BackgroundService to call StopApplication(), cancelling
stoppingToken. However, ApplicationScopeContainer.InvokeInScopeAsync was
awaiting the scope TaskCompletionSource without the cancellation token, so
ApplicationLifecycle hung indefinitely waiting for a Blazor circuit that would
never start. After 10 seconds the shutdown timeout killed the process with no
meaningful error logged.

Fixes:
- ApplicationScopeContainer.InvokeInScopeAsync: use WaitAsync(combinedToken)
  instead of bare await so the scope wait is cancelled when shutdown is
  requested, preventing the 10-second hang
- ApplicationLifecycle: log unhandled exceptions (previously swallowed
  silently) and add scope wait/release log messages for startup debugging
- PlaywrightBrowserLifetimeService: wrap Tracing.StopAsync in try/catch so
  CleanupAsync always runs even if saving the trace file throws, preventing
  Chromium from being orphaned between runs
- AdaptiveRemoteHost.Builder.Start: detect host process exit during the TCP
  retry loop and fail fast instead of waiting the full 120-second timeout
- SimulatedEnvironment.SetLogLocation: stop accumulating --log:FilePath args
  in the shared builder settings on every scenario; instead apply the log
  path only at start time via a per-start override
- SimulatedEnvironment.EnsureHostStarted: dispose the old (exited) host
  before starting a new one to avoid leaking the process handle

https://claude.ai/code/session_0166sDkzTih5jmGpyeuYBBND
@github-actions
Copy link

github-actions bot commented Mar 4, 2026

Test Results

312 tests  ±0   302 ✅  - 4   1m 3s ⏱️ -19s
  5 suites ±0     6 💤 ±0 
  5 files   ±0     4 ❌ +4 

For more details on these failures, see this check.

Results for commit b6643e2. ± Comparison against base commit 2d54cae.

This pull request removes 3 and adds 2 tests. Note that renamed tests count towards both.
,False)
AdaptiveRemote.Services.ProgrammaticSettings.PersistSettingsTests ‑ PersistSettings_Set_ValidatesKeyNameAsync (Hello
AdaptiveRemote.Services.ProgrammaticSettings.PersistSettingsTests ‑ PersistSettings_Set_ValidatesValueAsync (Invalid
AdaptiveRemote.Services.ProgrammaticSettings.PersistSettingsTests ‑ PersistSettings_Set_ValidatesKeyNameAsync (Hello
,False)
AdaptiveRemote.Services.ProgrammaticSettings.PersistSettingsTests ‑ PersistSettings_Set_ValidatesValueAsync (Invalid
,False)

♻️ This comment has been updated with latest results.

…essages

The WaitingForScope and ScopeReleased log messages added to
ApplicationLifecycle.ExecuteAsync need to be included in the strict
VerifyMessages assertions in the unit tests.

- WaitingForScope appears first in every test (logged at ExecuteAsync entry)
- ScopeReleased appears after InvokeInScopeAsync returns on the normal/
  handled-error path (service init failures are caught internally by
  ScopedLifecycleContainer and the scope call returns normally)
- ScopeReleased does NOT appear on the cancellation path
  (StopAsync_CancelsInitializeMethodsThatAreWaiting), where
  OperationCanceledException is caught instead
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants