fix(integration-tests): unblock OrderService integration tests in CI#16
Conversation
The OrderService.Tests.Integration suite has hung in CI on every run since it was added 9 days ago (commit 76b2f2f, 2026-05-15). Step #6 ("Integration tests — Order") runs for ~19m14s before the job hits the runner's max and gets cancelled. The suite has never been observed green in CI. Two distinct bugs, in sequence — the first one hides the second. ROOT CAUSE #1 — Wolverine AutoProvision hangs against fake ASB endpoint Each service's Program.cs calls `.AutoProvision()` on the Azure Service Bus transport: opts.UseAzureServiceBus(connectionString).AutoProvision(); AutoProvision is a host-startup hook that actively connects to the Service Bus namespace to create topics/subscriptions if they don't exist. The integration test factory uses a syntactically-valid-but-fake connection string ("sb://fake.servicebus.windows.net/...") so Wolverine can register without throwing, and calls `DisableAllExternalWolverineTransports()` via ConfigureTestServices to turn off the actual message routing. But — DisableAllExternalWolverineTransports() is a DI registration; AutoProvision runs at host startup via a transport bootstrapper that fires BEFORE ConfigureTestServices is applied. So AutoProvision tries to reach the fake endpoint, the Azure SDK retries with backoff, and the host startup hangs until the runner kills the job at 20 minutes. Variant pattern: same .AutoProvision() call exists in OrderService, PaymentService, ShippingService, NotificationService. CatalogService is the working reference — it uses Wolverine but no Azure Service Bus transport, so no AutoProvision call, so no hang. Fix: gate .AutoProvision() on a config flag (defaults to true so dev/prod are unchanged). Test factory sets the flag to false: builder.UseSetting("Wolverine:AutoProvision", "false"); Applied to all four services even though only OrderService has integration tests today; preventing the same trap when integration tests get wired up for Payment / Shipping / Notification. ROOT CAUSE #2 — Dispose order crashes the test host AFTER tests pass With root cause #1 fixed, the 4 tests passed in 38 seconds — then the test host crashed because OrderApiFactory.DisposeAsync stops the SQL Server container BEFORE calling base.DisposeAsync. Wolverine's DurableReceiver (a BackgroundService that polls the wolverine.* outbox tables) is still running during that window; every heartbeat hits "connection refused" and the unhandled SqlException bubbles up: Total tests: Unknown Passed: 4 Test Run Aborted. Fix: dispose host first (await base.DisposeAsync()), then SQL container. Lets Wolverine's background services exit gracefully before the DB is yanked. VERIFICATION Locally with Docker Desktop: Test Run Successful. Total tests: 4 Passed: 4 Total time: 16.2220 Seconds The full saga runs end-to-end: PlaceOrder → OrderPlacedEvent published, second test asserts no orphan row on validation failure, PaymentCompletedEvent transitions Placed → Paid (idempotently), RowVersion concurrency token fires. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughWolverine Azure Service Bus auto-provisioning is now gated by a ChangesAuto-provisioning configuration gating
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
The OrderService.Tests.Integration suite has hung in CI on every run since it was added 9 days ago (commit 76b2f2f, 2026-05-15). Step #6 ("Integration tests — Order") runs for ~19m14s before the job hits the runner's max and gets cancelled. The suite has never been observed green in CI. Two distinct bugs, in sequence — the first one hides the second.
ROOT CAUSE #1 — Wolverine AutoProvision hangs against fake ASB endpoint
Each service's Program.cs calls
.AutoProvision()on the Azure Service Bus transport:AutoProvision is a host-startup hook that actively connects to the Service Bus namespace to create topics/subscriptions if they don't exist. The integration test factory uses a syntactically-valid-but-fake connection string ("sb://fake.servicebus.windows.net/...") so Wolverine can register without throwing, and calls
DisableAllExternalWolverineTransports()via ConfigureTestServices to turn off the actual message routing.But — DisableAllExternalWolverineTransports() is a DI registration; AutoProvision runs at host startup via a transport bootstrapper that fires BEFORE ConfigureTestServices is applied. So AutoProvision tries to reach the fake endpoint, the Azure SDK retries with backoff, and the host startup hangs until the runner kills the job at 20 minutes.
Variant pattern: same .AutoProvision() call exists in OrderService, PaymentService, ShippingService, NotificationService. CatalogService is the working reference — it uses Wolverine but no Azure Service Bus transport, so no AutoProvision call, so no hang.
Fix: gate .AutoProvision() on a config flag (defaults to true so dev/prod are unchanged). Test factory sets the flag to false:
Applied to all four services even though only OrderService has integration tests today; preventing the same trap when integration tests get wired up for Payment / Shipping / Notification.
ROOT CAUSE #2 — Dispose order crashes the test host AFTER tests pass
With root cause #1 fixed, the 4 tests passed in 38 seconds — then the test host crashed because OrderApiFactory.DisposeAsync stops the SQL Server container BEFORE calling base.DisposeAsync. Wolverine's DurableReceiver (a BackgroundService that polls the wolverine.* outbox tables) is still running during that window; every heartbeat hits "connection refused" and the unhandled SqlException bubbles up:
Fix: dispose host first (await base.DisposeAsync()), then SQL container. Lets Wolverine's background services exit gracefully before the DB is yanked.
VERIFICATION
Locally with Docker Desktop:
The full saga runs end-to-end: PlaceOrder → OrderPlacedEvent published, second test asserts no orphan row on validation failure, PaymentCompletedEvent transitions Placed → Paid (idempotently), RowVersion concurrency token fires.
What changed [required]
How it was built [required]
If AI was involved: link to the conversation transcript or commit messages that describe
the AI workflow (e.g.
gh issue view Nif the conversation is preserved in an issue).Architecture impact [skip if N/A]
NextAurora.Contracts/Events)See CLAUDE.mdparaphraseFor non-trivial architectural changes, consider invoking the
architecture-revieweragentlocally before requesting review.
Verification [required]
dotnet buildclean (zero warnings — TreatWarningsAsErrors is on)dotnet testpasses locallycurl /api/v1/ordersand observed expected response>OrderPlacedEventand watched PaymentService consume it>dotnet run --project NextAurora.AppHostand confirmed all services reach "Running" (not "Finished") in the dashboardDeferred / known gaps [skip if N/A]
Linked docs / issues [skip if N/A]
Summary by CodeRabbit