Skip to content

fix(integration-tests): unblock OrderService integration tests in CI#16

Merged
emeraldleaf merged 1 commit into
mainfrom
fix/orderservice-integration-test-hang
May 23, 2026
Merged

fix(integration-tests): unblock OrderService integration tests in CI#16
emeraldleaf merged 1 commit into
mainfrom
fix/orderservice-integration-test-hang

Conversation

@emeraldleaf

@emeraldleaf emeraldleaf commented May 23, 2026

Copy link
Copy Markdown
Owner

The OrderService.Tests.Integration suite has hung in CI on every run since it was added 9 days ago (commit 76b2f2f, 2026-05-15). Step #6 ("Integration tests — Order") runs for ~19m14s before the job hits the runner's max and gets cancelled. The suite has never been observed green in CI. Two distinct bugs, in sequence — the first one hides the second.

ROOT CAUSE #1 — Wolverine AutoProvision hangs against fake ASB endpoint

Each service's Program.cs calls .AutoProvision() on the Azure Service Bus transport:

opts.UseAzureServiceBus(connectionString).AutoProvision();

AutoProvision is a host-startup hook that actively connects to the Service Bus namespace to create topics/subscriptions if they don't exist. The integration test factory uses a syntactically-valid-but-fake connection string ("sb://fake.servicebus.windows.net/...") so Wolverine can register without throwing, and calls DisableAllExternalWolverineTransports() via ConfigureTestServices to turn off the actual message routing.

But — DisableAllExternalWolverineTransports() is a DI registration; AutoProvision runs at host startup via a transport bootstrapper that fires BEFORE ConfigureTestServices is applied. So AutoProvision tries to reach the fake endpoint, the Azure SDK retries with backoff, and the host startup hangs until the runner kills the job at 20 minutes.

Variant pattern: same .AutoProvision() call exists in OrderService, PaymentService, ShippingService, NotificationService. CatalogService is the working reference — it uses Wolverine but no Azure Service Bus transport, so no AutoProvision call, so no hang.

Fix: gate .AutoProvision() on a config flag (defaults to true so dev/prod are unchanged). Test factory sets the flag to false:

builder.UseSetting("Wolverine:AutoProvision", "false");

Applied to all four services even though only OrderService has integration tests today; preventing the same trap when integration tests get wired up for Payment / Shipping / Notification.

ROOT CAUSE #2 — Dispose order crashes the test host AFTER tests pass

With root cause #1 fixed, the 4 tests passed in 38 seconds — then the test host crashed because OrderApiFactory.DisposeAsync stops the SQL Server container BEFORE calling base.DisposeAsync. Wolverine's DurableReceiver (a BackgroundService that polls the wolverine.* outbox tables) is still running during that window; every heartbeat hits "connection refused" and the unhandled SqlException bubbles up:

Total tests: Unknown
     Passed: 4
Test Run Aborted.

Fix: dispose host first (await base.DisposeAsync()), then SQL container. Lets Wolverine's background services exit gracefully before the DB is yanked.

VERIFICATION

Locally with Docker Desktop:

Test Run Successful.
Total tests: 4
     Passed: 4
 Total time: 16.2220 Seconds

The full saga runs end-to-end: PlaceOrder → OrderPlacedEvent published, second test asserts no orphan row on validation failure, PaymentCompletedEvent transitions Placed → Paid (idempotently), RowVersion concurrency token fires.

What changed [required]

How it was built [required]

  • Pure AI-assisted, human-verified. Used Claude Code (or equivalent) to draft. I read every line, ran the build, and manually exercised the changed path before pushing.
  • AI-assisted with manual edits. Started from an AI draft, then refactored / fixed / added detail by hand. Verification as above.
  • Hand-written. No AI assistance on this change.
  • AI-generated, not yet verified. Marking as draft / WIP. Will verify before requesting review.

If AI was involved: link to the conversation transcript or commit messages that describe
the AI workflow (e.g. gh issue view N if the conversation is preserved in an issue).

Architecture impact [skip if N/A]

  • Adds a new domain entity / aggregate / value object
  • Adds a new event contract (in NextAurora.Contracts/Events)
  • Adds a new repository or port interface
  • Adds a new service or splits an existing one
  • Changes a published API surface (REST or gRPC)
  • Touches the outbox, sagas, or messaging topology
  • Modifies CLAUDE.md or a See CLAUDE.md paraphrase

For non-trivial architectural changes, consider invoking the architecture-reviewer agent
locally before requesting review.

Verification [required]

  • dotnet build clean (zero warnings — TreatWarningsAsErrors is on)
  • dotnet test passes locally
  • Manually exercised the changed code path:
    • <e.g. curl /api/v1/orders and observed expected response>
    • <e.g. published an OrderPlacedEvent and watched PaymentService consume it>
  • If this touches integration tests: ran the Testcontainers slice and confirmed Docker socket is reachable
  • If this touches AppHost: ran dotnet run --project NextAurora.AppHost and confirmed all services reach "Running" (not "Finished") in the dashboard

Deferred / known gaps [skip if N/A]

Linked docs / issues [skip if N/A]

Summary by CodeRabbit

  • Chores
    • Introduced configurable service auto-provisioning (defaults to enabled, preserving existing behavior) to provide better control across different deployment environments.
    • Enhanced service shutdown sequence to ensure graceful cleanup and prevent background operations from accessing infrastructure after shutdown.

Review Change Stack

The OrderService.Tests.Integration suite has hung in CI on every run since it was
added 9 days ago (commit 76b2f2f, 2026-05-15). Step #6 ("Integration tests — Order")
runs for ~19m14s before the job hits the runner's max and gets cancelled. The suite
has never been observed green in CI. Two distinct bugs, in sequence — the first one
hides the second.

ROOT CAUSE #1 — Wolverine AutoProvision hangs against fake ASB endpoint

Each service's Program.cs calls `.AutoProvision()` on the Azure Service Bus transport:

    opts.UseAzureServiceBus(connectionString).AutoProvision();

AutoProvision is a host-startup hook that actively connects to the Service Bus
namespace to create topics/subscriptions if they don't exist. The integration test
factory uses a syntactically-valid-but-fake connection string
("sb://fake.servicebus.windows.net/...") so Wolverine can register without throwing,
and calls `DisableAllExternalWolverineTransports()` via ConfigureTestServices to
turn off the actual message routing.

But — DisableAllExternalWolverineTransports() is a DI registration; AutoProvision
runs at host startup via a transport bootstrapper that fires BEFORE
ConfigureTestServices is applied. So AutoProvision tries to reach the fake
endpoint, the Azure SDK retries with backoff, and the host startup hangs until the
runner kills the job at 20 minutes.

Variant pattern: same .AutoProvision() call exists in OrderService, PaymentService,
ShippingService, NotificationService. CatalogService is the working reference —
it uses Wolverine but no Azure Service Bus transport, so no AutoProvision call,
so no hang.

Fix: gate .AutoProvision() on a config flag (defaults to true so dev/prod are
unchanged). Test factory sets the flag to false:

    builder.UseSetting("Wolverine:AutoProvision", "false");

Applied to all four services even though only OrderService has integration tests
today; preventing the same trap when integration tests get wired up for Payment /
Shipping / Notification.

ROOT CAUSE #2 — Dispose order crashes the test host AFTER tests pass

With root cause #1 fixed, the 4 tests passed in 38 seconds — then the test host
crashed because OrderApiFactory.DisposeAsync stops the SQL Server container BEFORE
calling base.DisposeAsync. Wolverine's DurableReceiver (a BackgroundService that
polls the wolverine.* outbox tables) is still running during that window; every
heartbeat hits "connection refused" and the unhandled SqlException bubbles up:

    Total tests: Unknown
         Passed: 4
    Test Run Aborted.

Fix: dispose host first (await base.DisposeAsync()), then SQL container. Lets
Wolverine's background services exit gracefully before the DB is yanked.

VERIFICATION

Locally with Docker Desktop:

    Test Run Successful.
    Total tests: 4
         Passed: 4
     Total time: 16.2220 Seconds

The full saga runs end-to-end: PlaceOrder → OrderPlacedEvent published, second
test asserts no orphan row on validation failure, PaymentCompletedEvent transitions
Placed → Paid (idempotently), RowVersion concurrency token fires.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 23, 2026

Copy link
Copy Markdown

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 5cf777b9-3b36-408a-a46b-438988d9f0e4

📥 Commits

Reviewing files that changed from the base of the PR and between 615c090 and a142a3e.

📒 Files selected for processing (5)
  • NotificationService/Program.cs
  • OrderService/Program.cs
  • PaymentService/Program.cs
  • ShippingService/Program.cs
  • tests/OrderService.Tests.Integration/OrderApiFactory.cs

📝 Walkthrough

Walkthrough

Wolverine Azure Service Bus auto-provisioning is now gated by a Wolverine:AutoProvision configuration flag across four microservices, defaulting to enabled. Integration test setup disables auto-provisioning and corrects shutdown ordering to prevent background outbox polling from accessing a torn-down database.

Changes

Auto-provisioning configuration gating

Layer / File(s) Summary
Conditional auto-provisioning across services
NotificationService/Program.cs, OrderService/Program.cs, PaymentService/Program.cs, ShippingService/Program.cs
UseAzureServiceBus() result is now captured in an explicit variable, and AutoProvision() is invoked only when Wolverine:AutoProvision evaluates to true (defaults true), replacing previous unconditional fluent method chaining.
Integration test setup and shutdown ordering
tests/OrderService.Tests.Integration/OrderApiFactory.cs
Test configuration disables auto-provisioning via Wolverine:AutoProvision = false, and DisposeAsync now disposes the test host before the SQL container to prevent outbox polling from accessing a torn-down database during shutdown.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

A rabbit hops through service lanes,
Gates the broker's eager reins,
With config flags so clean and kind,
Tests no longer left behind,
Shutdown flows now intertwined. 🐰

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/orderservice-integration-test-hang

Comment @coderabbitai help to get the list of available commands and usage tips.

@emeraldleaf emeraldleaf merged commit fd27fd1 into main May 23, 2026
3 of 4 checks passed
@codecov

codecov Bot commented May 23, 2026

Copy link
Copy Markdown

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant