Skip to content

feat(aspire): dump resource error logs on test failure + warn on missing telemetry wiring#6323

Merged
thomhurst merged 5 commits into
mainfrom
feat/aspire-log-correlation-dx
Jun 30, 2026
Merged

feat(aspire): dump resource error logs on test failure + warn on missing telemetry wiring#6323
thomhurst merged 5 commits into
mainfrom
feat/aspire-log-correlation-dx

Conversation

@thomhurst

Copy link
Copy Markdown
Owner

What

Two opt-out DX features so a failing Aspire test surfaces SUT logs without any manual wiring.

1. Dump resource error logs on test failure (default on)

AspireFixture<T> now implements ITestEndEventReceiver. When a test that consumes the fixture fails, each waited-on resource's recent stderr lines are appended to that test's output.

  • Reuses the existing ResourceLoggerService.WatchAsync pattern; collected in parallel with a short per-resource timeout.
  • Resource-scoped, not request-correlated — on a shared (session/class) fixture the log buffer spans the whole run, so output can include lines from earlier tests. Precise per-request correlation already flows live via the OTLP receiver (trace-context based). This is the coarse fallback for resources that don't export OpenTelemetry logs.
  • Knobs: DumpResourceLogsOnFailure (default true), MaxFailureLogLinesPerResource (50), FailureLogCollectionTimeout (2s).

2. Warn on missing telemetry wiring (default on)

OtlpReceiver now records the service.name of every incoming OTLP log record (before the trace-id filter — presence means OTLP logging reached us at all). At session end the fixture emits a one-time hint for any project resource that produced console output but never sent correlated OTLP logs — the usual symptom of an SUT missing OpenTelemetry log export (e.g. Aspire ServiceDefaults not wired). Turns a silent gap into an actionable nudge.

  • Emitted after the receiver drain (seen-set complete) but before StopAsync (app still up to probe console output); skipped on a run-abort (incomplete data).
  • Name match relies on the fixture-injected OTEL_SERVICE_NAME=<resourceName>; an SUT that overrides its own service name may produce a false hint, hence the soft wording. Knob: WarnOnMissingTelemetry (default true).

Why

For a SUT wired with standard ServiceDefaults, correlated per-request logs already reach the owning test live. These two changes cover the two remaining gaps with zero user code: failure visibility for uncorrelated console logs, and a hint when correlation silently can't work because OTLP log export isn't on.

Tests

  • New Receiver_LogWithServiceName_RecordsSeenService (hand-built OTLP logs protobuf, unregistered trace id, asserts case-insensitive HasSeenLogsFrom + SeenLogServiceNames).
  • OtlpReceiverIngestionTests 8/8; pure Aspire TraceRegistryTests / ResourcesToRemoveTests pass; clean build on net10.0.
  • Docker-backed integration path is not covered here (no container runtime in this environment) — same gap as the existing Aspire startup-diagnostics tests.

Notes / landmines for reviewers

  • TestContext.Result is internal; the public read path is context.Execution.Result?.State.
  • OtlpReceiver.ProcessLogs changed from static to an instance method (needs the seen-services field).
  • Event receivers fire for [ClassDataSource]-injected fixtures because BuildEligibleEventObjects scans test class arguments / injected properties.

…ing telemetry wiring

Two DX features so a failing Aspire test surfaces SUT logs without manual setup:

1. AspireFixture<T> implements ITestEndEventReceiver. On a failed test it appends
   each waited-on resource's recent stderr lines to that test's output (resource-scoped,
   capped). Gated by DumpResourceLogsOnFailure (default on), MaxFailureLogLinesPerResource,
   FailureLogCollectionTimeout. Complements live OTLP per-request log correlation.

2. OtlpReceiver records the service.name of every incoming OTLP log record. At session end
   the fixture hints (WarnOnMissingTelemetry, default on) for any project resource that
   produced console output but never sent correlated OTLP logs - the usual symptom of an
   SUT missing OpenTelemetry log export (e.g. ServiceDefaults not wired). Emitted after the
   receiver drain and skipped on run-abort.

Adds Receiver_LogWithServiceName_RecordsSeenService.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(aspire): dump resource error logs on test failure + warn on missing telemetry wiring

Good DX addition overall — the two-feature design is well-reasoned, the PR description is thorough, and the test is solid. A few things worth fixing before merge:


1. Potential null reference: _app passed to non-nullable parameter

In StopAndDisposeCoreAsync, _app (typed DistributedApplication?) is passed to EmitTelemetryWiringHintsAsync which takes DistributedApplication app (non-nullable):

await EmitTelemetryWiringHintsAsync(_app, _otlpReceiver);

If _app is null at that point (e.g. initialization failed before the app was assigned), this is either a CS8604 nullable warning or a runtime null dereference. The call site should guard:

if (WarnOnMissingTelemetry && _app is not null && !RunCancellationToken.IsCancellationRequested)
{
    await EmitTelemetryWiringHintsAsync(_app, _otlpReceiver);
}

Or make the parameter nullable and guard inside the method (similar to how OnTestEnd does var app = _app; if (... || app is null) early return).


2. Memory inefficiency: CollectResourceErrorLinesAsync collects unbounded lines before truncating

var errors = new List<string>();
// ... fills with ALL error lines from the backlog ...
if (max >= 0 && errors.Count > max)
{
    errors.RemoveRange(0, errors.Count - max);
}

On a long test run where a resource produced thousands of error lines, this allocates the full list before discarding all but the last max. For shared [ClassDataSource] or [SessionDataSource] fixtures this could be a significant allocation on every test failure.

The better approach is a bounded ring-buffer that never exceeds max entries:

var errors = new Queue<string>(max + 1);
// inside loop:
if (line.IsErrorMessage)
{
    errors.Enqueue(line.Content);
    if (errors.Count > max) errors.Dequeue();
}
// return errors.ToList()

This keeps memory proportional to MaxFailureLogLinesPerResource, not to the total number of error lines the resource has ever produced.


3. Hardcoded 2-second timeout in ResourceProducedConsoleOutputAsync

FailureLogCollectionTimeout is a configurable virtual property, but ResourceProducedConsoleOutputAsync bakes in TimeSpan.FromSeconds(2) with no hook for customization. These are separate concerns (failure dumps vs. the telemetry hint), but the inconsistency is surprising.

More importantly: if a project resource produced no output at all, the probe will wait the full 2 seconds before returning false. With N project resources in parallel, this is always 2s of teardown overhead on every run — even when all resources are perfectly wired with OTLP. Consider a short-circuit: if the receiver has already seen a service name for a resource, skip the probe entirely:

if (!receiver.HasSeenLogsFrom(p.Name) && await ResourceProducedConsoleOutputAsync(...))
{
    // emit hint
}

Reversing the check order avoids the I/O probe for resources that clearly sent OTLP logs.


4. Minor: unnecessary local copy in ProcessLogs

private void ProcessLogs(byte[] body)
{
    var diag = _diagnostics;

_diagnostics is a readonly field, so var diag = _diagnostics is a needless alias. Just use _diagnostics directly as before. The motivation for the change (making the method instance rather than static) is correct — the alias is just noise.


What's solid

  • The ITestEndEventReceiver integration is the right hook — cleaner than a global cleanup callback.
  • ObjectDisposedException guard in OnTestEnd is the right call: a race against run-abort teardown is expected and shouldn't poison the test result.
  • ConcurrentDictionary<string, byte> for the seen-services set is an appropriate allocation-free pattern (compared to ConcurrentHashSet).
  • Case-insensitive service name matching via StringComparer.OrdinalIgnoreCase on the dictionary is correctly placed at construction time.
  • The new test Receiver_LogWithServiceName_RecordsSeenService tests both the happy path and case-insensitivity, and uses an unregistered trace ID to verify the seen-service recording is independent of trace correlation — good.
  • The PR description is unusually complete and the "landmines for reviewers" section is helpful context.

@codacy-production

codacy-production Bot commented Jun 29, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 38 complexity

Metric Results
Complexity 38

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

… probes

Cleanups from /simplify review:
- Merge CollectResourceErrorLinesAsync into CollectResourceLogLinesAsync via
  optional (timeout, errorsOnly, max) params; defaults reproduce the existing
  startup-diagnostics caller's behavior (5s, all lines, E> prefix, no cap).
- EmitTelemetryWiringHintsAsync: filter project resources by !HasSeenLogsFrom
  before probing console output, so a correctly-wired SUT opens zero probes.
- OnTestEnd: build the waited-on resource name list with LINQ instead of a
  manual foreach.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (second commit: 016b04a3)

What the new commit addressed

The refactor(aspire) commit picked up two concerns from the prior review:

  • Pre-filter telemetry-hint probesEmitTelemetryWiringHintsAsync now filters by !HasSeenLogsFrom before opening any console probe, so a correctly-wired SUT (all resources seen) takes zero I/O overhead at teardown. Good fix.
  • Method consolidation — merging CollectResourceErrorLinesAsync into CollectResourceLogLinesAsync via optional params reduces duplication. Clean.

Correction to my prior review — issue #1 was a misread

My prior comment about a potential null dereference on _app was wrong. The call to EmitTelemetryWiringHintsAsync(_app, _otlpReceiver) is nested inside:

if (_otlpReceiver is not null && _app is not null)

so _app is guaranteed non-null at that point. Apologies for the noise.


Still open from prior review

Issue #2: Memory — unbounded list before tail-truncation (still present)

CollectResourceLogLinesAsync still allocates a List<string> for every line in the backlog, then removes the head:

// line 1062
lines.RemoveRange(0, lines.Count - max);

On a long test run with a shared (session/class) fixture, a noisy resource can produce thousands of lines, all allocated before all-but-50 are discarded. A fixed-size ring-buffer keeps memory proportional to max throughout:

// Replace List<string> with a sliding window
var window = new Queue<string>(max + 1);
// inside loop, when errorsOnly:
if (line.IsErrorMessage)
{
    window.Enqueue(line.Content);
    if (window.Count > max) window.Dequeue();
}
// return window.ToList()

This is especially worth fixing here because OnTestEnd fires for every failing test that consumes the fixture — on a shared fixture, that means N calls per run, each potentially pulling a large backlog.

Issue #4: Unnecessary alias in ProcessLogs (still present)

// OtlpReceiver.cs line 554
private void ProcessLogs(byte[] body)
{
    var diag = _diagnostics;  // _diagnostics is a readonly field — this alias adds nothing

Just use _diagnostics directly. The alias was a remnant of when the method was static and took diag as a parameter; now that it's an instance method, the variable is dead weight.


Hardcoded timeout in ResourceProducedConsoleOutputAsync (FYI, not blocking)

// line 743
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(2));

FailureLogCollectionTimeout defaults to 2s too, so the values coincidentally match, but the probe timeout isn't wired to any configurable property. With the pre-filter now in place the probe is only opened for resources that haven't sent OTLP logs, so in practice this affects mis-wired SUTs only. Still, if someone extends the failure dump timeout they'd reasonably expect the hint probes to follow. A minor but clean fix would be passing FailureLogCollectionTimeout through to the probe (or a separate virtual TelemetryHintProbeTimeout). Not blocking.


Summary

Two real fixes landed in the second commit; issue #1 from my prior review was a misread (already safe). Memory inefficiency (#2) and the var diag alias (#4) still need attention. The timeout inconsistency is low priority but worth a note.

…lias

- CollectResourceLogLinesAsync: use a fixed-size sliding window (Queue) when
  `max` is set so memory stays proportional to MaxFailureLogLinesPerResource
  instead of buffering the whole backlog before tail-truncating (review #2).
- OtlpReceiver.ProcessLogs: drop the leftover `var diag = _diagnostics` alias
  from when the method was static; use the field directly (review #4).
- Wire the telemetry-hint console probe to FailureLogCollectionTimeout instead
  of a hardcoded 2s, so extending the dump timeout extends the probe too.
@thomhurst

Copy link
Copy Markdown
Owner Author

Addressed the open items from the follow-up review in 90a36ff0a:

  • Retry attribute  #2 (memory): CollectResourceLogLinesAsync now keeps a fixed-size sliding window (Queue<string> capped at max) instead of buffering the full backlog and tail-truncating. Memory stays proportional to MaxFailureLogLinesPerResource regardless of how noisy the resource is. The unbounded diagnostics path (no max) is unchanged.
  • And assertion conditions #4 (alias): dropped the leftover var diag = _diagnostics in ProcessLogs (a remnant of the old static signature); the method uses the field directly.
  • FYI (hardcoded timeout): the telemetry-hint console probe now takes FailureLogCollectionTimeout rather than a baked-in 2s, so extending the dump timeout extends the probe too.

OtlpReceiverIngestionTests 8/8 green, clean build on net10.0.

@claude

claude Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Code Review

This PR adds two opt-out Aspire DX features to AspireFixture<T>: (1) dumping a failed test's waited-on resources' recent stderr lines via a new ITestEndEventReceiver.OnTestEnd, and (2) a session-end heuristic hint (EmitTelemetryWiringHintsAsync) for resources that produced console output but never sent correlated OTLP logs, backed by new service-name tracking in OtlpReceiver. The design is well-reasoned and the doc comments are unusually thorough about documented tradeoffs (resource-scoped vs request-correlated, name-match false positives). I fanned out 8 review angles and verified each surviving candidate against the actual PR-branch code; findings below are ranked most-severe first.

Findings

  1. TUnit.OpenTelemetry/Receiver/OtlpLogParser.cs:167 — the new "seen service" tracking never fires for the case it's meant to catch.
    ParseLogRecord returns null (dropping the whole record) whenever traceId is empty:

    if (string.IsNullOrEmpty(traceId)) return null;

    OtlpReceiver.ProcessLogs's new code (TUnit.OpenTelemetry/Receiver/OtlpReceiver.cs:452) only sees records that already survived this filter, so its comment — "Record the source service even when the record has no usable trace id" — is false for the actual no-trace-id case; it only fires for the unrelated all-zero-but-present-trace-id edge case. Failure scenario: a correctly-wired SUT that emits genuine untraced logs (e.g. startup/background logs with no active Activity) but never traced logs will still trigger a false-positive "missing telemetry" hint, which is exactly the false positive this PR is trying to avoid by recording presence "regardless of trace-id match." Worth either parsing service.name ahead of the trace-id filter (in OtlpLogParser, before returning null) or routing the resource-name signal back out of the dropped-record path.

  2. TUnit.Aspire.Core/AspireFixture.cs:450OnTestEnd fires once per retry attempt, and a failed attempt's resource-log dump leaks into an eventually-passing test's output.
    Verified in TUnit.Engine: the retry loop (RetryHelper.cs) re-invokes the full test lifecycle — including InvokeTestEndEventReceiversAsyncOnTestEnd — on every attempt, and explicitly resets Result/TestStart/TestEnd/Timings between attempts but not context.Output (the ConcurrentStringWriter lives for the TestContext's lifetime across all attempts). So: test fails on attempt 1 → OnTestEnd dumps resource error logs into context.Output → test passes on attempt 2 (e.g. via [Retry(n)]) → the final "Passed" test's output still contains attempt-1's failure log dump. This is confusing output on a green test, not just noise on red ones — worth gating on context.CurrentRetryAttempt == <final> or clearing/scoping output per attempt (if that's feasible given TestContext.Output's current semantics).

  3. TUnit.OpenTelemetry/Receiver/OtlpReceiver.cs:38-43, 69-75 — Aspire-specific bookkeeping leaks into the shared, multi-consumer OtlpReceiver.
    _seenLogServiceNames/HasSeenLogsFrom/SeenLogServiceNames are explicitly documented as "Used by TUnit.Aspire..." but OtlpReceiver has a second, non-Aspire consumer — TUnit.OpenTelemetry/AutoReceiver.cs (the generic WebApplicationFactory/TUNIT_OTEL_RECEIVER auto-start path) — which now carries this dictionary and a widened internal surface for no benefit. The ProcessLogs static→instance conversion (line 430) was made solely to reach this field, while the sibling ProcessTraces was deliberately left static with diag passed explicitly — i.e. this PR broke the file's own established convention for one consumer's feature. Consider keeping OtlpReceiver consumer-agnostic and tracking seen-service-names in a thin Aspire-side wrapper/callback instead (e.g. a delegate OtlpReceiver invokes on every parsed log, which AspireFixture subscribes to) — same effect for Aspire, zero footprint for AutoReceiver.

  4. TUnit.Aspire.Core/AspireFixture.cs:463 vs :699 — inconsistent resource filtering between the two new features.
    OnTestEnd only collects logs for resources passing ShouldWaitForResource (line 463), but EmitTelemetryWiringHintsAsync's candidate list (line 699) is model.Resources.OfType<ProjectResource>() with no ShouldWaitForResource filter. A resource a user has explicitly excluded via ShouldWaitForResource (e.g. resource.Name != "slow-service") is still probed for console output and can still trigger a "missing telemetry" warning about it. ShouldWaitForResource's doc is narrowly scoped to startup-wait semantics, so this may be intentional, but if the intent is "this resource is opted out of fixture diagnostics," both paths should agree.

  5. TUnit.Aspire.Core/AspireFixture.cs:741 reimplements :1023's WatchAsync/cancellation pattern, but isn't a drop-in merge.
    ResourceProducedConsoleOutputAsync and CollectResourceLogLinesAsync share the same GetRequiredService<ResourceLoggerService>()CancellationTokenSource(timeout)await foreach (... WithCancellation(cts.Token))catch (OperationCanceledException) scaffolding (~20 duplicated lines). Naively replacing the former with (await CollectResourceLogLinesAsync(app, name, timeout, max: 1)).Count > 0 would change behavior, though: CollectResourceLogLinesAsync has no early-exit once max is reached (it keeps draining via the sliding-window Queue until the full timeout fires), whereas ResourceProducedConsoleOutputAsync returns true the instant any non-empty batch arrives. A clean dedup would need an early-exit-on-max path added to CollectResourceLogLinesAsync first — worth doing since it would also make the failure-dump path return faster for resources with little log volume, not just the probe.

  6. TUnit.Aspire.Core/AspireFixture.cs:222FailureLogCollectionTimeout's doc doesn't disclose its second use.
    The property is documented purely as "Time budget for collecting a resource's buffered logs on test failure," but line 709 reuses it as the session-end console-output probe timeout in EmitTelemetryWiringHintsAsync, with the dual-purpose only disclosed in an inline comment at the call site, not in the property's XML doc. A consumer raising this for a slow-CI failure dump won't realize via IntelliSense that they're also changing session-teardown probe latency (multiplied across every non-OTLP-seen resource). Either split into two properties or extend the XML doc to mention both consumers.

Not flagged (verified and refuted)

  • OnTestEnd's catch (ObjectDisposedException) only — checked whether other exception types are realistically reachable from GetRequiredService/WatchAsync during a teardown race; both only ever throw ObjectDisposedException or complete gracefully, so the existing catch is sufficient.
  • The errorsOnly || !line.IsErrorMessage ternary in CollectResourceLogLinesAsync — looked redundant at first glance, but it's load-bearing: it's what makes errorsOnly=true strip the E> prefix for raw stderr content, per the documented contract.

Nice test coverage on the new OtlpReceiver behavior (Receiver_LogWithServiceName_RecordsSeenService), and the call-site audit for CollectResourceLogLinesAsync's widened signature checked out — the one pre-existing caller keeps its original behavior unchanged.

Addresses the deep-review findings on the OTLP log-correlation DX PR.

- #1 false-positive hint: OtlpLogParser dropped every record without a trace id
  *before* OtlpReceiver could record its service.name, so a correctly-wired SUT
  emitting only untraced logs (startup/background, no active Activity) was never
  marked "seen" and triggered the missing-telemetry hint it was meant to avoid.
  Parse now invokes an onResourceSeen callback for each record ahead of the
  trace-id filter; the receiver records seen services via a cached delegate. The
  record-emission contract (untraced records excluded from results) is unchanged.
- #2 retry leak: OnTestEnd fires once per retry attempt (end-event receivers run
  inside the retried body), and TestContext.Output is shared across attempts, so
  an early failed attempt's log dump leaked into the output of a test that passed
  on retry. Gate the dump on the final attempt (CurrentRetryAttempt >= RetryLimit).
- #4 consistency: the telemetry-hint candidate list now applies ShouldWaitForResource,
  matching the on-failure dump's filtering, so a user-excluded resource isn't probed.
- #6 docs: FailureLogCollectionTimeout XML doc now discloses its reuse as the
  session-end console-output probe timeout.

New test Parse_UntracedLogRecord_StillReportsSeenService covers #1.
@thomhurst

Copy link
Copy Markdown
Owner Author

Thanks — the deep pass caught two real bugs. Addressed in 061197f72:

#1 (false-positive hint) — fixed. Confirmed: OtlpLogParser.ParseLogRecord returns null for any record without a trace id before ProcessLogs runs, so the per-record _seenLogServiceNames.TryAdd only ever fired for records that already had a trace id — exactly inverting the intent. A correctly-wired SUT emitting only untraced logs (startup/background, no active Activity) would have triggered the very hint it was meant to avoid. Parse now takes an onResourceSeen callback invoked ahead of the trace-id filter; the receiver records via a cached delegate (no per-request alloc). The record-emission contract is unchanged (untraced records still excluded from results — two existing parser tests assert that). New test Parse_UntracedLogRecord_StillReportsSeenService locks it in.

#2 (retry leak) — fixed. Verified against RetryHelper.ExecuteWithRetryExecuteTestLifecycleAsyncTestExecutor.ExecuteAsync, whose finally fires the end-event receivers inside the retried body, and RetryHelper clears Result/TestStart/Timings but not Output. So a failed attempt-1 dump persisted into an eventually-passing test's output. Gated the dump on CurrentRetryAttempt >= RetryLimit (final attempt only). Caveat noted in a comment: a custom retry predicate that declines mid-way terminates as Failed before the limit, and in that rarer case we forgo the dump rather than risk a leak; the default per-exception retry always runs to the limit on failure, so it dumps exactly once on the terminal attempt.

#4 (filter consistency) — fixed. EmitTelemetryWiringHintsAsync now applies ShouldWaitForResource before probing, matching the on-failure dump.

#6 (doc) — fixed. FailureLogCollectionTimeout's XML doc now discloses its reuse as the session-end probe timeout.

#3 (tracking on shared OtlpReceiver) — keeping as-is, with reasoning. OtlpReceiver is the component that receives OTLP, and it already owns received-signal observability (Diagnostics, RequestCount) as internal surface; SeenLogServiceNames is the same category, not Aspire-specific bookkeeping. AutoReceiver carries one extra ConcurrentDictionary that stays empty until a log arrives — negligible. On the static/instance asymmetry: ProcessLogs needs instance state regardless now (it passes the instance _recordSeenLogService to the parser), and ProcessTraces has no equivalent seen-set, so the asymmetry reflects a real difference between the two signals rather than a broken convention. A delegate-extraction layer would add indirection for marginal purity benefit. Happy to revisit if you'd still prefer the wrapper.

#5 (dedup the two WatchAsync helpers) — keeping separate, with reasoning. They share ~20 lines of scaffolding but have opposite termination semantics: the probe returns on the first line (early-exit), while the failure dump must drain the full backlog until the timeout to keep the most recent max lines — it can't early-exit on max or it'd capture the oldest lines instead. The proposed early-exit-on-max path would therefore change the dump's behavior, not just speed it up, so a clean merge isn't available for the lines saved. Left as two focused methods.

OtlpReceiverIngestionTests 8/8, OtlpLogParserTests (+1 new) and OtlpReceiverTests all green on net10.0.

@claude

claude Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Follow-up review (verifying fixes in 061197f72)

I re-verified every item from the deep-pass review (4848260427) against the current PR head, reading the actual diffs rather than trusting the commit message summary.

Confirmed fixed, correctly

  1. False-positive telemetry hint (Repeat attributes  #1). OtlpLogParser.ParseScopeLogs now invokes onResourceSeen per record before ParseLogRecord can drop it for lacking a trace id (OtlpLogParser.cs:122-127). OtlpReceiver wires this through a cached _recordSeenLogService delegate (no per-call allocation). New test Parse_UntracedLogRecord_StillReportsSeenService locks in the exact regression scenario (untraced record still marks the resource "seen"). Verified correct.

  2. Retry log-dump leak (Retry attribute  #2). OnTestEnd now gates the dump on context.Execution.CurrentRetryAttempt < context.Metadata.TestDetails.RetryLimit → return (AspireFixture.cs:470). I traced this against RetryHelper.ExecuteWithRetry: CurrentRetryAttempt is set before each attempt, and the loop only reaches the final attempt index (== RetryLimit) when no more retries remain — so the dump now only ever fires on the terminal attempt, exactly closing the leak into a test that ultimately passes on retry. The documented caveat (custom retry predicates that decline mid-way forgo the dump) is correctly scoped and honestly disclosed.

  3. Filter consistency (And assertion conditions #4). EmitTelemetryWiringHintsAsync's candidate list now applies .Where(ShouldWaitForResource) (AspireFixture.cs:717), matching OnTestEnd's resource list (AspireFixture.cs:480). A user-excluded resource is no longer probed by either path.

  4. Doc disclosure (Analyzers for assertion object types #6). FailureLogCollectionTimeout's XML doc now explicitly documents its second use as the session-end console-probe timeout (AspireFixture.cs:222-225), and the probe itself now takes this value instead of a hardcoded 2s (AspireFixture.cs:727, 759).

Also re-confirmed the earlier round's ring-buffer fix is still solid: CollectResourceLogLinesAsync uses a Queue<string> sliding window when max >= 0, so memory stays bounded regardless of backlog size (AspireFixture.cs:1041-1096).

On the two deferred items (#3, #5)

Both rebuttals hold up:

  • Ignore attribute  #3 (seen-service tracking on shared OtlpReceiver) — fair: Diagnostics/RequestCount already establish "received-signal observability" as in-scope for this class, and AutoReceiver's cost is one empty ConcurrentDictionary until first use. Not worth a wrapper layer for this.
  • Or assertion conditions #5 (dedup the two WatchAsync helpers) — also fair: the early-exit-on-first-line vs. drain-to-timeout-for-tail semantics are genuinely different, so a merge would require building an early-exit-on-max path into the failure-dump collector first, which is out of scope for this PR. Reasonable to leave as two focused methods.

Net

All four real bugs from the deep-pass review are fixed and verified against the actual code (not just trusting the commit message). No new issues found in this pass. This is ready to merge from my side.

Quality cleanups from a /simplify pass (no behaviour change):

- OtlpLogParser: fire the onResourceSeen callback once per resource scope
  instead of once per log record. ParseScopeLogs now returns whether it saw any
  record; ParseResourceLogs signals after the loop, dropping the per-record
  ConcurrentDictionary.TryAdd churn for a noisy resource and removing the
  callback param from ParseScopeLogs's signature.
- AspireFixture: invert the log-line prefix ternary to test the special case
  directly (E> only when an error line and not errors-only).
- AspireFixture: merge the two telemetry-hint candidate .Where() filters into a
  single predicate.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (latest commit 1054d3fa: "tidy log-parser seen-service signal and hint filter")

All prior rounds' findings (the deep-pass review and the two follow-ups) are already confirmed fixed and verified. This commit is a small post-"ready to merge" cleanup on top of that state — reviewed it specifically since it landed after the last sign-off.

Changes in this commit

  1. AspireFixture.cs:714-717 — merged the two chained .Where(ShouldWaitForResource).Where(p => !receiver.HasSeenLogsFrom(p.Name)) calls into a single .Where(p => ShouldWaitForResource(p) && !receiver.HasSeenLogsFrom(p.Name)). Pure style change — same predicates, same short-circuit order, no behavior difference.

  2. AspireFixture.cs:1070-1072 — flipped the E> prefix ternary from errorsOnly || !line.IsErrorMessage ? Content : "E> " + Content to !errorsOnly && line.IsErrorMessage ? "E> " + Content : Content. Worked through the truth table for all four (errorsOnly, IsErrorMessage) combinations — identical output in every case. The new form reads more directly (positive condition for the prefix-added branch instead of a double-negative guard), so this is a legitimate readability win with no regression.

  3. OtlpLogParser.cs — reworks onResourceSeen from firing once per LogRecord (inside ParseScopeLogs, before the trace-id filter) to firing once per resource (ParseResourceLogs, via a sawLogRecord flag OR'd across all of a resource's ScopeLogs blocks). This avoids redundant onResourceSeen invocations for a noisy resource — AspireFixture's consumer is a ConcurrentDictionary.TryAdd, so the old per-record version wasn't wrong, just wasteful at scale. The new code explicitly documents the wire-format assumption it leans on ("Resource (field 1) precedes ScopeLogs (field 2) ... true for all known OTel SDK implementations"), which is the right way to disclose it given the parser has no way to enforce field ordering.

Verification

  • Confirmed the ternary rewrite is logically equivalent by hand (see truth table above).
  • Built TUnit.OpenTelemetry and TUnit.Aspire.Core against the PR head (1054d3fa) — both succeed with 0 warnings/errors.
  • Ran the directly-affected suites against the PR head:
    • OtlpLogParserTests (incl. Parse_UntracedLogRecord_StillReportsSeenService, which exercises exactly this code path): 26/26 passed.
    • OtlpReceiverIngestionTests: 8/8 passed.

No new issues. This commit doesn't touch any of the previously-flagged areas and doesn't introduce new ones — the seen-service contract (presence-before-trace-filter, case-insensitive lookup) is preserved, just invoked more efficiently. Ready to merge from my side.

@thomhurst thomhurst enabled auto-merge (squash) June 30, 2026 22:36
@thomhurst thomhurst merged commit 6d3597c into main Jun 30, 2026
14 checks passed
This was referenced Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant