Skip to content

fix: benchmark reporting — detect timeouts, fix comparisons#331

Merged
strawgate merged 1 commit into
masterfrom
fix/bench-reporting-v2
Mar 31, 2026
Merged

fix: benchmark reporting — detect timeouts, fix comparisons#331
strawgate merged 1 commit into
masterfrom
fix/bench-reporting-v2

Conversation

@strawgate
Copy link
Copy Markdown
Owner

Summary

  • Timeout detection: BenchResult.timed_out flag set by wait_blackhole_done. Agents hitting the 120s timeout show TIMEOUT instead of "0 lines/sec".
  • Exclude timed-out agents from comparisons: No more "filebeat is 1.2x faster than otelcol" when otelcol didn't finish.
  • Runs column clarity: Shows 4203ms not 4203, and 120027ms(TIMEOUT) for timed-out runs.

Before (issue #322)

| otelcol | binary | 120027ms | 1ms | 0 lines/sec | 120027|120028|120027 |
> filebeat is 1.2x faster than otelcol

After

| otelcol | binary | **TIMEOUT** (120027ms) | 1ms | 0 lines (timed out) | 120027ms(TIMEOUT) | ... |

Comparisons only include agents that completed.

Test plan

  • cargo clippy clean
  • cargo fmt clean
  • CI passes

Fixes #322

🤖 Generated with Claude Code

- BenchResult tracks `timed_out` flag from wait_blackhole_done
- Timed-out agents show "TIMEOUT" instead of misleading "0 lines/sec"
- Timed-out agents excluded from speed comparisons
- Runs column shows units and timeout markers (e.g. "4203ms", "120027ms(TIMEOUT)")

Fixes #322

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 31, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: af7309d0-2b5e-47e0-86f1-ae5b0a28d62e

📥 Commits

Reviewing files that changed from the base of the PR and between 885922a and d830e9a.

📒 Files selected for processing (3)
  • crates/logfwd-competitive-bench/src/main.rs
  • crates/logfwd-competitive-bench/src/runner.rs
  • crates/logfwd-competitive-bench/src/summarize.rs

Walkthrough

This PR adds timeout detection and tracking to the competitive benchmarking framework. A new timed_out boolean field is added to BenchResult to indicate when a benchmark run exceeds its configured timeout. The timeout status is determined by wait_blackhole_done() which now returns a tuple containing both lines processed and a timeout flag. All runner functions are updated to destructure and propagate this value to the result struct. Summarization logic then uses this information to annotate timed-out runs in output and exclude them from performance comparisons.

Possibly related PRs


Comment @coderabbitai help to get the list of available commands and usage tips.

@strawgate strawgate merged commit ede6500 into master Mar 31, 2026
8 of 11 checks passed
@strawgate strawgate deleted the fix/bench-reporting-v2 branch March 31, 2026 04:30
let mut stable_count = 0u32;

loop {
if start.elapsed() > timeout {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout check runs before checking whether the expected line count has been reached, which can misclassify successful runs as timed out near the boundary.

Concrete case: if lines reaches expected just before 120s, the next loop iteration may start slightly after 120s and return (lines, true) immediately, even though the run actually completed.

Consider re-checking lines >= expected in the timeout branch (or reading stats first each loop) before returning timed_out = true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Benchmark results 2026-03-31 (a1019aa)

1 participant