fix: benchmark reporting — detect timeouts, fix comparisons#331
Conversation
- BenchResult tracks `timed_out` flag from wait_blackhole_done - Timed-out agents show "TIMEOUT" instead of misleading "0 lines/sec" - Timed-out agents excluded from speed comparisons - Runs column shows units and timeout markers (e.g. "4203ms", "120027ms(TIMEOUT)") Fixes #322 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughThis PR adds timeout detection and tracking to the competitive benchmarking framework. A new Possibly related PRs
Comment |
| let mut stable_count = 0u32; | ||
|
|
||
| loop { | ||
| if start.elapsed() > timeout { |
There was a problem hiding this comment.
The timeout check runs before checking whether the expected line count has been reached, which can misclassify successful runs as timed out near the boundary.
Concrete case: if lines reaches expected just before 120s, the next loop iteration may start slightly after 120s and return (lines, true) immediately, even though the run actually completed.
Consider re-checking lines >= expected in the timeout branch (or reading stats first each loop) before returning timed_out = true.
Summary
BenchResult.timed_outflag set bywait_blackhole_done. Agents hitting the 120s timeout show TIMEOUT instead of "0 lines/sec".4203msnot4203, and120027ms(TIMEOUT)for timed-out runs.Before (issue #322)
After
Comparisons only include agents that completed.
Test plan
cargo clippycleancargo fmtcleanFixes #322
🤖 Generated with Claude Code