fix: benchmark timeouts and dashboard memory usage tracking by strawgate · Pull Request #426 · strawgate/fastforward

strawgate · 2026-03-31T17:52:44Z

Addresses #424 by improving the benchmark experience and dashboard formatting.

Key Changes

Longer Benchmarks: Increased default line count to 20M events to provide a more stable measurement for fast agents like logfwd, especially since it was finishing in ~5s previously.
Reliable Testing: Increased the blackhole polling timeout to 300s to prevent premature timeouts during high-throughput runs.
Fair Comparisons: Reverted OTel Collector agent configuration changes to keep 'standard' behavior while still fixing the harness-level timeout.
New Resource Metrics: The summarize tool now aggregates peak RSS and peak CPU % for each benchmarked agent.
Enhanced Dashboard:
- Added a Memory column to the Competitive Analysis table.
- Added Resource Usage horizontal bar charts to the Run Detail page, showing peak memory and CPU consumption across all scenarios for a specific run.
- Added responsive table containers (overflow-x: auto) to ensure the dashboard remains usable on smaller screens or when many agents are compared.

- Increase competitive benchmark lines from 5M to 20M for better stability - Increase blackhole wait timeout from 120s to 300s to prevent OTel timeouts - Add Peak RSS and Peak CPU tracking to competitive benchmarks - Add Memory Usage column to competitive dashboard table - Add Resource Usage charts (Peak RSS/CPU) to run detail page - Ensure rate-ingest benchmarks are correctly published to dashboard

coderabbitai · 2026-03-31T17:52:59Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 74539a32-9a5c-4b45-bf13-96f611a8af1d

📥 Commits

Reviewing files that changed from the base of the PR and between 67c24eb and ab1c651.

📒 Files selected for processing (2)

bench/dashboard/competitive.html
bench/dashboard/run.html

Walkthrough

Increases the workflow-dispatch default and env fallback for benchmark LINES from 5,000,000 to 20,000,000; extends the wait_blackhole_done timeout from 120s to 300s in three runner flows; adds per-run aggregation helpers and formatting for peak RSS and peak CPU, exposes avg_max_rss/avg_max_cpu in dashboard JSON, and adds a “Peak RSS” column to CLI/table output; updates dashboard HTML/CSS to render per-agent Memory columns, add formatBytes, enable horizontal scrolling/min-width for tables, add a hidden resource-usage section with charts, and change run JSON loading to an inline raw GitHub URL.

Possibly related PRs

strawgate/memagent PR 426: Makes largely overlapping changes across workflow LINES, runner timeout, summarize aggregation helpers, and dashboard HTML/CSS (near-identical code-level edits).
strawgate/memagent PR 331: Modifies runner wait_blackhole_done usage and timed_out reporting; closely related to the timeout increases here.
strawgate/memagent PR 301: Introduced earlier dashboard/summarize/runner changes that this PR continues or adjusts (same files and fields touched).

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

bench/dashboard/competitive.html (1)

102-118: ⚠️ Potential issue | 🟠 Major

Normalize and reorder run IDs before treating them as chronological data.

The updater still prepends new entries with insert(0, entry_id) and tolerates legacy { id: ... } items in index.runs. Using that array directly here can fetch data/runs/[object Object].json, reverses the trend x-axis, and makes latestRun = allRuns[allRuns.length - 1] point at the oldest run.

Proposed fix

-  const runIds = index.competitive_runs || index.runs || [];
+  const runIds = (index.competitive_runs || index.runs || [])
+    .map(run => typeof run === 'string' ? run : run?.id)
+    .filter(Boolean)
+    .reverse();

   if (runIds.length === 0) {
     document.getElementById('loading').textContent = 'No competitive runs found.';
     return;
   }

   const allRuns = (await Promise.all(runIds.map(id => loadJSON('data/runs/' + id + '.json')))).filter(Boolean);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@bench/dashboard/competitive.html` around lines 102 - 118, Normalize run IDs
from index.competitive_runs/index.runs by mapping each item to a string id
(handle legacy objects with an id property: e.g., runIds.map(r => typeof r ===
'object' ? r.id : r)), filter out falsy values, then ensure chronological order
by reversing if the updater prepends entries (so oldest-first), before calling
loadJSON and computing allRuns and latestRun; update references to runIds,
loadJSON, allRuns, and latestRun accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bench/dashboard/run.html`:
- Around line 118-120: The page now loads run data from data/runs/${runId}.json
(via loadJSON and runId) but the deployment workflow only publishes
bench/dashboard/ to Pages, so those JSON files never reach the site and the
fallback message (document.getElementById('loading').textContent...) always
appears; fix this by updating the CI workflow (.github/workflows/bench.yml) to
include the generated data files in the Pages artifact (either copy
data/runs/*.json under bench/dashboard/ before upload or add data/runs/** to the
upload/publish step) so loadJSON can find data/runs/${runId}.json at runtime.
Ensure both bench/dashboard/ and data/runs/*.json are staged for the Pages
upload so the published site and competitive.html can load run JSON.
- Around line 190-230: The CPU chart never gets populated or instantiated:
inside the agents loop (where rssData/maxRss are computed) also compute maxCpu
from run.scenarios?.[sn]?.[agent]?.avg_max_cpu and push it into a cpuData array
and ensure colors/labels mirror rss (use AGENT_COLORS[agent] || '#64748b');
after rendering the RSS chart, check cpuData.length>0 and create a new Chart
targeting cpuCanvas (the chart-peak-cpu canvas) with the same horizontal bar
config but use formatPercent (or an existing percent formatter) for tooltip and
x-axis ticks and set datasets: [{ data: cpuData, backgroundColor: colors,
borderRadius: 3 }] so the “Peak CPU Usage” panel shows a real chart.

In `@crates/logfwd-competitive-bench/src/summarize.rs`:
- Around line 79-105: avg_max_cpu currently divides the CPU ms delta by a
hardcoded 10.0 (assuming 1s between samples); replace that with the actual
elapsed time between samples using the samples' elapsed_sec fields: compute
elapsed_delta = curr.elapsed_sec - prev.elapsed_sec, skip or treat as invalid
when elapsed_delta <= 0.0, and compute pct as (diff_ms as f64) / (elapsed_delta
* 1000.0) * 100.0 so the CPU percent uses the real time base; update the loop
inside avg_max_cpu (referencing self.runs, r.samples, cpu_user_ms, cpu_sys_ms,
and elapsed_sec) and keep the existing filter for v > 0.0.

---

Outside diff comments:
In `@bench/dashboard/competitive.html`:
- Around line 102-118: Normalize run IDs from index.competitive_runs/index.runs
by mapping each item to a string id (handle legacy objects with an id property:
e.g., runIds.map(r => typeof r === 'object' ? r.id : r)), filter out falsy
values, then ensure chronological order by reversing if the updater prepends
entries (so oldest-first), before calling loadJSON and computing allRuns and
latestRun; update references to runIds, loadJSON, allRuns, and latestRun
accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d303d9e-7d4e-4d6f-8c89-df69a4d1ed8a

📥 Commits

Reviewing files that changed from the base of the PR and between c42b4cf and bcb7a9f.

📒 Files selected for processing (6)

.github/workflows/bench.yml
bench/dashboard/competitive.html
bench/dashboard/run.html
bench/dashboard/style.css
crates/logfwd-competitive-bench/src/runner.rs
crates/logfwd-competitive-bench/src/summarize.rs

github-actions · 2026-03-31T18:08:36Z

Addressed all 4 unresolved review threads with targeted fixes and pushed commit 3a954a1.

Fixed dashboard data source mismatch in competitive.html and run.html by restoring fetches to the published bench-data source.
Implemented the missing Peak CPU chart rendering in run.html.
Corrected CPU percentage calculation in summarize.rs to use actual sample interval (elapsed_sec delta) instead of a fixed divisor.
Replied inline on each review comment and resolved all addressed threads.

Validation

just ci: could not run in this environment (just: command not found).
cargo test -p logfwd-competitive-bench --quiet (with RUSTC_WRAPPER=): passed (2 tests).

Note

🔒 Integrity filtering filtered 1 item

Integrity filtering activated and filtered the following item during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.

pr:fix: benchmark timeouts and dashboard memory usage tracking #426 (pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)

What is this? | From workflow: AI: Address PR Review Feedback

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bench/dashboard/competitive.html`:
- Around line 74-79: formatBytes currently formats 0 (and other falsy values) as
"0B", but the Rust backend's fmt_bytes returns "-" for zero/falsy values; update
the formatBytes function to return '-' when the input is falsy (e.g., if (!b)
return '-') before the existing size checks so that formatBytes aligns with the
backend behavior; reference the formatBytes function to locate and change this
behavior.

In `@bench/dashboard/run.html`:
- Around line 80-85: The formatBytes function returns "0B" for zero input but
should mirror Rust's fmt_bytes by returning "-" for zero/missing values; modify
the formatBytes(b) function to first check if b === 0 (or b == null/undefined if
you want to treat missing as zero) and return "-" before the size comparisons,
keeping the existing GB/MB/KB formatting logic intact so callers like
avg_max_rss display "-" instead of "0B".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0eb00a1e-95c9-4fb5-95cb-488b67eaaf38

📥 Commits

Reviewing files that changed from the base of the PR and between bcb7a9f and 67c24eb.

📒 Files selected for processing (3)

bench/dashboard/competitive.html
bench/dashboard/run.html
crates/logfwd-competitive-bench/src/summarize.rs

coderabbitai · 2026-03-31T18:12:42Z

+function formatBytes(b) {
+  if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';
+  if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';
+  if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';
+  return b + 'B';
+}


⚠️ Potential issue | 🟡 Minor

Same zero-handling inconsistency as run.html.

This formatBytes should also return '-' for zero/falsy values to match the Rust backend's fmt_bytes behavior.

Proposed fix

function formatBytes(b) { + if (!b) return '-'; if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB'; if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB'; if (b >= 1024) return (b / 1024).toFixed(1) + 'KB'; return b + 'B'; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

function formatBytes(b) {

if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';

if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';

if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';

return b + 'B';

}

function formatBytes(b) {

if (!b) return '-';

if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';

if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';

if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';

return b + 'B';

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bench/dashboard/competitive.html` around lines 74 - 79, formatBytes currently formats 0 (and other falsy values) as "0B", but the Rust backend's fmt_bytes returns "-" for zero/falsy values; update the formatBytes function to return '-' when the input is falsy (e.g., if (!b) return '-') before the existing size checks so that formatBytes aligns with the backend behavior; reference the formatBytes function to locate and change this behavior.

coderabbitai · 2026-03-31T18:12:42Z

+function formatBytes(b) {
+  if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';
+  if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';
+  if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';
+  return b + 'B';
+}


⚠️ Potential issue | 🟡 Minor

formatBytes lacks a zero check, causing 0B output instead of empty/dash.

The Rust fmt_bytes returns "-" for zero bytes, but this JS version returns "0B". When an agent has no RSS data (avg_max_rss: 0), the chart will show "0B" instead of indicating missing data.

Proposed fix

function formatBytes(b) { + if (!b) return '-'; if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB'; if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB'; if (b >= 1024) return (b / 1024).toFixed(1) + 'KB'; return b + 'B'; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

function formatBytes(b) {

if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';

if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';

if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';

return b + 'B';

}

function formatBytes(b) {

if (!b) return '-';

if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';

if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';

if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';

return b + 'B';

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bench/dashboard/run.html` around lines 80 - 85, The formatBytes function returns "0B" for zero input but should mirror Rust's fmt_bytes by returning "-" for zero/missing values; modify the formatBytes(b) function to first check if b === 0 (or b == null/undefined if you want to treat missing as zero) and return "-" before the size comparisons, keeping the existing GB/MB/KB formatting logic intact so callers like avg_max_rss display "-" instead of "0B".

- Fix 'Best Competitive Ratio' bug by ignoring agents with 0/null LPS - Fix empty rate-ingest charts by updating rate_bench.rs output format - Add dynamic DATA_BASE fallback to all dashboard pages for local development - Add Peak RSS and Peak CPU % charts to individual run detail pages - Improve LPS display in competitive table (Avg ± Stddev) - Make dashboard tables responsive with overflow-x: auto

github-actions

Two blocking runtime regressions remain:

bench/dashboard/competitive.html line 96 references DATA_BASE (loadJSON(DATA_BASE + '/index.json')), but this constant is not declared anywhere in the file after the PR changes. In the browser this throws ReferenceError: DATA_BASE is not defined, so the page never renders.
bench/dashboard/run.html line 118 has the same issue (loadJSON(DATA_BASE + '/runs/' + runId + '.json')) with no in-file declaration of DATA_BASE. Opening a run detail page fails immediately with the same ReferenceError and no run data is shown.

Please restore a valid DATA_BASE definition or switch both call sites to a defined base URL source before merge.

Note

🔒 Integrity filtering filtered 1 item

Integrity filtering activated and filtered the following item during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.

pr:#426 (pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)

What is this? | From workflow: AI: PR Review

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

github-actions Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread bench/dashboard/competitive.html Outdated

coderabbitai Bot requested changes Mar 31, 2026

View reviewed changes

Comment thread bench/dashboard/run.html Outdated

Comment thread bench/dashboard/run.html

Comment thread crates/logfwd-competitive-bench/src/summarize.rs

coderabbitai Bot requested changes Mar 31, 2026

View reviewed changes

strawgate force-pushed the fix/bench-reporting-v3 branch from 67c24eb to ab1c651 Compare March 31, 2026 18:46

strawgate merged commit c866429 into master Mar 31, 2026
9 of 11 checks passed

strawgate deleted the fix/bench-reporting-v3 branch March 31, 2026 18:47

github-actions Bot requested changes Mar 31, 2026

View reviewed changes

This was referenced Mar 31, 2026

feat: add pprof-rs CPU profiling and fix dhat profiler flushing #449

Merged

fix: dashboard source — timing fields, favicon, CPU chart consistency #624

Merged

coderabbitai Bot mentioned this pull request Apr 12, 2026

bench: canonical dashboard data paths + publish index to data/* #1828

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: benchmark timeouts and dashboard memory usage tracking#426

fix: benchmark timeouts and dashboard memory usage tracking#426
strawgate merged 2 commits into
masterfrom
fix/bench-reporting-v3

strawgate commented Mar 31, 2026

Uh oh!

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading

Review failed

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Uh oh!

coderabbitai Bot Mar 31, 2026

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

strawgate commented Mar 31, 2026

Key Changes

Uh oh!

coderabbitai Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Possibly related PRs

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading