Skip to content

fix: benchmark timeouts and dashboard memory usage tracking#426

Merged
strawgate merged 2 commits into
masterfrom
fix/bench-reporting-v3
Mar 31, 2026
Merged

fix: benchmark timeouts and dashboard memory usage tracking#426
strawgate merged 2 commits into
masterfrom
fix/bench-reporting-v3

Conversation

@strawgate
Copy link
Copy Markdown
Owner

Addresses #424 by improving the benchmark experience and dashboard formatting.

Key Changes

  • Longer Benchmarks: Increased default line count to 20M events to provide a more stable measurement for fast agents like logfwd, especially since it was finishing in ~5s previously.
  • Reliable Testing: Increased the blackhole polling timeout to 300s to prevent premature timeouts during high-throughput runs.
  • Fair Comparisons: Reverted OTel Collector agent configuration changes to keep 'standard' behavior while still fixing the harness-level timeout.
  • New Resource Metrics: The summarize tool now aggregates peak RSS and peak CPU % for each benchmarked agent.
  • Enhanced Dashboard:
    • Added a Memory column to the Competitive Analysis table.
    • Added Resource Usage horizontal bar charts to the Run Detail page, showing peak memory and CPU consumption across all scenarios for a specific run.
    • Added responsive table containers (overflow-x: auto) to ensure the dashboard remains usable on smaller screens or when many agents are compared.

- Increase competitive benchmark lines from 5M to 20M for better stability
- Increase blackhole wait timeout from 120s to 300s to prevent OTel timeouts
- Add Peak RSS and Peak CPU tracking to competitive benchmarks
- Add Memory Usage column to competitive dashboard table
- Add Resource Usage charts (Peak RSS/CPU) to run detail page
- Ensure rate-ingest benchmarks are correctly published to dashboard
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 31, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 74539a32-9a5c-4b45-bf13-96f611a8af1d

📥 Commits

Reviewing files that changed from the base of the PR and between 67c24eb and ab1c651.

📒 Files selected for processing (2)
  • bench/dashboard/competitive.html
  • bench/dashboard/run.html

Walkthrough

Increases the workflow-dispatch default and env fallback for benchmark LINES from 5,000,000 to 20,000,000; extends the wait_blackhole_done timeout from 120s to 300s in three runner flows; adds per-run aggregation helpers and formatting for peak RSS and peak CPU, exposes avg_max_rss/avg_max_cpu in dashboard JSON, and adds a “Peak RSS” column to CLI/table output; updates dashboard HTML/CSS to render per-agent Memory columns, add formatBytes, enable horizontal scrolling/min-width for tables, add a hidden resource-usage section with charts, and change run JSON loading to an inline raw GitHub URL.

Possibly related PRs

  • strawgate/memagent PR 426: Makes largely overlapping changes across workflow LINES, runner timeout, summarize aggregation helpers, and dashboard HTML/CSS (near-identical code-level edits).
  • strawgate/memagent PR 331: Modifies runner wait_blackhole_done usage and timed_out reporting; closely related to the timeout increases here.
  • strawgate/memagent PR 301: Introduced earlier dashboard/summarize/runner changes that this PR continues or adjusts (same files and fields touched).

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread bench/dashboard/competitive.html Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bench/dashboard/competitive.html (1)

102-118: ⚠️ Potential issue | 🟠 Major

Normalize and reorder run IDs before treating them as chronological data.

The updater still prepends new entries with insert(0, entry_id) and tolerates legacy { id: ... } items in index.runs. Using that array directly here can fetch data/runs/[object Object].json, reverses the trend x-axis, and makes latestRun = allRuns[allRuns.length - 1] point at the oldest run.

Proposed fix
-  const runIds = index.competitive_runs || index.runs || [];
+  const runIds = (index.competitive_runs || index.runs || [])
+    .map(run => typeof run === 'string' ? run : run?.id)
+    .filter(Boolean)
+    .reverse();

   if (runIds.length === 0) {
     document.getElementById('loading').textContent = 'No competitive runs found.';
     return;
   }

   const allRuns = (await Promise.all(runIds.map(id => loadJSON('data/runs/' + id + '.json')))).filter(Boolean);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench/dashboard/competitive.html` around lines 102 - 118, Normalize run IDs
from index.competitive_runs/index.runs by mapping each item to a string id
(handle legacy objects with an id property: e.g., runIds.map(r => typeof r ===
'object' ? r.id : r)), filter out falsy values, then ensure chronological order
by reversing if the updater prepends entries (so oldest-first), before calling
loadJSON and computing allRuns and latestRun; update references to runIds,
loadJSON, allRuns, and latestRun accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bench/dashboard/run.html`:
- Around line 118-120: The page now loads run data from data/runs/${runId}.json
(via loadJSON and runId) but the deployment workflow only publishes
bench/dashboard/ to Pages, so those JSON files never reach the site and the
fallback message (document.getElementById('loading').textContent...) always
appears; fix this by updating the CI workflow (.github/workflows/bench.yml) to
include the generated data files in the Pages artifact (either copy
data/runs/*.json under bench/dashboard/ before upload or add data/runs/** to the
upload/publish step) so loadJSON can find data/runs/${runId}.json at runtime.
Ensure both bench/dashboard/ and data/runs/*.json are staged for the Pages
upload so the published site and competitive.html can load run JSON.
- Around line 190-230: The CPU chart never gets populated or instantiated:
inside the agents loop (where rssData/maxRss are computed) also compute maxCpu
from run.scenarios?.[sn]?.[agent]?.avg_max_cpu and push it into a cpuData array
and ensure colors/labels mirror rss (use AGENT_COLORS[agent] || '#64748b');
after rendering the RSS chart, check cpuData.length>0 and create a new Chart
targeting cpuCanvas (the chart-peak-cpu canvas) with the same horizontal bar
config but use formatPercent (or an existing percent formatter) for tooltip and
x-axis ticks and set datasets: [{ data: cpuData, backgroundColor: colors,
borderRadius: 3 }] so the “Peak CPU Usage” panel shows a real chart.

In `@crates/logfwd-competitive-bench/src/summarize.rs`:
- Around line 79-105: avg_max_cpu currently divides the CPU ms delta by a
hardcoded 10.0 (assuming 1s between samples); replace that with the actual
elapsed time between samples using the samples' elapsed_sec fields: compute
elapsed_delta = curr.elapsed_sec - prev.elapsed_sec, skip or treat as invalid
when elapsed_delta <= 0.0, and compute pct as (diff_ms as f64) / (elapsed_delta
* 1000.0) * 100.0 so the CPU percent uses the real time base; update the loop
inside avg_max_cpu (referencing self.runs, r.samples, cpu_user_ms, cpu_sys_ms,
and elapsed_sec) and keep the existing filter for v > 0.0.

---

Outside diff comments:
In `@bench/dashboard/competitive.html`:
- Around line 102-118: Normalize run IDs from index.competitive_runs/index.runs
by mapping each item to a string id (handle legacy objects with an id property:
e.g., runIds.map(r => typeof r === 'object' ? r.id : r)), filter out falsy
values, then ensure chronological order by reversing if the updater prepends
entries (so oldest-first), before calling loadJSON and computing allRuns and
latestRun; update references to runIds, loadJSON, allRuns, and latestRun
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d303d9e-7d4e-4d6f-8c89-df69a4d1ed8a

📥 Commits

Reviewing files that changed from the base of the PR and between c42b4cf and bcb7a9f.

📒 Files selected for processing (6)
  • .github/workflows/bench.yml
  • bench/dashboard/competitive.html
  • bench/dashboard/run.html
  • bench/dashboard/style.css
  • crates/logfwd-competitive-bench/src/runner.rs
  • crates/logfwd-competitive-bench/src/summarize.rs

Comment thread bench/dashboard/run.html Outdated
Comment thread bench/dashboard/run.html
Comment thread crates/logfwd-competitive-bench/src/summarize.rs
@github-actions
Copy link
Copy Markdown
Contributor

Addressed all 4 unresolved review threads with targeted fixes and pushed commit 3a954a1.

  • Fixed dashboard data source mismatch in competitive.html and run.html by restoring fetches to the published bench-data source.
  • Implemented the missing Peak CPU chart rendering in run.html.
  • Corrected CPU percentage calculation in summarize.rs to use actual sample interval (elapsed_sec delta) instead of a fixed divisor.
  • Replied inline on each review comment and resolved all addressed threads.
Validation
  • just ci: could not run in this environment (just: command not found).
  • cargo test -p logfwd-competitive-bench --quiet (with RUSTC_WRAPPER=): passed (2 tests).

Note

🔒 Integrity filtering filtered 1 item

Integrity filtering activated and filtered the following item during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.


What is this? | From workflow: AI: Address PR Review Feedback

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bench/dashboard/competitive.html`:
- Around line 74-79: formatBytes currently formats 0 (and other falsy values) as
"0B", but the Rust backend's fmt_bytes returns "-" for zero/falsy values; update
the formatBytes function to return '-' when the input is falsy (e.g., if (!b)
return '-') before the existing size checks so that formatBytes aligns with the
backend behavior; reference the formatBytes function to locate and change this
behavior.

In `@bench/dashboard/run.html`:
- Around line 80-85: The formatBytes function returns "0B" for zero input but
should mirror Rust's fmt_bytes by returning "-" for zero/missing values; modify
the formatBytes(b) function to first check if b === 0 (or b == null/undefined if
you want to treat missing as zero) and return "-" before the size comparisons,
keeping the existing GB/MB/KB formatting logic intact so callers like
avg_max_rss display "-" instead of "0B".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0eb00a1e-95c9-4fb5-95cb-488b67eaaf38

📥 Commits

Reviewing files that changed from the base of the PR and between bcb7a9f and 67c24eb.

📒 Files selected for processing (3)
  • bench/dashboard/competitive.html
  • bench/dashboard/run.html
  • crates/logfwd-competitive-bench/src/summarize.rs

Comment on lines +74 to +79
function formatBytes(b) {
if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';
if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';
if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';
return b + 'B';
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same zero-handling inconsistency as run.html.

This formatBytes should also return '-' for zero/falsy values to match the Rust backend's fmt_bytes behavior.

Proposed fix
 function formatBytes(b) {
+  if (!b) return '-';
   if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';
   if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';
   if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';
   return b + 'B';
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function formatBytes(b) {
if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';
if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';
if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';
return b + 'B';
}
function formatBytes(b) {
if (!b) return '-';
if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';
if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';
if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';
return b + 'B';
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench/dashboard/competitive.html` around lines 74 - 79, formatBytes currently
formats 0 (and other falsy values) as "0B", but the Rust backend's fmt_bytes
returns "-" for zero/falsy values; update the formatBytes function to return '-'
when the input is falsy (e.g., if (!b) return '-') before the existing size
checks so that formatBytes aligns with the backend behavior; reference the
formatBytes function to locate and change this behavior.

Comment thread bench/dashboard/run.html
Comment on lines +80 to +85
function formatBytes(b) {
if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';
if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';
if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';
return b + 'B';
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

formatBytes lacks a zero check, causing 0B output instead of empty/dash.

The Rust fmt_bytes returns "-" for zero bytes, but this JS version returns "0B". When an agent has no RSS data (avg_max_rss: 0), the chart will show "0B" instead of indicating missing data.

Proposed fix
 function formatBytes(b) {
+  if (!b) return '-';
   if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';
   if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';
   if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';
   return b + 'B';
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function formatBytes(b) {
if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';
if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';
if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';
return b + 'B';
}
function formatBytes(b) {
if (!b) return '-';
if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';
if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';
if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';
return b + 'B';
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bench/dashboard/run.html` around lines 80 - 85, The formatBytes function
returns "0B" for zero input but should mirror Rust's fmt_bytes by returning "-"
for zero/missing values; modify the formatBytes(b) function to first check if b
=== 0 (or b == null/undefined if you want to treat missing as zero) and return
"-" before the size comparisons, keeping the existing GB/MB/KB formatting logic
intact so callers like avg_max_rss display "-" instead of "0B".

- Fix 'Best Competitive Ratio' bug by ignoring agents with 0/null LPS
- Fix empty rate-ingest charts by updating rate_bench.rs output format
- Add dynamic DATA_BASE fallback to all dashboard pages for local development
- Add Peak RSS and Peak CPU % charts to individual run detail pages
- Improve LPS display in competitive table (Avg ± Stddev)
- Make dashboard tables responsive with overflow-x: auto
@strawgate strawgate force-pushed the fix/bench-reporting-v3 branch from 67c24eb to ab1c651 Compare March 31, 2026 18:46
@strawgate strawgate merged commit c866429 into master Mar 31, 2026
9 of 11 checks passed
@strawgate strawgate deleted the fix/bench-reporting-v3 branch March 31, 2026 18:47
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two blocking runtime regressions remain:

  1. bench/dashboard/competitive.html line 96 references DATA_BASE (loadJSON(DATA_BASE + '/index.json')), but this constant is not declared anywhere in the file after the PR changes. In the browser this throws ReferenceError: DATA_BASE is not defined, so the page never renders.

  2. bench/dashboard/run.html line 118 has the same issue (loadJSON(DATA_BASE + '/runs/' + runId + '.json')) with no in-file declaration of DATA_BASE. Opening a run detail page fails immediately with the same ReferenceError and no run data is shown.

Please restore a valid DATA_BASE definition or switch both call sites to a defined base URL source before merge.

Note

🔒 Integrity filtering filtered 1 item

Integrity filtering activated and filtered the following item during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.

  • pr:#426 (pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)

What is this? | From workflow: AI: PR Review

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant