fix: benchmark timeouts and dashboard memory usage tracking#426
Conversation
- Increase competitive benchmark lines from 5M to 20M for better stability - Increase blackhole wait timeout from 120s to 300s to prevent OTel timeouts - Add Peak RSS and Peak CPU tracking to competitive benchmarks - Add Memory Usage column to competitive dashboard table - Add Resource Usage charts (Peak RSS/CPU) to run detail page - Ensure rate-ingest benchmarks are correctly published to dashboard
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughIncreases the workflow-dispatch default and env fallback for benchmark Possibly related PRs
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bench/dashboard/competitive.html (1)
102-118:⚠️ Potential issue | 🟠 MajorNormalize and reorder run IDs before treating them as chronological data.
The updater still prepends new entries with
insert(0, entry_id)and tolerates legacy{ id: ... }items inindex.runs. Using that array directly here can fetchdata/runs/[object Object].json, reverses the trend x-axis, and makeslatestRun = allRuns[allRuns.length - 1]point at the oldest run.Proposed fix
- const runIds = index.competitive_runs || index.runs || []; + const runIds = (index.competitive_runs || index.runs || []) + .map(run => typeof run === 'string' ? run : run?.id) + .filter(Boolean) + .reverse(); if (runIds.length === 0) { document.getElementById('loading').textContent = 'No competitive runs found.'; return; } const allRuns = (await Promise.all(runIds.map(id => loadJSON('data/runs/' + id + '.json')))).filter(Boolean);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bench/dashboard/competitive.html` around lines 102 - 118, Normalize run IDs from index.competitive_runs/index.runs by mapping each item to a string id (handle legacy objects with an id property: e.g., runIds.map(r => typeof r === 'object' ? r.id : r)), filter out falsy values, then ensure chronological order by reversing if the updater prepends entries (so oldest-first), before calling loadJSON and computing allRuns and latestRun; update references to runIds, loadJSON, allRuns, and latestRun accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bench/dashboard/run.html`:
- Around line 118-120: The page now loads run data from data/runs/${runId}.json
(via loadJSON and runId) but the deployment workflow only publishes
bench/dashboard/ to Pages, so those JSON files never reach the site and the
fallback message (document.getElementById('loading').textContent...) always
appears; fix this by updating the CI workflow (.github/workflows/bench.yml) to
include the generated data files in the Pages artifact (either copy
data/runs/*.json under bench/dashboard/ before upload or add data/runs/** to the
upload/publish step) so loadJSON can find data/runs/${runId}.json at runtime.
Ensure both bench/dashboard/ and data/runs/*.json are staged for the Pages
upload so the published site and competitive.html can load run JSON.
- Around line 190-230: The CPU chart never gets populated or instantiated:
inside the agents loop (where rssData/maxRss are computed) also compute maxCpu
from run.scenarios?.[sn]?.[agent]?.avg_max_cpu and push it into a cpuData array
and ensure colors/labels mirror rss (use AGENT_COLORS[agent] || '#64748b');
after rendering the RSS chart, check cpuData.length>0 and create a new Chart
targeting cpuCanvas (the chart-peak-cpu canvas) with the same horizontal bar
config but use formatPercent (or an existing percent formatter) for tooltip and
x-axis ticks and set datasets: [{ data: cpuData, backgroundColor: colors,
borderRadius: 3 }] so the “Peak CPU Usage” panel shows a real chart.
In `@crates/logfwd-competitive-bench/src/summarize.rs`:
- Around line 79-105: avg_max_cpu currently divides the CPU ms delta by a
hardcoded 10.0 (assuming 1s between samples); replace that with the actual
elapsed time between samples using the samples' elapsed_sec fields: compute
elapsed_delta = curr.elapsed_sec - prev.elapsed_sec, skip or treat as invalid
when elapsed_delta <= 0.0, and compute pct as (diff_ms as f64) / (elapsed_delta
* 1000.0) * 100.0 so the CPU percent uses the real time base; update the loop
inside avg_max_cpu (referencing self.runs, r.samples, cpu_user_ms, cpu_sys_ms,
and elapsed_sec) and keep the existing filter for v > 0.0.
---
Outside diff comments:
In `@bench/dashboard/competitive.html`:
- Around line 102-118: Normalize run IDs from index.competitive_runs/index.runs
by mapping each item to a string id (handle legacy objects with an id property:
e.g., runIds.map(r => typeof r === 'object' ? r.id : r)), filter out falsy
values, then ensure chronological order by reversing if the updater prepends
entries (so oldest-first), before calling loadJSON and computing allRuns and
latestRun; update references to runIds, loadJSON, allRuns, and latestRun
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1d303d9e-7d4e-4d6f-8c89-df69a4d1ed8a
📒 Files selected for processing (6)
.github/workflows/bench.ymlbench/dashboard/competitive.htmlbench/dashboard/run.htmlbench/dashboard/style.csscrates/logfwd-competitive-bench/src/runner.rscrates/logfwd-competitive-bench/src/summarize.rs
|
Addressed all 4 unresolved review threads with targeted fixes and pushed commit
Validation
Note 🔒 Integrity filtering filtered 1 itemIntegrity filtering activated and filtered the following item during workflow execution.
What is this? | From workflow: AI: Address PR Review Feedback Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bench/dashboard/competitive.html`:
- Around line 74-79: formatBytes currently formats 0 (and other falsy values) as
"0B", but the Rust backend's fmt_bytes returns "-" for zero/falsy values; update
the formatBytes function to return '-' when the input is falsy (e.g., if (!b)
return '-') before the existing size checks so that formatBytes aligns with the
backend behavior; reference the formatBytes function to locate and change this
behavior.
In `@bench/dashboard/run.html`:
- Around line 80-85: The formatBytes function returns "0B" for zero input but
should mirror Rust's fmt_bytes by returning "-" for zero/missing values; modify
the formatBytes(b) function to first check if b === 0 (or b == null/undefined if
you want to treat missing as zero) and return "-" before the size comparisons,
keeping the existing GB/MB/KB formatting logic intact so callers like
avg_max_rss display "-" instead of "0B".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0eb00a1e-95c9-4fb5-95cb-488b67eaaf38
📒 Files selected for processing (3)
bench/dashboard/competitive.htmlbench/dashboard/run.htmlcrates/logfwd-competitive-bench/src/summarize.rs
| function formatBytes(b) { | ||
| if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB'; | ||
| if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB'; | ||
| if (b >= 1024) return (b / 1024).toFixed(1) + 'KB'; | ||
| return b + 'B'; | ||
| } |
There was a problem hiding this comment.
Same zero-handling inconsistency as run.html.
This formatBytes should also return '-' for zero/falsy values to match the Rust backend's fmt_bytes behavior.
Proposed fix
function formatBytes(b) {
+ if (!b) return '-';
if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';
if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';
if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';
return b + 'B';
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function formatBytes(b) { | |
| if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB'; | |
| if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB'; | |
| if (b >= 1024) return (b / 1024).toFixed(1) + 'KB'; | |
| return b + 'B'; | |
| } | |
| function formatBytes(b) { | |
| if (!b) return '-'; | |
| if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB'; | |
| if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB'; | |
| if (b >= 1024) return (b / 1024).toFixed(1) + 'KB'; | |
| return b + 'B'; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bench/dashboard/competitive.html` around lines 74 - 79, formatBytes currently
formats 0 (and other falsy values) as "0B", but the Rust backend's fmt_bytes
returns "-" for zero/falsy values; update the formatBytes function to return '-'
when the input is falsy (e.g., if (!b) return '-') before the existing size
checks so that formatBytes aligns with the backend behavior; reference the
formatBytes function to locate and change this behavior.
| function formatBytes(b) { | ||
| if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB'; | ||
| if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB'; | ||
| if (b >= 1024) return (b / 1024).toFixed(1) + 'KB'; | ||
| return b + 'B'; | ||
| } |
There was a problem hiding this comment.
formatBytes lacks a zero check, causing 0B output instead of empty/dash.
The Rust fmt_bytes returns "-" for zero bytes, but this JS version returns "0B". When an agent has no RSS data (avg_max_rss: 0), the chart will show "0B" instead of indicating missing data.
Proposed fix
function formatBytes(b) {
+ if (!b) return '-';
if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB';
if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB';
if (b >= 1024) return (b / 1024).toFixed(1) + 'KB';
return b + 'B';
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function formatBytes(b) { | |
| if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB'; | |
| if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB'; | |
| if (b >= 1024) return (b / 1024).toFixed(1) + 'KB'; | |
| return b + 'B'; | |
| } | |
| function formatBytes(b) { | |
| if (!b) return '-'; | |
| if (b >= 1024 * 1024 * 1024) return (b / (1024 * 1024 * 1024)).toFixed(1) + 'GB'; | |
| if (b >= 1024 * 1024) return (b / (1024 * 1024)).toFixed(1) + 'MB'; | |
| if (b >= 1024) return (b / 1024).toFixed(1) + 'KB'; | |
| return b + 'B'; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bench/dashboard/run.html` around lines 80 - 85, The formatBytes function
returns "0B" for zero input but should mirror Rust's fmt_bytes by returning "-"
for zero/missing values; modify the formatBytes(b) function to first check if b
=== 0 (or b == null/undefined if you want to treat missing as zero) and return
"-" before the size comparisons, keeping the existing GB/MB/KB formatting logic
intact so callers like avg_max_rss display "-" instead of "0B".
- Fix 'Best Competitive Ratio' bug by ignoring agents with 0/null LPS - Fix empty rate-ingest charts by updating rate_bench.rs output format - Add dynamic DATA_BASE fallback to all dashboard pages for local development - Add Peak RSS and Peak CPU % charts to individual run detail pages - Improve LPS display in competitive table (Avg ± Stddev) - Make dashboard tables responsive with overflow-x: auto
67c24eb to
ab1c651
Compare
There was a problem hiding this comment.
Two blocking runtime regressions remain:
-
bench/dashboard/competitive.htmlline 96 referencesDATA_BASE(loadJSON(DATA_BASE + '/index.json')), but this constant is not declared anywhere in the file after the PR changes. In the browser this throwsReferenceError: DATA_BASE is not defined, so the page never renders. -
bench/dashboard/run.htmlline 118 has the same issue (loadJSON(DATA_BASE + '/runs/' + runId + '.json')) with no in-file declaration ofDATA_BASE. Opening a run detail page fails immediately with the sameReferenceErrorand no run data is shown.
Please restore a valid DATA_BASE definition or switch both call sites to a defined base URL source before merge.
Note
🔒 Integrity filtering filtered 1 item
Integrity filtering activated and filtered the following item during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.
- pr:#426 (
pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
What is this? | From workflow: AI: PR Review
Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.
Addresses #424 by improving the benchmark experience and dashboard formatting.
Key Changes
logfwd, especially since it was finishing in ~5s previously.summarizetool now aggregates peak RSS and peak CPU % for each benchmarked agent.overflow-x: auto) to ensure the dashboard remains usable on smaller screens or when many agents are compared.