feat: rich diagnostics dashboard with live charts and system stats#454
feat: rich diagnostics dashboard with live charts and system stats#454strawgate wants to merge 9 commits into
Conversation
New dashboard (replaces the 138-line prototype): - Dark theme, tab-based UI (Overview / Pipelines / Config) - 6 live time-series charts: lines/sec, bytes/sec, errors/sec, CPU%, RSS memory, disk read/sec — rendered with Canvas API, ring buffer history from page open - Pipeline flow diagrams: input → transform → output with live counters, drop rate, error badges - Config viewer with YAML syntax highlighting - Status ribbon with uptime, version, PID, threads, CPU, RSS New API endpoints: - GET /api/system — CPU%, RSS, threads, disk I/O via getrusage (macOS) and /proc/self (Linux). No new deps beyond libc. - GET /api/config — raw YAML + path for the config viewer Architecture: - Single embedded HTML file (include_str!), no external deps - System stats use Cell<> for CPU delta tracking on the diagnostics thread — no locks on the hot path - libc added as direct dep (was already transitive) 3 new tests for the new endpoints (110 total). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds direct dependencies Possibly related PRs
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/dashboard.html`:
- Around line 563-575: In highlightYaml, the comment replacement currently
treats any '#' to EOL as a comment and can false-positive inside quoted strings
(e.g. "foo # bar"); change the comment handling so '#' is only highlighted when
not inside a quoted string — either replace the final .replace(/(#.*)$/gm, ...)
with a regex that uses negative lookbehind/alternation to ensure the '#' is not
inside single or double quotes, or implement a small scan in highlightYaml that
walks the line and skips over quoted segments (respecting escapes) before
wrapping a trailing unquoted '#' in '<span class="yc">...</span>' — update the
replacement logic in highlightYaml accordingly.
In `@src/diagnostics.rs`:
- Around line 324-338: The current serve_config builds JSON by hand and calls
esc() which only handles a few characters; update serve_config to produce valid
JSON by serializing the response object with a JSON serializer (e.g., call
serde_json::to_string on a struct or map containing config_path and config_yaml)
instead of manual formatting, or alternatively extend the esc() implementation
(the esc function referenced in this diff) to escape all control characters
U+0000–U+001F as \uXXXX in addition to the existing escapes; locate serve_config
and esc to apply the change and ensure the Response body is produced from the
serializer output so all control chars are safely encoded.
- Around line 142-145: Add a doc comment for the public method `set_config` in
`diagnostics.rs` describing its purpose (update stored config path and YAML),
the parameters (`path: &str` and `yaml: &str`), and any side effects (it mutates
`self.config_path` and `self.config_yaml`); place the doc comment immediately
above the `pub fn set_config(&mut self, path: &str, yaml: &str)` definition so
the public API is documented for callers such as `main.rs`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 10b1043d-337c-4955-9791-842a5552a582
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
Cargo.tomlsrc/dashboard.htmlsrc/diagnostics.rssrc/main.rs
| function highlightYaml(yaml) { | ||
| return esc(yaml).replace(/^(\s*)([\w._-]+)(:)/gm, function(m, ws, key, colon) { | ||
| return ws + '<span class="yk">' + key + '</span>' + colon; | ||
| }).replace(/:\s+("[^]*?"|'[^]*?')/g, function(m, val) { | ||
| return ': <span class="yv">' + val + '</span>'; | ||
| }).replace(/:\s+(\d+(?:\.\d+)?)\s*$/gm, function(m, num) { | ||
| return ': <span class="yn">' + num + '</span>'; | ||
| }).replace(/:\s+(true|false)\s*$/gm, function(m, b) { | ||
| return ': <span class="yn">' + b + '</span>'; | ||
| }).replace(/(#.*)$/gm, function(m, comment) { | ||
| return '<span class="yc">' + comment + '</span>'; | ||
| }).replace(/(\|)/g, '<span class="yp">$1</span>'); | ||
| } |
There was a problem hiding this comment.
Minor: comment highlighting may false-positive inside string values.
The # comment regex at line 572 runs after string highlighting, but a string like "foo # bar" would show # bar" as a comment since the regex matches any # to EOL. Edge case but worth noting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/dashboard.html` around lines 563 - 575, In highlightYaml, the comment
replacement currently treats any '#' to EOL as a comment and can false-positive
inside quoted strings (e.g. "foo # bar"); change the comment handling so '#' is
only highlighted when not inside a quoted string — either replace the final
.replace(/(#.*)$/gm, ...) with a regex that uses negative lookbehind/alternation
to ensure the '#' is not inside single or double quotes, or implement a small
scan in highlightYaml that walks the line and skips over quoted segments
(respecting escapes) before wrapping a trailing unquoted '#' in '<span
class="yc">...</span>' — update the replacement logic in highlightYaml
accordingly.
| pub fn set_config(&mut self, path: &str, yaml: &str) { | ||
| self.config_path = path.to_string(); | ||
| self.config_yaml = yaml.to_string(); | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Missing doc comment on public method.
Per coding guidelines, public APIs should have doc comments. set_config is public and called from main.rs.
📝 Suggested doc comment
+ /// Store the config file path and raw YAML content for the `/api/config` endpoint.
pub fn set_config(&mut self, path: &str, yaml: &str) {
self.config_path = path.to_string();
self.config_yaml = yaml.to_string();
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub fn set_config(&mut self, path: &str, yaml: &str) { | |
| self.config_path = path.to_string(); | |
| self.config_yaml = yaml.to_string(); | |
| } | |
| /// Store the config file path and raw YAML content for the `/api/config` endpoint. | |
| pub fn set_config(&mut self, path: &str, yaml: &str) { | |
| self.config_path = path.to_string(); | |
| self.config_yaml = yaml.to_string(); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/diagnostics.rs` around lines 142 - 145, Add a doc comment for the public
method `set_config` in `diagnostics.rs` describing its purpose (update stored
config path and YAML), the parameters (`path: &str` and `yaml: &str`), and any
side effects (it mutates `self.config_path` and `self.config_yaml`); place the
doc comment immediately above the `pub fn set_config(&mut self, path: &str,
yaml: &str)` definition so the public API is documented for callers such as
`main.rs`.
| fn serve_config( | ||
| &self, | ||
| request: tiny_http::Request, | ||
| ) -> Result<(), Box<dyn std::error::Error>> { | ||
| let body = format!( | ||
| r#"{{"path":"{}","raw_yaml":"{}"}}"#, | ||
| esc(&self.config_path), | ||
| esc(&self.config_yaml), | ||
| ); | ||
| let resp = tiny_http::Response::from_string(body).with_header( | ||
| tiny_http::Header::from_bytes(&b"Content-Type"[..], &b"application/json"[..]).unwrap(), | ||
| ); | ||
| request.respond(resp)?; | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
esc() doesn't escape all JSON control characters.
The esc() function (lines 461-474) handles \n, \r, \t, \\, and ", but JSON requires escaping all control characters (U+0000–U+001F). If YAML config contains e.g. form-feed or vertical tab, the response becomes invalid JSON.
🛠️ Proposed fix
fn esc(s: &str) -> String {
let mut out = String::with_capacity(s.len());
for c in s.chars() {
match c {
'"' => out.push_str("\\\""),
'\\' => out.push_str("\\\\"),
'\n' => out.push_str("\\n"),
'\r' => out.push_str("\\r"),
'\t' => out.push_str("\\t"),
- _ => out.push(c),
+ c if c.is_control() => {
+ // Escape other control chars as \uXXXX
+ for unit in c.encode_utf16(&mut [0; 2]) {
+ out.push_str(&format!("\\u{:04x}", unit));
+ }
+ }
+ _ => out.push(c),
}
}
out
}Also applies to: 460-474
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/diagnostics.rs` around lines 324 - 338, The current serve_config builds
JSON by hand and calls esc() which only handles a few characters; update
serve_config to produce valid JSON by serializing the response object with a
JSON serializer (e.g., call serde_json::to_string on a struct or map containing
config_path and config_yaml) instead of manual formatting, or alternatively
extend the esc() implementation (the esc function referenced in this diff) to
escape all control characters U+0000–U+001F as \uXXXX in addition to the
existing escapes; locate serve_config and esc to apply the change and ensure the
Response body is produced from the serializer output so all control chars are
safely encoded.
| request: tiny_http::Request, | ||
| ) -> Result<(), Box<dyn std::error::Error>> { | ||
| let body = format!( | ||
| r#"{{"path":"{}","raw_yaml":"{}"}}"#, |
There was a problem hiding this comment.
/api/config returns raw_yaml with the full loaded config and no access control in the route handler. In typical deployments where diagnostics is reachable beyond localhost, this exposes secrets (tokens/passwords/credentials in URLs, internal endpoints, file paths) to any caller.
Please gate this endpoint behind explicit auth/opt-in and redact sensitive keys before returning config content. If this is only for local debugging, consider disabling it by default and returning metadata only (for example path + checksum) in production mode.
Diagnostics: - Replace hand-rolled getrusage/procfs with `sysinfo` crate for cross-platform process metrics (CPU%, current RSS, disk I/O) - Add FD count (/proc/self/fd on Linux, /dev/fd on macOS) and soft FD limit via getrlimit - Expose system info: hostname, OS version, arch, CPU count, total memory — everything a support call needs at a glance - Return null for unavailable metrics instead of 0 Dashboard: - System Information panel: hostname, OS, arch, cores, total memory, FD limit, PID, version - Chart axis padding: Y-axis labels no longer overlap the line - X-axis time markers: 10s, 30s, 1m, 5m, "now" - Hover tooltip: shows value + "Xs ago" on mouseover - 5-minute rolling window with old points trimmed - Limits shown alongside values: RSS "15.9 MB / 32.0 GB", FDs "14 / 1.05 M" - Replaced disk chart with FD count chart (more useful, works cross-platform) - Fixed fmtBytesCompact for sub-KB values Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of stretching 2-3 data points across the full chart width, the X-axis is now anchored to a fixed window: [now - 5min, now]. Data appears on the right edge and grows leftward as it accumulates. Empty space on the left shows the full time range from the start. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Layout stability: - Render ribbon, pipeline summaries, and detail tables once as stable DOM structure, then update only text content via IDs (setText checks for changes before writing to avoid reflow) - tabular-nums + min-width on all numeric displays so digit width is constant across value changes - table-layout:fixed with explicit column widths so columns don't resize as values change - white-space:nowrap on ribbon items Adaptive time window: - Charts start at 10s width, snap to 30s/1m/2m/5m as data accumulates (instead of fixed 5min with tiny data in corner) - Window grows with oldest data point, capped at 5 minutes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/dashboard.html`:
- Around line 75-77: The tab elements currently rendered as non-focusable <div
class="tab" data-tab="..."> must be converted to accessible tab controls: wrap
the tab set in an element with role="tablist", change each .tab to a focusable
control (preferably a <button> or an element with role="tab" + tabindex="0"),
add aria-selected="true"/"false" on the active/inactive tabs, set aria-controls
to the corresponding tabpanel id and ensure the panels have role="tabpanel" and
matching ids, and implement keyboard handling for Enter/Space to activate a tab
and ArrowLeft/ArrowRight/Home/End to move focus between tabs; apply the same
changes for the other tab group referenced (lines 259-267) so both groups
support focus, ARIA state, and keyboard navigation.
- Around line 405-410: The catch handler for the fetch("/api/config")
prematurely sets state.configLoaded = true, preventing retry attempts after a
transient failure; modify the catch callback so it does not set
state.configLoaded to true (either leave it unchanged or explicitly set it to
false) and only updates document.getElementById("configContent").innerHTML with
the error message, ensuring the config tab can recover and retry fetching later;
locate the fetch("/api/config") block and update its catch(function(){...}) to
remove or change the state.configLoaded assignment.
- Around line 374-385: The headline display uses raw lps/bps/eps while the
charts use Math.max(0, rate), causing negative labels after resets; update each
headline update to use the same clamped value (e.g. compute clampedLps =
Math.max(0, lps) and pass that to c.c_lps.add and to
document.getElementById("c_lps_val").innerHTML via fmt(clampedLps), and do the
same for bps (use fmtBytes on the clampedBps) and eps so chart samples and
displayed rates stay in sync; refer to variables lps, bps, eps and chart objects
c.c_lps, c.c_bps, c.c_errs and the DOM ids "c_lps_val", "c_bps_val",
"c_errs_val".
In `@src/diagnostics.rs`:
- Around line 769-771: The tests use server.start() followed by a fixed
std::thread::sleep(Duration::from_millis(100)), which is racy; replace each
sleep (the occurrences immediately after server.start()) with a real readiness
check that polls the server until it accepts connections (e.g., loop for a short
timeout calling std::net::TcpStream::connect(addr) or an HTTP probe to the test
endpoint, breaking on success and sleeping a few ms between attempts), and fail
the test if the timeout elapses; update the code paths around server.start() and
the three std::thread::sleep(...) occurrences so the test only proceeds once the
TCP/HTTP connection is established.
- Around line 322-326: The code currently collapses missing process metrics into
sentinel zeros for cpu_pct, rss_bytes, virt_bytes, disk_read, and disk_written;
change these variables and the corresponding struct fields to Option<f64> for
percentages and Option<u64> for byte counters (e.g., cpu_pct -> Option<f64>,
rss_bytes/virt_bytes/disk_read/disk_written -> Option<u64>) and replace
unwrap_or(0.0)/unwrap_or(0) with map(...) (leaving None when proc_info is None)
so they serialize to null; update any struct definitions (e.g., the
diagnostics/process metrics struct) and downstream usage to handle Option values
and ensure serde continues to serialize None as null.
- Around line 560-577: The unsafe FFI usage in get_fd_limit (the unsafe block
calling libc::getrlimit with a zeroed libc::rlimit) needs a SAFETY comment
explaining why the call is sound: that rlim is properly initialized with
std::mem::zeroed() to provide a valid out-pointer, that libc::getrlimit only
writes to that struct and expects a non-null pointer, and that the call is
guarded by #[cfg(unix)] so it's only compiled on platforms where
libc::RLIMIT_NOFILE and getrlimit exist; add this SAFETY comment immediately
above the unsafe block in get_fd_limit, or replace the manual FFI with a safe
wrapper (e.g., rustix) if you prefer to remove unsafe entirely.
- Around line 541-543: get_fd_count() currently opens a directory
("/proc/self/fd" on Linux or "/dev/fd" on macOS) and counts entries, but the
directory stream itself allocates a file descriptor so the result is off by one;
change the logic in get_fd_count() to try reading "/proc/self/fd" then fall back
to "/dev/fd", count the entries from the ReadDir iterator, and subtract one to
exclude the directory's own fd (use saturating_sub(1) to avoid underflow) before
returning the u64; ensure failures still return None.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: cdee6a2c-80a4-49b3-904e-d2a490b55f6d
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
Cargo.tomlsrc/dashboard.htmlsrc/diagnostics.rs
| <div class="tab active" data-tab="overview">Overview</div> | ||
| <div class="tab" data-tab="pipelines">Pipelines</div> | ||
| <div class="tab" data-tab="config">Config</div> |
There was a problem hiding this comment.
Tabs need semantic controls and keyboard support.
These are clickable <div>s only. They aren't focusable and there is no keyboard/ARIA state management, so keyboard users can't switch views.
Also applies to: 259-267
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/dashboard.html` around lines 75 - 77, The tab elements currently rendered
as non-focusable <div class="tab" data-tab="..."> must be converted to
accessible tab controls: wrap the tab set in an element with role="tablist",
change each .tab to a focusable control (preferably a <button> or an element
with role="tab" + tabindex="0"), add aria-selected="true"/"false" on the
active/inactive tabs, set aria-controls to the corresponding tabpanel id and
ensure the panels have role="tabpanel" and matching ids, and implement keyboard
handling for Enter/Space to activate a tab and ArrowLeft/ArrowRight/Home/End to
move focus between tabs; apply the same changes for the other tab group
referenced (lines 259-267) so both groups support focus, ARIA state, and
keyboard navigation.
| let cpu_pct = proc_info.map(|p| p.cpu_usage()).unwrap_or(0.0); | ||
| let rss_bytes = proc_info.map(|p| p.memory()).unwrap_or(0); | ||
| let virt_bytes = proc_info.map(|p| p.virtual_memory()).unwrap_or(0); | ||
| let disk_read = proc_info.map(|p| p.disk_usage().read_bytes).unwrap_or(0); | ||
| let disk_written = proc_info.map(|p| p.disk_usage().written_bytes).unwrap_or(0); |
There was a problem hiding this comment.
Use Option types for process metrics instead of sentinel zeros.
Lines 322-326 and 350-357 collapse missing process data to 0 and 0.0. This prevents the dashboard from distinguishing "unavailable" from actual zero readings, unlike fd_count/fd_limit which already use Option types that serialize to null. Change these to Option<f64> and Option<u64> respectively and serialize null on None.
Per coding guidelines: "Use Option type instead of sentinel values like 0, -1, or empty strings."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/diagnostics.rs` around lines 322 - 326, The code currently collapses
missing process metrics into sentinel zeros for cpu_pct, rss_bytes, virt_bytes,
disk_read, and disk_written; change these variables and the corresponding struct
fields to Option<f64> for percentages and Option<u64> for byte counters (e.g.,
cpu_pct -> Option<f64>, rss_bytes/virt_bytes/disk_read/disk_written ->
Option<u64>) and replace unwrap_or(0.0)/unwrap_or(0) with map(...) (leaving None
when proc_info is None) so they serialize to null; update any struct definitions
(e.g., the diagnostics/process metrics struct) and downstream usage to handle
Option values and ensure serde continues to serialize None as null.
| std::fs::read_dir("/proc/self/fd") | ||
| .ok() | ||
| .map(|entries| entries.count() as u64) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n src/diagnostics.rs | sed -n '535,560p'Repository: strawgate/memagent
Length of output: 1014
🌐 Web query:
/proc/self/fd read_dir file descriptor self-counting enumeration
💡 Result:
When you perform a read_dir (readdir) operation on /proc/self/fd (or /proc/[pid]/fd), which lists the open file descriptors of the current process as symbolic links, the act of opening the directory for reading allocates a new file descriptor (fd). This new fd is included in the directory listing itself, creating a self-referential or "self-counting" effect. For example, running ls -l /proc/self/fd opens the directory on some fd N (often 3 or 4, depending on existing fds), and the listing shows entries 0,1,2,...,N where N points back to /proc/[pid]/fd/. This is evident from: - Unix StackExchange explanation: When ls opens /proc/self/fd, it gets fd 4, shown as 4 -> /proc/29512/fd/. - SuperUser on wildcard expansion: Shell opens dir for * expansion, includes the fd (e.g., 3), closes after, but list has it transiently. - Kernel source fs/proc/fd.c: proc_readfd iterates over the process's fdtable, dynamically generating entries for all currently open fds, including any opened during the readdir (like the dir fd itself). To accurately count open fds using readdir on /proc/self/fd in C, subtract 3 from the readdir count (accounting for '.', '..', and the dir fd itself), as shown in code examples: int count_open_fds(void) { DIR *dp = opendir("/proc/self/fd"); struct dirent *de; int count = -3; // '.', '..', dp fd while ((de = readdir(dp)) != NULL) count++; closedir(dp); return count; } This adjustment handles the self-counting enumeration. Official docs (proc_pid_fd(5)) confirm /proc/pid/fd lists all open fds dynamically, without noting exclusion of the dir fd, consistent with behavior.
Citations:
- 1: https://unix.stackexchange.com/questions/676683/what-does-the-output-of-ll-proc-self-fd-from-ll-dev-fd-mean
- 2: http://www.man7.org/linux/man-pages/man5/proc_pid_fd.5.html
- 3: https://stackoverflow.com/questions/7976769/getting-count-of-current-used-file-descriptors-from-c-code
- 4: https://stackoverflow.com/questions/6583158/finding-open-file-descriptors-for-a-process-linux-c-code/9750021
- 5: https://superuser.com/questions/1820205/list-of-proc-fd-path
- 6: https://elixir.bootlin.com/linux/latest/source/fs/proc/fd.c
- 7: https://mechpen.github.io/posts/2023-07-01-proc-fd-files/
🌐 Web query:
macOS /dev/fd read_dir self-counting file descriptor enumeration
💡 Result:
On macOS, /dev/fd is a per-process virtual directory (fdescfs, borrowed from FreeBSD) that lists numeric entries corresponding to the open file descriptors (FDs) of the process accessing it. Each entry /dev/fd/N represents FD N, allowing access equivalent to duping that FD. When a process calls opendir or read_dir on /dev/fd (e.g., ls /dev/fd, std::fs::read_dir("/dev/fd"), Directory.EnumerateFileSystemEntries("/dev/fd")), it allocates a new directory FD, say M (typically the next available FD, like 3 or 4). This new FD M is included in the list of FDs enumerated by readdir, because /dev/fd reflects all open FDs of the current process at enumeration time, including the one just opened for the directory itself. Thus, std::fs::read_dir("/dev/fd").count returns (true open FDs before + 1), overcounting by 1 due to self-inclusion. This is the "self-counting" behavior. Examples: - Minimal process (stdin/stdout/stderr): ls /dev/fd shows 0 1 2 3 4; 3 is cwd FD, 4 is /dev/fd dir FD (dtruss confirms open_nocancel returns 3/4). - Rust: std::fs::read_dir("/dev/fd").unwrap.count == true FDs + 1. To accurately count open FDs: - Use proc_pidinfo(PROC_PIDLISTFDS) from libproc.h (counts p->p_fd->fd_nfiles, all types including sockets/pipes/kqueues). - Avoid /dev/fd for counting due to self-inclusion and limitations (e.g., misses some FD types; stat fails on kqueues causing errors in some APIs like ioutil.ReadDir). No . and .. entries in /dev/fd (per mount_fdesc manpage). Not NFS-exportable. XNU sources (devfs_vnops.c references fdesc.h) confirm integration, but full fdesc_vnops.c not public; behavior matches FreeBSD fdescfs where readdir iterates curproc->p_fd->fd_ofiles array. Sources confirm up to macOS 15+ (xnu-12377 Darwin 25). No changes noted post-2021.
Citations:
- 1: https://zameermanji.com/blog/2021/8/1/counting-open-file-descriptors-on-macos/
- 2: Getting file descriptor count on macOS throws dotnet/runtime#65254
- 3: https://stackoverflow.com/questions/48438404/cant-call-ioutil-readdir-on-dev-fd-on-macos
- 4: https://liamseanbrady.wordpress.com/2016/03/28/the-great-file-descriptor-mystery/
- 5: http://www.newosxbook.com/src.php?file=%2Fbsd%2Fman%2Fman4%2Ffd.4&tree=xnu
- 6: https://leopard-adc.pepas.com/documentation/Darwin/Reference/ManPages/man8/mount_fdesc.8.html
- 7: http://www.newosxbook.com/src.php?file=%2Fbsd%2Fmiscfs%2Fdevfs%2Fdevfs_vnops.c&tree=xnu
- 8: https://newosxbook.com/src.php?file=%2Fbsd%2Fmiscfs%2Fdevfs%2Fdevfs_vnops.c&tree=xnu
- 9: https://unix.superglobalmegacorp.com/NetBSD-0.8/newsrc/miscfs/fdesc/fdesc_vnops.c.html
- 10: https://stackoverflow.com/questions/44413909/vnode-and-file-descriptor-in-xnu-where-does-the-file-operation-vector-is-stored
Fix get_fd_count() self-counting on both Linux and macOS.
Opening /proc/self/fd (Linux) or /dev/fd (macOS) allocates a new file descriptor that gets included in the enumeration, causing the count to report N+1.
🛠️ Minimal fix
std::fs::read_dir("/proc/self/fd")
.ok()
- .map(|entries| entries.count() as u64)
+ .map(|entries| (entries.count() as u64).saturating_sub(1)) std::fs::read_dir("/dev/fd")
.ok()
- .map(|entries| entries.count() as u64)
+ .map(|entries| (entries.count() as u64).saturating_sub(1))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| std::fs::read_dir("/proc/self/fd") | |
| .ok() | |
| .map(|entries| entries.count() as u64) | |
| std::fs::read_dir("/proc/self/fd") | |
| .ok() | |
| .map(|entries| (entries.count() as u64).saturating_sub(1)) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/diagnostics.rs` around lines 541 - 543, get_fd_count() currently opens a
directory ("/proc/self/fd" on Linux or "/dev/fd" on macOS) and counts entries,
but the directory stream itself allocates a file descriptor so the result is off
by one; change the logic in get_fd_count() to try reading "/proc/self/fd" then
fall back to "/dev/fd", count the entries from the ReadDir iterator, and
subtract one to exclude the directory's own fd (use saturating_sub(1) to avoid
underflow) before returning the u64; ensure failures still return None.
| /// Get the soft FD limit (ulimit -n). | ||
| fn get_fd_limit() -> Option<u64> { | ||
| #[cfg(unix)] | ||
| { | ||
| unsafe { | ||
| let mut rlim: libc::rlimit = std::mem::zeroed(); | ||
| if libc::getrlimit(libc::RLIMIT_NOFILE, &mut rlim) == 0 { | ||
| Some(rlim.rlim_cur as u64) | ||
| } else { | ||
| None | ||
| } | ||
| } | ||
| } | ||
| #[cfg(not(unix))] | ||
| { | ||
| None | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the actual function context and usage
grep -n "get_fd_limit\|diagnostics" src/diagnostics.rs | head -20Repository: strawgate/memagent
Length of output: 446
🏁 Script executed:
# Check what dependencies are already in use
cat Cargo.toml | grep -A 50 "\[dependencies\]"Repository: strawgate/memagent
Length of output: 898
🏁 Script executed:
# Search for other unsafe blocks in the repository to understand the pattern
rg "unsafe\s*{" --type rust -B 2 -A 5 | head -100Repository: strawgate/memagent
Length of output: 145
🏁 Script executed:
# Check if there are existing safe wrappers being used for system calls
rg "libc::" --type rust | head -20Repository: strawgate/memagent
Length of output: 230
🏁 Script executed:
# Correct regex for finding unsafe blocks
rg "unsafe\s" --type rust -B 2 -A 5 | head -150Repository: strawgate/memagent
Length of output: 4285
🏁 Script executed:
# Check if there's a safe wrapper crate like rustix that's available
web_search "rust safe getrlimit wrapper crate"Repository: strawgate/memagent
Length of output: 112
🏁 Script executed:
# Look at the coding guidelines files mentioned in the learnings
cat .github/copilot-instructions.md CODE_STYLE.md 2>/dev/null | head -100Repository: strawgate/memagent
Length of output: 44
🏁 Script executed:
# Check the context around line 330 where get_fd_limit is called
sed -n '325,335p' src/diagnostics.rsRepository: strawgate/memagent
Length of output: 512
🏁 Script executed:
# Check if there's already a SAFETY comment for the getrlimit call
sed -n '560,570p' src/diagnostics.rsRepository: strawgate/memagent
Length of output: 405
🏁 Script executed:
# Search for rustix crate usage (safe getrlimit wrapper)
rg "rustix" --type rustRepository: strawgate/memagent
Length of output: 44
🏁 Script executed:
# Verify the performance-critical paths in guidelines
grep -r "SIMD\|wide\|performance" --include="*.md" | head -20Repository: strawgate/memagent
Length of output: 501
🏁 Script executed:
# Check for SAFETY comments pattern in the codebase
rg "SAFETY:" --type rust -A 1 | head -30Repository: strawgate/memagent
Length of output: 592
🏁 Script executed:
# Verify if this is indeed only called once (diagnostics, not hot path)
rg "get_fd_limit\|fd_limit" --type rust -B 2 -A 2Repository: strawgate/memagent
Length of output: 44
🏁 Script executed:
# Check dev-docs/DECISIONS.md for any guidance on unsafe
cat dev-docs/DECISIONS.md 2>/dev/null | head -100Repository: strawgate/memagent
Length of output: 44
🏁 Script executed:
# Check if the unsafe block needs a SAFETY comment per the pattern
sed -n '558,572p' src/diagnostics.rsRepository: strawgate/memagent
Length of output: 424
🏁 Script executed:
# Verify if this function is only called once and in initialization
rg "get_fd_limit" --type rust -B 5 -A 5Repository: strawgate/memagent
Length of output: 1200
Add a SAFETY comment justifying the FFI call, or use a safe wrapper crate.
The getrlimit call lacks a SAFETY comment explaining why the unsafe block is necessary. Following the codebase pattern (see batch_builder.rs), add a comment. Alternatively, consider using the rustix crate for a safe API if adding a dependency is acceptable.
Note: The guideline about "no hand-written unsafe code" specifically refers to performance-critical SIMD operations (via the wide crate), not diagnostics FFI. However, the missing safety documentation is still a gap.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/diagnostics.rs` around lines 560 - 577, The unsafe FFI usage in
get_fd_limit (the unsafe block calling libc::getrlimit with a zeroed
libc::rlimit) needs a SAFETY comment explaining why the call is sound: that rlim
is properly initialized with std::mem::zeroed() to provide a valid out-pointer,
that libc::getrlimit only writes to that struct and expects a non-null pointer,
and that the call is guarded by #[cfg(unix)] so it's only compiled on platforms
where libc::RLIMIT_NOFILE and getrlimit exist; add this SAFETY comment
immediately above the unsafe block in get_fd_limit, or replace the manual FFI
with a safe wrapper (e.g., rustix) if you prefer to remove unsafe entirely.
| let _handle = server.start(); | ||
|
|
||
| std::thread::sleep(std::time::Duration::from_millis(100)); |
There was a problem hiding this comment.
Replace the fixed sleeps with a real readiness check.
Each new endpoint test sleeps 100ms and assumes the server is already listening. That's racy on slower CI runners and will produce intermittent failures unrelated to the feature under test.
Also applies to: 786-788, 801-803
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/diagnostics.rs` around lines 769 - 771, The tests use server.start()
followed by a fixed std::thread::sleep(Duration::from_millis(100)), which is
racy; replace each sleep (the occurrences immediately after server.start()) with
a real readiness check that polls the server until it accepts connections (e.g.,
loop for a short timeout calling std::net::TcpStream::connect(addr) or an HTTP
probe to the test endpoint, breaking on success and sleeping a few ms between
attempts), and fail the test if the timeout elapses; update the code paths
around server.start() and the three std::thread::sleep(...) occurrences so the
test only proceeds once the TCP/HTTP connection is established.
Window grows continuously: max(10s, dataAge * 1.2), capped at 5min. The 20% headroom means the line never touches the left edge. No more jarring jumps from 10s→30s→1m. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/dashboard.html (1)
74-77:⚠️ Potential issue | 🟠 MajorUse real tab semantics here.
These are still click-only
<div>s, so keyboard users cannot focus or switch tabs, and the panels still have notab/tabpanelrelationship. Please convert them to focusable tab controls and add the ARIA + arrow-key behavior.Also applies to: 83-90, 266-275
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` around lines 74 - 77, The current clickable tab DIVs (elements with class "tab" and container "tabs") must be converted to accessible tabs: make the container use role="tablist" and replace each clickable .tab DIV with a focusable control (preferably <button> or an element with role="tab", tabindex="0"/"-1") that has aria-selected, aria-controls pointing to its corresponding panel id, and ensure each panel has role="tabpanel" and aria-labelledby linking back. Update activation logic in the script that toggles the "active" state (look for handlers referencing data-tab/data-panel or class "active") to manage focus and states: when a tab is activated set aria-selected="true", tabindex="0", move focus as appropriate, and set other tabs to aria-selected="false" and tabindex="-1"; implement keyboard handling on the tab elements to support Left/Right arrow (move focus, optional activation), Home/End (go to first/last), and Enter/Space to activate the focused tab. Also ensure visual styling (.active) is preserved when switching to aria-driven attributes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/dashboard.html`:
- Around line 89-90: The dashboard currently renders raw_yaml from loadConfig()
into `#configContent` and the /api/config handler returns that payload unchanged,
potentially exposing secrets; update the server-side /api/config handler to
scrub/redact sensitive keys (e.g., api_key, secret, password, token,
private_key, aws_access_key_id, aws_secret_access_key) before returning the
config JSON, or alternatively require explicit auth/opt-in before serving this
endpoint; adjust loadConfig() client-side to expect redacted_config (instead of
raw_yaml) and ensure the UI element id="#tab-config" / `#configContent` only
displays sanitized content or an opt-in prompt.
- Around line 241-250: The CHART_DEFS entry currently using id "c_fds" and label
"Open FDs" is wrong for the new disk I/O metric; update the chart definition
(CHART_DEFS entry with id "c_fds") to represent disk read throughput: set label
to "Disk Read", unit to "/s", color/formatter to use fmtBytesCompact, and ensure
anywhere the dashboard feeds this chart (previously passing fd_count) is changed
to consume the backend's cumulative disk_read_bytes series and compute a
per-second rate (derivative/delta over time) to plot bytes/sec; also apply the
same fix to the corresponding duplicated definition referenced around lines
459-462.
- Around line 342-376: The current UI freezes DOM structure via
summariesBuilt/detailsBuilt and updates elements by numeric index (ids like
"ps_i{pi}_{i}", "ps_o{pi}_{i}", "ps_t{pi}", "ps_td{pi}"), causing
stale/misplaced stats if pipelines/inputs/outputs are added/removed/reordered;
fix by detecting topology changes and rebuilding the DOM: compute a stable
signature (e.g., join of pipeline names plus ordered input/output names and
presence of transform.sql) from data.pipelines and compare it to a stored
signature each poll, and if different set summariesBuilt/detailsBuilt=false (or
re-run the container.innerHTML rebuild) so the code that generates the markup
(the block using container.innerHTML=html and the id patterns above) is
re-executed; alternatively, change ids to stable keys using
pipeline/input/output names instead of indices so the update loop (where
setText(document.getElementById(...)) is called) always targets the correct
elements.
---
Duplicate comments:
In `@src/dashboard.html`:
- Around line 74-77: The current clickable tab DIVs (elements with class "tab"
and container "tabs") must be converted to accessible tabs: make the container
use role="tablist" and replace each clickable .tab DIV with a focusable control
(preferably <button> or an element with role="tab", tabindex="0"/"-1") that has
aria-selected, aria-controls pointing to its corresponding panel id, and ensure
each panel has role="tabpanel" and aria-labelledby linking back. Update
activation logic in the script that toggles the "active" state (look for
handlers referencing data-tab/data-panel or class "active") to manage focus and
states: when a tab is activated set aria-selected="true", tabindex="0", move
focus as appropriate, and set other tabs to aria-selected="false" and
tabindex="-1"; implement keyboard handling on the tab elements to support
Left/Right arrow (move focus, optional activation), Home/End (go to first/last),
and Enter/Space to activate the focused tab. Also ensure visual styling
(.active) is preserved when switching to aria-driven attributes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: dab55642-e210-41cc-8f0c-21850b486094
📒 Files selected for processing (1)
src/dashboard.html
| <div class="tab-content" id="tab-config"> | ||
| <div class="panel"><div class="panel-title">Configuration</div><div id="configContent"><span style="color:var(--text3)">Loading...</span></div></div> |
There was a problem hiding this comment.
Don't expose raw config YAML here.
loadConfig() renders raw_yaml verbatim, and the paired /api/config handler returns that payload unchanged. If the config contains credentials or tokens, they are already disclosed to any dashboard viewer; client-side highlighting does not mitigate that. Please redact sensitive keys server-side before returning this payload, or gate the tab behind explicit auth/opt-in.
Also applies to: 469-474
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/dashboard.html` around lines 89 - 90, The dashboard currently renders
raw_yaml from loadConfig() into `#configContent` and the /api/config handler
returns that payload unchanged, potentially exposing secrets; update the
server-side /api/config handler to scrub/redact sensitive keys (e.g., api_key,
secret, password, token, private_key, aws_access_key_id, aws_secret_access_key)
before returning the config JSON, or alternatively require explicit auth/opt-in
before serving this endpoint; adjust loadConfig() client-side to expect
redacted_config (instead of raw_yaml) and ensure the UI element id="#tab-config"
/ `#configContent` only displays sanitized content or an opt-in prompt.
| var state={prev:null,prevTime:null,prevSystem:null,charts:{},configLoaded:false,sysInfoShown:false}; | ||
|
|
||
| var CHART_DEFS=[ | ||
| {id:"c_lps",label:"Lines / sec",color:"#3b82f6",unit:"/s",fmtVal:fmtCompact}, | ||
| {id:"c_bps",label:"Throughput",color:"#8b5cf6",unit:"/s",fmtVal:fmtBytesCompact}, | ||
| {id:"c_errs",label:"Errors / sec",color:"#ef4444",unit:"/s",fmtVal:fmtCompact}, | ||
| {id:"c_cpu",label:"Process CPU",color:"#f59e0b",unit:"%",fmtVal:fmtCompact}, | ||
| {id:"c_rss",label:"Memory (RSS)",color:"#22c55e",unit:"",fmtVal:fmtBytesCompact}, | ||
| {id:"c_fds",label:"Open FDs",color:"#06b6d4",unit:"",fmtVal:fmtCompact}, | ||
| ]; |
There was a problem hiding this comment.
The sixth chart is plotting FDs, not disk read/sec.
The PR objective calls for a disk read throughput chart, and the backend already exposes cumulative disk_read_bytes, but this series is defined as Open FDs and fed fd_count. The dashboard never renders the advertised disk I/O metric.
🛠️ Minimal direction
- {id:"c_fds",label:"Open FDs",color:"#06b6d4",unit:"",fmtVal:fmtCompact},
+ {id:"c_diskr",label:"Disk read",color:"#06b6d4",unit:"/s",fmtVal:fmtBytesCompact},
...
- if(sys.fd_count!=null){c.c_fds.add(sys.fd_count);
- var fdStr=sys.fd_count;
- if(sys.fd_limit!=null)fdStr+='<span class="chart-unit"> / '+fmt(sys.fd_limit)+'</span>';
- document.getElementById("c_fds_val").innerHTML=fdStr;
- }
+ if(sys.disk_read_bytes!=null && state.prevSystem && dt>0){
+ var diskReadBps=Math.max(0,(sys.disk_read_bytes-(state.prevSystem.disk_read_bytes||0))/dt);
+ c.c_diskr.add(diskReadBps);
+ document.getElementById("c_diskr_val").innerHTML=
+ fmtBytes(diskReadBps)+'<span class="chart-unit">/s</span>';
+ }Also applies to: 459-462
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/dashboard.html` around lines 241 - 250, The CHART_DEFS entry currently
using id "c_fds" and label "Open FDs" is wrong for the new disk I/O metric;
update the chart definition (CHART_DEFS entry with id "c_fds") to represent disk
read throughput: set label to "Disk Read", unit to "/s", color/formatter to use
fmtBytesCompact, and ensure anywhere the dashboard feeds this chart (previously
passing fd_count) is changed to consume the backend's cumulative disk_read_bytes
series and compute a per-second rate (derivative/delta over time) to plot
bytes/sec; also apply the same fix to the corresponding duplicated definition
referenced around lines 459-462.
| var now=Date.now(),dt=state.prevTime?(now-state.prevTime)/1000:0; | ||
| renderRibbon(pipe,sys);renderSystemInfo(sys); | ||
| updateCharts(pipe,sys,dt);renderPipelineSummaries(pipe);renderPipelineDetails(pipe,dt); | ||
| state.prev=pipe;state.prevTime=now;state.prevSystem=sys; |
There was a problem hiding this comment.
state.prev is assigned before validating the /api/pipelines payload shape. If the first successful fetch returns JSON without pipelines (e.g., transient proxy error body), this line stores an invalid snapshot; on the next poll, updateCharts() reaches state.prev.pipelines.forEach(...) and throws, so the dashboard stays disconnected and never recovers.
Please only update state.prev after validating pipe.pipelines is an array (and ideally check r.ok before r.json()).
Rewrote the dashboard from scratch with a fundamentally different information architecture. No more tabs or duplicated info. Design philosophy: pipeline diagram is the hero, everything else supports it. Like a fighter jet HUD — dark, dense, at-a-glance. Layout (single scrollable page): - Status bar: health indicator (green/red), key vitals, uptime - Pipeline diagram: clickable nodes (input → transform → output) with live stats. Click any node → inline detail panel expands showing that component's config + metrics - Metric sparkline cards: 6 compact cards with inline Canvas sparklines. Click to expand full chart below. - Collapsible System info: hostname, OS, arch, cores, memory all on the summary line — expand for full detail grid - Collapsible Config: file path on summary, expand for syntax-highlighted YAML Technical improvements: - Alpine.js (CDN) for all reactive data binding — eliminated 300+ lines of manual DOM manipulation - EMA smoothing (alpha=0.3) on all rate computations — no more zigzag sparklines from poll jitter - Sparklines show baseline at zero (not invisible) - All values use tabular-nums + nowrap — no layout twitching - Adaptive time window (10s → 5min) grows smoothly with data - Clean separation: formatters, rate tracking, chart engine, Alpine component Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
src/dashboard.html (2)
252-253:⚠️ Potential issue | 🟠 MajorDon't ship raw config YAML to every dashboard viewer.
Line 252 fetches the config during init, and
loadConfig()rendersd.raw_yamldirectly.src/diagnostics.rs:366-380returns that field verbatim, so any secrets in the YAML are disclosed in the browser even if the panel is never opened. Redact sensitive keys server-side or gate/api/configbehind explicit auth/opt-in before rendering this panel.Also applies to: 460-463
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` around lines 252 - 253, The page is currently invoking loadConfig() on init and rendering d.raw_yaml into cfgHtml (dashboard.html x-init and x-html), which exposes secrets returned verbatim by diagnostics.rs (d.raw_yaml) to any viewer; modify the flow so the client never fetches raw YAML by default — either gate /api/config behind proper authentication/explicit opt-in before loadConfig() runs (remove automatic x-init call) or change diagnostics.rs to return a sanitized version (redacting sensitive keys) instead of d.raw_yaml; ensure functions/handlers named loadConfig, cfgHtml, d.raw_yaml, and the /api/config endpoint are updated accordingly so the dashboard only receives and displays a redacted config or requires auth.
352-358:⚠️ Potential issue | 🟠 MajorThe sixth metric is still FDs, not disk read/sec.
src/diagnostics.rs:300-358already exposes cumulativedisk_read_bytes, but this dashboard still defines and feedsfds. That leaves the advertised disk-read chart missing. Track the previous system snapshot and plotmax(0, delta(disk_read_bytes) / dt)instead.Also applies to: 442-445
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` around lines 352 - 358, The dashboard still defines the sixth metric as 'fds' in metricDefs; replace that entry with an id like 'disk_rps' (label "Disk Read", unit '/s', appropriate color and formatter, e.g. fmtBC) and update the code that feeds this metric (the block around the existing feed logic for lines ~442-445) to compute a rate from the exposed cumulative disk_read_bytes: keep a previous snapshot for disk_read_bytes and timestamp, compute rate = max(0, (cur.disk_read_bytes - prev.disk_read_bytes) / (cur_ts - prev_ts)), and push that value to the 'disk_rps' series instead of sending fds; ensure prev is initialized on first sample and use the same metric id 'disk_rps' in both metricDefs and the feed code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.gitignore:
- Line 6: Remove the global '*.png' entry from .gitignore and replace it with a
scoped pattern that only ignores generated Playwright artifacts (for example use
'.playwright-mcp/**/*.png' or the specific generated path you expect) so UI/docs
images are not accidentally hidden; target the '*.png' entry for removal and add
the scoped pattern such as '.playwright-mcp/' or a generated-assets path
instead.
In `@src/dashboard.html`:
- Line 181: The selection key logic is ambiguous and leads to stale/mismatched
details: stop using prefix matching (sel.startsWith(String(pi))) and the cached
selData; instead build and store a stable composite key (e.g.,
"pipelineName:type:nodeName") when toggle() sets selection, and on each render
derive the selected data from this.pipes using that stable key (replace checks
at sel.startsWith(String(pi)) and the similar check around line 383), so the
detail panel always looks up fresh data rather than using cached selData.
- Line 139: The interactive pipe-node elements (e.g., the element with class
"pipe-node inp-border" that uses the click handler calling
toggle(pi+'_i'+ii,{type:'input',cfg:inp,stats:inp}) and the selection expression
sel===pi+'_i'+ii) must be made keyboard-accessible: replace the <div> with a
<button type="button"> or, if keeping a non-semantic element, add role="button",
tabindex="0", and a keydown handler that triggers the same toggle call on
Enter/Space and reflect selection with an appropriate ARIA state (e.g.,
aria-pressed or aria-selected). Apply the same change to the other similar nodes
referenced (lines around 153, 168, 213) so all metric cards/pipeline nodes
provide focus, keyboard activation, and ARIA state.
- Around line 397-404: compRate currently mutates prevSnap and computes dt using
the global prevTime at render time, causing tiny dt and spikes; move the
delta/ema computation out of the template into the poll handler so it uses the
poll timestamp (not Date.now() in the render) and update prevSnap there; also
include pipeline identity in the history key (e.g., use
comp_<pipelineId>_<obj.name>_<field> instead of comp_<obj.name>_<field>) and
keep emaRates keyed the same way; ensure prevTime is read/set only in the poll
logic (compute dt = (pollTimestamp - prevTime)/1000 before updating prevTime)
and have compRate only read the precomputed rate value for display.
- Line 7: The embedded dashboard currently pulls Alpine.js from jsDelivr with a
floating `@3` tag and no integrity check; replace this CDN dependency by vendoring
a pinned Alpine.js build into the app (e.g., add a local file like alpine.min.js
built from a specific version such as 3.x.y), update the <script> tag in
src/dashboard.html to reference that local bundle (or if you must use CDN, use
an exact version string and add an SRI integrity attribute and crossorigin
attribute), and ensure the vendored file is added to your static assets bundling
so the page no longer fails when external CDNs are unavailable.
---
Duplicate comments:
In `@src/dashboard.html`:
- Around line 252-253: The page is currently invoking loadConfig() on init and
rendering d.raw_yaml into cfgHtml (dashboard.html x-init and x-html), which
exposes secrets returned verbatim by diagnostics.rs (d.raw_yaml) to any viewer;
modify the flow so the client never fetches raw YAML by default — either gate
/api/config behind proper authentication/explicit opt-in before loadConfig()
runs (remove automatic x-init call) or change diagnostics.rs to return a
sanitized version (redacting sensitive keys) instead of d.raw_yaml; ensure
functions/handlers named loadConfig, cfgHtml, d.raw_yaml, and the /api/config
endpoint are updated accordingly so the dashboard only receives and displays a
redacted config or requires auth.
- Around line 352-358: The dashboard still defines the sixth metric as 'fds' in
metricDefs; replace that entry with an id like 'disk_rps' (label "Disk Read",
unit '/s', appropriate color and formatter, e.g. fmtBC) and update the code that
feeds this metric (the block around the existing feed logic for lines ~442-445)
to compute a rate from the exposed cumulative disk_read_bytes: keep a previous
snapshot for disk_read_bytes and timestamp, compute rate = max(0,
(cur.disk_read_bytes - prev.disk_read_bytes) / (cur_ts - prev_ts)), and push
that value to the 'disk_rps' series instead of sending fds; ensure prev is
initialized on first sample and use the same metric id 'disk_rps' in both
metricDefs and the feed code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 257234f3-6d05-4dac-af5c-1d50b7217181
📒 Files selected for processing (2)
.gitignoresrc/dashboard.html
| build | ||
| # Playwright MCP artifacts | ||
| .playwright-mcp/ | ||
| *.png |
There was a problem hiding this comment.
Avoid globally ignoring every PNG.
*.png will hide any new checked-in UI assets or docs images, not just generated Playwright output. If the intent is artifact cleanup, .playwright-mcp/ already covers that directory; otherwise scope the ignore to a generated path/pattern instead of the whole repo.
🛠️ Minimal fix
-*.png📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| *.png |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.gitignore at line 6, Remove the global '*.png' entry from .gitignore and
replace it with a scoped pattern that only ignores generated Playwright
artifacts (for example use '.playwright-mcp/**/*.png' or the specific generated
path you expect) so UI/docs images are not accidentally hidden; target the
'*.png' entry for removal and add the scoped pattern such as '.playwright-mcp/'
or a generated-assets path instead.
| <title>logfwd dashboard</title> | ||
| <meta name="viewport" content="width=device-width, initial-scale=1"> | ||
| <title>logfwd</title> | ||
| <script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3/dist/cdn.min.js"></script> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if src/dashboard.html exists and read around line 7
cat -n src/dashboard.html | head -20Repository: strawgate/memagent
Length of output: 1012
🏁 Script executed:
# Search for other Alpine.js references or bundled versions
fd -i "alpine" --type fRepository: strawgate/memagent
Length of output: 44
🏁 Script executed:
# Check for any bundled assets or build configuration
git ls-files | grep -E "(package\.json|build|dist|assets)" | head -20Repository: strawgate/memagent
Length of output: 82
Don't make the embedded dashboard depend on a public CDN.
The application relies entirely on jsDelivr to deliver Alpine.js; if the CDN is unavailable or blocked, this page fails to load. The URL uses a floating @3 version tag instead of a pinned release, and lacks an integrity hash. Ship the bundle with the app, or at minimum pin an exact version with an SRI hash.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/dashboard.html` at line 7, The embedded dashboard currently pulls
Alpine.js from jsDelivr with a floating `@3` tag and no integrity check; replace
this CDN dependency by vendoring a pinned Alpine.js build into the app (e.g.,
add a local file like alpine.min.js built from a specific version such as
3.x.y), update the <script> tag in src/dashboard.html to reference that local
bundle (or if you must use CDN, use an exact version string and add an SRI
integrity attribute and crossorigin attribute), and ensure the vendored file is
added to your static assets bundling so the page no longer fails when external
CDNs are unavailable.
| <template x-for="(inp,ii) in p.inputs" :key="inp.name"> | ||
| <div style="display:contents"> | ||
| <div class="pipe-conn" x-show="ii>0"><svg viewBox="0 0 32 16"><line x1="0" y1="8" x2="24" y2="8"/><polygon points="24,4 32,8 24,12"/></svg></div> | ||
| <div class="pipe-node inp-border" :class="sel===pi+'_i'+ii&&'selected'" @click="toggle(pi+'_i'+ii,{type:'input',cfg:inp,stats:inp})"> |
There was a problem hiding this comment.
Use real controls for the clickable nodes and metric cards.
These are interactive <div>s with only @click, so keyboard users cannot focus or activate them. Convert them to <button type="button"> elements, or add role="button", tabindex="0", Enter/Space handling, and the appropriate ARIA state.
Also applies to: 153-153, 168-168, 213-213
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/dashboard.html` at line 139, The interactive pipe-node elements (e.g.,
the element with class "pipe-node inp-border" that uses the click handler
calling toggle(pi+'_i'+ii,{type:'input',cfg:inp,stats:inp}) and the selection
expression sel===pi+'_i'+ii) must be made keyboard-accessible: replace the <div>
with a <button type="button"> or, if keeping a non-semantic element, add
role="button", tabindex="0", and a keydown handler that triggers the same toggle
call on Enter/Space and reflect selection with an appropriate ARIA state (e.g.,
aria-pressed or aria-selected). Apply the same change to the other similar nodes
referenced (lines around 153, 168, 213) so all metric cards/pipeline nodes
provide focus, keyboard activation, and ARIA state.
| </template> | ||
| </div> | ||
| <!-- Detail panel (expanded when node clicked) --> | ||
| <template x-if="selData&&sel.startsWith(String(pi))"> |
There was a problem hiding this comment.
Selection keys are ambiguous and go stale.
Line 181 uses sel.startsWith(String(pi)), so selecting pipeline 10 also matches pi === 1. toggle() then caches the clicked object in selData, so after the next poll the detail panel can render under the wrong pipeline and keep stale stats. Store a stable key such as pipelineName:type:nodeName and derive the selected data from this.pipes on each render.
Also applies to: 383-383
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/dashboard.html` at line 181, The selection key logic is ambiguous and
leads to stale/mismatched details: stop using prefix matching
(sel.startsWith(String(pi))) and the cached selData; instead build and store a
stable composite key (e.g., "pipelineName:type:nodeName") when toggle() sets
selection, and on each render derive the selected data from this.pipes using
that stable key (replace checks at sel.startsWith(String(pi)) and the similar
check around line 383), so the detail panel always looks up fresh data rather
than using cached selData.
Information architecture: System > Metrics > Pipelines > Config All sections visible (no collapsing), single scrollable page. Changes: - System: compact card grid (hostname, OS, arch, cores, memory, RSS, FDs/limit, PID) — always visible, no collapsing - Metrics: proper charts with Y-axis labels, time markers, and gradient fills. Restored the good chart rendering from v2. - Pipeline: flow diagram with clickable nodes. Click opens an inspector panel with blue accent border, shadow, close button — visually distinct from the pipeline row above it - Config: always visible with syntax-highlighted YAML - Status bar: simplified to brand + health pill + version/uptime (no duplicated metric numbers) - EMA smoothing on all rates prevents zigzag sparklines - Filed 4 instrumentation issues (#521-524) for batch latency, queue depth, schema info, and output send latency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/dashboard.html`:
- Line 168: The close button element with class "insp-close" and click handler
"@click=\"sel=null;selData=null\"" currently lacks an explicit type so it
defaults to type="submit"; update the button to include type="button" to prevent
accidental form submissions—locate the <button class="insp-close" ...> element
and add the type="button" attribute while keeping the existing `@click` handler
and text unchanged.
- Around line 315-317: The two fetch calls (fetch('/api/pipelines') and
fetch('/api/system')) currently call r.json() unconditionally; update both
promise chains to check the Response.ok flag first (e.g., inside the then
callback for each fetch) and handle non-OK responses by returning null or
throwing a descriptive error so r.json() is only called for successful
responses; ensure the fetch('/api/system') branch still has its outer catch to
return null and that the pipelines fetch similarly handles non-200 statuses to
avoid parsing unexpected content.
- Around line 347-352: The current loadConfig() sets self.cfgHtml to a truthy
error message in the catch, which prevents the existing "if(!self.cfgHtml)"
guard from retrying; change the catch handler to keep cfgHtml falsy (e.g., set
self.cfgHtml = '' or null) and clear cfgPath (self.cfgPath = '') so future
retries can run, optionally log the error for debugging; alternatively implement
a separate retry flag (e.g., self.cfgLoadFailed = true) and use that instead of
populating cfgHtml on failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 85c7c72c-10dc-4a59-ab3e-71038e24bd4f
📒 Files selected for processing (1)
src/dashboard.html
| <div class="inspector" x-transition.opacity> | ||
| <div class="insp-header"> | ||
| <span class="insp-title" x-text="selType+' \u2014 '+(selData.name||'SQL')"></span> | ||
| <button class="insp-close" @click="sel=null;selData=null">× close</button> |
There was a problem hiding this comment.
Add type="button" to the close button.
Static analysis flagged this. Without an explicit type, buttons default to type="submit" which can cause unintended form submissions if this is ever wrapped in a form.
Proposed fix
- <button class="insp-close" `@click`="sel=null;selData=null">× close</button>
+ <button type="button" class="insp-close" `@click`="sel=null;selData=null">× close</button>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <button class="insp-close" @click="sel=null;selData=null">× close</button> | |
| <button type="button" class="insp-close" `@click`="sel=null;selData=null">× close</button> |
🧰 Tools
🪛 HTMLHint (1.9.2)
[warning] 168-168: The type attribute must be present on elements.
(button-type-require)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/dashboard.html` at line 168, The close button element with class
"insp-close" and click handler "@click=\"sel=null;selData=null\"" currently
lacks an explicit type so it defaults to type="submit"; update the button to
include type="button" to prevent accidental form submissions—locate the <button
class="insp-close" ...> element and add the type="button" attribute while
keeping the existing `@click` handler and text unchanged.
| Promise.all([ | ||
| fetch('/api/pipelines').then(function(r){return r.json()}), | ||
| fetch('/api/system').then(function(r){return r.json()}).catch(function(){return null}) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider checking r.ok before parsing JSON.
If the server returns a non-200 status (e.g., 502 from a proxy), r.json() may throw or return unexpected content. The outer .catch() handles this, but explicit status checking would be more robust.
Suggested improvement
Promise.all([
- fetch('/api/pipelines').then(function(r){return r.json()}),
- fetch('/api/system').then(function(r){return r.json()}).catch(function(){return null})
+ fetch('/api/pipelines').then(function(r){if(!r.ok)throw new Error('fetch failed');return r.json()}),
+ fetch('/api/system').then(function(r){return r.ok?r.json():null}).catch(function(){return null})
]).then(function(r){🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/dashboard.html` around lines 315 - 317, The two fetch calls
(fetch('/api/pipelines') and fetch('/api/system')) currently call r.json()
unconditionally; update both promise chains to check the Response.ok flag first
(e.g., inside the then callback for each fetch) and handle non-OK responses by
returning null or throwing a descriptive error so r.json() is only called for
successful responses; ensure the fetch('/api/system') branch still has its outer
catch to return null and that the pipelines fetch similarly handles non-200
statuses to avoid parsing unexpected content.
| loadConfig(){ | ||
| var self=this; | ||
| fetch('/api/config').then(function(r){return r.json()}).then(function(d){ | ||
| self.cfgPath=d.path||'';self.cfgHtml=hlYaml(d.raw_yaml||'(no config)'); | ||
| }).catch(function(){self.cfgHtml='<span style="color:var(--t4)">could not load</span>'}); | ||
| } |
There was a problem hiding this comment.
Config fetch still won't retry after failure.
On error, cfgHtml is set to '<span>...</span>' (truthy), so the guard at line 324 (if(!self.cfgHtml)) prevents future retry attempts. The prior review suggested keeping cfgHtml falsy on error to allow recovery.
Proposed fix
loadConfig(){
var self=this;
fetch('/api/config').then(function(r){return r.json()}).then(function(d){
self.cfgPath=d.path||'';self.cfgHtml=hlYaml(d.raw_yaml||'(no config)');
- }).catch(function(){self.cfgHtml='<span style="color:var(--t4)">could not load</span>'});
+ }).catch(function(){/* leave cfgHtml falsy to retry on next poll */});
}Or add a separate flag for config load attempts with retry backoff.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/dashboard.html` around lines 347 - 352, The current loadConfig() sets
self.cfgHtml to a truthy error message in the catch, which prevents the existing
"if(!self.cfgHtml)" guard from retrying; change the catch handler to keep
cfgHtml falsy (e.g., set self.cfgHtml = '' or null) and clear cfgPath
(self.cfgPath = '') so future retries can run, optionally log the error for
debugging; alternatively implement a separate retry flag (e.g.,
self.cfgLoadFailed = true) and use that instead of populating cfgHtml on
failure.
Charts: - Lines now extend to right edge (removed 20% headroom that left a gap before "now") - Taller charts: 90px → 120px - 3-column grid with responsive breakpoints - "Throughput" renamed to "Input Bytes" (clearer meaning) - Y-axis padding increased for label clarity System panel: - Shows system-level memory (used/total) instead of duplicating RSS - Added swap usage, global CPU%, cgroup memory limit (for K8s) - Removed RSS from system grid (already in metrics chart) - Better contrast on section headings and chart labels Pipeline: - Nodes expand to fill page width (removed max-width constraint) API (/api/system): - Added: global_cpu_percent, available_memory_bytes, used_memory_bytes, total_swap_bytes, used_swap_bytes, cgroup_memory_limit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Triage: recommend closing as supersededThis PR is 151 commits behind master and modifies files that no longer exist (
Already on master:
Net-new from this PR that could be extracted into a fresh PR:
Review feedback to address in the fresh PR (from CodeRabbit):
Recommend closing and opening a fresh PR for the enhanced dashboard against the current codebase. |
Summary
GET /api/system(CPU/RSS/threads via getrusage) andGET /api/config(raw YAML)libcadded as direct dep (already transitive) forgetrusageon macOS/LinuxTest plan
logfwd --config test.yaml→ open http://localhost:9090 → verify all 3 tabs render/api/systemreturns valid CPU% on both macOS and Linux🤖 Generated with Claude Code