feat: rich diagnostics dashboard with live charts and system stats by strawgate · Pull Request #454 · strawgate/fastforward

strawgate · 2026-03-31T21:01:05Z

Summary

Complete rewrite of the embedded diagnostics dashboard (138 → 700 lines)
Dark theme, tab-based UI: Overview (charts + pipeline flow), Pipelines (detailed tables), Config (YAML viewer)
6 live Canvas time-series charts with ring buffer history: lines/sec, bytes/sec, errors/sec, CPU%, RSS, disk read/sec
Pipeline flow diagrams showing input → transform → output with live counters and drop rates
Two new API endpoints: GET /api/system (CPU/RSS/threads via getrusage) and GET /api/config (raw YAML)
libc added as direct dep (already transitive) for getrusage on macOS/Linux
3 new tests (110 total)

Test plan

All 110 tests pass locally
CI validates on Linux + macOS
Manual: logfwd --config test.yaml → open http://localhost:9090 → verify all 3 tabs render
Verify charts start graphing when page opens and accumulate history
Verify /api/system returns valid CPU% on both macOS and Linux

🤖 Generated with Claude Code

New dashboard (replaces the 138-line prototype): - Dark theme, tab-based UI (Overview / Pipelines / Config) - 6 live time-series charts: lines/sec, bytes/sec, errors/sec, CPU%, RSS memory, disk read/sec — rendered with Canvas API, ring buffer history from page open - Pipeline flow diagrams: input → transform → output with live counters, drop rate, error badges - Config viewer with YAML syntax highlighting - Status ribbon with uptime, version, PID, threads, CPU, RSS New API endpoints: - GET /api/system — CPU%, RSS, threads, disk I/O via getrusage (macOS) and /proc/self (Linux). No new deps beyond libc. - GET /api/config — raw YAML + path for the config viewer Architecture: - Single embedded HTML file (include_str!), no external deps - System stats use Cell<> for CPU delta tracking on the diagnostics thread — no locks on the hot path - libc added as direct dep (was already transitive) 3 new tests for the new endpoints (110 total). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-31T21:01:16Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds direct dependencies libc and sysinfo; introduces persistent diagnostics state (a sysinfo::System handle, server PID, and stored config_path/config_yaml) and a new public DiagnosticsServer::set_config method. Adds two HTTP endpoints: /api/system (process and host metrics) and /api/config (config path and raw YAML). main now reads the full config YAML, passes config_path/config_yaml into run_v2_pipelines, and sets diagnostics config. Replaces the server-rendered diagnostics page with a client-side Alpine.js dashboard that polls /api/pipelines and /api/system, renders pipeline views and charts, and shows a config panel and interactive node inspector.

Possibly related PRs

fix: Include bench dashboard in docs site deployment #310: Adds sysinfo dependency and extends diagnostics/server area with sysinfo-backed metrics and endpoints overlapping the new /api/system implementation.
feat: logfwd-core is now #![no_std] + #![forbid(unsafe_code)] #375: Modifies diagnostics and dashboard frontend, adding sysinfo/libc usage and client-side monitoring/dashboard updates that align with the new UI and API endpoints.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/dashboard.html`:
- Around line 563-575: In highlightYaml, the comment replacement currently
treats any '#' to EOL as a comment and can false-positive inside quoted strings
(e.g. "foo # bar"); change the comment handling so '#' is only highlighted when
not inside a quoted string — either replace the final .replace(/(#.*)$/gm, ...)
with a regex that uses negative lookbehind/alternation to ensure the '#' is not
inside single or double quotes, or implement a small scan in highlightYaml that
walks the line and skips over quoted segments (respecting escapes) before
wrapping a trailing unquoted '#' in '<span class="yc">...</span>' — update the
replacement logic in highlightYaml accordingly.

In `@src/diagnostics.rs`:
- Around line 324-338: The current serve_config builds JSON by hand and calls
esc() which only handles a few characters; update serve_config to produce valid
JSON by serializing the response object with a JSON serializer (e.g., call
serde_json::to_string on a struct or map containing config_path and config_yaml)
instead of manual formatting, or alternatively extend the esc() implementation
(the esc function referenced in this diff) to escape all control characters
U+0000–U+001F as \uXXXX in addition to the existing escapes; locate serve_config
and esc to apply the change and ensure the Response body is produced from the
serializer output so all control chars are safely encoded.
- Around line 142-145: Add a doc comment for the public method `set_config` in
`diagnostics.rs` describing its purpose (update stored config path and YAML),
the parameters (`path: &str` and `yaml: &str`), and any side effects (it mutates
`self.config_path` and `self.config_yaml`); place the doc comment immediately
above the `pub fn set_config(&mut self, path: &str, yaml: &str)` definition so
the public API is documented for callers such as `main.rs`.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 10b1043d-337c-4955-9791-842a5552a582

📥 Commits

Reviewing files that changed from the base of the PR and between c866429 and 97af2bc.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

Cargo.toml
src/dashboard.html
src/diagnostics.rs
src/main.rs

coderabbitai · 2026-03-31T21:07:22Z

+function highlightYaml(yaml) {
+  return esc(yaml).replace(/^(\s*)([\w._-]+)(:)/gm, function(m, ws, key, colon) {
+    return ws + '<span class="yk">' + key + '</span>' + colon;
+  }).replace(/:\s+(&quot;[^]*?&quot;|'[^]*?')/g, function(m, val) {
+    return ': <span class="yv">' + val + '</span>';
+  }).replace(/:\s+(\d+(?:\.\d+)?)\s*$/gm, function(m, num) {
+    return ': <span class="yn">' + num + '</span>';
+  }).replace(/:\s+(true|false)\s*$/gm, function(m, b) {
+    return ': <span class="yn">' + b + '</span>';
+  }).replace(/(#.*)$/gm, function(m, comment) {
+    return '<span class="yc">' + comment + '</span>';
+  }).replace(/(\|)/g, '<span class="yp">$1</span>');
+}


⚠️ Potential issue | 🟡 Minor

Minor: comment highlighting may false-positive inside string values.

The # comment regex at line 572 runs after string highlighting, but a string like "foo # bar" would show # bar" as a comment since the regex matches any # to EOL. Edge case but worth noting.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` around lines 563 - 575, In highlightYaml, the comment replacement currently treats any '#' to EOL as a comment and can false-positive inside quoted strings (e.g. "foo # bar"); change the comment handling so '#' is only highlighted when not inside a quoted string — either replace the final .replace(/(#.*)$/gm, ...) with a regex that uses negative lookbehind/alternation to ensure the '#' is not inside single or double quotes, or implement a small scan in highlightYaml that walks the line and skips over quoted segments (respecting escapes) before wrapping a trailing unquoted '#' in '<span class="yc">...</span>' — update the replacement logic in highlightYaml accordingly.

coderabbitai · 2026-03-31T21:07:22Z

+    pub fn set_config(&mut self, path: &str, yaml: &str) {
+        self.config_path = path.to_string();
+        self.config_yaml = yaml.to_string();
+    }


🧹 Nitpick | 🔵 Trivial

Missing doc comment on public method.

Per coding guidelines, public APIs should have doc comments. set_config is public and called from main.rs.

📝 Suggested doc comment

+ /// Store the config file path and raw YAML content for the `/api/config` endpoint. pub fn set_config(&mut self, path: &str, yaml: &str) { self.config_path = path.to_string(); self.config_yaml = yaml.to_string(); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub fn set_config(&mut self, path: &str, yaml: &str) {

self.config_path = path.to_string();

self.config_yaml = yaml.to_string();

}

/// Store the config file path and raw YAML content for the `/api/config` endpoint.

pub fn set_config(&mut self, path: &str, yaml: &str) {

self.config_path = path.to_string();

self.config_yaml = yaml.to_string();

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/diagnostics.rs` around lines 142 - 145, Add a doc comment for the public method `set_config` in `diagnostics.rs` describing its purpose (update stored config path and YAML), the parameters (`path: &str` and `yaml: &str`), and any side effects (it mutates `self.config_path` and `self.config_yaml`); place the doc comment immediately above the `pub fn set_config(&mut self, path: &str, yaml: &str)` definition so the public API is documented for callers such as `main.rs`.

coderabbitai · 2026-03-31T21:07:23Z

+    fn serve_config(
+        &self,
+        request: tiny_http::Request,
+    ) -> Result<(), Box<dyn std::error::Error>> {
+        let body = format!(
+            r#"{{"path":"{}","raw_yaml":"{}"}}"#,
+            esc(&self.config_path),
+            esc(&self.config_yaml),
+        );
+        let resp = tiny_http::Response::from_string(body).with_header(
+            tiny_http::Header::from_bytes(&b"Content-Type"[..], &b"application/json"[..]).unwrap(),
+        );
+        request.respond(resp)?;
+        Ok(())
+    }


⚠️ Potential issue | 🟡 Minor

esc() doesn't escape all JSON control characters.

The esc() function (lines 461-474) handles \n, \r, \t, \\, and ", but JSON requires escaping all control characters (U+0000–U+001F). If YAML config contains e.g. form-feed or vertical tab, the response becomes invalid JSON.

🛠️ Proposed fix

fn esc(s: &str) -> String { let mut out = String::with_capacity(s.len()); for c in s.chars() { match c { '"' => out.push_str("\\\""), '\\' => out.push_str("\\\\"), '\n' => out.push_str("\\n"), '\r' => out.push_str("\\r"), '\t' => out.push_str("\\t"), - _ => out.push(c), + c if c.is_control() => { + // Escape other control chars as \uXXXX + for unit in c.encode_utf16(&mut [0; 2]) { + out.push_str(&format!("\\u{:04x}", unit)); + } + } + _ => out.push(c), } } out }

Also applies to: 460-474

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/diagnostics.rs` around lines 324 - 338, The current serve_config builds JSON by hand and calls esc() which only handles a few characters; update serve_config to produce valid JSON by serializing the response object with a JSON serializer (e.g., call serde_json::to_string on a struct or map containing config_path and config_yaml) instead of manual formatting, or alternatively extend the esc() implementation (the esc function referenced in this diff) to escape all control characters U+0000–U+001F as \uXXXX in addition to the existing escapes; locate serve_config and esc to apply the change and ensure the Response body is produced from the serializer output so all control chars are safely encoded.

github-actions · 2026-03-31T21:21:45Z

+        request: tiny_http::Request,
+    ) -> Result<(), Box<dyn std::error::Error>> {
+        let body = format!(
+            r#"{{"path":"{}","raw_yaml":"{}"}}"#,


/api/config returns raw_yaml with the full loaded config and no access control in the route handler. In typical deployments where diagnostics is reachable beyond localhost, this exposes secrets (tokens/passwords/credentials in URLs, internal endpoints, file paths) to any caller.

Please gate this endpoint behind explicit auth/opt-in and redact sensitive keys before returning config content. If this is only for local debugging, consider disabling it by default and returning metadata only (for example path + checksum) in production mode.

Diagnostics: - Replace hand-rolled getrusage/procfs with `sysinfo` crate for cross-platform process metrics (CPU%, current RSS, disk I/O) - Add FD count (/proc/self/fd on Linux, /dev/fd on macOS) and soft FD limit via getrlimit - Expose system info: hostname, OS version, arch, CPU count, total memory — everything a support call needs at a glance - Return null for unavailable metrics instead of 0 Dashboard: - System Information panel: hostname, OS, arch, cores, total memory, FD limit, PID, version - Chart axis padding: Y-axis labels no longer overlap the line - X-axis time markers: 10s, 30s, 1m, 5m, "now" - Hover tooltip: shows value + "Xs ago" on mouseover - 5-minute rolling window with old points trimmed - Limits shown alongside values: RSS "15.9 MB / 32.0 GB", FDs "14 / 1.05 M" - Replaced disk chart with FD count chart (more useful, works cross-platform) - Fixed fmtBytesCompact for sub-KB values Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of stretching 2-3 data points across the full chart width, the X-axis is now anchored to a fixed window: [now - 5min, now]. Data appears on the right edge and grows leftward as it accumulates. Empty space on the left shows the full time range from the start. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Layout stability: - Render ribbon, pipeline summaries, and detail tables once as stable DOM structure, then update only text content via IDs (setText checks for changes before writing to avoid reflow) - tabular-nums + min-width on all numeric displays so digit width is constant across value changes - table-layout:fixed with explicit column widths so columns don't resize as values change - white-space:nowrap on ribbon items Adaptive time window: - Charts start at 10s width, snap to 30s/1m/2m/5m as data accumulates (instead of fixed 5min with tiny data in corner) - Window grows with oldest data point, capped at 5 minutes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/dashboard.html`:
- Around line 75-77: The tab elements currently rendered as non-focusable <div
class="tab" data-tab="..."> must be converted to accessible tab controls: wrap
the tab set in an element with role="tablist", change each .tab to a focusable
control (preferably a <button> or an element with role="tab" + tabindex="0"),
add aria-selected="true"/"false" on the active/inactive tabs, set aria-controls
to the corresponding tabpanel id and ensure the panels have role="tabpanel" and
matching ids, and implement keyboard handling for Enter/Space to activate a tab
and ArrowLeft/ArrowRight/Home/End to move focus between tabs; apply the same
changes for the other tab group referenced (lines 259-267) so both groups
support focus, ARIA state, and keyboard navigation.
- Around line 405-410: The catch handler for the fetch("/api/config")
prematurely sets state.configLoaded = true, preventing retry attempts after a
transient failure; modify the catch callback so it does not set
state.configLoaded to true (either leave it unchanged or explicitly set it to
false) and only updates document.getElementById("configContent").innerHTML with
the error message, ensuring the config tab can recover and retry fetching later;
locate the fetch("/api/config") block and update its catch(function(){...}) to
remove or change the state.configLoaded assignment.
- Around line 374-385: The headline display uses raw lps/bps/eps while the
charts use Math.max(0, rate), causing negative labels after resets; update each
headline update to use the same clamped value (e.g. compute clampedLps =
Math.max(0, lps) and pass that to c.c_lps.add and to
document.getElementById("c_lps_val").innerHTML via fmt(clampedLps), and do the
same for bps (use fmtBytes on the clampedBps) and eps so chart samples and
displayed rates stay in sync; refer to variables lps, bps, eps and chart objects
c.c_lps, c.c_bps, c.c_errs and the DOM ids "c_lps_val", "c_bps_val",
"c_errs_val".

In `@src/diagnostics.rs`:
- Around line 769-771: The tests use server.start() followed by a fixed
std::thread::sleep(Duration::from_millis(100)), which is racy; replace each
sleep (the occurrences immediately after server.start()) with a real readiness
check that polls the server until it accepts connections (e.g., loop for a short
timeout calling std::net::TcpStream::connect(addr) or an HTTP probe to the test
endpoint, breaking on success and sleeping a few ms between attempts), and fail
the test if the timeout elapses; update the code paths around server.start() and
the three std::thread::sleep(...) occurrences so the test only proceeds once the
TCP/HTTP connection is established.
- Around line 322-326: The code currently collapses missing process metrics into
sentinel zeros for cpu_pct, rss_bytes, virt_bytes, disk_read, and disk_written;
change these variables and the corresponding struct fields to Option<f64> for
percentages and Option<u64> for byte counters (e.g., cpu_pct -> Option<f64>,
rss_bytes/virt_bytes/disk_read/disk_written -> Option<u64>) and replace
unwrap_or(0.0)/unwrap_or(0) with map(...) (leaving None when proc_info is None)
so they serialize to null; update any struct definitions (e.g., the
diagnostics/process metrics struct) and downstream usage to handle Option values
and ensure serde continues to serialize None as null.
- Around line 560-577: The unsafe FFI usage in get_fd_limit (the unsafe block
calling libc::getrlimit with a zeroed libc::rlimit) needs a SAFETY comment
explaining why the call is sound: that rlim is properly initialized with
std::mem::zeroed() to provide a valid out-pointer, that libc::getrlimit only
writes to that struct and expects a non-null pointer, and that the call is
guarded by #[cfg(unix)] so it's only compiled on platforms where
libc::RLIMIT_NOFILE and getrlimit exist; add this SAFETY comment immediately
above the unsafe block in get_fd_limit, or replace the manual FFI with a safe
wrapper (e.g., rustix) if you prefer to remove unsafe entirely.
- Around line 541-543: get_fd_count() currently opens a directory
("/proc/self/fd" on Linux or "/dev/fd" on macOS) and counts entries, but the
directory stream itself allocates a file descriptor so the result is off by one;
change the logic in get_fd_count() to try reading "/proc/self/fd" then fall back
to "/dev/fd", count the entries from the ReadDir iterator, and subtract one to
exclude the directory's own fd (use saturating_sub(1) to avoid underflow) before
returning the u64; ensure failures still return None.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cdee6a2c-80a4-49b3-904e-d2a490b55f6d

📥 Commits

Reviewing files that changed from the base of the PR and between 97af2bc and 7888f32.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

Cargo.toml
src/dashboard.html
src/diagnostics.rs

coderabbitai · 2026-03-31T21:55:13Z

+    <div class="tab active" data-tab="overview">Overview</div>
+    <div class="tab" data-tab="pipelines">Pipelines</div>
+    <div class="tab" data-tab="config">Config</div>


⚠️ Potential issue | 🟠 Major

Tabs need semantic controls and keyboard support.

These are clickable <div>s only. They aren't focusable and there is no keyboard/ARIA state management, so keyboard users can't switch views.

Also applies to: 259-267

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` around lines 75 - 77, The tab elements currently rendered as non-focusable <div class="tab" data-tab="..."> must be converted to accessible tab controls: wrap the tab set in an element with role="tablist", change each .tab to a focusable control (preferably a <button> or an element with role="tab" + tabindex="0"), add aria-selected="true"/"false" on the active/inactive tabs, set aria-controls to the corresponding tabpanel id and ensure the panels have role="tabpanel" and matching ids, and implement keyboard handling for Enter/Space to activate a tab and ArrowLeft/ArrowRight/Home/End to move focus between tabs; apply the same changes for the other tab group referenced (lines 259-267) so both groups support focus, ARIA state, and keyboard navigation.

coderabbitai · 2026-03-31T21:55:14Z

+        let cpu_pct = proc_info.map(|p| p.cpu_usage()).unwrap_or(0.0);
+        let rss_bytes = proc_info.map(|p| p.memory()).unwrap_or(0);
+        let virt_bytes = proc_info.map(|p| p.virtual_memory()).unwrap_or(0);
+        let disk_read = proc_info.map(|p| p.disk_usage().read_bytes).unwrap_or(0);
+        let disk_written = proc_info.map(|p| p.disk_usage().written_bytes).unwrap_or(0);


⚠️ Potential issue | 🟡 Minor

Use Option types for process metrics instead of sentinel zeros.

Lines 322-326 and 350-357 collapse missing process data to 0 and 0.0. This prevents the dashboard from distinguishing "unavailable" from actual zero readings, unlike fd_count/fd_limit which already use Option types that serialize to null. Change these to Option<f64> and Option<u64> respectively and serialize null on None.

Per coding guidelines: "Use Option type instead of sentinel values like 0, -1, or empty strings."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/diagnostics.rs` around lines 322 - 326, The code currently collapses missing process metrics into sentinel zeros for cpu_pct, rss_bytes, virt_bytes, disk_read, and disk_written; change these variables and the corresponding struct fields to Option<f64> for percentages and Option<u64> for byte counters (e.g., cpu_pct -> Option<f64>, rss_bytes/virt_bytes/disk_read/disk_written -> Option<u64>) and replace unwrap_or(0.0)/unwrap_or(0) with map(...) (leaving None when proc_info is None) so they serialize to null; update any struct definitions (e.g., the diagnostics/process metrics struct) and downstream usage to handle Option values and ensure serde continues to serialize None as null.

coderabbitai · 2026-03-31T21:55:14Z

+        std::fs::read_dir("/proc/self/fd")
+            .ok()
+            .map(|entries| entries.count() as u64)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n src/diagnostics.rs | sed -n '535,560p'

Repository: strawgate/memagent

Length of output: 1014

🌐 Web query:

/proc/self/fd read_dir file descriptor self-counting enumeration

💡 Result:

When you perform a read_dir (readdir) operation on /proc/self/fd (or /proc/[pid]/fd), which lists the open file descriptors of the current process as symbolic links, the act of opening the directory for reading allocates a new file descriptor (fd). This new fd is included in the directory listing itself, creating a self-referential or "self-counting" effect. For example, running ls -l /proc/self/fd opens the directory on some fd N (often 3 or 4, depending on existing fds), and the listing shows entries 0,1,2,...,N where N points back to /proc/[pid]/fd/. This is evident from: - Unix StackExchange explanation: When ls opens /proc/self/fd, it gets fd 4, shown as 4 -> /proc/29512/fd/. - SuperUser on wildcard expansion: Shell opens dir for * expansion, includes the fd (e.g., 3), closes after, but list has it transiently. - Kernel source fs/proc/fd.c: proc_readfd iterates over the process's fdtable, dynamically generating entries for all currently open fds, including any opened during the readdir (like the dir fd itself). To accurately count open fds using readdir on /proc/self/fd in C, subtract 3 from the readdir count (accounting for '.', '..', and the dir fd itself), as shown in code examples: int count_open_fds(void) { DIR *dp = opendir("/proc/self/fd"); struct dirent *de; int count = -3; // '.', '..', dp fd while ((de = readdir(dp)) != NULL) count++; closedir(dp); return count; } This adjustment handles the self-counting enumeration. Official docs (proc_pid_fd(5)) confirm /proc/pid/fd lists all open fds dynamically, without noting exclusion of the dir fd, consistent with behavior.

Citations:

1: https://unix.stackexchange.com/questions/676683/what-does-the-output-of-ll-proc-self-fd-from-ll-dev-fd-mean

2: http://www.man7.org/linux/man-pages/man5/proc_pid_fd.5.html

3: https://stackoverflow.com/questions/7976769/getting-count-of-current-used-file-descriptors-from-c-code

4: https://stackoverflow.com/questions/6583158/finding-open-file-descriptors-for-a-process-linux-c-code/9750021

5: https://superuser.com/questions/1820205/list-of-proc-fd-path

6: https://elixir.bootlin.com/linux/latest/source/fs/proc/fd.c

7: https://mechpen.github.io/posts/2023-07-01-proc-fd-files/

🌐 Web query:

macOS /dev/fd read_dir self-counting file descriptor enumeration

💡 Result:

On macOS, /dev/fd is a per-process virtual directory (fdescfs, borrowed from FreeBSD) that lists numeric entries corresponding to the open file descriptors (FDs) of the process accessing it. Each entry /dev/fd/N represents FD N, allowing access equivalent to duping that FD. When a process calls opendir or read_dir on /dev/fd (e.g., ls /dev/fd, std::fs::read_dir("/dev/fd"), Directory.EnumerateFileSystemEntries("/dev/fd")), it allocates a new directory FD, say M (typically the next available FD, like 3 or 4). This new FD M is included in the list of FDs enumerated by readdir, because /dev/fd reflects all open FDs of the current process at enumeration time, including the one just opened for the directory itself. Thus, std::fs::read_dir("/dev/fd").count returns (true open FDs before + 1), overcounting by 1 due to self-inclusion. This is the "self-counting" behavior. Examples: - Minimal process (stdin/stdout/stderr): ls /dev/fd shows 0 1 2 3 4; 3 is cwd FD, 4 is /dev/fd dir FD (dtruss confirms open_nocancel returns 3/4). - Rust: std::fs::read_dir("/dev/fd").unwrap.count == true FDs + 1. To accurately count open FDs: - Use proc_pidinfo(PROC_PIDLISTFDS) from libproc.h (counts p->p_fd->fd_nfiles, all types including sockets/pipes/kqueues). - Avoid /dev/fd for counting due to self-inclusion and limitations (e.g., misses some FD types; stat fails on kqueues causing errors in some APIs like ioutil.ReadDir). No . and .. entries in /dev/fd (per mount_fdesc manpage). Not NFS-exportable. XNU sources (devfs_vnops.c references fdesc.h) confirm integration, but full fdesc_vnops.c not public; behavior matches FreeBSD fdescfs where readdir iterates curproc->p_fd->fd_ofiles array. Sources confirm up to macOS 15+ (xnu-12377 Darwin 25). No changes noted post-2021.

Citations:

1: https://zameermanji.com/blog/2021/8/1/counting-open-file-descriptors-on-macos/

2: Getting file descriptor count on macOS throws dotnet/runtime#65254

3: https://stackoverflow.com/questions/48438404/cant-call-ioutil-readdir-on-dev-fd-on-macos

4: https://liamseanbrady.wordpress.com/2016/03/28/the-great-file-descriptor-mystery/

5: http://www.newosxbook.com/src.php?file=%2Fbsd%2Fman%2Fman4%2Ffd.4&tree=xnu

6: https://leopard-adc.pepas.com/documentation/Darwin/Reference/ManPages/man8/mount_fdesc.8.html

7: http://www.newosxbook.com/src.php?file=%2Fbsd%2Fmiscfs%2Fdevfs%2Fdevfs_vnops.c&tree=xnu

8: https://newosxbook.com/src.php?file=%2Fbsd%2Fmiscfs%2Fdevfs%2Fdevfs_vnops.c&tree=xnu

9: https://unix.superglobalmegacorp.com/NetBSD-0.8/newsrc/miscfs/fdesc/fdesc_vnops.c.html

10: https://stackoverflow.com/questions/44413909/vnode-and-file-descriptor-in-xnu-where-does-the-file-operation-vector-is-stored

Fix get_fd_count() self-counting on both Linux and macOS.

Opening /proc/self/fd (Linux) or /dev/fd (macOS) allocates a new file descriptor that gets included in the enumeration, causing the count to report N+1.

🛠️ Minimal fix

std::fs::read_dir("/proc/self/fd") .ok() - .map(|entries| entries.count() as u64) + .map(|entries| (entries.count() as u64).saturating_sub(1))

std::fs::read_dir("/dev/fd") .ok() - .map(|entries| entries.count() as u64) + .map(|entries| (entries.count() as u64).saturating_sub(1))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

std::fs::read_dir("/proc/self/fd")

.ok()

.map(|entries| entries.count() as u64)

std::fs::read_dir("/proc/self/fd")

.ok()

.map(|entries| (entries.count() as u64).saturating_sub(1))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/diagnostics.rs` around lines 541 - 543, get_fd_count() currently opens a directory ("/proc/self/fd" on Linux or "/dev/fd" on macOS) and counts entries, but the directory stream itself allocates a file descriptor so the result is off by one; change the logic in get_fd_count() to try reading "/proc/self/fd" then fall back to "/dev/fd", count the entries from the ReadDir iterator, and subtract one to exclude the directory's own fd (use saturating_sub(1) to avoid underflow) before returning the u64; ensure failures still return None.

coderabbitai · 2026-03-31T21:55:14Z

+/// Get the soft FD limit (ulimit -n).
+fn get_fd_limit() -> Option<u64> {
+    #[cfg(unix)]
+    {
+        unsafe {
+            let mut rlim: libc::rlimit = std::mem::zeroed();
+            if libc::getrlimit(libc::RLIMIT_NOFILE, &mut rlim) == 0 {
+                Some(rlim.rlim_cur as u64)
+            } else {
+                None
+            }
+        }
+    }
+    #[cfg(not(unix))]
+    {
+        None
+    }
+}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check the actual function context and usage grep -n "get_fd_limit\|diagnostics" src/diagnostics.rs | head -20

Repository: strawgate/memagent

Length of output: 446

🏁 Script executed:

# Check what dependencies are already in use cat Cargo.toml | grep -A 50 "\[dependencies\]"

Repository: strawgate/memagent

Length of output: 898

🏁 Script executed:

# Search for other unsafe blocks in the repository to understand the pattern rg "unsafe\s*{" --type rust -B 2 -A 5 | head -100

Repository: strawgate/memagent

Length of output: 145

🏁 Script executed:

# Check if there are existing safe wrappers being used for system calls rg "libc::" --type rust | head -20

Repository: strawgate/memagent

Length of output: 230

🏁 Script executed:

# Correct regex for finding unsafe blocks rg "unsafe\s" --type rust -B 2 -A 5 | head -150

Repository: strawgate/memagent

Length of output: 4285

🏁 Script executed:

# Check if there's a safe wrapper crate like rustix that's available web_search "rust safe getrlimit wrapper crate"

Repository: strawgate/memagent

Length of output: 112

🏁 Script executed:

# Look at the coding guidelines files mentioned in the learnings cat .github/copilot-instructions.md CODE_STYLE.md 2>/dev/null | head -100

Repository: strawgate/memagent

Length of output: 44

🏁 Script executed:

# Check the context around line 330 where get_fd_limit is called sed -n '325,335p' src/diagnostics.rs

Repository: strawgate/memagent

Length of output: 512

🏁 Script executed:

# Check if there's already a SAFETY comment for the getrlimit call sed -n '560,570p' src/diagnostics.rs

Repository: strawgate/memagent

Length of output: 405

🏁 Script executed:

# Search for rustix crate usage (safe getrlimit wrapper) rg "rustix" --type rust

Repository: strawgate/memagent

Length of output: 44

🏁 Script executed:

# Verify the performance-critical paths in guidelines grep -r "SIMD\|wide\|performance" --include="*.md" | head -20

Repository: strawgate/memagent

Length of output: 501

🏁 Script executed:

# Check for SAFETY comments pattern in the codebase rg "SAFETY:" --type rust -A 1 | head -30

Repository: strawgate/memagent

Length of output: 592

🏁 Script executed:

# Verify if this is indeed only called once (diagnostics, not hot path) rg "get_fd_limit\|fd_limit" --type rust -B 2 -A 2

Repository: strawgate/memagent

Length of output: 44

🏁 Script executed:

# Check dev-docs/DECISIONS.md for any guidance on unsafe cat dev-docs/DECISIONS.md 2>/dev/null | head -100

Repository: strawgate/memagent

Length of output: 44

🏁 Script executed:

# Check if the unsafe block needs a SAFETY comment per the pattern sed -n '558,572p' src/diagnostics.rs

Repository: strawgate/memagent

Length of output: 424

🏁 Script executed:

# Verify if this function is only called once and in initialization rg "get_fd_limit" --type rust -B 5 -A 5

Repository: strawgate/memagent

Length of output: 1200

Add a SAFETY comment justifying the FFI call, or use a safe wrapper crate.

The getrlimit call lacks a SAFETY comment explaining why the unsafe block is necessary. Following the codebase pattern (see batch_builder.rs), add a comment. Alternatively, consider using the rustix crate for a safe API if adding a dependency is acceptable.

Note: The guideline about "no hand-written unsafe code" specifically refers to performance-critical SIMD operations (via the wide crate), not diagnostics FFI. However, the missing safety documentation is still a gap.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/diagnostics.rs` around lines 560 - 577, The unsafe FFI usage in get_fd_limit (the unsafe block calling libc::getrlimit with a zeroed libc::rlimit) needs a SAFETY comment explaining why the call is sound: that rlim is properly initialized with std::mem::zeroed() to provide a valid out-pointer, that libc::getrlimit only writes to that struct and expects a non-null pointer, and that the call is guarded by #[cfg(unix)] so it's only compiled on platforms where libc::RLIMIT_NOFILE and getrlimit exist; add this SAFETY comment immediately above the unsafe block in get_fd_limit, or replace the manual FFI with a safe wrapper (e.g., rustix) if you prefer to remove unsafe entirely.

coderabbitai · 2026-03-31T21:55:14Z

+        let _handle = server.start();
+
+        std::thread::sleep(std::time::Duration::from_millis(100));


⚠️ Potential issue | 🟠 Major

Replace the fixed sleeps with a real readiness check.

Each new endpoint test sleeps 100ms and assumes the server is already listening. That's racy on slower CI runners and will produce intermittent failures unrelated to the feature under test.

Also applies to: 786-788, 801-803

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/diagnostics.rs` around lines 769 - 771, The tests use server.start() followed by a fixed std::thread::sleep(Duration::from_millis(100)), which is racy; replace each sleep (the occurrences immediately after server.start()) with a real readiness check that polls the server until it accepts connections (e.g., loop for a short timeout calling std::net::TcpStream::connect(addr) or an HTTP probe to the test endpoint, breaking on success and sleeping a few ms between attempts), and fail the test if the timeout elapses; update the code paths around server.start() and the three std::thread::sleep(...) occurrences so the test only proceeds once the TCP/HTTP connection is established.

Window grows continuously: max(10s, dataAge * 1.2), capped at 5min. The 20% headroom means the line never touches the left edge. No more jarring jumps from 10s→30s→1m. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

src/dashboard.html (1)

74-77: ⚠️ Potential issue | 🟠 Major

Use real tab semantics here.

These are still click-only <div>s, so keyboard users cannot focus or switch tabs, and the panels still have no tab/tabpanel relationship. Please convert them to focusable tab controls and add the ARIA + arrow-key behavior.

Also applies to: 83-90, 266-275

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/dashboard.html` around lines 74 - 77, The current clickable tab DIVs
(elements with class "tab" and container "tabs") must be converted to accessible
tabs: make the container use role="tablist" and replace each clickable .tab DIV
with a focusable control (preferably <button> or an element with role="tab",
tabindex="0"/"-1") that has aria-selected, aria-controls pointing to its
corresponding panel id, and ensure each panel has role="tabpanel" and
aria-labelledby linking back. Update activation logic in the script that toggles
the "active" state (look for handlers referencing data-tab/data-panel or class
"active") to manage focus and states: when a tab is activated set
aria-selected="true", tabindex="0", move focus as appropriate, and set other
tabs to aria-selected="false" and tabindex="-1"; implement keyboard handling on
the tab elements to support Left/Right arrow (move focus, optional activation),
Home/End (go to first/last), and Enter/Space to activate the focused tab. Also
ensure visual styling (.active) is preserved when switching to aria-driven
attributes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/dashboard.html`:
- Around line 89-90: The dashboard currently renders raw_yaml from loadConfig()
into `#configContent` and the /api/config handler returns that payload unchanged,
potentially exposing secrets; update the server-side /api/config handler to
scrub/redact sensitive keys (e.g., api_key, secret, password, token,
private_key, aws_access_key_id, aws_secret_access_key) before returning the
config JSON, or alternatively require explicit auth/opt-in before serving this
endpoint; adjust loadConfig() client-side to expect redacted_config (instead of
raw_yaml) and ensure the UI element id="#tab-config" / `#configContent` only
displays sanitized content or an opt-in prompt.
- Around line 241-250: The CHART_DEFS entry currently using id "c_fds" and label
"Open FDs" is wrong for the new disk I/O metric; update the chart definition
(CHART_DEFS entry with id "c_fds") to represent disk read throughput: set label
to "Disk Read", unit to "/s", color/formatter to use fmtBytesCompact, and ensure
anywhere the dashboard feeds this chart (previously passing fd_count) is changed
to consume the backend's cumulative disk_read_bytes series and compute a
per-second rate (derivative/delta over time) to plot bytes/sec; also apply the
same fix to the corresponding duplicated definition referenced around lines
459-462.
- Around line 342-376: The current UI freezes DOM structure via
summariesBuilt/detailsBuilt and updates elements by numeric index (ids like
"ps_i{pi}_{i}", "ps_o{pi}_{i}", "ps_t{pi}", "ps_td{pi}"), causing
stale/misplaced stats if pipelines/inputs/outputs are added/removed/reordered;
fix by detecting topology changes and rebuilding the DOM: compute a stable
signature (e.g., join of pipeline names plus ordered input/output names and
presence of transform.sql) from data.pipelines and compare it to a stored
signature each poll, and if different set summariesBuilt/detailsBuilt=false (or
re-run the container.innerHTML rebuild) so the code that generates the markup
(the block using container.innerHTML=html and the id patterns above) is
re-executed; alternatively, change ids to stable keys using
pipeline/input/output names instead of indices so the update loop (where
setText(document.getElementById(...)) is called) always targets the correct
elements.

---

Duplicate comments:
In `@src/dashboard.html`:
- Around line 74-77: The current clickable tab DIVs (elements with class "tab"
and container "tabs") must be converted to accessible tabs: make the container
use role="tablist" and replace each clickable .tab DIV with a focusable control
(preferably <button> or an element with role="tab", tabindex="0"/"-1") that has
aria-selected, aria-controls pointing to its corresponding panel id, and ensure
each panel has role="tabpanel" and aria-labelledby linking back. Update
activation logic in the script that toggles the "active" state (look for
handlers referencing data-tab/data-panel or class "active") to manage focus and
states: when a tab is activated set aria-selected="true", tabindex="0", move
focus as appropriate, and set other tabs to aria-selected="false" and
tabindex="-1"; implement keyboard handling on the tab elements to support
Left/Right arrow (move focus, optional activation), Home/End (go to first/last),
and Enter/Space to activate the focused tab. Also ensure visual styling
(.active) is preserved when switching to aria-driven attributes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dab55642-e210-41cc-8f0c-21850b486094

📥 Commits

Reviewing files that changed from the base of the PR and between 7888f32 and 404bf2e.

📒 Files selected for processing (1)

src/dashboard.html

coderabbitai · 2026-03-31T22:03:22Z

+<div class="tab-content" id="tab-config">
+  <div class="panel"><div class="panel-title">Configuration</div><div id="configContent"><span style="color:var(--text3)">Loading...</span></div></div>


⚠️ Potential issue | 🟠 Major

Don't expose raw config YAML here.

loadConfig() renders raw_yaml verbatim, and the paired /api/config handler returns that payload unchanged. If the config contains credentials or tokens, they are already disclosed to any dashboard viewer; client-side highlighting does not mitigate that. Please redact sensitive keys server-side before returning this payload, or gate the tab behind explicit auth/opt-in.

Also applies to: 469-474

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` around lines 89 - 90, The dashboard currently renders raw_yaml from loadConfig() into `#configContent` and the /api/config handler returns that payload unchanged, potentially exposing secrets; update the server-side /api/config handler to scrub/redact sensitive keys (e.g., api_key, secret, password, token, private_key, aws_access_key_id, aws_secret_access_key) before returning the config JSON, or alternatively require explicit auth/opt-in before serving this endpoint; adjust loadConfig() client-side to expect redacted_config (instead of raw_yaml) and ensure the UI element id="#tab-config" / `#configContent` only displays sanitized content or an opt-in prompt.

coderabbitai · 2026-03-31T22:03:22Z

+var state={prev:null,prevTime:null,prevSystem:null,charts:{},configLoaded:false,sysInfoShown:false};
+
+var CHART_DEFS=[
+  {id:"c_lps",label:"Lines / sec",color:"#3b82f6",unit:"/s",fmtVal:fmtCompact},
+  {id:"c_bps",label:"Throughput",color:"#8b5cf6",unit:"/s",fmtVal:fmtBytesCompact},
+  {id:"c_errs",label:"Errors / sec",color:"#ef4444",unit:"/s",fmtVal:fmtCompact},
+  {id:"c_cpu",label:"Process CPU",color:"#f59e0b",unit:"%",fmtVal:fmtCompact},
+  {id:"c_rss",label:"Memory (RSS)",color:"#22c55e",unit:"",fmtVal:fmtBytesCompact},
+  {id:"c_fds",label:"Open FDs",color:"#06b6d4",unit:"",fmtVal:fmtCompact},
+];


⚠️ Potential issue | 🟠 Major

The sixth chart is plotting FDs, not disk read/sec.

The PR objective calls for a disk read throughput chart, and the backend already exposes cumulative disk_read_bytes, but this series is defined as Open FDs and fed fd_count. The dashboard never renders the advertised disk I/O metric.

🛠️ Minimal direction

- {id:"c_fds",label:"Open FDs",color:"#06b6d4",unit:"",fmtVal:fmtCompact}, + {id:"c_diskr",label:"Disk read",color:"#06b6d4",unit:"/s",fmtVal:fmtBytesCompact}, ... - if(sys.fd_count!=null){c.c_fds.add(sys.fd_count); - var fdStr=sys.fd_count; - if(sys.fd_limit!=null)fdStr+='<span class="chart-unit"> / '+fmt(sys.fd_limit)+'</span>'; - document.getElementById("c_fds_val").innerHTML=fdStr; - } + if(sys.disk_read_bytes!=null && state.prevSystem && dt>0){ + var diskReadBps=Math.max(0,(sys.disk_read_bytes-(state.prevSystem.disk_read_bytes||0))/dt); + c.c_diskr.add(diskReadBps); + document.getElementById("c_diskr_val").innerHTML= + fmtBytes(diskReadBps)+'<span class="chart-unit">/s</span>'; + }

Also applies to: 459-462

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` around lines 241 - 250, The CHART_DEFS entry currently using id "c_fds" and label "Open FDs" is wrong for the new disk I/O metric; update the chart definition (CHART_DEFS entry with id "c_fds") to represent disk read throughput: set label to "Disk Read", unit to "/s", color/formatter to use fmtBytesCompact, and ensure anywhere the dashboard feeds this chart (previously passing fd_count) is changed to consume the backend's cumulative disk_read_bytes series and compute a per-second rate (derivative/delta over time) to plot bytes/sec; also apply the same fix to the corresponding duplicated definition referenced around lines 459-462.

github-actions · 2026-03-31T22:26:30Z

+    var now=Date.now(),dt=state.prevTime?(now-state.prevTime)/1000:0;
+    renderRibbon(pipe,sys);renderSystemInfo(sys);
+    updateCharts(pipe,sys,dt);renderPipelineSummaries(pipe);renderPipelineDetails(pipe,dt);
+    state.prev=pipe;state.prevTime=now;state.prevSystem=sys;


state.prev is assigned before validating the /api/pipelines payload shape. If the first successful fetch returns JSON without pipelines (e.g., transient proxy error body), this line stores an invalid snapshot; on the next poll, updateCharts() reaches state.prev.pipelines.forEach(...) and throws, so the dashboard stays disconnected and never recovers.

Please only update state.prev after validating pipe.pipelines is an array (and ideally check r.ok before r.json()).

Rewrote the dashboard from scratch with a fundamentally different information architecture. No more tabs or duplicated info. Design philosophy: pipeline diagram is the hero, everything else supports it. Like a fighter jet HUD — dark, dense, at-a-glance. Layout (single scrollable page): - Status bar: health indicator (green/red), key vitals, uptime - Pipeline diagram: clickable nodes (input → transform → output) with live stats. Click any node → inline detail panel expands showing that component's config + metrics - Metric sparkline cards: 6 compact cards with inline Canvas sparklines. Click to expand full chart below. - Collapsible System info: hostname, OS, arch, cores, memory all on the summary line — expand for full detail grid - Collapsible Config: file path on summary, expand for syntax-highlighted YAML Technical improvements: - Alpine.js (CDN) for all reactive data binding — eliminated 300+ lines of manual DOM manipulation - EMA smoothing (alpha=0.3) on all rate computations — no more zigzag sparklines from poll jitter - Sparklines show baseline at zero (not invisible) - All values use tabular-nums + nowrap — no layout twitching - Adaptive time window (10s → 5min) grows smoothly with data - Clean separation: formatters, rate tracking, chart engine, Alpine component Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (2)

src/dashboard.html (2)
252-253: ⚠️ Potential issue | 🟠 Major

Don't ship raw config YAML to every dashboard viewer.

Line 252 fetches the config during init, and loadConfig() renders d.raw_yaml directly. src/diagnostics.rs:366-380 returns that field verbatim, so any secrets in the YAML are disclosed in the browser even if the panel is never opened. Redact sensitive keys server-side or gate /api/config behind explicit auth/opt-in before rendering this panel.

Also applies to: 460-463
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dashboard.html` around lines 252 - 253, The page is currently invoking
loadConfig() on init and rendering d.raw_yaml into cfgHtml (dashboard.html
x-init and x-html), which exposes secrets returned verbatim by diagnostics.rs
(d.raw_yaml) to any viewer; modify the flow so the client never fetches raw YAML
by default — either gate /api/config behind proper authentication/explicit
opt-in before loadConfig() runs (remove automatic x-init call) or change
diagnostics.rs to return a sanitized version (redacting sensitive keys) instead
of d.raw_yaml; ensure functions/handlers named loadConfig, cfgHtml, d.raw_yaml,
and the /api/config endpoint are updated accordingly so the dashboard only
receives and displays a redacted config or requires auth.
352-358: ⚠️ Potential issue | 🟠 Major

The sixth metric is still FDs, not disk read/sec.

src/diagnostics.rs:300-358 already exposes cumulative disk_read_bytes, but this dashboard still defines and feeds fds. That leaves the advertised disk-read chart missing. Track the previous system snapshot and plot max(0, delta(disk_read_bytes) / dt) instead.

Also applies to: 442-445
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dashboard.html` around lines 352 - 358, The dashboard still defines the
sixth metric as 'fds' in metricDefs; replace that entry with an id like
'disk_rps' (label "Disk Read", unit '/s', appropriate color and formatter, e.g.
fmtBC) and update the code that feeds this metric (the block around the existing
feed logic for lines ~442-445) to compute a rate from the exposed cumulative
disk_read_bytes: keep a previous snapshot for disk_read_bytes and timestamp,
compute rate = max(0, (cur.disk_read_bytes - prev.disk_read_bytes) / (cur_ts -
prev_ts)), and push that value to the 'disk_rps' series instead of sending fds;
ensure prev is initialized on first sample and use the same metric id 'disk_rps'
in both metricDefs and the feed code.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Line 6: Remove the global '*.png' entry from .gitignore and replace it with a
scoped pattern that only ignores generated Playwright artifacts (for example use
'.playwright-mcp/**/*.png' or the specific generated path you expect) so UI/docs
images are not accidentally hidden; target the '*.png' entry for removal and add
the scoped pattern such as '.playwright-mcp/' or a generated-assets path
instead.

In `@src/dashboard.html`:
- Line 181: The selection key logic is ambiguous and leads to stale/mismatched
details: stop using prefix matching (sel.startsWith(String(pi))) and the cached
selData; instead build and store a stable composite key (e.g.,
"pipelineName:type:nodeName") when toggle() sets selection, and on each render
derive the selected data from this.pipes using that stable key (replace checks
at sel.startsWith(String(pi)) and the similar check around line 383), so the
detail panel always looks up fresh data rather than using cached selData.
- Line 139: The interactive pipe-node elements (e.g., the element with class
"pipe-node inp-border" that uses the click handler calling
toggle(pi+'_i'+ii,{type:'input',cfg:inp,stats:inp}) and the selection expression
sel===pi+'_i'+ii) must be made keyboard-accessible: replace the <div> with a
<button type="button"> or, if keeping a non-semantic element, add role="button",
tabindex="0", and a keydown handler that triggers the same toggle call on
Enter/Space and reflect selection with an appropriate ARIA state (e.g.,
aria-pressed or aria-selected). Apply the same change to the other similar nodes
referenced (lines around 153, 168, 213) so all metric cards/pipeline nodes
provide focus, keyboard activation, and ARIA state.
- Around line 397-404: compRate currently mutates prevSnap and computes dt using
the global prevTime at render time, causing tiny dt and spikes; move the
delta/ema computation out of the template into the poll handler so it uses the
poll timestamp (not Date.now() in the render) and update prevSnap there; also
include pipeline identity in the history key (e.g., use
comp_<pipelineId>_<obj.name>_<field> instead of comp_<obj.name>_<field>) and
keep emaRates keyed the same way; ensure prevTime is read/set only in the poll
logic (compute dt = (pollTimestamp - prevTime)/1000 before updating prevTime)
and have compRate only read the precomputed rate value for display.
- Line 7: The embedded dashboard currently pulls Alpine.js from jsDelivr with a
floating `@3` tag and no integrity check; replace this CDN dependency by vendoring
a pinned Alpine.js build into the app (e.g., add a local file like alpine.min.js
built from a specific version such as 3.x.y), update the <script> tag in
src/dashboard.html to reference that local bundle (or if you must use CDN, use
an exact version string and add an SRI integrity attribute and crossorigin
attribute), and ensure the vendored file is added to your static assets bundling
so the page no longer fails when external CDNs are unavailable.

---

Duplicate comments:
In `@src/dashboard.html`:
- Around line 252-253: The page is currently invoking loadConfig() on init and
rendering d.raw_yaml into cfgHtml (dashboard.html x-init and x-html), which
exposes secrets returned verbatim by diagnostics.rs (d.raw_yaml) to any viewer;
modify the flow so the client never fetches raw YAML by default — either gate
/api/config behind proper authentication/explicit opt-in before loadConfig()
runs (remove automatic x-init call) or change diagnostics.rs to return a
sanitized version (redacting sensitive keys) instead of d.raw_yaml; ensure
functions/handlers named loadConfig, cfgHtml, d.raw_yaml, and the /api/config
endpoint are updated accordingly so the dashboard only receives and displays a
redacted config or requires auth.
- Around line 352-358: The dashboard still defines the sixth metric as 'fds' in
metricDefs; replace that entry with an id like 'disk_rps' (label "Disk Read",
unit '/s', appropriate color and formatter, e.g. fmtBC) and update the code that
feeds this metric (the block around the existing feed logic for lines ~442-445)
to compute a rate from the exposed cumulative disk_read_bytes: keep a previous
snapshot for disk_read_bytes and timestamp, compute rate = max(0,
(cur.disk_read_bytes - prev.disk_read_bytes) / (cur_ts - prev_ts)), and push
that value to the 'disk_rps' series instead of sending fds; ensure prev is
initialized on first sample and use the same metric id 'disk_rps' in both
metricDefs and the feed code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 257234f3-6d05-4dac-af5c-1d50b7217181

📥 Commits

Reviewing files that changed from the base of the PR and between 404bf2e and c9306b2.

📒 Files selected for processing (2)

.gitignore
src/dashboard.html

coderabbitai · 2026-03-31T22:39:32Z

+build
+# Playwright MCP artifacts
+.playwright-mcp/
+*.png


⚠️ Potential issue | 🟡 Minor

Avoid globally ignoring every PNG.

*.png will hide any new checked-in UI assets or docs images, not just generated Playwright output. If the intent is artifact cleanup, .playwright-mcp/ already covers that directory; otherwise scope the ignore to a generated path/pattern instead of the whole repo.

🛠️ Minimal fix

-*.png

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

*.png

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.gitignore at line 6, Remove the global '*.png' entry from .gitignore and replace it with a scoped pattern that only ignores generated Playwright artifacts (for example use '.playwright-mcp/**/*.png' or the specific generated path you expect) so UI/docs images are not accidentally hidden; target the '*.png' entry for removal and add the scoped pattern such as '.playwright-mcp/' or a generated-assets path instead.

coderabbitai · 2026-03-31T22:39:32Z

-<title>logfwd dashboard</title>
+<meta name="viewport" content="width=device-width, initial-scale=1">
+<title>logfwd</title>
+<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3/dist/cdn.min.js"></script>


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if src/dashboard.html exists and read around line 7 cat -n src/dashboard.html | head -20

Repository: strawgate/memagent

Length of output: 1012

🏁 Script executed:

# Search for other Alpine.js references or bundled versions fd -i "alpine" --type f

Repository: strawgate/memagent

Length of output: 44

🏁 Script executed:

# Check for any bundled assets or build configuration git ls-files | grep -E "(package\.json|build|dist|assets)" | head -20

Repository: strawgate/memagent

Length of output: 82

Don't make the embedded dashboard depend on a public CDN.

The application relies entirely on jsDelivr to deliver Alpine.js; if the CDN is unavailable or blocked, this page fails to load. The URL uses a floating @3 version tag instead of a pinned release, and lacks an integrity hash. Ship the bundle with the app, or at minimum pin an exact version with an SRI hash.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` at line 7, The embedded dashboard currently pulls Alpine.js from jsDelivr with a floating `@3` tag and no integrity check; replace this CDN dependency by vendoring a pinned Alpine.js build into the app (e.g., add a local file like alpine.min.js built from a specific version such as 3.x.y), update the <script> tag in src/dashboard.html to reference that local bundle (or if you must use CDN, use an exact version string and add an SRI integrity attribute and crossorigin attribute), and ensure the vendored file is added to your static assets bundling so the page no longer fails when external CDNs are unavailable.

coderabbitai · 2026-03-31T22:39:32Z

+        <template x-for="(inp,ii) in p.inputs" :key="inp.name">
+          <div style="display:contents">
+            <div class="pipe-conn" x-show="ii>0"><svg viewBox="0 0 32 16"><line x1="0" y1="8" x2="24" y2="8"/><polygon points="24,4 32,8 24,12"/></svg></div>
+            <div class="pipe-node inp-border" :class="sel===pi+'_i'+ii&&'selected'" @click="toggle(pi+'_i'+ii,{type:'input',cfg:inp,stats:inp})">


⚠️ Potential issue | 🟠 Major

Use real controls for the clickable nodes and metric cards.

These are interactive <div>s with only @click, so keyboard users cannot focus or activate them. Convert them to <button type="button"> elements, or add role="button", tabindex="0", Enter/Space handling, and the appropriate ARIA state.

Also applies to: 153-153, 168-168, 213-213

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` at line 139, The interactive pipe-node elements (e.g., the element with class "pipe-node inp-border" that uses the click handler calling toggle(pi+'_i'+ii,{type:'input',cfg:inp,stats:inp}) and the selection expression sel===pi+'_i'+ii) must be made keyboard-accessible: replace the <div> with a <button type="button"> or, if keeping a non-semantic element, add role="button", tabindex="0", and a keydown handler that triggers the same toggle call on Enter/Space and reflect selection with an appropriate ARIA state (e.g., aria-pressed or aria-selected). Apply the same change to the other similar nodes referenced (lines around 153, 168, 213) so all metric cards/pipeline nodes provide focus, keyboard activation, and ARIA state.

coderabbitai · 2026-03-31T22:39:32Z

+        </template>
+      </div>
+      <!-- Detail panel (expanded when node clicked) -->
+      <template x-if="selData&&sel.startsWith(String(pi))">


⚠️ Potential issue | 🟠 Major

Selection keys are ambiguous and go stale.

Line 181 uses sel.startsWith(String(pi)), so selecting pipeline 10 also matches pi === 1. toggle() then caches the clicked object in selData, so after the next poll the detail panel can render under the wrong pipeline and keep stale stats. Store a stable key such as pipelineName:type:nodeName and derive the selected data from this.pipes on each render.

Also applies to: 383-383

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` at line 181, The selection key logic is ambiguous and leads to stale/mismatched details: stop using prefix matching (sel.startsWith(String(pi))) and the cached selData; instead build and store a stable composite key (e.g., "pipelineName:type:nodeName") when toggle() sets selection, and on each render derive the selected data from this.pipes using that stable key (replace checks at sel.startsWith(String(pi)) and the similar check around line 383), so the detail panel always looks up fresh data rather than using cached selData.

Information architecture: System > Metrics > Pipelines > Config All sections visible (no collapsing), single scrollable page. Changes: - System: compact card grid (hostname, OS, arch, cores, memory, RSS, FDs/limit, PID) — always visible, no collapsing - Metrics: proper charts with Y-axis labels, time markers, and gradient fills. Restored the good chart rendering from v2. - Pipeline: flow diagram with clickable nodes. Click opens an inspector panel with blue accent border, shadow, close button — visually distinct from the pipeline row above it - Config: always visible with syntax-highlighted YAML - Status bar: simplified to brand + health pill + version/uptime (no duplicated metric numbers) - EMA smoothing on all rates prevents zigzag sparklines - Filed 4 instrumentation issues (#521-524) for batch latency, queue depth, schema info, and output send latency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/dashboard.html`:
- Line 168: The close button element with class "insp-close" and click handler
"@click=\"sel=null;selData=null\"" currently lacks an explicit type so it
defaults to type="submit"; update the button to include type="button" to prevent
accidental form submissions—locate the <button class="insp-close" ...> element
and add the type="button" attribute while keeping the existing `@click` handler
and text unchanged.
- Around line 315-317: The two fetch calls (fetch('/api/pipelines') and
fetch('/api/system')) currently call r.json() unconditionally; update both
promise chains to check the Response.ok flag first (e.g., inside the then
callback for each fetch) and handle non-OK responses by returning null or
throwing a descriptive error so r.json() is only called for successful
responses; ensure the fetch('/api/system') branch still has its outer catch to
return null and that the pipelines fetch similarly handles non-200 statuses to
avoid parsing unexpected content.
- Around line 347-352: The current loadConfig() sets self.cfgHtml to a truthy
error message in the catch, which prevents the existing "if(!self.cfgHtml)"
guard from retrying; change the catch handler to keep cfgHtml falsy (e.g., set
self.cfgHtml = '' or null) and clear cfgPath (self.cfgPath = '') so future
retries can run, optionally log the error for debugging; alternatively implement
a separate retry flag (e.g., self.cfgLoadFailed = true) and use that instead of
populating cfgHtml on failure.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 85c7c72c-10dc-4a59-ab3e-71038e24bd4f

📥 Commits

Reviewing files that changed from the base of the PR and between c9306b2 and b458e34.

📒 Files selected for processing (1)

src/dashboard.html

coderabbitai · 2026-03-31T23:15:06Z

+    <div class="inspector" x-transition.opacity>
+      <div class="insp-header">
+        <span class="insp-title" x-text="selType+' \u2014 '+(selData.name||'SQL')"></span>
+        <button class="insp-close" @click="sel=null;selData=null">&times; close</button>


⚠️ Potential issue | 🟡 Minor

Add type="button" to the close button.

Static analysis flagged this. Without an explicit type, buttons default to type="submit" which can cause unintended form submissions if this is ever wrapped in a form.

Proposed fix

- <button class="insp-close" `@click`="sel=null;selData=null">× close</button> + <button type="button" class="insp-close" `@click`="sel=null;selData=null">× close</button>

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

<button class="insp-close" @click="sel=null;selData=null">× close</button>

<button type="button" class="insp-close" `@click`="sel=null;selData=null">× close</button>

🧰 Tools

🪛 HTMLHint (1.9.2)

[warning] 168-168: The type attribute must be present on elements.

(button-type-require)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` at line 168, The close button element with class "insp-close" and click handler "@click=\"sel=null;selData=null\"" currently lacks an explicit type so it defaults to type="submit"; update the button to include type="button" to prevent accidental form submissions—locate the <button class="insp-close" ...> element and add the type="button" attribute while keeping the existing `@click` handler and text unchanged.

coderabbitai · 2026-03-31T23:15:06Z

+    Promise.all([
+      fetch('/api/pipelines').then(function(r){return r.json()}),
+      fetch('/api/system').then(function(r){return r.json()}).catch(function(){return null})


🧹 Nitpick | 🔵 Trivial

Consider checking r.ok before parsing JSON.

If the server returns a non-200 status (e.g., 502 from a proxy), r.json() may throw or return unexpected content. The outer .catch() handles this, but explicit status checking would be more robust.

Suggested improvement

Promise.all([ - fetch('/api/pipelines').then(function(r){return r.json()}), - fetch('/api/system').then(function(r){return r.json()}).catch(function(){return null}) + fetch('/api/pipelines').then(function(r){if(!r.ok)throw new Error('fetch failed');return r.json()}), + fetch('/api/system').then(function(r){return r.ok?r.json():null}).catch(function(){return null}) ]).then(function(r){

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` around lines 315 - 317, The two fetch calls (fetch('/api/pipelines') and fetch('/api/system')) currently call r.json() unconditionally; update both promise chains to check the Response.ok flag first (e.g., inside the then callback for each fetch) and handle non-OK responses by returning null or throwing a descriptive error so r.json() is only called for successful responses; ensure the fetch('/api/system') branch still has its outer catch to return null and that the pipelines fetch similarly handles non-200 statuses to avoid parsing unexpected content.

coderabbitai · 2026-03-31T23:15:07Z

+  loadConfig(){
+    var self=this;
+    fetch('/api/config').then(function(r){return r.json()}).then(function(d){
+      self.cfgPath=d.path||'';self.cfgHtml=hlYaml(d.raw_yaml||'(no config)');
+    }).catch(function(){self.cfgHtml='<span style="color:var(--t4)">could not load</span>'});
  }


⚠️ Potential issue | 🟠 Major

Config fetch still won't retry after failure.

On error, cfgHtml is set to '<span>...</span>' (truthy), so the guard at line 324 (if(!self.cfgHtml)) prevents future retry attempts. The prior review suggested keeping cfgHtml falsy on error to allow recovery.

Proposed fix

loadConfig(){ var self=this; fetch('/api/config').then(function(r){return r.json()}).then(function(d){ self.cfgPath=d.path||'';self.cfgHtml=hlYaml(d.raw_yaml||'(no config)'); - }).catch(function(){self.cfgHtml='<span style="color:var(--t4)">could not load</span>'}); + }).catch(function(){/* leave cfgHtml falsy to retry on next poll */}); }

Or add a separate flag for config load attempts with retry backoff.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/dashboard.html` around lines 347 - 352, The current loadConfig() sets self.cfgHtml to a truthy error message in the catch, which prevents the existing "if(!self.cfgHtml)" guard from retrying; change the catch handler to keep cfgHtml falsy (e.g., set self.cfgHtml = '' or null) and clear cfgPath (self.cfgPath = '') so future retries can run, optionally log the error for debugging; alternatively implement a separate retry flag (e.g., self.cfgLoadFailed = true) and use that instead of populating cfgHtml on failure.

Charts: - Lines now extend to right edge (removed 20% headroom that left a gap before "now") - Taller charts: 90px → 120px - 3-column grid with responsive breakpoints - "Throughput" renamed to "Input Bytes" (clearer meaning) - Y-axis padding increased for label clarity System panel: - Shows system-level memory (used/total) instead of duplicating RSS - Added swap usage, global CPU%, cgroup memory limit (for K8s) - Removed RSS from system grid (already in metrics chart) - Better contrast on section headings and chart labels Pipeline: - Nodes expand to fill page width (removed max-width constraint) API (/api/system): - Added: global_cpu_percent, available_memory_bytes, used_memory_bytes, total_swap_bytes, used_swap_bytes, cgroup_memory_limit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

strawgate · 2026-03-31T23:28:57Z

Triage: recommend closing as superseded

This PR is 151 commits behind master and modifies files that no longer exist (src/diagnostics.rs, src/main.rs, src/dashboard.html) — the codebase was restructured into crates/. The equivalent files are now:

crates/logfwd-io/src/diagnostics.rs (1126 lines with HTTP server + API endpoints)
crates/logfwd-io/src/dashboard.html (138-line basic dashboard already on master)
crates/logfwd/src/main.rs (completely rewritten)

Already on master:

Diagnostics HTTP server with /health, /api/pipelines, /api/config endpoints
Basic HTML dashboard served at /
Pipeline metrics (lines/bytes/errors per component)
Memory stats via jemalloc

Net-new from this PR that could be extracted into a fresh PR:

Rich dashboard — live charts with Canvas, inspector panel, pipeline-centric layout (357 lines)
sysinfo crate — system panel with CPU, memory, disk, FD metrics
Adaptive chart time window — smooth window growth instead of fixed sizes

Review feedback to address in the fresh PR (from CodeRabbit):

Sanitize/redact secrets in /api/config endpoint
Fix disk metrics computation (was using FD stats)
Vendor Alpine.js or pin with SRI hash (no bare CDN)
Fix timing bug in rate computation

Recommend closing and opening a fresh PR for the enhanced dashboard against the current codebase.

coderabbitai Bot requested changes Mar 31, 2026

View reviewed changes

github-actions Bot reviewed Mar 31, 2026

View reviewed changes

strawgate and others added 3 commits March 31, 2026 16:42

coderabbitai Bot requested changes Mar 31, 2026

View reviewed changes

github-actions Bot reviewed Mar 31, 2026

View reviewed changes

strawgate and others added 2 commits March 31, 2026 17:29

chore: gitignore Playwright MCP artifacts and screenshots

c9306b2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot requested changes Mar 31, 2026

View reviewed changes

strawgate closed this Apr 1, 2026

		let _handle = server.start();

		std::thread::sleep(std::time::Duration::from_millis(100));

		<div class="tab-content" id="tab-config">
		<div class="panel"><div class="panel-title">Configuration</div><div id="configContent"><span style="color:var(--text3)">Loading...</span></div></div>

	<button class="insp-close" @click="sel=null;selData=null">× close</button>
	<button type="button" class="insp-close" `@click`="sel=null;selData=null">× close</button>

Conversation

strawgate commented Mar 31, 2026

Summary

Test plan

Uh oh!

coderabbitai Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading