fix: NaN panic in metric sort + warn on unsupported input formats#572
fix: NaN panic in metric sort + warn on unsupported input formats#572strawgate wants to merge 5 commits into
Conversation
The JSON and console output sinks dispatched on the column name suffix
("_int", "_float", "_str") to decide how to serialize values. This was
fragile after SQL transforms (which produce columns without suffixes)
and caused silent data loss when a field had multiple type variants
across rows (the old dedup picked one column and dropped the rest).
Changes:
- ColInfo now groups all typed variants of a field (e.g. status_int +
status_str) instead of deduplicating to one winner
- write_row_json iterates variants per row, picks first non-null,
dispatches on Arrow DataType (Int64 → number, Float64 → number,
anything else → quoted string)
- write_console uses the same DataType dispatch
- Extracted write_json_string and write_json_value helpers for reuse
This matches how OtlpSink already works (DataType dispatch since day 1).
Fixes #430 (SQL computed columns output empty strings)
Fixes #444 (float 1.0 output as integer)
Fixes #428 (boolean values coerced to quoted strings)
Fixes #442 (JSON null values silently dropped from multi-type fields)
Fixes #404 (mixed int/float silently dropped)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add criterion benchmark for the JSON serialization hot path (build_col_infos + write_row_json on 10K rows). Measures 3.88M rows/sec on Apple Silicon, ~48% of end-to-end time. Make ColInfo, build_col_infos, write_row_json pub (was pub(crate)) so the bench crate can access them. Also removes stale RUSTSEC-2024-0437 advisory ignore from deny.toml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use itoa/ryu for zero-allocation numeric formatting in write_json_value and write_console (both already transitive deps via serde_json/ureq) - Fix write_console well-known field skip to check ALL variant indices, not just primary_idx — prevents duplicate emission when find_col matched a different variant than the primary - Remove dead primary_idx field from ColInfo (no longer referenced) - Make ColInfo fields pub for bench crate access Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix partial_cmp().unwrap() panic on NaN scores in metric_history.rs by falling back to Equal ordering (#502) - Explicitly match Json/Raw in pipeline input format handling; warn and treat as raw for unsupported formats (Logfmt, Syslog) instead of silently passing through to JSON scanner (#499) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WalkthroughThis PR adds a Criterion benchmark measuring JSON serialization, refactors JSON output to group multiple typed column variants per field and to pick the first non-null variant per row, adds RFC‑8259 string escaping and datatype-driven value serialization (including non-finite float -> null), introduces itoa/ryu for numeric formatting, updates stdout sink to select per-row variants and skip well‑known fields by variant indices, changes metric history sorting to use f64::total_cmp, and tightens pipeline input format handling by rejecting unsupported formats earlier. Possibly related PRs
✨ Finishing Touches⚔️ Resolve merge conflicts
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/logfwd-io/src/metric_history.rs`:
- Line 116: The current comparator uses a.t.partial_cmp(&b.t).unwrap_or(Equal)
which breaks sort's total-order requirement; replace it to use the IEEE-754
total ordering via f64::total_cmp (e.g., call a.t.total_cmp(&b.t) or
f64::total_cmp(&a.t, &b.t)) in the comparator passed to all.sort_by so NaN and
other edge cases are handled correctly while keeping the sort stable and
well-defined.
In `@crates/logfwd-output/src/stdout.rs`:
- Around line 207-213: The Float64 branch in the DataType::Float64 match uses
ryu::format_finite which is UB for NaN/Infinity; update this branch (the
DataType::Float64 match arm that calls
arr.as_primitive::<arrow::datatypes::Float64Type>().value(row) and then
self.buf.extend_from_slice(...format_finite...)) to check v.is_finite() first
and only call ryu::Buffer::new().format_finite(v) for finite values; for
non-finite values follow the same fallback used in write_json_value (e.g., write
"null" or another safe representation) instead of calling format_finite.
In `@crates/logfwd/src/pipeline.rs`:
- Around line 502-508: The eprintln! in input_poll_loop causes hot-path
allocations and stderr I/O for every chunk when unsupported formats like
Logfmt/Syslog/Console are configured; update build_input_state to validate the
parsed input format (in the function build_input_state) and return an Err(...)
if the format is Logfmt, Syslog, or Console so these cases fail at construction
time, then remove the catch-all `_` arm in input_poll_loop (the branch currently
extending input.buf and printing the warning) so runtime no longer does
per-chunk logging or allocation for unsupported formats.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f5aef861-2138-4ff0-aa24-d60ddaa3a491
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
crates/logfwd-bench/benches/pipeline.rscrates/logfwd-io/src/metric_history.rscrates/logfwd-output/Cargo.tomlcrates/logfwd-output/src/lib.rscrates/logfwd-output/src/stdout.rscrates/logfwd/src/pipeline.rs
There was a problem hiding this comment.
Two correctness issues should be fixed before merge:
-
crates/logfwd-io/src/metric_history.rs:116— usinga.t.partial_cmp(&b.t).unwrap_or(Ordering::Equal)creates a non-total comparator when NaN is present, sosort_bycan leave finite timestamps out of order. This breaks the intended chronological merge beforededup_by. Use a total order (for examplef64::total_cmp) and explicitly decide NaN placement. -
crates/logfwd-output/src/stdout.rs:212—ryu::Buffer::format_finite(v)is called withoutv.is_finite()check forFloat64. For NaN/±Inf, ryu produces an unspecified numeric string, so console output can silently emit incorrect values. Guard non-finite values and emit a stable representation (null/NaN/inf) consistent with your output policy.
Optional follow-up: crates/logfwd/src/pipeline.rs:503 currently warns on every chunk for unsupported formats, which can flood stderr under load; consider warning once per input source or rate-limiting.
Warning
⚠️ Firewall blocked 1 domain
The following domain was blocked by the firewall during workflow execution:
docs.rs
To allow these domains, add them to the network.allowed list in your workflow frontmatter:
network:
allowed:
- defaults
- "docs.rs"See Network Configuration for more information.
Note
🔒 Integrity filtering filtered 2 items
Integrity filtering activated and filtered the following items during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.
What is this? | From workflow: AI: PR Review
Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.
- use f64::total_cmp for metric history sorting with NaN - guard stdout Float64 formatting for non-finite values - reject unsupported input formats at pipeline build time Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed all 3 unresolved review threads and pushed commit
Validation
Note 🔒 Integrity filtering filtered 1 itemIntegrity filtering activated and filtered the following item during workflow execution.
What is this? | From workflow: AI: Address PR Review Feedback Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/logfwd-output/src/stdout.rs`:
- Around line 166-172: The console output skips entire grouped columns when any
variant matches ts_idx/level_idx/msg_idx, causing mixed-type fields (e.g.,
msg_str vs msg_int) to be dropped; update the logic in stdout.rs around the
col.field_name == "_raw" guard to resolve well-known fields by scanning cols for
the first non-null variant the same way as in crates/logfwd-output/src/lib.rs
(lines ~226-255) — pick the first valid variant index for timestamp, level and
message from the group and use those resolved indices for rendering, then skip
by the group name (column.field_name) instead of by individual variant; apply
the same fix to the similar block at the other occurrence (around lines
177-185).
In `@crates/logfwd/src/pipeline.rs`:
- Around line 499-507: Add a regression test that asserts the
constructor/runtime format invariant by verifying build_input_state rejects
Logfmt, Syslog, and Console formats so input_poll_loop cannot hit the
unreachable! branch; specifically, write a small config-level test that calls
build_input_state (and any constructor wrapper used at runtime) with
input.format set to Format::Logfmt, Format::Syslog, and Format::Console and
expects an Err for each case, and mirror the same assertions for the second site
referenced around the other unreachable! (the block around the lines 628-633) to
ensure both code paths fail fast rather than panicking in input_poll_loop.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6b789098-fc7b-4a2f-bca0-50037dd5ed6c
📒 Files selected for processing (3)
crates/logfwd-io/src/metric_history.rscrates/logfwd-output/src/stdout.rscrates/logfwd/src/pipeline.rs
| // Skip the well-known columns and _raw. Check ALL variant | ||
| // indices — find_col may have matched a different variant | ||
| // (e.g. message_str vs message_int). | ||
| if col.field_name == "_raw" | ||
| || col.variants.iter().any(|(idx, _)| { | ||
| Some(*idx) == ts_idx || Some(*idx) == level_idx || Some(*idx) == msg_idx | ||
| }) |
There was a problem hiding this comment.
Console output can drop mixed-type well-known fields.
This now skips the whole grouped field when any variant matches ts_idx / level_idx / msg_idx, but the timestamp/level/message rendering above still reads only one fixed column index. In a batch with both msg_str and msg_int, rows that populate only msg_int will print neither the main message nor msg=.... Resolve the well-known fields from cols with the same first-non-null-variant logic used in crates/logfwd-output/src/lib.rs:226-255, then skip by group.
Also applies to: 177-185
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/logfwd-output/src/stdout.rs` around lines 166 - 172, The console
output skips entire grouped columns when any variant matches
ts_idx/level_idx/msg_idx, causing mixed-type fields (e.g., msg_str vs msg_int)
to be dropped; update the logic in stdout.rs around the col.field_name == "_raw"
guard to resolve well-known fields by scanning cols for the first non-null
variant the same way as in crates/logfwd-output/src/lib.rs (lines ~226-255) —
pick the first valid variant index for timestamp, level and message from the
group and use those resolved indices for rendering, then skip by the group name
(column.field_name) instead of by individual variant; apply the same fix to the
similar block at the other occurrence (around lines 177-185).
| Format::Json | Format::Raw => { | ||
| input.buf.extend_from_slice(&chunk); | ||
| } | ||
| _ => { | ||
| unreachable!( | ||
| "unsupported format should be rejected in build_input_state: {:?}", | ||
| input.format | ||
| ); | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add regression coverage for the constructor/runtime format invariant.
input_poll_loop now depends on build_input_state rejecting Logfmt / Syslog / Console; if that guard regresses later, the failure mode becomes an input-thread panic via unreachable!. Please lock this down with a small config-level test that these input formats return Err.
Also applies to: 628-633
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/logfwd/src/pipeline.rs` around lines 499 - 507, Add a regression test
that asserts the constructor/runtime format invariant by verifying
build_input_state rejects Logfmt, Syslog, and Console formats so input_poll_loop
cannot hit the unreachable! branch; specifically, write a small config-level
test that calls build_input_state (and any constructor wrapper used at runtime)
with input.format set to Format::Logfmt, Format::Syslog, and Format::Console and
expects an Err for each case, and mirror the same assertions for the second site
referenced around the other unreachable! (the block around the lines 628-633) to
ensure both code paths fail fast rather than panicking in input_poll_loop.
Summary
Two bug fixes:
partial_cmp().unwrap()in metric_history.rs panics if a NaN timestamp appears. Fixed withunwrap_or(Ordering::Equal).Json/Rawin the passthrough arm and warns on unsupported formats.Also closes via audit verification:
write_json_valueunwrap_or(0)no longer exists after PipelineMachine rewriteTest plan
cargo clippy -- -D warningscleanFixes #502, #499
🤖 Generated with Claude Code