Skip to content

audit: fix scan_error metrics, write_row_json bugs, strengthen OTLP proofs#416

Merged
strawgate merged 4 commits into
masterfrom
audit/dead-code-cleanup
Mar 31, 2026
Merged

audit: fix scan_error metrics, write_row_json bugs, strengthen OTLP proofs#416
strawgate merged 4 commits into
masterfrom
audit/dead-code-cleanup

Conversation

@strawgate
Copy link
Copy Markdown
Owner

@strawgate strawgate commented Mar 31, 2026

Fixes found during codebase audit

scan_error metrics lost (#339)

inc_scan_error() and inc_dropped_batch() missing from async flush_batch — lost when sync run() was deleted.

write_row_json RFC 8259 violations (#405, #406)

  • Control characters (0x00-0x1F) now escaped as \uXXXX
  • f64 infinity/NaN now emit null instead of invalid inf/nan
  • 9 unit tests added covering every code path

OTLP encoder proofs strengthened (#365)

  • encode_fixed64 proof verifies tag bytes, not just size
  • encode_varint_field proof verifies both tag and value bytes

🤖 Generated with Claude Code

The async flush_batch path was missing inc_scan_error() and
inc_dropped_batch() on scan failure — the old sync run() had them
but they were lost when we deleted it. Found during dead code audit
(inc_scan_error appeared unused but should have been called).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 31, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 823566fc-d269-4838-9c22-a80ada66d44e

📥 Commits

Reviewing files that changed from the base of the PR and between d9f9df2 and 663d40c.

📒 Files selected for processing (1)
  • crates/logfwd/src/pipeline.rs

Walkthrough

Pipeline flush behavior: Pipeline::flush_batch now increments scan_error and dropped_batch when scanner scan(...) fails inside block_in_place, and it also increments dropped_batch when output.send_batch(...) fails (in addition to existing output error metrics). crates/logfwd-output: adds serde_json as a dev-dependency and changes JSON rendering so non-finite floats emit null, string escaping is expanded to RFC 8259-compliant escapes for control characters, and a new test module verifies JSON rendering/escaping. crates/logfwd-core: Kani verification tests now validate exact encoded tag and varint bytes.

Possibly related PRs

  • strawgate/memagent PR 208: Modifies crates/logfwd/src/pipeline.rs scanner error handling and metrics updates in flush_batch, directly overlapping the same error paths.
  • strawgate/memagent PR 221: Adjusts async flush_batch behavior and error handling for scan/transform/send failures, intersecting with dropped-batch and metric changes.
  • strawgate/memagent PR 399: Refactors batching/flush logic in pipeline.rs, touching the same flush and batching code paths modified here.

Comment @coderabbitai help to get the list of available commands and usage tips.

Fix two RFC 8259 violations in write_row_json:
- Control characters (0x00-0x1F) now escaped as \uXXXX instead of
  emitted raw (#405)
- f64 infinity and NaN now emit null instead of 'inf'/'-inf'/'nan'
  which are not valid JSON (#406)

Add 9 unit tests covering every write_row_json code path:
- basic string, integer, float fields
- null values skipped
- string escaping: quotes, backslash, newline/tab/cr, control chars
- float infinity/NaN → null
- multiple fields produce valid JSON

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

No actionable review feedback was found for this PR.

  • Unresolved review threads: 0
  • Outdated unresolved threads requiring action: 0
  • Code changes made: none
  • Thread replies posted: none
Validation
  • Attempted just ci, but just is not available in this environment (command not found).
  • Since no code changes were required, no further test execution was needed.

Note

🔒 Integrity filtering filtered 1 item

Integrity filtering activated and filtered the following item during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.


What is this? | From workflow: AI: Address PR Review Feedback

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

…ze (#365)

encode_fixed64 proof now verifies:
- Size = tag_len + 8 (was already verified)
- Tag bytes match encode_varint(field_number << 3 | wire_type) (NEW)
- Payload bytes decode to original value (was already verified)

encode_varint_field proof now verifies:
- Size = tag_len + val_len (was already verified)
- Tag bytes match expected encoding (NEW)
- Value bytes match expected encoding (NEW)

Closes the "length-only" gap identified in #365.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@strawgate strawgate changed the title fix: restore scan_error metrics lost during sync→async migration audit: fix scan_error metrics, write_row_json bugs, strengthen OTLP proofs Mar 31, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/logfwd-output/src/lib.rs (2)

172-174: ⚠️ Potential issue | 🟠 Major

Escape JSON object keys as well as values.

Values are now RFC-compliant, but keys are still emitted unescaped (field_name.as_bytes()), which can still generate invalid JSON for special characters in column names. Reuse the same escaping path for both key and string value emission.

Also applies to: 199-210

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-output/src/lib.rs` around lines 172 - 174, The code currently
writes column keys with out.extend_from_slice(col.field_name.as_bytes()) which
emits unescaped JSON keys; update the serializer to run col.field_name through
the same escaping path used for string values (i.e., call the existing
string-escaping/emit routine used elsewhere for values instead of directly using
as_bytes()), and replace the analogous occurrences (including the other block
referencing col.field_name) so both object keys and values use the identical
RFC-compliant escaping routine.

177-193: ⚠️ Potential issue | 🔴 Critical

Prevent panic on suffix/type mismatch in write_row_json.

as_primitive::<Int64Type/Float64Type>() panics when suffix-based dispatch from field names (e.g., status_int) disagrees with actual Arrow type. Enrichment creates all JSON key fields as Utf8, so names like status_int encode the desired type in the string, not the schema. When write_row_json dispatches on "int" but the array is Utf8, the downcast panics. This violates: "Public APIs must return Result, never panic on user input".

Add a runtime type check before downcasting:

-        match col.type_suffix.as_str() {
-            "int" => {
+        match (col.type_suffix.as_str(), arr.data_type()) {
+            ("int", DataType::Int64) => {
                 let arr = arr.as_primitive::<arrow::datatypes::Int64Type>();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-output/src/lib.rs` around lines 177 - 193, In write_row_json,
avoid panics from `arr.as_primitive::<...>()` by checking the runtime Arrow type
before downcasting: when handling the "int" arm of `col.type_suffix` verify
`arr.data_type()` is Int64 (or downcast via
`as_any().downcast_ref::<PrimitiveArray<Int64Type>>()`) and only then call
`as_primitive::<arrow::datatypes::Int64Type>()` and write the numeric value; if
the runtime type mismatches (e.g. Utf8 produced by enrichment) fall back to a
safe path such as treating the cell as a string/nullable value (downcast to
`StringArray` or emit JSON null when invalid) instead of unconditionally
downcasting and panicking; do the analogous runtime-type check for the "float"
arm (Float64) and preserve the existing is_finite logic when the type matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/logfwd-output/src/lib.rs`:
- Around line 172-174: The code currently writes column keys with
out.extend_from_slice(col.field_name.as_bytes()) which emits unescaped JSON
keys; update the serializer to run col.field_name through the same escaping path
used for string values (i.e., call the existing string-escaping/emit routine
used elsewhere for values instead of directly using as_bytes()), and replace the
analogous occurrences (including the other block referencing col.field_name) so
both object keys and values use the identical RFC-compliant escaping routine.
- Around line 177-193: In write_row_json, avoid panics from
`arr.as_primitive::<...>()` by checking the runtime Arrow type before
downcasting: when handling the "int" arm of `col.type_suffix` verify
`arr.data_type()` is Int64 (or downcast via
`as_any().downcast_ref::<PrimitiveArray<Int64Type>>()`) and only then call
`as_primitive::<arrow::datatypes::Int64Type>()` and write the numeric value; if
the runtime type mismatches (e.g. Utf8 produced by enrichment) fall back to a
safe path such as treating the cell as a string/nullable value (downcast to
`StringArray` or emit JSON null when invalid) instead of unconditionally
downcasting and panicking; do the analogous runtime-type check for the "float"
arm (Float64) and preserve the existing is_finite logic when the type matches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 330ec3f9-457d-44fa-9ef4-dea9a45b9ca0

📥 Commits

Reviewing files that changed from the base of the PR and between a42e06b and c19fd84.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • crates/logfwd-output/Cargo.toml
  • crates/logfwd-output/src/lib.rs

Same bug pattern as scan_error — output.send_batch() failure called
output_error() but not inc_dropped_batch(). Found during robustness
audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/logfwd-core/src/otlp.rs`:
- Around line 755-756: Rename the abbreviated identifiers to conform to naming
rules: change tag_val to tag_value and val_buf to value_buf (and any other
occurrences of *_val or val_buf in this module, e.g., around the code near the
other mention at the 799-800 region). Update all usages in the surrounding
functions/blocks (e.g., where tag_val is computed and passed to varint_len and
where val_buf is created/used) to the new names so compilation still succeeds
and retain existing behavior; ensure imports or pattern matches referencing
these identifiers are also updated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 84bd45fb-842a-40c2-80ce-e8f6e5e2b1fa

📥 Commits

Reviewing files that changed from the base of the PR and between c19fd84 and d9f9df2.

📒 Files selected for processing (1)
  • crates/logfwd-core/src/otlp.rs

Comment on lines +755 to +756
let tag_val = ((field_number as u64) << 3) | 1;
let tag_len = varint_len(tag_val);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Rename *_val identifiers to satisfy repository naming rules.

Please replace tag_val/val_buf with non-abbreviated names (for example, tag_value/value_buf).

As per coding guidelines: "Allow only specific abbreviations in names: buf, pos, len, idx, cfg, ctx. No other abbreviations."

Also applies to: 799-800

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/otlp.rs` around lines 755 - 756, Rename the
abbreviated identifiers to conform to naming rules: change tag_val to tag_value
and val_buf to value_buf (and any other occurrences of *_val or val_buf in this
module, e.g., around the code near the other mention at the 799-800 region).
Update all usages in the surrounding functions/blocks (e.g., where tag_val is
computed and passed to varint_len and where val_buf is created/used) to the new
names so compilation still succeeds and retain existing behavior; ensure imports
or pattern matches referencing these identifiers are also updated.

@strawgate strawgate merged commit c42b4cf into master Mar 31, 2026
13 of 14 checks passed
@strawgate strawgate deleted the audit/dead-code-cleanup branch March 31, 2026 17:05
strawgate added a commit that referenced this pull request Mar 31, 2026
As per PR feedback, this issue is already addressed in the audit branch (PR #416).
Restoring files to their original state and closing this effort.

Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes due to one correctness issue that can emit invalid JSON output.

Note

🔒 Integrity filtering filtered 1 item

Integrity filtering activated and filtered the following item during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.

  • pr:#416 (pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)

What is this? | From workflow: AI: PR Review

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant