fix: correct 6 documentation accuracy bugs by strawgate · Pull Request #786 · strawgate/fastforward

strawgate · 2026-04-03T02:58:20Z

Summary

Six cases where the docs say one thing and the runtime does another.

Bugs fixed

Issue	Bug	Fix
#718	README marks elasticsearch as ✅ Implemented but config rejects it	Changed to 🔜 Planned
#713	README advanced-layout uses singular `input:`/`output:` keys that silently fail	Fixed to plural `inputs:`/`outputs:`
#706	CONFIG_REFERENCE documents `_file_str` column that doesn't exist at runtime	Removed the row
#714	`generator` input type completely undocumented	Added to CONFIG_REFERENCE
#719	CRI timestamp documented as `_time_ns_int` (int64 ns) but actual is `_timestamp_str` (RFC 3339)	Fixed docs to match code
#711	COLUMN_NAMING.md implies C3 cross-batch stability is available (only C1 is implemented)	Clarified status, referenced #625

Files changed

README.md — elasticsearch status, advanced-layout example keys
docs/CONFIG_REFERENCE.md — removed phantom column, added generator, fixed CRI timestamp
docs/COLUMN_NAMING.md — clarified C1 vs C3 stability status

Fixes #718, #713, #706, #714, #719, #711

- #718: README marks elasticsearch as '✅ Implemented' but config rejects it as 'not yet implemented' → change to '🔜 Planned' - #713: README advanced-layout examples use singular input:/output: keys which silently fail → fix to plural inputs:/outputs: - #706: CONFIG_REFERENCE documents _file_str column that doesn't exist at runtime → remove the row - #714: generator input type completely undocumented → add to CONFIG_REFERENCE input types table - #719: CRI timestamp documented as _time_ns_int (int64 ns) but actual column is _timestamp_str (RFC 3339 string) → fix docs to match code - #711: COLUMN_NAMING.md implies cross-batch schema stability (C3) is available when only per-batch fidelity (C1) is implemented → clarify and reference #625 Fixes #718, #713, #706, #714, #719, #711 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-04-03T02:58:31Z

Walkthrough

Updates only documentation across three files. README.md: renamed pipeline keys in the “Advanced — multiple named pipelines” YAML example from input → inputs and output → outputs for errors and debug, and added a trailing space to the elasticsearch status cell (✅ Implemented ). docs/COLUMN_NAMING.md: clarifies that only per-batch type fidelity (C1) is implemented and that full cross-batch schema stability (C3) is not yet implemented, with notes on the TableProvider requirements. docs/CONFIG_REFERENCE.md: documents a new generator input type, removes _file_str from special columns, replaces _time_ns_int with _timestamp_str, and updates example SQL to use _source_path_str with a note about source path injection not yet implemented.

Possibly related PRs

fix: address post-merge CodeRabbit feedback on #684 #759: Edits docs/COLUMN_NAMING.md and the TableProvider/C3 discussion, matching the C1 vs C3 clarification in this PR.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
High-Quality Rust Practices	✅ Passed	The PR contains no Rust source code changes (.rs files), consisting entirely of documentation updates to README.md and docs/ markdown files. The High-Quality Rust Practices check does not apply to documentation-only modifications.
Formal Verification Coverage	✅ Passed	PR modifies only documentation files with no changes to Rust source code. No new public functions introduced, so formal verification coverage requirements do not apply.
Documentation Thoroughly Updated	✅ Passed	PR updates CONFIG_REFERENCE.md and COLUMN_NAMING.md per guidance; no conditions triggered for other documentation files.
Crate Boundary And Dependency Integrity	✅ Passed	PR contains only documentation updates to README.md and docs files with no modifications to Cargo.toml, Rust source code, crate structures, or dependency declarations.
Maintainer Fitness	✅ Passed	Documentation-only PR fixing six tightly-related bugs across README, COLUMN_NAMING.md, and CONFIG_REFERENCE.md with clear issue traceability and no semantic code changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

README.md (1)
192-194: ⚠️ Potential issue | 🔴 Critical

Remove incorrect CRI column documentation from README.md.

Lines 192-194 document non-existent Arrow columns. CRI input processing injects _timestamp and _stream as JSON metadata (not as separate Arrow columns), and _file_str is not created as an Arrow column by the scanner. The column name _time_ns_int does not exist anywhere in the codebase. Refer to CONFIG_REFERENCE.md lines 316-317 for accurate documentation of actual CRI columns.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 192 - 194, Remove the three incorrect README table
rows that document `_file_str`, `_time_ns_int`, and `_stream_str`; instead
update the CRI section to state that CRI processing injects `_timestamp` and
`_stream` as JSON metadata (not Arrow columns) and that `_file_str` and
`_time_ns_int` are not created by the scanner, and point readers to the
CONFIG_REFERENCE.md CRI section for the authoritative column/metadata list.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/CONFIG_REFERENCE.md`:
- Around line 316-317: The docs still reference and join on the removed
`_file_str`; update the examples at the affected snippets to stop using
`_file_str` and instead use the correct remaining fields (e.g., `_stream_str`
for stream name or `_timestamp_str` for time) or construct paths using the new
canonical field(s) used elsewhere in the doc; search for occurrences of
`_file_str` in the examples and replace the join logic with usage of
`_stream_str` (or the appropriate field) and adjust surrounding text to remove
mention of `_file_str`.

In `@README.md`:
- Line 216: The README row incorrectly labels Elasticsearch as "🔜 Planned"
despite implemented code; update the table entry for `elasticsearch` to indicate
it is implemented (e.g., "✅ Implemented" or "Implemented") and keep or slightly
adjust the description to match existing behavior (mentioning Elasticsearch bulk
API with retry logic and per-document error handling) to reflect that
`ElasticsearchSink`, `OutputType::Elasticsearch`, and deserialization are
present and active.

---

Outside diff comments:
In `@README.md`:
- Around line 192-194: Remove the three incorrect README table rows that
document `_file_str`, `_time_ns_int`, and `_stream_str`; instead update the CRI
section to state that CRI processing injects `_timestamp` and `_stream` as JSON
metadata (not Arrow columns) and that `_file_str` and `_time_ns_int` are not
created by the scanner, and point readers to the CONFIG_REFERENCE.md CRI section
for the authoritative column/metadata list.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: a1984235-7a75-4447-9df6-9a6fbaed76b0

📥 Commits

Reviewing files that changed from the base of the PR and between 8a9c452 and 0aa0337.

📒 Files selected for processing (3)

README.md
docs/COLUMN_NAMING.md
docs/CONFIG_REFERENCE.md

- Replace _file_str with _source_path_str in k8s enrichment SQL examples and note the column requires source path injection (not yet implemented) - Mark Elasticsearch as Implemented in README (ElasticsearchSink exists with bulk API, retry logic, and per-document error handling) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

README.md (1)
192-202: ⚠️ Potential issue | 🔴 Critical

Critical: CRI column names remain incorrect in README.

PR objectives state that issues #706 and #719 fixed documentation bugs where _file_str doesn't exist at runtime and _time_ns_int should be _timestamp_str (RFC 3339 format). Those fixes were applied to CONFIG_REFERENCE.md but not here. Lines 192-194 and the SQL example on line 199 still document the incorrect column names, leaving README.md inconsistent with the corrected CONFIG_REFERENCE.md.
📋 Proposed fix based on PR objectives `#706` and `#719`
 | Column | Description |
 |--------|-------------|
-| `_file_str` | Absolute path of the source log file |
-| `_time_ns_int` | CRI timestamp in nanoseconds |
+| `_timestamp_str` | CRI timestamp in RFC 3339 format |
 | `_stream_str` | `stdout` or `stderr` |
 
-Use `_file_str` to identify which pod and container generated a record, or filter by stream:
+Filter by stream or timestamp:
 
 ```sql
-SELECT _time_ns_int, _stream_str, level_str, message_str
+SELECT _timestamp_str, _stream_str, level_str, message_str
 FROM logs
 WHERE _stream_str = 'stderr'
   AND level_str = 'ERROR'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 192 - 202, Update README.md to match
CONFIG_REFERENCE.md by replacing incorrect CRI column names: change `_file_str`
and `_time_ns_int` usages to the runtime-correct `_timestamp_str` (RFC 3339) and
the actual file/identifier column used in runtime (remove or replace `_file_str`
with the correct column documented elsewhere), and update the SQL example SELECT
(replace `_time_ns_int` with `_timestamp_str`) so the `SELECT _timestamp_str,
_stream_str, level_str, message_str FROM logs WHERE _stream_str = 'stderr' AND
level_str = 'ERROR'` example reflects the corrected column names.

♻️ Duplicate comments (2)

docs/CONFIG_REFERENCE.md (2)

492-492: ⚠️ Potential issue | 🟠 Major

Duplicate issue: unavailable column in SQL example.

Same problem as line 372-373. The inline comment warns about "not yet implemented" but the SQL still uses _source_path_str, which will fail or return unexpected results.

📝 Suggested documentation fix

-      LEFT JOIN k8s k ON l._source_path_str = k.log_path_prefix  -- requires source path injection
+      -- LEFT JOIN k8s k ON l._source_path_str = k.log_path_prefix  -- source path injection not yet implemented

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/CONFIG_REFERENCE.md` at line 492, The SQL example uses a non-existent
column `_source_path_str` in the LEFT JOIN condition (LEFT JOIN k8s k ON
l._source_path_str = k.log_path_prefix) which will fail; edit the example to
either remove that join until source path injection is implemented or replace
`_source_path_str` with the actual available column (e.g., `source_path` or
whichever column exists on alias `l`), and keep the inline comment about
"requires source path injection" so the example is accurate and doesn't
reference a missing symbol (`_source_path_str`) while still showing the intended
k8s join (`k.log_path_prefix`).

372-373: ⚠️ Potential issue | 🟠 Major

SQL examples use unavailable column, will mislead users.

The note states "not yet implemented" but then immediately shows SQL using _source_path_str. Users will copy these queries expecting them to work. Either:

Remove the SQL examples entirely and replace with a placeholder comment, OR
If _source_path_str exists but is empty/null, clarify that explicitly

As written, this creates confusion about whether the column exists at all.

📝 Suggested documentation fix

--- Requires source path column injection (not yet implemented).
--- Once available, join on the source file path:
-SELECT l.level_str, l.message_str, k.namespace, k.pod_name, k.container_name
-FROM logs l
-JOIN k8s k ON l._source_path_str = k.log_path_prefix
+-- Source path column injection is not yet implemented.
+-- Future join syntax will be:
+--   SELECT l.level_str, l.message_str, k.namespace, k.pod_name, k.container_name
+--   FROM logs l
+--   JOIN k8s k ON l._source_path_str = k.log_path_prefix

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/CONFIG_REFERENCE.md` around lines 372 - 373, The SQL snippet references
a not-yet-implemented column `_source_path_str` and will mislead users; update
the documentation around the join/SQL examples to either remove the example SQL
entirely and replace it with a short placeholder mentioning "column not yet
implemented", or explicitly state that `_source_path_str` does not exist / will
be null/empty until implemented and show a non-working illustrative example with
a clear warning; ensure the text around the join on source file path and any
example queries reference the symbol `_source_path_str` so readers know exactly
which column is affected.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Line 216: Remove the trailing space after "Implemented" in the Markdown table
cell for the `elasticsearch` row (the line containing "| `elasticsearch` | ✅
Implemented   | Elasticsearch bulk API with retry logic, per-document error
handling |"); edit that cell so "Implemented" has no trailing spaces between it
and the closing pipe.

---

Outside diff comments:
In `@README.md`:
- Around line 192-202: Update README.md to match CONFIG_REFERENCE.md by
replacing incorrect CRI column names: change `_file_str` and `_time_ns_int`
usages to the runtime-correct `_timestamp_str` (RFC 3339) and the actual
file/identifier column used in runtime (remove or replace `_file_str` with the
correct column documented elsewhere), and update the SQL example SELECT (replace
`_time_ns_int` with `_timestamp_str`) so the `SELECT _timestamp_str,
_stream_str, level_str, message_str FROM logs WHERE _stream_str = 'stderr' AND
level_str = 'ERROR'` example reflects the corrected column names.

---

Duplicate comments:
In `@docs/CONFIG_REFERENCE.md`:
- Line 492: The SQL example uses a non-existent column `_source_path_str` in the
LEFT JOIN condition (LEFT JOIN k8s k ON l._source_path_str = k.log_path_prefix)
which will fail; edit the example to either remove that join until source path
injection is implemented or replace `_source_path_str` with the actual available
column (e.g., `source_path` or whichever column exists on alias `l`), and keep
the inline comment about "requires source path injection" so the example is
accurate and doesn't reference a missing symbol (`_source_path_str`) while still
showing the intended k8s join (`k.log_path_prefix`).
- Around line 372-373: The SQL snippet references a not-yet-implemented column
`_source_path_str` and will mislead users; update the documentation around the
join/SQL examples to either remove the example SQL entirely and replace it with
a short placeholder mentioning "column not yet implemented", or explicitly state
that `_source_path_str` does not exist / will be null/empty until implemented
and show a non-working illustrative example with a clear warning; ensure the
text around the join on source file path and any example queries reference the
symbol `_source_path_str` so readers know exactly which column is affected.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: e1d41827-71dc-4bde-a442-423c7206b155

📥 Commits

Reviewing files that changed from the base of the PR and between 0aa0337 and 0145e26.

📒 Files selected for processing (2)

README.md
docs/CONFIG_REFERENCE.md

coderabbitai · 2026-04-03T03:29:11Z

 | `http`          | ✅ Implemented  | JSON lines over HTTP POST, optional zstd compression |
 | `stdout`        | ✅ Implemented  | JSON or colored console output — great for local debugging |
-| `elasticsearch` | ✅ Implemented  | Elasticsearch bulk API with retry logic, per-document error handling |
+| `elasticsearch` | ✅ Implemented   | Elasticsearch bulk API with retry logic, per-document error handling |


🧹 Nitpick | 🔵 Trivial

Remove trailing whitespace.

The elasticsearch status cell has an unnecessary trailing space after "Implemented".

🧹 Remove trailing space

-| `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling | +| `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling |

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling |

| `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling |

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@README.md` at line 216, Remove the trailing space after "Implemented" in the Markdown table cell for the `elasticsearch` row (the line containing "| `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling |"); edit that cell so "Implemented" has no trailing spaces between it and the closing pipe.

coderabbitai Bot requested changes Apr 3, 2026

View reviewed changes

Comment thread docs/CONFIG_REFERENCE.md

Comment thread README.md Outdated

coderabbitai Bot requested changes Apr 3, 2026

View reviewed changes

strawgate merged commit 429fac8 into master Apr 3, 2026
9 checks passed

strawgate deleted the fix/doc-accuracy branch April 3, 2026 03:35

coderabbitai Bot mentioned this pull request Apr 4, 2026

docs: align seven doc/runtime accuracy gaps (#706, #711, #713, #714, #718, #719, #733) #852

Merged

This was referenced Apr 6, 2026

work-unit: file/CRI source metadata — contract and docs alignment #1356

Closed

docs: make config reference canonical and remove config status drift #1444

Closed

coderabbitai Bot mentioned this pull request Apr 7, 2026

docs: make config reference canonical #1481

Merged

This was referenced Apr 15, 2026

Fix GitHub issue #2095: work-unit: docs — examples and reference (second batch) #2122

Merged

docs: fix examples and reference (second batch) #2112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct 6 documentation accuracy bugs#786

fix: correct 6 documentation accuracy bugs#786
strawgate merged 2 commits into
masterfrom
fix/doc-accuracy

strawgate commented Apr 3, 2026

Uh oh!

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	\| `elasticsearch` \| ✅ Implemented \| Elasticsearch bulk API with retry logic, per-document error handling \|
	\| `elasticsearch` \| ✅ Implemented \| Elasticsearch bulk API with retry logic, per-document error handling \|

Conversation

strawgate commented Apr 3, 2026

Summary

Bugs fixed

Files changed

Uh oh!

coderabbitai Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading