Update metrics by lzxddz · Pull Request #148 · eloqdata/store_handler

lzxddz · 2025-11-20T08:22:45Z

Summary by CodeRabbit

Chores
- Added per-table and archive flush metrics (including per-batch counts) to improve operational visibility.
- Emit accumulated table flush totals after table processing completes.
Bug Fixes
- Adjusted timing capture for data fetches to start just before the actual read, improving timing accuracy.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-20T08:22:54Z

Walkthrough

Adds per-table and per-archive row counting and runtime flush/write metrics in PutAll/PutArchivesAll, and moves FetchRecord timing capture to just before the actual read path for non-snapshot/non-archives-only fetches. No public APIs changed.

Changes

Cohort / File(s)	Summary
Metric & batching changes `data_store_service_client.cpp`	`PutAll`: initialize per-table `records_count`, accumulate by `batch.size()` per processed batch, and emit `NAME_KV_FLUSH_ROWS_TOTAL` after all partitions for a table complete; add per-batch runtime metrics for flush/write activity.
Archive metrics `data_store_service_client.cpp`	`PutArchivesAll`: after each archive partition batch group, emit `NAME_KV_FLUSH_ROWS_TOTAL` with `recs_cnt` and `"archive"` scope; add per-batch runtime metrics.
Timing adjustment `data_store_service_client.cpp`	`FetchRecord` / snapshot fetch: remove early start-time capture at function entry; record `start_` immediately before invoking the read path for non-snapshot, non-archives-only fetches so timing reflects the actual fetch.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Client as DataStoreServiceClient
    participant Storage

    rect `#F2F7FF`
    note right of Client: PutAll / PutArchivesAll flow (per-table/archive)
    Caller->>Client: PutAll(table)
    Client->>Client: init records_count = 0
    loop per batch
        Client->>Storage: write batch
        Storage-->>Client: ack
        Client->>Client: records_count += batch.size()
        Client-->>Client: emit per-batch runtime metric
    end
    Client-->>Caller: all partitions done
    Client->>Client: emit NAME_KV_FLUSH_ROWS_TOTAL (records_count)
    end

sequenceDiagram
    participant Caller
    participant Client as DataStoreServiceClient
    participant Storage

    rect `#FFF9F2`
    note right of Client: FetchRecord timing moved
    Caller->>Client: FetchRecord(request)
    alt snapshot or archives-only
        Client->>Storage: perform snapshot/archive read
    else normal fetch
        Client->>Client: record start_ (just before read)
        Client->>Storage: perform read
    end
    Storage-->>Client: response
    Client-->>Caller: return record
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Check correctness of per-table records_count initialization and accumulation across partition/batch boundaries.
Verify metric emission timing (after partitions cleanup) and label/scope for archive metrics.
Confirm FetchRecord timing placement does not change error paths or observable behavior.

Possibly related PRs

Send out batchwrite req concurrenctly and wait all at once #83 — Refactors per-partition concurrent batching/state/callbacks in the same write paths (PutAll, PutArchivesAll) that this change instruments; likely interacts with batching semantics.

Poem

🐇 I hop through rows and counts with glee,
Batches tallied, flushes set free,
I time the read at the very start,
So metrics sing true, not just by heart,
A tiny rabbit cheering telemetry.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Update metrics' is vague and generic, using non-descriptive language that doesn't convey the specific nature of the changes.	Consider a more specific title such as 'Add per-batch KV flush metrics and adjust timing for fetch operations' to better describe the actual changes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch update-metrics

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 008259e and 1005e76.

📒 Files selected for processing (1)

data_store_service_client.cpp (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

data_store_service_client.cpp

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

data_store_service_client.cpp (2)
2324-2329: Consider aggregating archive metrics across partitions for consistency.

Currently, NAME_KV_FLUSH_ROWS_TOTAL with scope "archive" is emitted once per partition (inside the loop at line 2152). This differs from PutAll, which accumulates counts across all partitions and emits once per table (line 465-469).

If you process 10 partitions, this emits 10 separate metric data points instead of 1 aggregated point. Consider:

Aggregate approach (like PutAll): Move recs_cnt accumulation outside the partition loop and emit once per PutArchivesAll call for consistency.

Document the intent: If per-partition granularity is preferred for archive observability, add a comment explaining why this differs from the base table pattern.

Example aggregation:
+    size_t total_archive_recs_cnt = 0;
+
     // Send the batch request
     for (auto &[partition_id, archive_ptrs] : partitions_map)
     {
+        total_archive_recs_cnt += archive_ptrs.size();
         ...
-        if (metrics::enable_kv_metrics)
-        {
-            metrics::kv_meter->Collect(
-                metrics::NAME_KV_FLUSH_ROWS_TOTAL, recs_cnt, "archive");
-        }
     }
+
+    if (metrics::enable_kv_metrics)
+    {
+        metrics::kv_meter->Collect(
+            metrics::NAME_KV_FLUSH_ROWS_TOTAL, total_archive_recs_cnt, "archive");
+    }
3299-3302: Timing capture inconsistency across fetch paths.

The timing is now captured just before the Read call for non-snapshot, non-archives-only fetches, which is more accurate. However:

FetchSnapshot inconsistency: FetchSnapshot (line 3414-3417) still captures timing at function entry, not right before the actual fetch operation. Consider moving it closer to the read for consistency.

Missing archives-only timing: When only_fetch_archives_ is true (line 3294-3297), FetchArchives is called but no timing is captured. If archive fetch performance is important, consider capturing timing in the FetchArchives path as well.

If these timing differences are intentional due to different performance characteristics of snapshot vs. base vs. archive reads, please add comments explaining the rationale.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ca67b9 and 008259e.

📒 Files selected for processing (1)

data_store_service_client.cpp (5 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-17T11:08:35.322Z

Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.

Applied to files:

data_store_service_client.cpp

🔇 Additional comments (1)

data_store_service_client.cpp (1)

294-294: Metrics collection for base table flush looks good.

The per-table record count accumulation and metric emission is correctly placed after error checking and cleanup. The scope "base" appropriately distinguishes base table flushes from archive flushes.

Minor suggestion: Consider whether emitting metrics with records_count = 0 (when a table has no records to flush) adds observability value or just metric noise.

Also applies to: 310-310, 464-469

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

xiexiaoy approved these changes Nov 20, 2025

View reviewed changes

Update metrics

1005e76

lzxddz force-pushed the update-metrics branch from 008259e to 1005e76 Compare November 21, 2025 11:06

lzxddz linked an issue Nov 21, 2025 that may be closed by this pull request

Update kv metrics #147

Closed

lzxddz merged commit a322a76 into main Nov 24, 2025
1 check passed

lzxddz deleted the update-metrics branch November 24, 2025 08:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update metrics#148

Update metrics#148
lzxddz merged 1 commit intomainfrom
update-metrics

lzxddz commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lzxddz commented Nov 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lzxddz commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 20, 2025 •

edited

Loading