Skip to content

Update metrics#148

Merged
lzxddz merged 1 commit intomainfrom
update-metrics
Nov 24, 2025
Merged

Update metrics#148
lzxddz merged 1 commit intomainfrom
update-metrics

Conversation

@lzxddz
Copy link
Collaborator

@lzxddz lzxddz commented Nov 20, 2025

Summary by CodeRabbit

  • Chores

    • Added per-table and archive flush metrics (including per-batch counts) to improve operational visibility.
    • Emit accumulated table flush totals after table processing completes.
  • Bug Fixes

    • Adjusted timing capture for data fetches to start just before the actual read, improving timing accuracy.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 20, 2025

Walkthrough

Adds per-table and per-archive row counting and runtime flush/write metrics in PutAll/PutArchivesAll, and moves FetchRecord timing capture to just before the actual read path for non-snapshot/non-archives-only fetches. No public APIs changed.

Changes

Cohort / File(s) Summary
Metric & batching changes
data_store_service_client.cpp
PutAll: initialize per-table records_count, accumulate by batch.size() per processed batch, and emit NAME_KV_FLUSH_ROWS_TOTAL after all partitions for a table complete; add per-batch runtime metrics for flush/write activity.
Archive metrics
data_store_service_client.cpp
PutArchivesAll: after each archive partition batch group, emit NAME_KV_FLUSH_ROWS_TOTAL with recs_cnt and "archive" scope; add per-batch runtime metrics.
Timing adjustment
data_store_service_client.cpp
FetchRecord / snapshot fetch: remove early start-time capture at function entry; record start_ immediately before invoking the read path for non-snapshot, non-archives-only fetches so timing reflects the actual fetch.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Client as DataStoreServiceClient
    participant Storage

    rect `#F2F7FF`
    note right of Client: PutAll / PutArchivesAll flow (per-table/archive)
    Caller->>Client: PutAll(table)
    Client->>Client: init records_count = 0
    loop per batch
        Client->>Storage: write batch
        Storage-->>Client: ack
        Client->>Client: records_count += batch.size()
        Client-->>Client: emit per-batch runtime metric
    end
    Client-->>Caller: all partitions done
    Client->>Client: emit NAME_KV_FLUSH_ROWS_TOTAL (records_count)
    end
Loading
sequenceDiagram
    participant Caller
    participant Client as DataStoreServiceClient
    participant Storage

    rect `#FFF9F2`
    note right of Client: FetchRecord timing moved
    Caller->>Client: FetchRecord(request)
    alt snapshot or archives-only
        Client->>Storage: perform snapshot/archive read
    else normal fetch
        Client->>Client: record start_ (just before read)
        Client->>Storage: perform read
    end
    Storage-->>Client: response
    Client-->>Caller: return record
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Check correctness of per-table records_count initialization and accumulation across partition/batch boundaries.
  • Verify metric emission timing (after partitions cleanup) and label/scope for archive metrics.
  • Confirm FetchRecord timing placement does not change error paths or observable behavior.

Possibly related PRs

Poem

🐇 I hop through rows and counts with glee,
Batches tallied, flushes set free,
I time the read at the very start,
So metrics sing true, not just by heart,
A tiny rabbit cheering telemetry.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Update metrics' is vague and generic, using non-descriptive language that doesn't convey the specific nature of the changes. Consider a more specific title such as 'Add per-batch KV flush metrics and adjust timing for fetch operations' to better describe the actual changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch update-metrics

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 008259e and 1005e76.

📒 Files selected for processing (1)
  • data_store_service_client.cpp (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • data_store_service_client.cpp

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
data_store_service_client.cpp (2)

2324-2329: Consider aggregating archive metrics across partitions for consistency.

Currently, NAME_KV_FLUSH_ROWS_TOTAL with scope "archive" is emitted once per partition (inside the loop at line 2152). This differs from PutAll, which accumulates counts across all partitions and emits once per table (line 465-469).

If you process 10 partitions, this emits 10 separate metric data points instead of 1 aggregated point. Consider:

  • Aggregate approach (like PutAll): Move recs_cnt accumulation outside the partition loop and emit once per PutArchivesAll call for consistency.
  • Document the intent: If per-partition granularity is preferred for archive observability, add a comment explaining why this differs from the base table pattern.

Example aggregation:

+    size_t total_archive_recs_cnt = 0;
+
     // Send the batch request
     for (auto &[partition_id, archive_ptrs] : partitions_map)
     {
+        total_archive_recs_cnt += archive_ptrs.size();
         ...
-        if (metrics::enable_kv_metrics)
-        {
-            metrics::kv_meter->Collect(
-                metrics::NAME_KV_FLUSH_ROWS_TOTAL, recs_cnt, "archive");
-        }
     }
+
+    if (metrics::enable_kv_metrics)
+    {
+        metrics::kv_meter->Collect(
+            metrics::NAME_KV_FLUSH_ROWS_TOTAL, total_archive_recs_cnt, "archive");
+    }

3299-3302: Timing capture inconsistency across fetch paths.

The timing is now captured just before the Read call for non-snapshot, non-archives-only fetches, which is more accurate. However:

  1. FetchSnapshot inconsistency: FetchSnapshot (line 3414-3417) still captures timing at function entry, not right before the actual fetch operation. Consider moving it closer to the read for consistency.

  2. Missing archives-only timing: When only_fetch_archives_ is true (line 3294-3297), FetchArchives is called but no timing is captured. If archive fetch performance is important, consider capturing timing in the FetchArchives path as well.

If these timing differences are intentional due to different performance characteristics of snapshot vs. base vs. archive reads, please add comments explaining the rationale.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ca67b9 and 008259e.

📒 Files selected for processing (1)
  • data_store_service_client.cpp (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-17T11:08:35.322Z
Learnt from: liunyl
Repo: eloqdata/store_handler PR: 83
File: data_store_service_client_closure.cpp:408-448
Timestamp: 2025-09-17T11:08:35.322Z
Learning: In the DataStoreServiceClient batch write system, the global coordinator (SyncPutAllData) manages concurrency at the partition level, not at the individual batch level. The global unfinished_request_cnt_ is only decremented when an entire partition is fully completed or failed, not after each individual batch within a partition.

Applied to files:

  • data_store_service_client.cpp
🔇 Additional comments (1)
data_store_service_client.cpp (1)

294-294: Metrics collection for base table flush looks good.

The per-table record count accumulation and metric emission is correctly placed after error checking and cleanup. The scope "base" appropriately distinguishes base table flushes from archive flushes.

Minor suggestion: Consider whether emitting metrics with records_count = 0 (when a table has no records to flush) adds observability value or just metric noise.

Also applies to: 310-310, 464-469

@lzxddz lzxddz linked an issue Nov 21, 2025 that may be closed by this pull request
@lzxddz lzxddz merged commit a322a76 into main Nov 24, 2025
1 check passed
@lzxddz lzxddz deleted the update-metrics branch November 24, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update kv metrics

2 participants