Fix 2 bugs by liunyl · Pull Request #137 · eloqdata/store_handler

liunyl · 2025-11-06T12:02:32Z

copybasetoarchive read err code is ignored
data store close didnot reset pointer to null

Summary by CodeRabbit

Bug Fixes
- More robust archive operations: reads now complete before error aggregation and reporting, reducing premature failures during bulk copy.
- Safer shutdown behavior: internal data references are cleared after shutdown to prevent potential post-shutdown access issues.

coderabbitai · 2025-11-06T12:02:41Z

Walkthrough

CopyBaseToArchive now stops per-item loop on a read-callback error, waits for in-flight reads to finish, then checks an aggregated error code and returns false if set. UpsertTable received non-functional formatting changes. SwitchReadOnlyToClosed sets data_store_ to nullptr after shutdown.

Changes

Cohort / File(s)	Summary
Copy flow and post-wait error handling `data_store_service_client.cpp`	On per-item read callback error, break out of the per-item loop (instead of returning immediately), wait for all in-flight reads, then check an aggregated `error_code`; if non-zero, log error and return false.
Lambda formatting (no logic change) `data_store_service_client.cpp`	Re-formatted the lambda inside `UpsertTable`'s `AlterTableInfo` handling (spacing/braces only).
Shutdown pointer cleanup `eloq_data_store_service/data_store_service.cpp`	After `data_store_->Shutdown()` in `SwitchReadOnlyToClosed`, set `data_store_ = nullptr` (no other control-flow changes).

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Client as DataStoreServiceClient
    participant Reader as PerItemRead
    Caller->>Client: Request CopyBaseToArchive
    Client->>Reader: enqueue per-item read (loop)
    alt per-item read callback succeeds
        Reader-->>Client: callback (success)
    else per-item read callback reports error
        Reader-->>Client: callback (error) 
        Client->>Client: break out of per-item loop
    end
    Client->>Client: wait for all in-flight reads to finish
    Client->>Client: check aggregated error_code
    alt error_code == 0
        Client-->>Caller: return true / continue
    else error_code != 0
        Client-->>Caller: log error and return false
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Review the aggregation and synchronization around error_code and any concurrency primitives (barrier/wait) for race conditions.
Verify that breaking the loop plus waiting correctly handles all in-flight operations and that no early-return paths remain.
Confirm data_store_ = nullptr is safe: search for subsequent dereferences or assumptions about non-null.

Possibly related PRs

Fix handling putall failure #133 — Shares formatting change to the UpsertTable lambda in data_store_service_client.cpp.
Send out batchwrite req concurrenctly and wait all at once #83 — Previously modified CopyBaseToArchive; relates to concurrent batch/archival read handling.

Suggested reviewers

lzxddz

Poem

🐰
I hopped through bytes and busy queues,
Broke the loop when errors choose,
Waited till all reads were done,
Then I logged and weighed each one.
Paws clean, pointer tucked—now code is snug and true.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Fix 2 bugs' is vague and generic, lacking specificity about what bugs are being fixed or which components are affected.	Consider a more descriptive title that references the specific issues, such as 'Fix CopyBaseToArchive error handling and data store pointer reset' or similar.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bug

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b9dc7a and 3f78faf.

📒 Files selected for processing (2)

data_store_service_client.cpp (2 hunks)
eloq_data_store_service/data_store_service.cpp (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

eloq_data_store_service/data_store_service.cpp

🧰 Additional context used

🧬 Code graph analysis (1)

data_store_service_client.cpp (1)

bigtable_handler.cpp (2)

InitTableRanges (2361-2396)

InitTableRanges (2361-2362)

🔇 Additional comments (3)

data_store_service_client.cpp (3)

2409-2426: LGTM! Critical fix for error handling.

The change correctly addresses Issue 1 by:

Breaking the loop instead of returning immediately when an error is detected (line 2409)

Waiting for all in-flight read requests to complete (lines 2414-2420) to prevent use-after-free issues where callbacks might reference freed memory

Checking the aggregated error_code after all reads complete (lines 2422-2426)

This ensures that all callbacks finish and any errors from in-flight requests are properly captured before returning.

4385-4394: Formatting-only changes.

These changes adjust indentation and spacing in the lambda expression with no behavioral modifications.

1-1: Note: Issue 2 fix not visible in this file.

The PR description mentions fixing "data store close did not reset pointer to null" (Issue 2). According to the AI summary, this fix is implemented in eloq_data_store_service/data_store_service.cpp by setting data_store_ to nullptr in SwitchReadOnlyToClosed. That file is not included in this review, so I cannot verify the second fix.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

eloq_data_store_service/data_store_service.cpp (1)

2291-2297: Potential UAF: nulling data_store_ while reads may still be in flight

After CAS to Closed, in-flight Read/Scan RPCs that already passed the status check can still hit assert(data_store_ != nullptr); data_store_->.... Setting data_store_ = nullptr here risks a use‑after‑free. Writes are explicitly drained, but reads aren’t.

Consider one of:

Drain outstanding reads before destruction (add an ongoing_read_requests_ counter, increment in all read/scan entry points, and wait to zero here before Shutdown() and null).

Protect data_store_ with a shared lock or use std::shared_ptr and pass weak_ptr to closures so late callers won’t dereference freed memory.

If you can’t drain now, keep the pointer non‑null post‑Shutdown and defer nulling to teardown where no RPCs run.
data_store_service_client.cpp (1)
2399-2411: Do not early-return before draining in-flight reads (UAF risk)

The return false inside the scheduling loop can exit while many reads are still in flight; callback_datas then go out of scope and callbacks may dereference freed memory.

Replace early return with “stop scheduling, then drain all outstanding reads, then fail”:
-                if (callback_data->GetErrorCode() != 0)
-                {
-                    LOG(ERROR)
-                        << "CopyBaseToArchive failed for read base table.";
-                    return false;
-                }
+                if (callback_data->GetErrorCode() != 0)
+                {
+                    LOG(ERROR)
+                        << "CopyBaseToArchive read error detected; stop scheduling new reads and drain.";
+                    // Stop scheduling further reads for this batch
+                    // (break out of the base_vec loop; outstanding reads will still complete).
+                    break;
+                }
And after the drain (the “Wait the result all return” block), decide based on the aggregated error:
             {
                 std::unique_lock<bthread::Mutex> lk(mtx);
                 while (flying_cnt > 0) {
                     cv.wait(lk);
                 }
             }
+            if (error_code != 0) {
+                LOG(ERROR) << "CopyBaseToArchive failed for read base table.";
+                return false;
+            }
This preserves safety while still failing fast post‑drain.

🧹 Nitpick comments (1)

data_store_service_client.cpp (1)

2425-2429: Post‑drain error check: good safeguard

This second‑stage check (after all reads have completed) is safe and fixes the “ignored read error” bug. If you adopt the “no early return” change above, you can also rely on the shared error_code to avoid per‑callback checks in this loop.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7468860 and 8b9dc7a.

📒 Files selected for processing (2)

data_store_service_client.cpp (1 hunks)
eloq_data_store_service/data_store_service.cpp (1 hunks)

1. copybasetoarchive read err code is ignored 2. data store close didnot reset pointer to null

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

zhangh43 approved these changes Nov 6, 2025

View reviewed changes

Fix 2 bugs

3f78faf

1. copybasetoarchive read err code is ignored 2. data store close didnot reset pointer to null

liunyl force-pushed the bug branch from 8b9dc7a to 3f78faf Compare November 6, 2025 13:17

liunyl merged commit 5fc1890 into main Nov 6, 2025
1 check passed

liunyl deleted the bug branch November 6, 2025 15:42

liunyl linked an issue Nov 6, 2025 that may be closed by this pull request

[Bug]: DataStoreServiceClient::CopyBaseToArchive deserialize empty record #126

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 2 bugs#137

Fix 2 bugs#137
liunyl merged 1 commit intomainfrom
bug

liunyl commented Nov 6, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 6, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liunyl commented Nov 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liunyl commented Nov 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 6, 2025 •

edited

Loading