Skip to content

Fix 2 bugs#137

Merged
liunyl merged 1 commit intomainfrom
bug
Nov 6, 2025
Merged

Fix 2 bugs#137
liunyl merged 1 commit intomainfrom
bug

Conversation

@liunyl
Copy link
Copy Markdown
Contributor

@liunyl liunyl commented Nov 6, 2025

  1. copybasetoarchive read err code is ignored
  2. data store close didnot reset pointer to null

Summary by CodeRabbit

  • Bug Fixes
    • More robust archive operations: reads now complete before error aggregation and reporting, reducing premature failures during bulk copy.
    • Safer shutdown behavior: internal data references are cleared after shutdown to prevent potential post-shutdown access issues.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 6, 2025

Walkthrough

CopyBaseToArchive now stops per-item loop on a read-callback error, waits for in-flight reads to finish, then checks an aggregated error code and returns false if set. UpsertTable received non-functional formatting changes. SwitchReadOnlyToClosed sets data_store_ to nullptr after shutdown.

Changes

Cohort / File(s) Summary
Copy flow and post-wait error handling
data_store_service_client.cpp
On per-item read callback error, break out of the per-item loop (instead of returning immediately), wait for all in-flight reads, then check an aggregated error_code; if non-zero, log error and return false.
Lambda formatting (no logic change)
data_store_service_client.cpp
Re-formatted the lambda inside UpsertTable's AlterTableInfo handling (spacing/braces only).
Shutdown pointer cleanup
eloq_data_store_service/data_store_service.cpp
After data_store_->Shutdown() in SwitchReadOnlyToClosed, set data_store_ = nullptr (no other control-flow changes).

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Client as DataStoreServiceClient
    participant Reader as PerItemRead
    Caller->>Client: Request CopyBaseToArchive
    Client->>Reader: enqueue per-item read (loop)
    alt per-item read callback succeeds
        Reader-->>Client: callback (success)
    else per-item read callback reports error
        Reader-->>Client: callback (error) 
        Client->>Client: break out of per-item loop
    end
    Client->>Client: wait for all in-flight reads to finish
    Client->>Client: check aggregated error_code
    alt error_code == 0
        Client-->>Caller: return true / continue
    else error_code != 0
        Client-->>Caller: log error and return false
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review the aggregation and synchronization around error_code and any concurrency primitives (barrier/wait) for race conditions.
  • Verify that breaking the loop plus waiting correctly handles all in-flight operations and that no early-return paths remain.
  • Confirm data_store_ = nullptr is safe: search for subsequent dereferences or assumptions about non-null.

Possibly related PRs

Suggested reviewers

  • lzxddz

Poem

🐰
I hopped through bytes and busy queues,
Broke the loop when errors choose,
Waited till all reads were done,
Then I logged and weighed each one.
Paws clean, pointer tucked—now code is snug and true.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Fix 2 bugs' is vague and generic, lacking specificity about what bugs are being fixed or which components are affected. Consider a more descriptive title that references the specific issues, such as 'Fix CopyBaseToArchive error handling and data store pointer reset' or similar.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bug

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b9dc7a and 3f78faf.

📒 Files selected for processing (2)
  • data_store_service_client.cpp (2 hunks)
  • eloq_data_store_service/data_store_service.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • eloq_data_store_service/data_store_service.cpp
🧰 Additional context used
🧬 Code graph analysis (1)
data_store_service_client.cpp (1)
bigtable_handler.cpp (2)
  • InitTableRanges (2361-2396)
  • InitTableRanges (2361-2362)
🔇 Additional comments (3)
data_store_service_client.cpp (3)

2409-2426: LGTM! Critical fix for error handling.

The change correctly addresses Issue 1 by:

  1. Breaking the loop instead of returning immediately when an error is detected (line 2409)
  2. Waiting for all in-flight read requests to complete (lines 2414-2420) to prevent use-after-free issues where callbacks might reference freed memory
  3. Checking the aggregated error_code after all reads complete (lines 2422-2426)

This ensures that all callbacks finish and any errors from in-flight requests are properly captured before returning.


4385-4394: Formatting-only changes.

These changes adjust indentation and spacing in the lambda expression with no behavioral modifications.


1-1: Note: Issue 2 fix not visible in this file.

The PR description mentions fixing "data store close did not reset pointer to null" (Issue 2). According to the AI summary, this fix is implemented in eloq_data_store_service/data_store_service.cpp by setting data_store_ to nullptr in SwitchReadOnlyToClosed. That file is not included in this review, so I cannot verify the second fix.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
eloq_data_store_service/data_store_service.cpp (1)

2291-2297: Potential UAF: nulling data_store_ while reads may still be in flight

After CAS to Closed, in-flight Read/Scan RPCs that already passed the status check can still hit assert(data_store_ != nullptr); data_store_->.... Setting data_store_ = nullptr here risks a use‑after‑free. Writes are explicitly drained, but reads aren’t.

Consider one of:

  • Drain outstanding reads before destruction (add an ongoing_read_requests_ counter, increment in all read/scan entry points, and wait to zero here before Shutdown() and null).
  • Protect data_store_ with a shared lock or use std::shared_ptr and pass weak_ptr to closures so late callers won’t dereference freed memory.
  • If you can’t drain now, keep the pointer non‑null post‑Shutdown and defer nulling to teardown where no RPCs run.
data_store_service_client.cpp (1)

2399-2411: Do not early-return before draining in-flight reads (UAF risk)

The return false inside the scheduling loop can exit while many reads are still in flight; callback_datas then go out of scope and callbacks may dereference freed memory.

Replace early return with “stop scheduling, then drain all outstanding reads, then fail”:

-                if (callback_data->GetErrorCode() != 0)
-                {
-                    LOG(ERROR)
-                        << "CopyBaseToArchive failed for read base table.";
-                    return false;
-                }
+                if (callback_data->GetErrorCode() != 0)
+                {
+                    LOG(ERROR)
+                        << "CopyBaseToArchive read error detected; stop scheduling new reads and drain.";
+                    // Stop scheduling further reads for this batch
+                    // (break out of the base_vec loop; outstanding reads will still complete).
+                    break;
+                }

And after the drain (the “Wait the result all return” block), decide based on the aggregated error:

             {
                 std::unique_lock<bthread::Mutex> lk(mtx);
                 while (flying_cnt > 0) {
                     cv.wait(lk);
                 }
             }
+            if (error_code != 0) {
+                LOG(ERROR) << "CopyBaseToArchive failed for read base table.";
+                return false;
+            }

This preserves safety while still failing fast post‑drain.

🧹 Nitpick comments (1)
data_store_service_client.cpp (1)

2425-2429: Post‑drain error check: good safeguard

This second‑stage check (after all reads have completed) is safe and fixes the “ignored read error” bug. If you adopt the “no early return” change above, you can also rely on the shared error_code to avoid per‑callback checks in this loop.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7468860 and 8b9dc7a.

📒 Files selected for processing (2)
  • data_store_service_client.cpp (1 hunks)
  • eloq_data_store_service/data_store_service.cpp (1 hunks)

1. copybasetoarchive read err code is ignored
2. data store close didnot reset pointer to null
@liunyl liunyl merged commit 5fc1890 into main Nov 6, 2025
1 check passed
@liunyl liunyl deleted the bug branch November 6, 2025 15:42
@liunyl liunyl linked an issue Nov 6, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: DataStoreServiceClient::CopyBaseToArchive deserialize empty record

2 participants