Skip to content

Revert coroutine stack#236

Merged
xiexiaoy merged 2 commits intoeloqdata:mainfrom
xiexiaoy:revert_coro_stack
Oct 13, 2025
Merged

Revert coroutine stack#236
xiexiaoy merged 2 commits intoeloqdata:mainfrom
xiexiaoy:revert_coro_stack

Conversation

@xiexiaoy
Copy link
Copy Markdown
Collaborator

@xiexiaoy xiexiaoy commented Oct 13, 2025

This PR reverts the use of boost::context::protected_fixedsize_stack, because memory is not reused even though it is a class member variable. Each callcc will allocate memory by calling mmap/mprotect.

Summary by CodeRabbit

  • Bug Fixes

    • Reduced rare crashes during background task execution and improved cleanup to prevent memory/resource leaks.
  • Refactor

    • Reworked coroutine stack handling with stronger memory protections, deterministic teardown, and improved coroutine lifecycle management for more reliable scheduling across platforms.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 13, 2025

Walkthrough

Adds mmap-backed coroutine stack allocation with a guard page, integrates that stack into Boost.Context via a preallocated stack and NoopAllocator for callcc, introduces a destructor to unmap the stack, and adds coroutine state/migration helpers to ServiceStateMachine.

Changes

Cohort / File(s) Summary
Coroutine stack allocation and lifecycle
src/mongo/transport/service_state_machine.cpp
Allocates coroutine stack with mmap (PROT_READ
Header interface and coroutine helpers
src/mongo/transport/service_state_machine.h
Replaces Boost protected stack include with boost/context/stack_context.hpp; declares ~ServiceStateMachine(); adds nested NoopAllocator; declares _coroStackContext() and _coroStack pointer; introduces CoroStatus enum and coroutine-related members (_coroStatus, _coroYield, _coroResume, _coroLongResume, _coroMigrateThreadGroup, _resumeTask, _migrating, _threadGroupId) and _source continuation field.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant S as ServiceStateMachine
    participant OS as OS (mmap/mprotect/munmap)
    participant BC as Boost.Context

    rect rgba(230,240,255,0.6)
    note over S: Construction
    S->>OS: mmap(stack_size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS)
    OS-->>S: addr / failure
    S->>OS: mprotect(guard_page, PROT_NONE)
    OS-->>S: ok / failure
    end

    rect rgba(230,255,230,0.6)
    note over S,BC: Coroutine setup & run
    S->>S: _coroStackContext() -> stack_context(sp, size)
    S->>BC: callcc(preallocated{sp,size}, NoopAllocator, coroutine_fn)
    BC-->>S: continuation / control transfers
    end

    rect rgba(255,235,230,0.6)
    note over S: Destruction
    S->>OS: munmap(stack_addr, size)
    OS-->>S: ok
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

I dug a hole of mmaped ground,
a guarded page keeps troubles bound.
callcc hops where contexts play,
Noop paves paths and saves the day.
Munmap waves — the rabbit’s gone, well-found. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly states that the pull request’s primary purpose is to revert the coroutine stack changes, directly reflecting the core action taken in the changeset without extraneous detail. It is concise, specific, and fully related to the main change.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/mongo/transport/service_state_machine.h (1)

56-58: Unconditional POSIX headers break Windows builds

Including <sys/mman.h>/<unistd.h> without guarding them (and relying on mmap/mprotect/munmap) breaks every Windows compilation unit that includes this header. Please restore a portable path—e.g., keep the Boost allocator on non-POSIX platforms or add proper #ifdefs with a Windows implementation—otherwise the server cannot compile on Windows.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a02751 and f88d2d9.

📒 Files selected for processing (2)
  • src/mongo/transport/service_state_machine.cpp (3 hunks)
  • src/mongo/transport/service_state_machine.h (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/mongo/transport/service_state_machine.h (1)
src/mongo/transport/service_state_machine.cpp (4)
  • ServiceStateMachine (245-272)
  • ServiceStateMachine (274-276)
  • _migrateThreadGroup (731-742)
  • _migrateThreadGroup (731-731)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/transport/service_state_machine.h (1)
  • NoopAllocator (267-279)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
src/mongo/transport/service_state_machine.h (1)

283-290: Guard page excluded from usable stack — good

sc.size = kCoroStackSize - pageSize; sc.sp = _coroStack + kCoroStackSize; correctly keeps the guard page outside the advertised stack. This addresses the prior crash risk.

🧹 Nitpick comments (4)
src/mongo/transport/service_state_machine.h (2)

268-282: Make allocator methods noexcept and silence unused parameter

Tighten the allocator to meet typical Boost.Context expectations and avoid warnings.

 class NoopAllocator {
 public:
     NoopAllocator() = default;

-    boost::context::stack_context allocate() {
-        boost::context::stack_context sc;
-        return sc;
-    }
+    boost::context::stack_context allocate() noexcept { return {}; }

-    void deallocate(boost::context::stack_context& sc) {
-        // no-op
-    }
+    void deallocate(boost::context::stack_context&) noexcept {
+        // no-op
+    }
 };

295-297: Default-initialize _coroStack

Prevents accidental use of an indeterminate pointer in error paths or future changes.

-    char* _coroStack;
+    char* _coroStack{};
src/mongo/transport/service_state_machine.cpp (2)

261-272: mmap + guard page OK; consider MAP_STACK and avoid dumping stack in core files

Current mapping and single low guard page are fine. Optionally:

  • Add MAP_STACK (where available) to hint stack usage.
  • Call madvise(..., MADV_DONTDUMP) to keep these stacks out of core dumps.

Example:

-    _coroStack = (char*)::mmap(
-        nullptr, kCoroStackSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+    _coroStack = static_cast<char*>(::mmap(nullptr,
+                                           kCoroStackSize,
+                                           PROT_READ | PROT_WRITE,
+                                           MAP_PRIVATE | MAP_ANONYMOUS
+#ifdef MAP_STACK
+                                               | MAP_STACK
+#endif
+                                               ,
+                                           -1,
+                                           0));
     if (_coroStack == MAP_FAILED) {
         error() << "mmap allocate coroutine stack failed, " << ::strerror(errno);
         std::abort();
     }
 
     if (::mprotect(_coroStack, getpagesize(), PROT_NONE) != 0) {
         error() << "mprotect coroutine stack failed, " << ::strerror(errno);
         std::abort();
     }
+#ifdef MADV_DONTDUMP
+    ::madvise(_coroStack, kCoroStackSize, MADV_DONTDUMP);
+#endif

568-574: Avoid capturing ThreadGuard by reference; move-capture it into the coroutine

Referencing a local by &guard across coroutine setup is brittle. Move-capture it to bind lifetime clearly to the coroutine frame.

-                        _source = boost::context::callcc(
+                        _source = boost::context::callcc(
                             std::allocator_arg,
                             prealloc,
                             NoopAllocator(),
-                            [wssm, &guard](boost::context::continuation&& sink) {
+                            [wssm, guard = std::move(guard)](boost::context::continuation&& sink) {
                                 auto ssm = wssm.lock();
                                 if (!ssm) {
                                     return std::move(sink);
                                 }
                                 ssm->_coroYield = [ssm = ssm.get(), &sink]() {
                                     MONGO_LOG(3) << "call yield";
                                     ssm->_dbClient = Client::releaseCurrent();
                                     sink = sink.resume();
                                 };
-                                ssm->_processMessage(std::move(guard));
+                                ssm->_processMessage(std::move(guard));
 
                                 return std::move(sink);
                             });
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f88d2d9 and 83b7294.

📒 Files selected for processing (2)
  • src/mongo/transport/service_state_machine.cpp (3 hunks)
  • src/mongo/transport/service_state_machine.h (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/transport/service_state_machine.h (1)
  • NoopAllocator (269-281)
src/mongo/transport/service_state_machine.h (1)
src/mongo/transport/service_state_machine.cpp (4)
  • ServiceStateMachine (245-272)
  • ServiceStateMachine (274-276)
  • _migrateThreadGroup (731-742)
  • _migrateThreadGroup (731-731)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/mongo/transport/service_state_machine.cpp (1)

277-279: Destroy continuation before unmapping the stack.

The _source continuation may hold references to the stack and accessing it after munmap causes undefined behavior. Reset or destroy _source before unmapping to ensure safe cleanup.

Apply this diff:

 ServiceStateMachine::~ServiceStateMachine() {
+    // Ensure continuation releases any association with the stack first.
+    _source = {};
     ::munmap(_coroStack, kCoroStackSize);
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83b7294 and ddd4de9.

📒 Files selected for processing (1)
  • src/mongo/transport/service_state_machine.cpp (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/transport/service_state_machine.h (1)
  • NoopAllocator (269-281)
🔇 Additional comments (2)
src/mongo/transport/service_state_machine.cpp (2)

56-61: Includes are now complete.

All required headers for the mmap-based stack allocation have been added, including the previously flagged <cerrno>, <cstring>, and <boost/context/preallocated.hpp>.


570-576: No action required: _coroStackContext() correctly sets sc.sp = _coroStack + kCoroStackSize and sc.size = kCoroStackSize - getpagesize(), ensuring guard-page exclusion and downward stack growth.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/mongo/transport/service_state_machine.cpp (1)

264-274: Clean up mmap'd memory before aborting on mprotect failure.

If mprotect fails after mmap succeeds (Line 271), the allocated memory remains unmapped before std::abort(). While the process terminates, cleaner resource management would call ::munmap(_coroStack, kCoroStackSize) before aborting.

Apply this diff:

     if (::mprotect(_coroStack, getpagesize(), PROT_NONE) != 0) {
         error() << "mprotect coroutine stack failed, " << ::strerror(errno);
+        ::munmap(_coroStack, kCoroStackSize);
         std::abort();
     }
🧹 Nitpick comments (1)
src/mongo/transport/service_state_machine.h (1)

267-279: Consider adding documentation for NoopAllocator.

The NoopAllocator serves a specific purpose: it prevents Boost.Context from managing the stack lifecycle since we're providing a preallocated stack. Adding a brief comment explaining this design choice would improve code clarity.

Consider adding a comment:

+    // NoopAllocator is used with preallocated stacks to prevent Boost.Context
+    // from attempting its own allocation/deallocation of stack memory.
     class NoopAllocator {
     public:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddd4de9 and e32aea9.

📒 Files selected for processing (2)
  • src/mongo/transport/service_state_machine.cpp (4 hunks)
  • src/mongo/transport/service_state_machine.h (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/mongo/transport/service_state_machine.h (1)
src/mongo/transport/service_state_machine.cpp (6)
  • ServiceStateMachine (248-275)
  • ServiceStateMachine (277-280)
  • _coroStackContext (735-741)
  • _coroStackContext (735-735)
  • _migrateThreadGroup (743-754)
  • _migrateThreadGroup (743-743)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/transport/service_state_machine.h (1)
  • NoopAllocator (267-279)
🔇 Additional comments (5)
src/mongo/transport/service_state_machine.cpp (3)

56-60: Includes addressed from previous review.

The missing includes for errno, strerror, and preallocated have been properly added.


277-280: Proper cleanup order maintained.

The destructor correctly destroys _source before unmapping the stack, avoiding potential undefined behavior if Boost.Context accesses the stack during continuation destruction.


572-577: Verify preallocated integration with Boost.Context.

The integration of the custom mmap'd stack with boost::context::preallocated and NoopAllocator appears correct. The preallocated object receives the stack pointer and size, and the NoopAllocator prevents Boost from attempting its own allocation/deallocation.

Ensure that the Boost.Context version in use properly supports this pattern and that the NoopAllocator's empty deallocate won't cause issues during continuation destruction.

src/mongo/transport/service_state_machine.h (2)

84-84: Destructor properly declared.

The public destructor is correctly declared to handle cleanup of the mmap'd coroutine stack.


287-287: Raw pointer for mmap'd memory is appropriate.

Using a raw char* pointer for the mmap'd stack is correct here since the lifecycle is manually managed with explicit mmap/munmap calls in the constructor/destructor. Smart pointers would add unnecessary complexity for this system-level memory management.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/mongo/transport/service_state_machine.cpp (1)

264-274: Clean up mmap allocation before aborting on mprotect failure.

If mprotect fails at Line 271 after mmap succeeds at Line 264, the allocated memory is not unmapped before calling std::abort(). While the process terminates anyway, cleaner resource management would call ::munmap(_coroStack, kCoroStackSize) before aborting.

Apply this diff:

     if (::mprotect(_coroStack, getpagesize(), PROT_NONE) != 0) {
         error() << "mprotect coroutine stack failed, " << ::strerror(errno);
+        ::munmap(_coroStack, kCoroStackSize);
         std::abort();
     }
🧹 Nitpick comments (1)
src/mongo/transport/service_state_machine.cpp (1)

264-274: Consider adding platform guards for POSIX-specific stack management.

The code uses POSIX-only APIs (mmap, mprotect, getpagesize, munmap) without platform guards, which will break Windows builds. If cross-platform support is needed, consider wrapping these calls in #ifndef _WIN32 and providing Windows equivalents (e.g., VirtualAlloc/VirtualProtect or boost::context::stack_traits::page_size()).

If the codebase is intentionally POSIX-only, this is acceptable as-is.

Also applies to: 735-742

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e32aea9 and 04c9f5a.

📒 Files selected for processing (2)
  • src/mongo/transport/service_state_machine.cpp (4 hunks)
  • src/mongo/transport/service_state_machine.h (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/mongo/transport/service_state_machine.h (1)
src/mongo/transport/service_state_machine.cpp (6)
  • ServiceStateMachine (248-275)
  • ServiceStateMachine (277-280)
  • _coroStackContext (735-742)
  • _coroStackContext (735-735)
  • _migrateThreadGroup (744-755)
  • _migrateThreadGroup (744-744)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/transport/service_state_machine.h (1)
  • NoopAllocator (267-279)
🔇 Additional comments (9)
src/mongo/transport/service_state_machine.cpp (4)

56-61: LGTM! Includes are complete.

The addition of <boost/context/preallocated.hpp>, <cerrno>, and <cstring> properly supports the mmap-based stack allocation and error reporting.


277-280: LGTM! Continuation is destroyed before unmapping.

The destructor correctly destroys _source before unmapping the stack, avoiding any potential use-after-free if Boost touches the stack during continuation destruction.


572-577: LGTM! Preallocated stack setup is correct.

The use of boost::context::preallocated with the manually allocated stack and NoopAllocator correctly prevents Boost.Context from allocating its own stack while leveraging the mmap-based allocation.


735-742: LGTM! Guard page is correctly excluded from usable stack size.

The implementation correctly subtracts the guard page from the advertised stack size, preventing Boost from writing into the protected region.

src/mongo/transport/service_state_machine.h (5)

34-34: LGTM! Stack context header is included.

The addition of <boost/context/stack_context.hpp> supports the new _coroStackContext() method and stack management.


84-85: LGTM! Destructor is declared.

The destructor declaration properly supports cleanup of the manually allocated coroutine stack.


266-279: LGTM! NoopAllocator design is correct for preallocated usage.

The NoopAllocator correctly provides stub methods since the stack is manually allocated via mmap and passed to Boost.Context through boost::context::preallocated. This prevents double allocation.


281-281: LGTM! Stack context accessor is declared.

The _coroStackContext() helper method properly encapsulates the logic for computing the stack context with guard page adjustments.


287-287: LGTM! Stack pointer member is added.

The _coroStack raw pointer replaces the previous allocator-based approach. Ownership is clear from the constructor (allocates via mmap) and destructor (deallocates via munmap).

@xiexiaoy xiexiaoy requested a review from lzxddz October 13, 2025 05:36
@xiexiaoy xiexiaoy merged commit 2efa0de into eloqdata:main Oct 13, 2025
3 checks passed
@xiexiaoy xiexiaoy deleted the revert_coro_stack branch October 13, 2025 05:40
xiexiaoy added a commit that referenced this pull request Oct 13, 2025
This PR reverts the use of boost::context::protected_fixedsize_stack, because
memory is not reused even though it is a class member variable. Each callcc will
allocate memory by calling mmap/mprotect.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants