fix(cuda_helpers): clear sticky error and avoid cache poisoning in set_shmem_of_kernel by np96 · Pull Request #1095 · NVIDIA/cuopt

np96 · 2026-04-12T14:54:03Z

Description

When cudaFuncSetAttribute fails (e.g. requested size exceeds device limit),
the previous implementation stored the failed size in the shmem_sizes cache
and left a sticky CUDA error in the last-error slot. Subsequent calls for
the same kernel would see the cached (invalid) size and skip the attribute
call, silently proceeding without the required shared memory. The sticky
error would later be caught by an unrelated RAFT_CHECK_CUDA, producing a
confusing cudaErrorInvalidValue crash.

Fix:

Only update the cache on success.
On failure, consume the error with cudaGetLastError() so it cannot
surface later, then return false.
Add five unit tests in ROUTING_UNIT_TEST covering zero request, normal
request, too-large returns false, cache not poisoned on failure, and
no sticky error after failure.

Issue

#1094

Checklist

I am familiar with the Contributing Guidelines.
Testing
- Added tests
- Confirmed that the issue 1094 is not reproducing after the fix
Documentation
- NA

…in set_shmem_of_kernel When cudaFuncSetAttribute fails (e.g. requested size exceeds device limit), the previous implementation stored the failed size in the shmem_sizes cache and left a sticky CUDA error in the last-error slot. Subsequent calls for the same kernel would see the cached (invalid) size and skip the attribute call, silently proceeding without the required shared memory. The sticky error would later be caught by an unrelated RAFT_CHECK_CUDA, producing a confusing cudaErrorInvalidValue crash. Fix: - Only update the cache on success. - On failure, consume the error with cudaGetLastError() so it cannot surface later, then return false. - Add five unit tests in ROUTING_UNIT_TEST covering zero request, normal request, too-large returns false, cache not poisoned on failure, and no sticky error after failure. Reproducer: routing.Solve crashes with cudaErrorInvalidValue at N_VEHICLES >= 157 on V100 (sharedMemPerBlockOptin = 98304 B).

copy-pr-bot · 2026-04-12T14:54:10Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-12T14:58:13Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c08e789c-e1f9-4660-9069-c03c8064784f

📥 Commits

Reviewing files that changed from the base of the PR and between 8e90a94 and fe62f1f.

📒 Files selected for processing (2)

cpp/src/utilities/cuda_helpers.cuh
cpp/tests/routing/unit_tests/set_shmem_of_kernel.cu

✅ Files skipped from review due to trivial changes (1)

cpp/tests/routing/unit_tests/set_shmem_of_kernel.cu

🚧 Files skipped from review as they are similar to previous changes (1)

cpp/src/utilities/cuda_helpers.cuh

📝 Walkthrough

Walkthrough

Switched set_shmem_of_kernel synchronization to static std::shared_mutex, added a fast-path cached-read, made cache updates conditional on cudaFuncSetAttribute success (capturing its return in a local err), and consume CUDA errors on failure. Added unit tests exercising success, failure, repeat calls, and CUDA error state.

Changes

Cohort / File(s)	Summary
Core implementation `cpp/src/utilities/cuda_helpers.cuh`	Replaced process-wide `static std::mutex` with `static std::shared_mutex`; added shared-lock fast-path for cached aligned `dynamic_request_size`; call `cudaFuncSetAttribute`, capture return (`err`), update cache only on success, and call `cudaGetLastError()` on failure.
Tests & build `cpp/tests/routing/CMakeLists.txt`, `cpp/tests/routing/unit_tests/set_shmem_of_kernel.cu`	Added a new CUDA GTest (`set_shmem_of_kernel.cu`) covering zero/normal/oversized requests, repeated-failure behavior, and CUDA error-state clearing; updated CMake to include the test source.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately and concisely describes the main fix: clearing sticky errors and avoiding cache poisoning in set_shmem_of_kernel function.
Description check	✅ Passed	The PR description provides clear context about the bug, the fix applied, and the tests added, all directly related to the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/tests/routing/unit_tests/set_shmem_of_kernel.cu`:
- Around line 68-69: The test currently only checks that no CUDA error was
recorded after calling set_shmem_of_kernel(kernel_sticky_error, too_large); add
an assertion that the call actually failed by expecting a non-success error
(e.g., EXPECT_NE(cudaSuccess, cudaGetLastError()) or a specific error like
EXPECT_EQ(cudaErrorInvalidConfiguration, cudaGetLastError())) immediately after
set_shmem_of_kernel to ensure the sticky-error branch is exercised; locate the
call to set_shmem_of_kernel and replace or augment the following
EXPECT_EQ(cudaSuccess, cudaGetLastError()) accordingly.
- Around line 41-42: The cudaDeviceGetAttribute calls (used to set shmem_max and
derive too_large) are unchecked and may leave shmem_max uninitialized; update
each call to capture the cudaError_t return, verify it equals cudaSuccess, and
on failure fail the test or abort with a clear error message referencing the
call (e.g., the cudaDeviceGetAttribute for
cudaDevAttrMaxSharedMemoryPerBlockOptin) so downstream assertions (and variables
like shmem_max and too_large) are never used when the query failed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 515e20d6-05ab-4e9c-b9c7-3216ae006fb9

📥 Commits

Reviewing files that changed from the base of the PR and between 24fdb60 and 8e90a94.

📒 Files selected for processing (3)

cpp/src/utilities/cuda_helpers.cuh
cpp/tests/routing/CMakeLists.txt
cpp/tests/routing/unit_tests/set_shmem_of_kernel.cu

cpp/tests/routing/unit_tests/set_shmem_of_kernel.cu

…ring

np96 · 2026-04-12T16:07:55Z

Also realized there's a race in unordered_map operator[] access:

https://github.com/NVIDIA/cuopt/blob/main/cpp/src/utilities/cuda_helpers.cuh#L182

I see two options to fix it: either pre-initialize at start for all accessed functions (more efficient but requires collecting all operators per solver type and maintaining initial setting) or use correct double-locking pattern. Will use the second option since not familiar with the codebase enough.

Would be happy to hear feedback and collaborate, have strong interest contributing to this project.

np96 requested review from a team as code owners April 12, 2026 14:54

np96 requested review from hlinsen and rgsl888prabhu April 12, 2026 14:54

np96 requested a review from aliceb-nv April 12, 2026 14:54

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

cpp/tests/routing/unit_tests/set_shmem_of_kernel.cu Outdated Show resolved Hide resolved

cpp/tests/routing/unit_tests/set_shmem_of_kernel.cu Outdated Show resolved Hide resolved

fix(cuda_helpers): fix data race in set_shmem_of_kernel and add docst…

fe62f1f

…ring

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cuda_helpers): clear sticky error and avoid cache poisoning in set_shmem_of_kernel#1095

fix(cuda_helpers): clear sticky error and avoid cache poisoning in set_shmem_of_kernel#1095
np96 wants to merge 2 commits intoNVIDIA:mainfrom
np96:fix/set-shmem-main

np96 commented Apr 12, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 12, 2026

Uh oh!

coderabbitai bot commented Apr 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

np96 commented Apr 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

np96 commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue

Checklist

Uh oh!

copy-pr-bot bot commented Apr 12, 2026

Uh oh!

coderabbitai bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

np96 commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

np96 commented Apr 12, 2026 •

edited

Loading

coderabbitai bot commented Apr 12, 2026 •

edited

Loading

np96 commented Apr 12, 2026 •

edited

Loading