Incorporate determinism into GPU heuristics + parallel determinsitic B&B+GPU exploration#986
Incorporate determinism into GPU heuristics + parallel determinsitic B&B+GPU exploration#986aliceb-nv wants to merge 339 commits intoNVIDIA:release/26.04from
Conversation
Forward-merge release/26.04 into main
|
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
Forward-merge release/26.04 into main
These tests are expensive, currently broken, and aren't covered under CI. New contributors will be confused if they see failing tests, so they should be excluded from the instructions. Authors: - Miles Lubin (https://github.com/mlubin) Approvers: - Hugo Linsenmaier (https://github.com/hlinsen) - Ramakrishnap (https://github.com/rgsl888prabhu) URL: NVIDIA#1034
…26.04_2 Main merge release/26.04 2
Forward-merge release/26.04 into main
Forward-merge release/26.04 into main
Forward-merge release/26.04 into main
Forward-merge release/26.04 into main
Forward-merge release/26.04 into main
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu (1)
1484-1494:⚠️ Potential issue | 🟠 MajorA finite work budget still overruns by one sampling window.
Line 1492 only executes inside the
iterations % 100 == 0block, sowork_budget == 0or any small finite budget still allows up to one extra 100-iteration batch before the loop stops. If this limit is meant to be authoritative, it needs an upfront check or a tighter sampling interval while a finite budget is active.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` around lines 1484 - 1494, The loop currently only checks the finite budget every 100 iterations (the if (fj_cpu.iterations % 100 == 0 ...) block), allowing up to one extra 100-iteration batch to overrun a small work_budget; fix this by adding an authoritative budget check at the start of the loop using fj_cpu.work_units_elapsed.load(std::memory_order_relaxed) >= fj_cpu.work_budget and break early if true (so the budget is enforced on every iteration), and keep the existing 100-iteration sampling (memory_aggregator.collect(), biased_work update, producer_sync->notify_progress()) as-is.cpp/src/mip_heuristics/local_search/local_search.cu (1)
264-317:⚠️ Potential issue | 🟠 MajorReturn the merged solution's feasibility, not just
cpu_feasible.After the merge,
solutioncan be feasible because the GPU path found one even whencpu_feasibleis false.run_fj_on_zero()forwards this return value directly, so it can miss a valid zero-seeded incumbent and keep going into FP unnecessarily.💡 Minimal fix
- solution.compute_feasibility(); - - return cpu_feasible; + const bool merged_feasible = solution.compute_feasibility(); + return merged_feasible;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/local_search/local_search.cu` around lines 264 - 317, The function currently returns cpu_feasible which ignores the merged solution state after possibly copying the CPU solution into solution; change the return to reflect the final merged feasibility by returning solution.get_feasible() (or assign a merged_feasible = solution.get_feasible() after the final solution.compute_feasibility() and return that). Update the final return statement (replace "return cpu_feasible;") in the local_search.cu block handling CPU/GPU merge so callers receive the merged solution's true feasibility.
🧹 Nitpick comments (15)
cpp/CMakeLists.txt (2)
546-556: Convert the commented PAPI block into an option-gated implementation or remove it.This dead commented block will drift and confuse future maintainers. PAPI is not referenced elsewhere in the CMake configuration, so converting it behind an explicit
option(...)or deleting it entirely is safe.Proposed option-gated implementation
-# find_path(PAPI_INCLUDE_DIR papi.h) -# find_library(PAPI_LIBRARY papi) -# -# if (PAPI_INCLUDE_DIR AND PAPI_LIBRARY) -# message(STATUS "Found PAPI in ${PAPI_INCLUDE_DIR}") -# target_include_directories(cuopt PRIVATE ${PAPI_INCLUDE_DIR}) -# target_link_libraries(cuopt PRIVATE ${PAPI_LIBRARY}) -# else() -# message(FATAL_ERROR "Could not find PAPI") -# endif() +option(CUOPT_ENABLE_PAPI "Enable PAPI integration" OFF) +if(CUOPT_ENABLE_PAPI) + find_path(PAPI_INCLUDE_DIR papi.h) + find_library(PAPI_LIBRARY papi) + if(PAPI_INCLUDE_DIR AND PAPI_LIBRARY) + message(STATUS "Found PAPI in ${PAPI_INCLUDE_DIR}") + target_include_directories(cuopt PRIVATE ${PAPI_INCLUDE_DIR}) + target_link_libraries(cuopt PRIVATE ${PAPI_LIBRARY}) + else() + message(FATAL_ERROR "CUOPT_ENABLE_PAPI=ON but PAPI was not found") + endif() +endif()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/CMakeLists.txt` around lines 546 - 556, The commented PAPI detection block should be either removed or converted to an option-gated implementation: add a CMake option like ENABLE_PAPI (default OFF) and wrap the existing find_path/find_library logic in if(ENABLE_PAPI) ... endif(), using the same variables PAPI_INCLUDE_DIR and PAPI_LIBRARY and applying target_include_directories(cuopt PRIVATE ${PAPI_INCLUDE_DIR}) and target_link_libraries(cuopt PRIVATE ${PAPI_LIBRARY}) when found, and emit a clear message(FATAL_ERROR "...") only when ENABLE_PAPI is ON and the libraries are missing; alternatively delete the commented block entirely if PAPI support is not needed.
666-667: Guard-piewith platform/capability checks and use CMake's linker form.The unconditional
-pieflag (line 681) lacks platform guards. The-pieflag is primarily for Linux ELF binaries; on Windows and macOS, it may be ineffective or cause issues. UseCheckPIESupported, guard withCMAKE_SYSTEM_NAME STREQUAL "Linux", and switch toLINKER:-piefor proper linker driver handling.Proposed change
+include(CheckPIESupported) +check_pie_supported() + set_target_properties(cuopt_cli PROPERTIES CXX_STANDARD 20 CXX_STANDARD_REQUIRED ON CXX_SCAN_FOR_MODULES OFF POSITION_INDEPENDENT_CODE ON ) -target_link_options(cuopt_cli PRIVATE -pie) +if(CMAKE_SYSTEM_NAME STREQUAL "Linux") + target_link_options(cuopt_cli PRIVATE "LINKER:-pie") +endif()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/CMakeLists.txt` around lines 666 - 667, The CMakeLists currently adds an unconditional -pie flag; update it to only add PIE when supported and appropriate by calling CMake's CheckPIESupported (or using CMake's PIE support macros), guard with a platform check such as CMAKE_SYSTEM_NAME STREQUAL "Linux" (or the CheckPIESupported result), and add the flag via the target_link_options/linker form (e.g., LINKER:-pie) instead of raw compile flags; locate the section that mentions "-pie" and replace the unconditional addition with a conditional that queries CheckPIESupported and CMAKE_SYSTEM_NAME and then appends LINKER:-pie to the target_link_options for the relevant targets.cpp/src/mip_heuristics/problem/presolve_data.cu (1)
225-228: Make theproblemargument const here.This implementation only reads
problem.handle_ptr, but the non-const reference now forcesproblem_t<i_t, f_t>::papilo_uncrush_assignment(...)toconst_cast*thisat the call site. Changing this toconst problem_t<i_t, f_t>&keeps the API const-correct and removes that footgun.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/problem/presolve_data.cu` around lines 225 - 228, Change the parameter type of presolve_data_t<i_t,f_t>::papilo_uncrush_assignment from a non-const reference to a const reference (i.e., const problem_t<i_t,f_t>& problem) so the method is const-correct; update the function signature accordingly and ensure all uses inside (e.g., accessing problem.handle_ptr) remain reads-only so no other changes are required, then remove any call-site const_casts that were compensating for the non-const parameter.cpp/src/mip_heuristics/solver_solution.cu (1)
249-255: Makepool_hashreflect the pool order.XOR collapses permutations and duplicate pairs, so distinct solution pools can emit the same provenance hash. If this field is meant to fingerprint the exact pool state, hash the sequence of per-solution hashes instead.
Possible adjustment
uint32_t pool_hash = 0; + std::vector<uint32_t> pool_hashes; + pool_hashes.reserve(solution_pool_.size()); for (const auto& pool_sol : solution_pool_) { if (pool_sol.size() > 0) { auto host_pool_sol = cuopt::host_copy(pool_sol, rmm::cuda_stream_default); - pool_hash ^= detail::compute_hash(host_pool_sol); + pool_hashes.push_back(detail::compute_hash(host_pool_sol)); + } else { + pool_hashes.push_back(0); } } + if (!pool_hashes.empty()) { pool_hash = detail::compute_hash(pool_hashes); }Also applies to: 261-274
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/solver_solution.cu` around lines 249 - 255, The current XOR of per-solution hashes (pool_hash ^= detail::compute_hash(...)) collapses order and duplicates; instead compute a sequence-aware combined hash by iterating solution_pool_, calling cuopt::host_copy for each pool_sol and computing per_hash = detail::compute_hash(host_pool_sol), then fold per_hash into pool_hash using an order-sensitive combiner (e.g., pool_hash = pool_hash * PRIME + per_hash or use a std::hash_combine-style mix) so the final pool_hash reflects the sequence and content; apply the same change to the analogous block around the region noted (lines 261-274) so both places use the ordered combination instead of XOR.cpp/src/mip_heuristics/presolve/bounds_presolve.cu (1)
170-177: Avoid making the deterministic cap sticky across future calls.Clamping
settings.iteration_limitin-place mutates the stored solver settings, so later invocations can no longer distinguish the user-configured limit from this deterministic cap. Use a local loop bound instead.♻️ Suggested refactor
- if ((context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS)) { - settings.iteration_limit = std::min(settings.iteration_limit, 50); - } + i_t iter_limit = settings.iteration_limit; + if ((context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS)) { + iter_limit = std::min(iter_limit, i_t{50}); + } i_t iter; upd.init_changed_constraints(pb.handle_ptr); - for (iter = 0; iter < settings.iteration_limit; ++iter) { + for (iter = 0; iter < iter_limit; ++iter) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/presolve/bounds_presolve.cu` around lines 170 - 177, The code mutates the stored solver settings by clamping settings.iteration_limit in-place; instead, compute a local loop bound and leave settings untouched. In bound_presolve_t<i_t, f_t>::bound_update_loop, check context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS and set a local variable (e.g. max_iterations or loop_limit) to std::min(settings.iteration_limit, 50) and use that local variable for the loop/termination checks rather than assigning back to settings.iteration_limit so future calls see the original user-configured value.cpp/include/cuopt/linear_programming/cuopt_c.h (1)
786-797: Document the extended callback contract in the public header.Please spell out whether
cuOptSetMIPGetSolutionCallbackExtreplaces or coexists withcuOptSetMIPGetSolutionCallback, and which thread invokes it. Users need that to avoid duplicate processing and to synchronize shared state safely.As per coding guidelines, "
cpp/include/cuopt/**/*: Flag API changes that may need corresponding docs/ updates" and "Suggest documenting thread-safety, GPU requirements, and numerical behavior".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/include/cuopt/linear_programming/cuopt_c.h` around lines 786 - 797, Update the public header documentation for cuOptSetMIPGetSolutionCallbackExt to explicitly state its contract: clarify whether cuOptSetMIPGetSolutionCallbackExt replaces or coexists with cuOptSetMIPGetSolutionCallback (e.g., "replaces existing callback" or "coexists; both may be invoked"), name which thread/context invokes the callback (e.g., solver main thread, solver worker thread, or a dedicated callback thread), and enumerate thread-safety and synchronization requirements (reentrancy, whether the callback may be called concurrently, and whether callers must protect shared state). Also document GPU/CPU context and numerical behavior expectations (e.g., pointers/arrays are host memory, any lifetime guarantees for solution buffers, floating-point formats/scaling), and add a brief note that API changes need corresponding docs updates per the repository guidelines; reference cuOptSetMIPGetSolutionCallbackExt and cuOptSetMIPGetSolutionCallback in the wording so readers can find both symbols.cpp/src/branch_and_bound/pseudo_costs.hpp (1)
362-367: Document the distinction fromqueue_update.
record_update()only records the update without mutating the snapshot'ssum_*/num_*arrays, unlikequeue_update()which does both. A brief doc comment would clarify this intentional difference.// Record an update that was already applied to the arrays (e.g. by strong branching). + // Unlike queue_update(), this does NOT modify sum_*/num_* arrays - only records for later replay. void record_update(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.hpp` around lines 362 - 367, Add a short doc comment above record_update explaining it only appends an update to updates_ without modifying the snapshot's aggregation arrays (the sum_*/num_* members), in contrast to queue_update which both records and applies the change to those arrays; mention the parameters (variable, direction, delta, clock, worker_id) to make the intent explicit so callers know to use queue_update when they need the snapshot's sums/counts updated.cpp/tests/mip/local_search_test.cu (2)
68-68: Empty stub function.
setup_device_symbolsis an empty function that casts its parameter to void. If this is a placeholder for future implementation, consider adding a TODO comment. If it's no longer needed, remove it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/local_search_test.cu` at line 68, The function setup_device_symbols(rmm::cuda_stream_view stream_view) is an empty stub; either implement the intended device-symbol initialization logic or remove the function and its declarations/usages; if you intend to keep it as a placeholder, add a clear TODO comment explaining what must be implemented and why the parameter is currently unused (or mark it [[maybe_unused]]), and update any callers or headers referencing setup_device_symbols accordingly to avoid dead code or unused-parameter warnings.
196-197: Clarify purpose of spin_stream_raii_t objects.Two
spin_stream_raii_tobjects are created but their relationship to the determinism test isn't obvious from the code. Consider adding a comment explaining their purpose (e.g., stress-testing CUDA stream scheduling for determinism).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/local_search_test.cu` around lines 196 - 197, The two spin_stream_raii_t instances (spin_stream_1 and spin_stream_2) are created without explanation; add a concise comment above their declarations clarifying they spawn additional CUDA streams that spin to stress test stream scheduling and ensure the determinism test exercises concurrent stream behavior (i.e., stress-testing CUDA stream scheduling for determinism), so future readers understand why these dummy/spinning streams are present.cpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuh (2)
71-72: Consider documenting or extracting the work-unit scaling constant.The
1e8divisor is a magic number. The TODO comment suggests this is provisional. Consider extracting this to a named constant or documenting the rationale for this specific scale factor to aid future maintenance.+// Work unit scale: one work unit per 100M variables from the other parent +constexpr double SUB_MIP_WORK_SCALE = 1e8; // TODO: CHANGE -double work = static_cast<double>(n_vars_from_other) / 1e8; +double work = static_cast<double>(n_vars_from_other) / SUB_MIP_WORK_SCALE;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuh` around lines 71 - 72, Replace the magic divisor 1e8 used when computing work (double work = static_cast<double>(n_vars_from_other) / 1e8) with a named constant (e.g., kWorkUnitScale or WORK_UNIT_SCALE) and add a short comment explaining the rationale or units for the scale; update any surrounding TODO to remove ambiguity and use the new constant where work or similar scaling is needed referencing the symbol n_vars_from_other and the variable work.
203-203: Usingstd::isnanto detect valid solutions may be fragile.Relying on
!std::isnan(branch_and_bound_solution.objective)to determine success assumes the objective is initialized to NaN when no solution is found. Consider checkingbranch_and_bound_statusdirectly for a more explicit success condition.-return std::make_tuple(offspring, !std::isnan(branch_and_bound_solution.objective), work); +bool found_solution = (branch_and_bound_status != dual_simplex::mip_status_t::UNSET && + branch_and_bound_status != dual_simplex::mip_status_t::INFEASIBLE); +return std::make_tuple(offspring, found_solution, work);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuh` at line 203, The return currently uses !std::isnan(branch_and_bound_solution.objective) to indicate success; instead use the explicit branch_and_bound_status value to determine success. Update the return in the function that returns std::make_tuple(offspring, ..., work) to replace the second element with a boolean derived from branch_and_bound_status (e.g., branch_and_bound_status == <success-state> or simply branch_and_bound_status) so it no longer relies on NaN-initialization of branch_and_bound_solution.objective; reference the variables branch_and_bound_solution, branch_and_bound_status, offspring, and work when making this change.cpp/src/mip_heuristics/solver_context.cuh (1)
72-74: Nullterminationpointer could cause issues if not set before use.The
terminationpointer is initialized tonullptrand expected to be set bymip_solver_tafter construction. Consider adding a getter that asserts non-null, or document more prominently that callers must set this before using any timer functionality that depends on it.+ cuopt::termination_checker_t& get_termination() const + { + cuopt_assert(termination != nullptr, "termination must be set before use"); + return *termination; + } + // Root termination checker — set by mip_solver_t after construction. // All sub-timers should use this as parent for wall-clock safety. cuopt::termination_checker_t* termination{nullptr};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/solver_context.cuh` around lines 72 - 74, The termination pointer (termination in solver_context.cuh) may remain nullptr until mip_solver_t sets it, so add a safe accessor on the solver context (e.g., get_termination() or termination_checked()) that asserts or throws if termination is null and returns the non-null pointer; update any callers that read termination (timers or sub-timers) to use this accessor instead of accessing the raw termination member directly, or alternatively change the member to a non-null reference/init-on-construction pattern so the compiler enforces it. Ensure the accessor name (get_termination/termination_checked) and mip_solver_t usage are the unique symbols referenced so the fix is easy to locate.cpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cu (1)
32-39: Disabled determinism logging macro.The
#if 0block disables verbose determinism logging. Consider using a runtime flag or build option instead of commented-out code, or remove if this is only for temporary debugging.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cu` around lines 32 - 39, The disabled determinism logging block leaving CUOPT_DETERMINISM_LOG wrapped in `#if` 0 should be replaced with a controllable mechanism: either remove the block if it's temporary, or guard the macro with a meaningful compile-time flag (e.g., replace `#if` 0 with `#ifdef` ENABLE_CUOPT_DETERMINISM_LOG) and document/emit that flag from your build system, or implement a runtime check that forwards to CUOPT_LOG_INFO only when a global/flag (e.g., bool enableDeterminismLog) is true; update the macro definition for CUOPT_DETERMINISM_LOG and references to CUOPT_LOG_INFO accordingly so enabling determinism logging is controlled by a build or runtime option rather than commented-out code.cpp/src/mip_heuristics/diversity/population.cuh (1)
127-127: Makepopulation.cuhself-contained forwork_limit_timer_t.
adjust_threshold(...)and thetimermember now exposecuopt::work_limit_timer_t, but this header still only includesutilities/timer.hpp. Please add the directwork_limit_timer.hppinclude here so this header does not depend on transitive include order.💡 Minimal fix
`#include` <utilities/timer.hpp> +#include <utilities/work_limit_timer.hpp>Keep
utilities/timer.hppas well ifexternal_solution_t::timerstill needstimer_t.Also applies to: 216-216
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/population.cuh` at line 127, The header exposes cuopt::work_limit_timer_t (in adjust_threshold and the timer member) but only includes utilities/timer.hpp; add a direct include of the work_limit_timer header (work_limit_timer.hpp) to make population.cuh self-contained and not rely on transitive includes, while keeping utilities/timer.hpp if external_solution_t::timer still requires timer_t.cpp/src/branch_and_bound/branch_and_bound.cpp (1)
1130-1148: Consider extracting common feasibility validation logic.The
check_guess()+ rejection logging pattern is duplicated acrossnondeterministic_policy_t::handle_integer_solution,deterministic_bfs_policy_t::handle_integer_solution, anddeterministic_diving_policy_t::handle_integer_solution. A shared helper would reduce maintenance burden.♻️ Example helper signature
// In branch_and_bound_t or as a free function template <typename i_t, typename f_t> bool validate_integer_solution( const lp_problem_t<i_t, f_t>& lp, const simplex_solver_settings_t<i_t, f_t>& settings, const std::vector<variable_type_t>& var_types, const std::vector<f_t>& x, int worker_id, int node_id, int depth, f_t obj, logger_t& log);Then each policy's
handle_integer_solutionwould call this helper and return early if it returns false.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 1130 - 1148, Extract the duplicated check_guess + rejection logging into a helper (e.g., validate_integer_solution) and have each policy's handle_integer_solution call it and return early on false; specifically move the call to check_guess(...) plus the bnb.settings_.log.printf(...) path into the helper (accepting bnb.original_lp_, bnb.settings_, bnb.var_types_, x, node->node_id, node->depth, obj, and logger) and return a bool so existing callers (nondeterministic_policy_t::handle_integer_solution, deterministic_bfs_policy_t::handle_integer_solution, deterministic_diving_policy_t::handle_integer_solution) simply call validate_integer_solution(...) and only call bnb.add_feasible_solution(...) when it returns true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ci/release/update-version.sh`:
- Around line 155-157: Wrap the call to ci/utils/update_doc_versions.py in a
RUN_CONTEXT check and add error handling: only invoke the Python script when
RUN_CONTEXT equals "release" (or update if policy says run in both contexts) and
ensure the command's failure is caught and causes the script to exit with a
non‑zero status; reference the RUN_CONTEXT variable and the
ci/utils/update_doc_versions.py invocation so you update that section to
conditionally execute and append a failure handler that echoes an error and
exits.
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 1422-1425: The two local variables previous_local_upper and
previous_seq are assigned but never used; remove their declarations and
assignments to clean up dead code around the call that updates
this->worker.local_upper_bound and calls
this->worker.queue_integer_solution(obj, x, node->depth). Ensure only the
necessary state updates (this->worker.local_upper_bound = obj and the
queue_integer_solution call) remain and do not alter any other logic in the
surrounding function.
- Around line 640-642: The local variable heuristic_queue_size is computed but
never used; remove the unused declaration to avoid dead code by deleting the
line that assigns const size_t heuristic_queue_size =
heuristic_solution_queue_.size() (located near the call to
heuristic_solution_queue_.push_back and before mutex_heuristic_queue_.unlock()),
or if the queue size is intended to be used later, replace the removal by using
heuristic_queue_size where needed—otherwise simply delete that variable
declaration.
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1690-1701: The pseudo-cost update divides by frac (computed from
root_soln[j]) which may be extremely small near integer boundaries; modify the
loops around strong_branch_down[k] and strong_branch_up[k] to guard against
near-zero frac by computing a safe divisor: set an epsilon (e.g. using
std::numeric_limits<f_t>::epsilon() scaled or a small constant like 1e-8) and
either skip the update when frac < eps or replace frac with std::max(frac, eps)
before dividing; apply this change to the fractions computed for down (frac =
root_soln[j] - std::floor(...)) and up (frac = std::ceil(...) - root_soln[j]) so
updates to pseudo_cost_sum_down[j], pseudo_cost_sum_up[j] and increments of
pseudo_cost_num_down[j]/pseudo_cost_num_up[j] only occur with a numerically safe
divisor.
In `@cpp/src/dual_simplex/simplex_solver_settings.hpp`:
- Around line 206-207: The default for
simplex_solver_settings_t::gpu_heur_wait_for_exploration is inconsistent with
the public MIP default; change the member declaration (currently next to
bb_work_unit_scale in simplex_solver_settings_t) to default to false so internal
constructions get the same scheduling behavior as mip_solver_settings_t (ensure
the bool gpu_heur_wait_for_exploration{...} initialization is set to false).
In `@cpp/src/mip_heuristics/diversity/population.cu`:
- Around line 226-231: The code incorrectly updates new_best_feasible_objective
when a non-CPU_FEASIBILITY_JUMP entry is worse (h_entry.objective >
new_best_feasible_objective); change the condition in the else-if that
references h_entry and internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP so
it only updates new_best_feasible_objective when the incoming objective is
better (use h_entry.objective < new_best_feasible_objective) to tighten the
drain threshold and keep later CPU_FEASIBILITY_JUMP entries correctly skipped.
In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu`:
- Around line 1037-1041: deterministic_work_estimate and use_graph currently
allow launching a full graph batch even when the remaining deterministic budget
is less than a full batch; change the logic so when deterministic_work_estimate
is true you compute the remaining deterministic budget and cap
iterations_per_batch (and the decision to use the graph) to not exceed
deterministic_batch_work — i.e., replace the unconditional use_graph=true and
iterations_per_batch = iterations_per_graph with logic that if
deterministic_work_estimate && deterministic_batch_work < iterations_per_graph
then set iterations_per_batch = deterministic_batch_work (or fall back to
non-graph single-iteration mode) before calling run_step_device(); apply the
same fix at the other similar sites referenced (the blocks around the other
occurrences) so no full batch is launched when budget would be overshot.
- Around line 995-1017: The new kernel launch
init_lhs_and_violated_constraints<i_t, f_t><<<4096, 256, 0, stream>>>(v) lacks
immediate CUDA error checking; add a CUDA_CHECK (or the project's equivalent)
right after that launch to call cudaGetLastError()/cudaStreamSynchronize as
appropriate using the same stream so failures are reported at the kernel launch
site before the subsequent thrust::transform_reduce calls (refer to
init_lhs_and_violated_constraints and the local variable stream for placement).
In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`:
- Around line 1333-1337: The direct use of
context.settings.cpufj_work_unit_scale can be zero, negative, or NaN and must be
validated before applying to cpu_work_unit_scale and fj_cpu->work_unit_bias;
update the selection logic around cpu_work_unit_scale (which now chooses between
context.settings.cpufj_work_unit_scale and read_positive_work_unit_scale) to
explicitly check that context.settings.cpufj_work_unit_scale is finite and > 0
(and not NaN) and only use it if valid, otherwise call
read_positive_work_unit_scale(...) or clamp to a safe minimum, ensuring
fj_cpu->work_unit_bias is only multiplied by a strictly positive finite scale.
In `@cpp/src/mip_heuristics/local_search/local_search.cu`:
- Around line 233-249: The helpers are being given independent deterministic
budgets (cpu_fj.fj_cpu->work_budget = time_limit) which lets multiple helpers
consume more than the single deterministic budget; instead, register each
created CPU climber with the shared preempt/work context (use the same
registration mechanism as start_cpufj_deterministic) so they account against one
deterministic budget—replace the per-worker work_budget assignment for entries
in ls_cpu_fj with a call that registers the created cpu_fj.fj_cpu with
context.preempt_heuristic_solver_ (or invoke the same start_cpufj_deterministic
helper) so work_units_elapsed and work_budget are tracked by the shared
producer/consumer context rather than set independently.
In `@cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu`:
- Around line 945-947: The II-repair call uses the broader
`timer`/`*context.termination` instead of the local per-call budget created by
`apply_round()` (`max_timer`), which lets repair outlive
`max_time_for_bounds_prop`; update `find_integer()` (and the other occurrences
around `bounds_repair.repair_problem`) to construct the `repair_timer` from the
active parent/local `max_timer` (or derive it from the `timer` object passed
down by `apply_round()`), and pass that localized timer into
`bounds_repair.repair_problem(*sol.problem_ptr, *orig_sol.problem_ptr,
repair_timer, sol.handle_ptr)` so repair work is bounded by the per-round
`max_timer` hierarchy rather than `*context.termination`.
In `@cpp/tests/mip/diversity_test.cu`:
- Around line 100-103: The helpers create an ad-hoc work_limit_context_t and
attach a work_limit_timer_t to diversity_manager.timer instead of using the
shared deterministic context; change them to use solver.context.gpu_heur_loop as
the deterministic work context (set solver.context.gpu_heur_loop.deterministic =
true where needed) and construct the work_limit_timer_t and any nested timers
with solver.context.gpu_heur_loop and the existing timer variable so the
nested/local-search/FJ timers read the same deterministic context as in
test_full_run_determinism(). Ensure you update the corresponding locations where
diversity_manager, work_limit_timer_t, and timer are set (also apply the same
change to the other two helper blocks mentioned).
In `@cpp/tests/mip/feasibility_jump_tests.cu`:
- Around line 193-198: The current check using first_val_map and EXPECT_NEAR
only asserts similar numeric objective, not exact determinism; change the test
to record and compare a full deterministic serialization of the solution
(include solution.is_feasible() flag, solution.get_user_objective(), and the
produced assignment/variable values) for each test_instance. Replace
first_val_map with a map from test_instance to the serialized result (e.g.,
string/byte vector) and on subsequent runs serialize the current solution the
same way and ASSERT_EQ the stored bytes with the current bytes. Use the existing
symbols solution.get_user_objective(), the feasible flag accessor (e.g.,
solution.is_feasible()), and however assignments are exposed to build a
byte-for-byte representation to compare exact equality rather than EXPECT_NEAR.
In `@cpp/tests/mip/presolve_test.cu`:
- Around line 138-144: The determinism serialization currently only reads
a.second[0].var_to_cached_bound_map into sorted_map, so nondeterminism from the
[1] probe is ignored; update the logic that builds
var_to_cached_bound_keys/var_to_cached_bound_lb/var_to_cached_bound_ub to
include both a.second[0].var_to_cached_bound_map and
a.second[1].var_to_cached_bound_map (e.g., iterate index 0 and 1, or merge both
maps before creating sorted_map) so that both cached-bound maps are serialized
into the hash; ensure you use the same sorting/iteration approach and push_back
the corresponding var_id, cached_bound.lb and cached_bound.ub for entries from
both maps.
- Around line 110-117: The root termination_checker_t is initialized with 0.0
which makes it immediately expired and causes work_limit_timer_t (probing_timer)
to abort before compute_probing_cache runs; update the root timer construction
in this test (the termination_checker_t instance used to create probing_timer)
to use a very large limit (e.g., std::numeric_limits<double>::max()) so the root
check does not short-circuit compute_probing_cache and the probing_cache
computation actually executes.
---
Outside diff comments:
In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`:
- Around line 1484-1494: The loop currently only checks the finite budget every
100 iterations (the if (fj_cpu.iterations % 100 == 0 ...) block), allowing up to
one extra 100-iteration batch to overrun a small work_budget; fix this by adding
an authoritative budget check at the start of the loop using
fj_cpu.work_units_elapsed.load(std::memory_order_relaxed) >= fj_cpu.work_budget
and break early if true (so the budget is enforced on every iteration), and keep
the existing 100-iteration sampling (memory_aggregator.collect(), biased_work
update, producer_sync->notify_progress()) as-is.
In `@cpp/src/mip_heuristics/local_search/local_search.cu`:
- Around line 264-317: The function currently returns cpu_feasible which ignores
the merged solution state after possibly copying the CPU solution into solution;
change the return to reflect the final merged feasibility by returning
solution.get_feasible() (or assign a merged_feasible = solution.get_feasible()
after the final solution.compute_feasibility() and return that). Update the
final return statement (replace "return cpu_feasible;") in the local_search.cu
block handling CPU/GPU merge so callers receive the merged solution's true
feasibility.
---
Nitpick comments:
In `@cpp/CMakeLists.txt`:
- Around line 546-556: The commented PAPI detection block should be either
removed or converted to an option-gated implementation: add a CMake option like
ENABLE_PAPI (default OFF) and wrap the existing find_path/find_library logic in
if(ENABLE_PAPI) ... endif(), using the same variables PAPI_INCLUDE_DIR and
PAPI_LIBRARY and applying target_include_directories(cuopt PRIVATE
${PAPI_INCLUDE_DIR}) and target_link_libraries(cuopt PRIVATE ${PAPI_LIBRARY})
when found, and emit a clear message(FATAL_ERROR "...") only when ENABLE_PAPI is
ON and the libraries are missing; alternatively delete the commented block
entirely if PAPI support is not needed.
- Around line 666-667: The CMakeLists currently adds an unconditional -pie flag;
update it to only add PIE when supported and appropriate by calling CMake's
CheckPIESupported (or using CMake's PIE support macros), guard with a platform
check such as CMAKE_SYSTEM_NAME STREQUAL "Linux" (or the CheckPIESupported
result), and add the flag via the target_link_options/linker form (e.g.,
LINKER:-pie) instead of raw compile flags; locate the section that mentions
"-pie" and replace the unconditional addition with a conditional that queries
CheckPIESupported and CMAKE_SYSTEM_NAME and then appends LINKER:-pie to the
target_link_options for the relevant targets.
In `@cpp/include/cuopt/linear_programming/cuopt_c.h`:
- Around line 786-797: Update the public header documentation for
cuOptSetMIPGetSolutionCallbackExt to explicitly state its contract: clarify
whether cuOptSetMIPGetSolutionCallbackExt replaces or coexists with
cuOptSetMIPGetSolutionCallback (e.g., "replaces existing callback" or "coexists;
both may be invoked"), name which thread/context invokes the callback (e.g.,
solver main thread, solver worker thread, or a dedicated callback thread), and
enumerate thread-safety and synchronization requirements (reentrancy, whether
the callback may be called concurrently, and whether callers must protect shared
state). Also document GPU/CPU context and numerical behavior expectations (e.g.,
pointers/arrays are host memory, any lifetime guarantees for solution buffers,
floating-point formats/scaling), and add a brief note that API changes need
corresponding docs updates per the repository guidelines; reference
cuOptSetMIPGetSolutionCallbackExt and cuOptSetMIPGetSolutionCallback in the
wording so readers can find both symbols.
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 1130-1148: Extract the duplicated check_guess + rejection logging
into a helper (e.g., validate_integer_solution) and have each policy's
handle_integer_solution call it and return early on false; specifically move the
call to check_guess(...) plus the bnb.settings_.log.printf(...) path into the
helper (accepting bnb.original_lp_, bnb.settings_, bnb.var_types_, x,
node->node_id, node->depth, obj, and logger) and return a bool so existing
callers (nondeterministic_policy_t::handle_integer_solution,
deterministic_bfs_policy_t::handle_integer_solution,
deterministic_diving_policy_t::handle_integer_solution) simply call
validate_integer_solution(...) and only call bnb.add_feasible_solution(...) when
it returns true.
In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Around line 362-367: Add a short doc comment above record_update explaining it
only appends an update to updates_ without modifying the snapshot's aggregation
arrays (the sum_*/num_* members), in contrast to queue_update which both records
and applies the change to those arrays; mention the parameters (variable,
direction, delta, clock, worker_id) to make the intent explicit so callers know
to use queue_update when they need the snapshot's sums/counts updated.
In `@cpp/src/mip_heuristics/diversity/population.cuh`:
- Line 127: The header exposes cuopt::work_limit_timer_t (in adjust_threshold
and the timer member) but only includes utilities/timer.hpp; add a direct
include of the work_limit_timer header (work_limit_timer.hpp) to make
population.cuh self-contained and not rely on transitive includes, while keeping
utilities/timer.hpp if external_solution_t::timer still requires timer_t.
In `@cpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuh`:
- Around line 71-72: Replace the magic divisor 1e8 used when computing work
(double work = static_cast<double>(n_vars_from_other) / 1e8) with a named
constant (e.g., kWorkUnitScale or WORK_UNIT_SCALE) and add a short comment
explaining the rationale or units for the scale; update any surrounding TODO to
remove ambiguity and use the new constant where work or similar scaling is
needed referencing the symbol n_vars_from_other and the variable work.
- Line 203: The return currently uses
!std::isnan(branch_and_bound_solution.objective) to indicate success; instead
use the explicit branch_and_bound_status value to determine success. Update the
return in the function that returns std::make_tuple(offspring, ..., work) to
replace the second element with a boolean derived from branch_and_bound_status
(e.g., branch_and_bound_status == <success-state> or simply
branch_and_bound_status) so it no longer relies on NaN-initialization of
branch_and_bound_solution.objective; reference the variables
branch_and_bound_solution, branch_and_bound_status, offspring, and work when
making this change.
In `@cpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cu`:
- Around line 32-39: The disabled determinism logging block leaving
CUOPT_DETERMINISM_LOG wrapped in `#if` 0 should be replaced with a controllable
mechanism: either remove the block if it's temporary, or guard the macro with a
meaningful compile-time flag (e.g., replace `#if` 0 with `#ifdef`
ENABLE_CUOPT_DETERMINISM_LOG) and document/emit that flag from your build
system, or implement a runtime check that forwards to CUOPT_LOG_INFO only when a
global/flag (e.g., bool enableDeterminismLog) is true; update the macro
definition for CUOPT_DETERMINISM_LOG and references to CUOPT_LOG_INFO
accordingly so enabling determinism logging is controlled by a build or runtime
option rather than commented-out code.
In `@cpp/src/mip_heuristics/presolve/bounds_presolve.cu`:
- Around line 170-177: The code mutates the stored solver settings by clamping
settings.iteration_limit in-place; instead, compute a local loop bound and leave
settings untouched. In bound_presolve_t<i_t, f_t>::bound_update_loop, check
context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS and set a
local variable (e.g. max_iterations or loop_limit) to
std::min(settings.iteration_limit, 50) and use that local variable for the
loop/termination checks rather than assigning back to settings.iteration_limit
so future calls see the original user-configured value.
In `@cpp/src/mip_heuristics/problem/presolve_data.cu`:
- Around line 225-228: Change the parameter type of
presolve_data_t<i_t,f_t>::papilo_uncrush_assignment from a non-const reference
to a const reference (i.e., const problem_t<i_t,f_t>& problem) so the method is
const-correct; update the function signature accordingly and ensure all uses
inside (e.g., accessing problem.handle_ptr) remain reads-only so no other
changes are required, then remove any call-site const_casts that were
compensating for the non-const parameter.
In `@cpp/src/mip_heuristics/solver_context.cuh`:
- Around line 72-74: The termination pointer (termination in solver_context.cuh)
may remain nullptr until mip_solver_t sets it, so add a safe accessor on the
solver context (e.g., get_termination() or termination_checked()) that asserts
or throws if termination is null and returns the non-null pointer; update any
callers that read termination (timers or sub-timers) to use this accessor
instead of accessing the raw termination member directly, or alternatively
change the member to a non-null reference/init-on-construction pattern so the
compiler enforces it. Ensure the accessor name
(get_termination/termination_checked) and mip_solver_t usage are the unique
symbols referenced so the fix is easy to locate.
In `@cpp/src/mip_heuristics/solver_solution.cu`:
- Around line 249-255: The current XOR of per-solution hashes (pool_hash ^=
detail::compute_hash(...)) collapses order and duplicates; instead compute a
sequence-aware combined hash by iterating solution_pool_, calling
cuopt::host_copy for each pool_sol and computing per_hash =
detail::compute_hash(host_pool_sol), then fold per_hash into pool_hash using an
order-sensitive combiner (e.g., pool_hash = pool_hash * PRIME + per_hash or use
a std::hash_combine-style mix) so the final pool_hash reflects the sequence and
content; apply the same change to the analogous block around the region noted
(lines 261-274) so both places use the ordered combination instead of XOR.
In `@cpp/tests/mip/local_search_test.cu`:
- Line 68: The function setup_device_symbols(rmm::cuda_stream_view stream_view)
is an empty stub; either implement the intended device-symbol initialization
logic or remove the function and its declarations/usages; if you intend to keep
it as a placeholder, add a clear TODO comment explaining what must be
implemented and why the parameter is currently unused (or mark it
[[maybe_unused]]), and update any callers or headers referencing
setup_device_symbols accordingly to avoid dead code or unused-parameter
warnings.
- Around line 196-197: The two spin_stream_raii_t instances (spin_stream_1 and
spin_stream_2) are created without explanation; add a concise comment above
their declarations clarifying they spawn additional CUDA streams that spin to
stress test stream scheduling and ensure the determinism test exercises
concurrent stream behavior (i.e., stress-testing CUDA stream scheduling for
determinism), so future readers understand why these dummy/spinning streams are
present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bddac155-630c-4b07-9087-ad7acabbd05b
📒 Files selected for processing (139)
.claude-plugin/marketplace.json.cursor-plugin/plugin.json.github/CODEOWNERS.github/workflows/build.yaml.github/workflows/build_test_publish_images.yaml.github/workflows/pr.yaml.github/workflows/test.yaml.github/workflows/trigger-breaking-change-alert.yamlCONTRIBUTING.mdRAPIDS_BRANCHREADME.mdVERSIONci/release/update-version.shconda/environments/all_cuda-129_arch-aarch64.yamlconda/environments/all_cuda-129_arch-x86_64.yamlconda/environments/all_cuda-131_arch-aarch64.yamlconda/environments/all_cuda-131_arch-x86_64.yamlcpp/CMakeLists.txtcpp/include/cuopt/linear_programming/constants.hcpp/include/cuopt/linear_programming/cuopt_c.hcpp/include/cuopt/linear_programming/mip/solver_settings.hppcpp/include/cuopt/linear_programming/utilities/internals.hppcpp/src/barrier/barrier.cucpp/src/barrier/iterative_refinement.hppcpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/branch_and_bound.hppcpp/src/branch_and_bound/pseudo_costs.cppcpp/src/branch_and_bound/pseudo_costs.hppcpp/src/dual_simplex/phase2.cppcpp/src/dual_simplex/simplex_solver_settings.hppcpp/src/math_optimization/solver_settings.cucpp/src/mip_heuristics/diversity/diversity_config.hppcpp/src/mip_heuristics/diversity/diversity_manager.cucpp/src/mip_heuristics/diversity/diversity_manager.cuhcpp/src/mip_heuristics/diversity/lns/rins.cucpp/src/mip_heuristics/diversity/population.cucpp/src/mip_heuristics/diversity/population.cuhcpp/src/mip_heuristics/diversity/recombiners/recombiner.cuhcpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuhcpp/src/mip_heuristics/early_heuristic.cuhcpp/src/mip_heuristics/feasibility_jump/early_cpufj.cuhcpp/src/mip_heuristics/feasibility_jump/early_gpufj.cucpp/src/mip_heuristics/feasibility_jump/early_gpufj.cuhcpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cucpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuhcpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cucpp/src/mip_heuristics/feasibility_jump/fj_cpu.cucpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuhcpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cucpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cuhcpp/src/mip_heuristics/local_search/local_search.cucpp/src/mip_heuristics/local_search/local_search.cuhcpp/src/mip_heuristics/local_search/rounding/bounds_repair.cucpp/src/mip_heuristics/local_search/rounding/bounds_repair.cuhcpp/src/mip_heuristics/local_search/rounding/constraint_prop.cucpp/src/mip_heuristics/local_search/rounding/lb_bounds_repair.cucpp/src/mip_heuristics/presolve/bounds_presolve.cucpp/src/mip_heuristics/presolve/conditional_bound_strengthening.cucpp/src/mip_heuristics/presolve/lb_probing_cache.cucpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cucpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve_helpers.cuhcpp/src/mip_heuristics/presolve/multi_probe.cucpp/src/mip_heuristics/presolve/probing_cache.cucpp/src/mip_heuristics/presolve/third_party_presolve.cppcpp/src/mip_heuristics/presolve/third_party_presolve.hppcpp/src/mip_heuristics/presolve/trivial_presolve.cuhcpp/src/mip_heuristics/problem/presolve_data.cucpp/src/mip_heuristics/problem/presolve_data.cuhcpp/src/mip_heuristics/problem/problem.cucpp/src/mip_heuristics/problem/problem.cuhcpp/src/mip_heuristics/problem/problem_helpers.cuhcpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cucpp/src/mip_heuristics/solution/solution.cucpp/src/mip_heuristics/solution/solution.cuhcpp/src/mip_heuristics/solution_callbacks.cuhcpp/src/mip_heuristics/solve.cucpp/src/mip_heuristics/solver.cucpp/src/mip_heuristics/solver.cuhcpp/src/mip_heuristics/solver_context.cuhcpp/src/mip_heuristics/solver_solution.cucpp/src/pdlp/initial_scaling_strategy/initial_scaling.cucpp/src/pdlp/pdhg.cucpp/src/pdlp/pdlp.cucpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cucpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cucpp/src/pdlp/termination_strategy/convergence_information.cucpp/src/pdlp/termination_strategy/infeasibility_information.cucpp/src/pdlp/utils.cuhcpp/src/routing/local_search/compute_compatible.cucpp/src/routing/route/break_route.cuhcpp/src/routing/route/capacity_route.cuhcpp/src/routing/route/dimensions_route.cuhcpp/src/routing/route/distance_route.cuhcpp/src/routing/route/mismatch_route.cuhcpp/src/routing/route/pdp_route.cuhcpp/src/routing/route/prize_route.cuhcpp/src/routing/route/route.cuhcpp/src/routing/route/service_time_route.cuhcpp/src/routing/route/tasks_route.cuhcpp/src/routing/route/time_route.cuhcpp/src/routing/route/tsp_route.cuhcpp/src/routing/route/vehicle_fixed_cost_route.cuhcpp/src/routing/solution/route_node_map.cuhcpp/src/routing/utilities/check_input.cucpp/src/utilities/copy_helpers.hppcpp/tests/mip/CMakeLists.txtcpp/tests/mip/determinism_test.cucpp/tests/mip/diversity_test.cucpp/tests/mip/feasibility_jump_tests.cucpp/tests/mip/load_balancing_test.cucpp/tests/mip/local_search_test.cucpp/tests/mip/mip_utils.cuhcpp/tests/mip/multi_probe_test.cucpp/tests/mip/presolve_test.cudatasets/mip/download_miplib_test_dataset.shdependencies.yamldocs/cuopt/source/versions1.jsongemini-extension.jsonhelmchart/cuopt-server/Chart.yamlhelmchart/cuopt-server/values.yamlmerge_review_findings_release_26_04.mdpython/cuopt/pyproject.tomlpython/cuopt_self_hosted/pyproject.tomlpython/cuopt_server/pyproject.tomlpython/libcuopt/pyproject.tomlskills/cuopt-developer/SKILL.mdskills/cuopt-installation-api-c/SKILL.mdskills/cuopt-installation-api-python/SKILL.mdskills/cuopt-installation-common/SKILL.mdskills/cuopt-installation-developer/SKILL.mdskills/cuopt-lp-milp-api-c/SKILL.mdskills/cuopt-lp-milp-api-cli/SKILL.mdskills/cuopt-lp-milp-api-python/SKILL.mdskills/cuopt-qp-api-c/SKILL.mdskills/cuopt-qp-api-cli/SKILL.mdskills/cuopt-qp-api-python/SKILL.mdskills/cuopt-routing-api-python/SKILL.mdskills/cuopt-server-api-python/SKILL.mdskills/cuopt-server-common/SKILL.md
✅ Files skipped from review due to trivial changes (66)
- RAPIDS_BRANCH
- cpp/src/routing/route/break_route.cuh
- cpp/src/pdlp/utils.cuh
- .github/workflows/build_test_publish_images.yaml
- README.md
- .cursor-plugin/plugin.json
- .claude-plugin/marketplace.json
- conda/environments/all_cuda-129_arch-aarch64.yaml
- cpp/src/routing/route/capacity_route.cuh
- cpp/src/mip_heuristics/problem/problem_helpers.cuh
- cpp/src/routing/route/tsp_route.cuh
- gemini-extension.json
- cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cu
- cpp/src/routing/route/service_time_route.cuh
- cpp/src/routing/route/pdp_route.cuh
- cpp/src/pdlp/pdhg.cu
- cpp/src/routing/route/tasks_route.cuh
- skills/cuopt-lp-milp-api-cli/SKILL.md
- cpp/src/routing/utilities/check_input.cu
- cpp/src/routing/local_search/compute_compatible.cu
- .github/CODEOWNERS
- skills/cuopt-installation-common/SKILL.md
- skills/cuopt-installation-developer/SKILL.md
- datasets/mip/download_miplib_test_dataset.sh
- skills/cuopt-installation-api-c/SKILL.md
- cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve_helpers.cuh
- cpp/src/pdlp/termination_strategy/infeasibility_information.cu
- cpp/src/routing/route/vehicle_fixed_cost_route.cuh
- conda/environments/all_cuda-131_arch-x86_64.yaml
- skills/cuopt-qp-api-cli/SKILL.md
- cpp/src/pdlp/termination_strategy/convergence_information.cu
- helmchart/cuopt-server/Chart.yaml
- VERSION
- skills/cuopt-installation-api-python/SKILL.md
- skills/cuopt-lp-milp-api-c/SKILL.md
- skills/cuopt-qp-api-c/SKILL.md
- .github/workflows/trigger-breaking-change-alert.yaml
- cpp/src/routing/route/time_route.cuh
- cpp/src/routing/route/route.cuh
- python/cuopt_self_hosted/pyproject.toml
- skills/cuopt-qp-api-python/SKILL.md
- skills/cuopt-server-common/SKILL.md
- cpp/src/mip_heuristics/presolve/conditional_bound_strengthening.cu
- cpp/src/routing/route/mismatch_route.cuh
- skills/cuopt-lp-milp-api-python/SKILL.md
- helmchart/cuopt-server/values.yaml
- cpp/src/pdlp/pdlp.cu
- cpp/src/barrier/barrier.cu
- conda/environments/all_cuda-129_arch-x86_64.yaml
- cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
- skills/cuopt-developer/SKILL.md
- cpp/src/mip_heuristics/diversity/diversity_config.hpp
- python/cuopt_server/pyproject.toml
- cpp/src/mip_heuristics/presolve/trivial_presolve.cuh
- cpp/src/barrier/iterative_refinement.hpp
- skills/cuopt-routing-api-python/SKILL.md
- cpp/src/routing/solution/route_node_map.cuh
- skills/cuopt-server-api-python/SKILL.md
- python/cuopt/pyproject.toml
- cpp/src/routing/route/prize_route.cuh
- cpp/src/routing/route/dimensions_route.cuh
- docs/cuopt/source/versions1.json
- cpp/src/routing/route/distance_route.cuh
- python/libcuopt/pyproject.toml
- cpp/src/mip_heuristics/solution_callbacks.cuh
- cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu
🚧 Files skipped from review as they are similar to previous changes (24)
- cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuh
- cpp/src/mip_heuristics/diversity/lns/rins.cu
- cpp/src/math_optimization/solver_settings.cu
- cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cu
- cpp/src/mip_heuristics/presolve/third_party_presolve.hpp
- cpp/tests/mip/load_balancing_test.cu
- cpp/src/mip_heuristics/presolve/lb_probing_cache.cu
- cpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cuh
- cpp/src/utilities/copy_helpers.hpp
- cpp/tests/mip/CMakeLists.txt
- cpp/src/mip_heuristics/solver.cuh
- cpp/src/mip_heuristics/problem/presolve_data.cuh
- cpp/src/mip_heuristics/solution/solution.cu
- cpp/src/mip_heuristics/problem/problem.cuh
- cpp/src/mip_heuristics/presolve/multi_probe.cu
- cpp/tests/mip/mip_utils.cuh
- cpp/src/mip_heuristics/diversity/diversity_manager.cuh
- cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cu
- cpp/src/branch_and_bound/branch_and_bound.hpp
- cpp/src/mip_heuristics/solver.cu
- cpp/src/mip_heuristics/local_search/local_search.cuh
- cpp/src/mip_heuristics/local_search/rounding/lb_bounds_repair.cu
- cpp/tests/mip/determinism_test.cu
- cpp/src/mip_heuristics/feasibility_jump/feasibility_jump_kernels.cu
|
|
||
| # Update docs version switcher to include the new version | ||
| python ci/utils/update_doc_versions.py |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the Python script exists and inspect its contents
# Check if the file exists
if [[ -f "ci/utils/update_doc_versions.py" ]]; then
echo "✓ File exists: ci/utils/update_doc_versions.py"
echo ""
echo "=== File contents ==="
cat ci/utils/update_doc_versions.py
else
echo "✗ File NOT found: ci/utils/update_doc_versions.py"
echo ""
echo "=== Searching for similar files ==="
fd -t f "update.*version" ci/
fiRepository: NVIDIA/cuopt
Length of output: 2433
🏁 Script executed:
sed -n '140,160p' ci/release/update-version.shRepository: NVIDIA/cuopt
Length of output: 935
🏁 Script executed:
cat -n ci/release/update-version.sh | head -80 && echo "..." && cat -n ci/release/update-version.sh | tail -20Repository: NVIDIA/cuopt
Length of output: 4490
Conditionally execute the version switcher update and add error handling.
The Python script should be conditional on RUN_CONTEXT, similar to the documentation reference updates (lines 147-154). Running in main context adds development versions to the public version switcher, which is likely unintended. Additionally, add error handling to catch failures in the Python script:
# Update docs version switcher to include the new version (release context only)
if [[ "${RUN_CONTEXT}" == "release" ]]; then
python ci/utils/update_doc_versions.py || { echo "Error: Failed to update version switcher"; exit 1; }
fiOr, if this should run in both contexts, at minimum add error handling to prevent silent failures.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ci/release/update-version.sh` around lines 155 - 157, Wrap the call to
ci/utils/update_doc_versions.py in a RUN_CONTEXT check and add error handling:
only invoke the Python script when RUN_CONTEXT equals "release" (or update if
policy says run in both contexts) and ensure the command's failure is caught and
causes the script to exit with a non‑zero status; reference the RUN_CONTEXT
variable and the ci/utils/update_doc_versions.py invocation so you update that
section to conditionally execute and append a failure handler that echoes an
error and exits.
| heuristic_solution_queue_.push_back({solution, user_objective, work_unit_ts, origin}); | ||
| const size_t heuristic_queue_size = heuristic_solution_queue_.size(); | ||
| mutex_heuristic_queue_.unlock(); |
There was a problem hiding this comment.
Unused variable heuristic_queue_size.
This variable is computed but never used.
🧹 Suggested fix
mutex_heuristic_queue_.lock();
heuristic_solution_queue_.push_back({solution, user_objective, work_unit_ts, origin});
- const size_t heuristic_queue_size = heuristic_solution_queue_.size();
mutex_heuristic_queue_.unlock();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 640 - 642, The
local variable heuristic_queue_size is computed but never used; remove the
unused declaration to avoid dead code by deleting the line that assigns const
size_t heuristic_queue_size = heuristic_solution_queue_.size() (located near the
call to heuristic_solution_queue_.push_back and before
mutex_heuristic_queue_.unlock()), or if the queue size is intended to be used
later, replace the removal by using heuristic_queue_size where needed—otherwise
simply delete that variable declaration.
| const f_t previous_local_upper = this->worker.local_upper_bound; | ||
| const int previous_seq = this->worker.next_solution_seq; | ||
| this->worker.local_upper_bound = obj; | ||
| this->worker.queue_integer_solution(obj, x, node->depth); |
There was a problem hiding this comment.
Unused variables previous_local_upper and previous_seq.
These variables are assigned but never referenced. They appear to be leftover debug code.
🧹 Suggested fix
}
- const f_t previous_local_upper = this->worker.local_upper_bound;
- const int previous_seq = this->worker.next_solution_seq;
this->worker.local_upper_bound = obj;
this->worker.queue_integer_solution(obj, x, node->depth);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const f_t previous_local_upper = this->worker.local_upper_bound; | |
| const int previous_seq = this->worker.next_solution_seq; | |
| this->worker.local_upper_bound = obj; | |
| this->worker.queue_integer_solution(obj, x, node->depth); | |
| this->worker.local_upper_bound = obj; | |
| this->worker.queue_integer_solution(obj, x, node->depth); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 1422 - 1425, The
two local variables previous_local_upper and previous_seq are assigned but never
used; remove their declarations and assignments to clean up dead code around the
call that updates this->worker.local_upper_bound and calls
this->worker.queue_integer_solution(obj, x, node->depth). Ensure only the
necessary state updates (this->worker.local_upper_bound = obj and the
queue_integer_solution call) remain and do not alter any other logic in the
surrounding function.
| for (i_t k = 0; k < fractional.size(); k++) { | ||
| const i_t j = fractional[k]; | ||
| for (i_t branch = 0; branch < 2; branch++) { | ||
| if (branch == 0) { | ||
| f_t change_in_obj = strong_branch_down[k]; | ||
| if (std::isnan(change_in_obj)) { continue; } | ||
| f_t frac = root_soln[j] - std::floor(root_soln[j]); | ||
| pseudo_cost_sum_down[j] += change_in_obj / frac; | ||
| pseudo_cost_num_down[j]++; | ||
| } else { | ||
| f_t change_in_obj = strong_branch_up[k]; | ||
| if (std::isnan(change_in_obj)) { continue; } | ||
| f_t frac = std::ceil(root_soln[j]) - root_soln[j]; | ||
| pseudo_cost_sum_up[j] += change_in_obj / frac; | ||
| pseudo_cost_num_up[j]++; | ||
| } | ||
| if (!std::isnan(strong_branch_down[k])) { | ||
| const f_t frac = root_soln[j] - std::floor(root_soln[j]); | ||
| pseudo_cost_sum_down[j] += strong_branch_down[k] / frac; | ||
| pseudo_cost_num_down[j]++; | ||
| } | ||
| if (!std::isnan(strong_branch_up[k])) { | ||
| const f_t frac = std::ceil(root_soln[j]) - root_soln[j]; | ||
| pseudo_cost_sum_up[j] += strong_branch_up[k] / frac; | ||
| pseudo_cost_num_up[j]++; | ||
| } |
There was a problem hiding this comment.
Consider guarding against near-zero fractions in pseudo-cost updates.
When root_soln[j] is very close to an integer boundary, frac can be extremely small (e.g., 1e-10), causing the pseudo-cost update to be numerically unstable due to division amplification.
🛡️ Suggested guard
if (!std::isnan(strong_branch_down[k])) {
const f_t frac = root_soln[j] - std::floor(root_soln[j]);
+ if (frac > 1e-6) { // guard against near-integer values
pseudo_cost_sum_down[j] += strong_branch_down[k] / frac;
pseudo_cost_num_down[j]++;
+ }
}
if (!std::isnan(strong_branch_up[k])) {
const f_t frac = std::ceil(root_soln[j]) - root_soln[j];
+ if (frac > 1e-6) {
pseudo_cost_sum_up[j] += strong_branch_up[k] / frac;
pseudo_cost_num_up[j]++;
+ }
}As per coding guidelines: "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1690 - 1701, The
pseudo-cost update divides by frac (computed from root_soln[j]) which may be
extremely small near integer boundaries; modify the loops around
strong_branch_down[k] and strong_branch_up[k] to guard against near-zero frac by
computing a safe divisor: set an epsilon (e.g. using
std::numeric_limits<f_t>::epsilon() scaled or a small constant like 1e-8) and
either skip the update when frac < eps or replace frac with std::max(frac, eps)
before dividing; apply this change to the fractions computed for down (frac =
root_soln[j] - std::floor(...)) and up (frac = std::ceil(...) - root_soln[j]) so
updates to pseudo_cost_sum_down[j], pseudo_cost_sum_up[j] and increments of
pseudo_cost_num_down[j]/pseudo_cost_num_up[j] only occur with a numerically safe
divisor.
| f_t bb_work_unit_scale{1.0}; | ||
| bool gpu_heur_wait_for_exploration{true}; |
There was a problem hiding this comment.
Align this default with the public MIP setting.
simplex_solver_settings_t::gpu_heur_wait_for_exploration defaults to true here, but mip_solver_settings_t now defaults the same knob to false. Any internal path that constructs simplex_solver_settings_t directly will silently get different scheduling behavior unless every translation layer overwrites it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/simplex_solver_settings.hpp` around lines 206 - 207, The
default for simplex_solver_settings_t::gpu_heur_wait_for_exploration is
inconsistent with the public MIP default; change the member declaration
(currently next to bb_work_unit_scale in simplex_solver_settings_t) to default
to false so internal constructions get the same scheduling behavior as
mip_solver_settings_t (ensure the bool gpu_heur_wait_for_exploration{...}
initialization is set to false).
cpp/tests/mip/diversity_test.cu
Outdated
| detail::diversity_manager_t<int, double> diversity_manager(solver.context); | ||
| solver.context.gpu_heur_loop.deterministic = true; | ||
| diversity_manager.timer = | ||
| work_limit_timer_t(solver.context.gpu_heur_loop, settings.work_limit, timer); |
There was a problem hiding this comment.
Use solver.context.gpu_heur_loop as the deterministic work context in these helpers too.
The nested timers inside local search/FJ read solver.context.gpu_heur_loop, not the ad-hoc work_limit_context_t created here. Unlike test_full_run_determinism(), these helpers never make that shared context deterministic, so they can bypass the timer/work-accounting path the test is supposed to validate.
💡 Minimal fix
Apply the same pattern used by test_full_run_determinism() in both helpers:
- work_limit_context_t work_limit_context("DiversityManager");
- work_limit_context.deterministic = true;
- diversity_manager.timer = work_limit_timer_t(work_limit_context, 60000, timer);
+ solver.context.gpu_heur_loop.deterministic = true;
+ diversity_manager.timer =
+ work_limit_timer_t(solver.context.gpu_heur_loop, 60000, timer);Also applies to: 147-153, 196-202
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/tests/mip/diversity_test.cu` around lines 100 - 103, The helpers create
an ad-hoc work_limit_context_t and attach a work_limit_timer_t to
diversity_manager.timer instead of using the shared deterministic context;
change them to use solver.context.gpu_heur_loop as the deterministic work
context (set solver.context.gpu_heur_loop.deterministic = true where needed) and
construct the work_limit_timer_t and any nested timers with
solver.context.gpu_heur_loop and the existing timer variable so the
nested/local-search/FJ timers read the same deterministic context as in
test_full_run_determinism(). Ensure you update the corresponding locations where
diversity_manager, work_limit_timer_t, and timer are set (also apply the same
change to the other two helper blocks mentioned).
| static std::unordered_map<std::string, double> first_val_map; | ||
| if (first_val_map.count(test_instance) == 0) { | ||
| first_val_map[test_instance] = solution.get_user_objective(); | ||
| } | ||
| EXPECT_NEAR(solution.get_user_objective(), first_val_map[test_instance], 1.0) | ||
| << test_instance << " determinism objective mismatch"; |
There was a problem hiding this comment.
This does not verify determinism, only “similar objective.”
EXPECT_NEAR(..., 1.0) will still pass for different incumbents, feasibility states, or assignments as long as the user objective stays within 1. For the determinism regression in this PR, compare the exact feasible flag plus the produced assignment/objective byte-for-byte across runs.
As per coding guidelines, "Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/tests/mip/feasibility_jump_tests.cu` around lines 193 - 198, The current
check using first_val_map and EXPECT_NEAR only asserts similar numeric
objective, not exact determinism; change the test to record and compare a full
deterministic serialization of the solution (include solution.is_feasible()
flag, solution.get_user_objective(), and the produced assignment/variable
values) for each test_instance. Replace first_val_map with a map from
test_instance to the serialized result (e.g., string/byte vector) and on
subsequent runs serialize the current solution the same way and ASSERT_EQ the
stored bytes with the current bytes. Use the existing symbols
solution.get_user_objective(), the feasible flag accessor (e.g.,
solution.is_feasible()), and however assignments are exposed to build a
byte-for-byte representation to compare exact equality rather than EXPECT_NEAR.
| auto sorted_map = std::map<int, detail::cached_bound_t<double>>( | ||
| a.second[0].var_to_cached_bound_map.begin(), a.second[0].var_to_cached_bound_map.end()); | ||
| for (const auto& [var_id, cached_bound] : sorted_map) { | ||
| var_to_cached_bound_keys.push_back(var_id); | ||
| var_to_cached_bound_lb.push_back(cached_bound.lb); | ||
| var_to_cached_bound_ub.push_back(cached_bound.ub); | ||
| } |
There was a problem hiding this comment.
Include both cached-bound maps in the determinism hash.
The composite hash only serializes a.second[0].var_to_cached_bound_map. Any nondeterminism in the [1] probe direction is invisible to the test.
💡 Minimal fix
- auto sorted_map = std::map<int, detail::cached_bound_t<double>>(
- a.second[0].var_to_cached_bound_map.begin(), a.second[0].var_to_cached_bound_map.end());
- for (const auto& [var_id, cached_bound] : sorted_map) {
- var_to_cached_bound_keys.push_back(var_id);
- var_to_cached_bound_lb.push_back(cached_bound.lb);
- var_to_cached_bound_ub.push_back(cached_bound.ub);
- }
+ for (int side = 0; side < 2; ++side) {
+ auto sorted_map = std::map<int, detail::cached_bound_t<double>>(
+ a.second[side].var_to_cached_bound_map.begin(),
+ a.second[side].var_to_cached_bound_map.end());
+ for (const auto& [var_id, cached_bound] : sorted_map) {
+ var_to_cached_bound_keys.push_back(var_id);
+ var_to_cached_bound_lb.push_back(cached_bound.lb);
+ var_to_cached_bound_ub.push_back(cached_bound.ub);
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/tests/mip/presolve_test.cu` around lines 138 - 144, The determinism
serialization currently only reads a.second[0].var_to_cached_bound_map into
sorted_map, so nondeterminism from the [1] probe is ignored; update the logic
that builds
var_to_cached_bound_keys/var_to_cached_bound_lb/var_to_cached_bound_ub to
include both a.second[0].var_to_cached_bound_map and
a.second[1].var_to_cached_bound_map (e.g., iterate index 0 and 1, or merge both
maps before creating sorted_map) so that both cached-bound maps are serialized
into the hash; ensure you use the same sorting/iteration approach and push_back
the corresponding var_id, cached_bound.lb and cached_bound.ub for entries from
both maps.
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/branch_and_bound/pseudo_costs.cpp (1)
1184-1202:⚠️ Potential issue | 🟠 MajorForward
work_unit_contextintostrong_branch_helper(...)call.At Line 1184,
strong_branch_helper(...)is invoked without the newwork_unit_contextparameter, so Line 307 falls back tonullptrand deterministic work accounting is dropped in this DS path.Suggested fix
strong_branch_helper(start, end, start_time, original_lp, settings, var_types, fractional, root_solution.x, root_vstatus, edge_norms, root_obj, upper_bound, simplex_iteration_limit, pc, dual_simplex_obj_down, dual_simplex_obj_up, dual_simplex_status_down, dual_simplex_status_up, - sb_view); + sb_view, + work_unit_context);As per coding guidelines: "Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1184 - 1202, The call to strong_branch_helper(...) at the shown site omits the new work_unit_context parameter so the helper receives nullptr (losing deterministic work accounting); update the invocation to pass the existing work_unit_context variable into strong_branch_helper, and ensure the helper's signature and all other call sites accept the extra parameter (verify function strong_branch_helper and its declaration are updated accordingly) so deterministic work accounting is preserved on this DS path.
♻️ Duplicate comments (7)
cpp/src/branch_and_bound/pseudo_costs.cpp (1)
2086-2094:⚠️ Potential issue | 🟡 MinorGuard pseudo-cost updates against near-zero fractional distances.
Line 2088 and Line 2093 divide by
fracwithout a minimum threshold; near-integer values can amplify noise or spike pseudo-costs.Suggested guard
for (i_t k = 0; k < fractional.size(); k++) { const i_t j = fractional[k]; + constexpr f_t frac_eps = 1e-8; if (!std::isnan(strong_branch_down[k])) { const f_t frac = root_soln[j] - std::floor(root_soln[j]); - pseudo_cost_sum_down[j] += strong_branch_down[k] / frac; - pseudo_cost_num_down[j]++; + if (frac > frac_eps) { + pseudo_cost_sum_down[j] += strong_branch_down[k] / frac; + pseudo_cost_num_down[j]++; + } } if (!std::isnan(strong_branch_up[k])) { const f_t frac = std::ceil(root_soln[j]) - root_soln[j]; - pseudo_cost_sum_up[j] += strong_branch_up[k] / frac; - pseudo_cost_num_up[j]++; + if (frac > frac_eps) { + pseudo_cost_sum_up[j] += strong_branch_up[k] / frac; + pseudo_cost_num_up[j]++; + } } }As per coding guidelines: "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 2086 - 2094, The pseudo-cost updates in the block that updates pseudo_cost_sum_down[j]/pseudo_cost_sum_up[j] (using strong_branch_down[k], strong_branch_up[k] and root_soln[j]) divide by frac computed as root_soln[j]-floor(root_soln[j]) and ceil(root_soln[j])-root_soln[j] without guarding tiny denominators; change the logic in that section to protect against near-zero frac by either clamping frac = std::max(frac, eps) or skipping the update when frac < eps (choose a small constant eps, e.g. 1e-12 or a configurable constant), so that pseudo_cost_num_down/up are only incremented when a safe division occurred and no large spikes are introduced.cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cuh (1)
103-103:⚠️ Potential issue | 🟡 MinorLatent issue:
termination_checker_tmember lacks initializer.The
termination_checker_t max_timer;declaration has no initializer, buttermination_checker_thas no default constructor (per context snippet 1 showing only explicit constructors). This would cause a compilation error.However, based on learnings, this
lb_constraint_proppath is not currently compiled, so this is a latent issue that will only surface if the path is re-enabled. When that happens, the member will need explicit initialization similar toconstraint_prop_t:termination_checker_t max_timer{0.0, cuopt::termination_checker_t::root_tag_t{}};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cuh` at line 103, The member declaration termination_checker_t max_timer in lb_constraint_prop lacks an initializer and will fail to compile because termination_checker_t has no default constructor; initialize max_timer explicitly (same pattern used for constraint_prop_t) by constructing it with a zero timeout and the root_tag_t sentinel (use termination_checker_t{0.0, cuopt::termination_checker_t::root_tag_t{}} or equivalent) inside the lb_constraint_prop class so the member is properly initialized when that path is enabled.cpp/src/mip_heuristics/local_search/local_search.cu (1)
233-249:⚠️ Potential issue | 🟠 MajorAccount the helper CPUFJ threads against one deterministic budget.
In deterministic mode, each
ls_cpu_fjworker gets an independentwork_budget = time_limit(line 247). This means with multiple helpers, the aggregate work consumed can exceed the caller's budget, and timestamps won't align with the single work-total that B&B synchronizes against.Unlike
start_cpufj_deterministic()(lines 148-190) which registers withproducer_sync, these helpers operate with isolated budgets.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/local_search/local_search.cu` around lines 233 - 249, The helper CPUFJ threads in ls_cpu_fj are being given independent deterministic budgets (setting cpu_fj.fj_cpu->work_units_elapsed = 0.0 and cpu_fj.fj_cpu->work_budget = time_limit), which allows aggregate work to exceed the single deterministic budget; instead, mirror start_cpufj_deterministic() and register each helper with the shared producer_sync so they are charged against the common deterministic budget (or otherwise attach the same shared budget handle) after create_cpu_climber(...) rather than assigning an isolated work_budget, ensuring helpers use the synchronized producer_sync accounting.cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu (2)
1037-1041:⚠️ Potential issue | 🟠 MajorGraph batch can overshoot deterministic budget.
When
deterministic_work_estimateis true,use_graphremainstrue, sorun_step_device()executes a full 50-iteration batch. Thedeterministic_batch_workis only charged to the timer after the batch completes (lines 1107-1112). This allows a tiny positive budget to overshoot by an entire batch, shifting work timestamps used for deterministic publication/sync.Consider disabling graph mode when the remaining deterministic budget is less than a full batch's estimated work.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu` around lines 1037 - 1041, The current logic keeps use_graph=true even when deterministic_work_estimate is set, allowing run_step_device() to execute a full iterations_per_graph batch and overshoot the deterministic budget; modify the decision so that when deterministic_work_estimate is true and the remaining deterministic budget (compare against deterministic_batch_work) is less than the estimated work for a full graph batch, disable graph mode or force iterations_per_batch=1 (i.e., set use_graph=false or set iterations_per_batch = 1) before calling run_step_device(), referencing the variables deterministic_work_estimate, use_graph, iterations_per_batch, run_step_device, and deterministic_batch_work so the small remaining budget is respected and work is charged before any multi-iteration graph run.
995-1017:⚠️ Potential issue | 🟠 MajorAdd CUDA error checking after the kernel launch.
The
init_lhs_and_violated_constraintskernel launch at line 996 lacks immediate error checking. Any failure will only surface later from the subsequentthrust::transform_reducecalls, making the fault harder to localize.💡 Suggested fix
data.violated_constraints.clear(stream); init_lhs_and_violated_constraints<i_t, f_t><<<4096, 256, 0, stream>>>(v); + RAFT_CHECK_CUDA(stream); // both transformreduce could be fused; but oh well hardly a bottleneckAs per coding guidelines, "Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu` around lines 995 - 1017, The kernel launch init_lhs_and_violated_constraints<i_t, f_t><<<4096, 256, 0, stream>>>(v) lacks immediate CUDA error checking; add a post-launch check using the project's CUDA_CHECK (or equivalent) macro/function right after that launch (and before any thrust::transform_reduce calls) to capture launch failures on stream, e.g., call CUDA_CHECK(cudaGetLastError()) and/or CUDA_CHECK(cudaPeekAtLastError()/cudaStreamSynchronize(stream)) as appropriate so failures are reported at the kernel boundary; update surrounding code in feasibility_jump.cu near init_lhs_and_violated_constraints, data.violated_constraints.clear, and before the transform_reduce invocations to perform this check.cpp/tests/mip/diversity_test.cu (2)
147-153:⚠️ Potential issue | 🟠 MajorUse
solver.context.gpu_heur_loopas the deterministic work context.This helper creates an ad-hoc
work_limit_context_tinstead of using the shared context fromsolver.context.gpu_heur_loop. The nested timers inside local search/FJ readsolver.context.gpu_heur_loop, not this ad-hoc context, so they can bypass the work-accounting path the test is supposed to validate.💡 Suggested fix
detail::diversity_manager_t<int, double> diversity_manager(solver.context); solver.context.diversity_manager_ptr = &diversity_manager; - work_limit_context_t work_limit_context("DiversityManager"); - work_limit_context.deterministic = true; - diversity_manager.timer = termination_checker_t(work_limit_context, 60000, timer); + solver.context.gpu_heur_loop.deterministic = true; + diversity_manager.timer = + termination_checker_t(solver.context.gpu_heur_loop, 60000, timer);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/diversity_test.cu` around lines 147 - 153, The test creates an ad-hoc work_limit_context_t and assigns diversity_manager.timer from it, which bypasses the shared GPU heuristic work accounting; replace that ad-hoc context with the shared solver.context.gpu_heur_loop: remove the local work_limit_context_t creation and set diversity_manager.timer = termination_checker_t(solver.context.gpu_heur_loop, 60000, timer) so nested timers inside local search/FJ use solver.context.gpu_heur_loop; keep the rest of the setup (diversity_manager, diversity_manager.diversity_config.initial_solution_only, solver.context.diversity_manager_ptr and diversity_manager.run_solver()) unchanged.
196-202:⚠️ Potential issue | 🟠 MajorUse
solver.context.gpu_heur_loopas the deterministic work context.Same issue as
test_initial_solution_determinism- the ad-hocwork_limit_context_tis not connected to the nested timers in FJ/local search.💡 Suggested fix
detail::diversity_manager_t<int, double> diversity_manager(solver.context); solver.context.diversity_manager_ptr = &diversity_manager; - work_limit_context_t work_limit_context("DiversityManager"); - work_limit_context.deterministic = true; - diversity_manager.timer = termination_checker_t(work_limit_context, 60000, timer); + solver.context.gpu_heur_loop.deterministic = true; + diversity_manager.timer = + termination_checker_t(solver.context.gpu_heur_loop, 60000, timer);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/diversity_test.cu` around lines 196 - 202, The test creates an ad-hoc work_limit_context_t and uses it to build the termination_checker_t for diversity_manager (detail::diversity_manager_t<int,double> diversity_manager(...)), but this isn’t hooked into nested timers; replace the local work_limit_context_t with the existing solver.context.gpu_heur_loop: set solver.context.gpu_heur_loop.deterministic = true (or ensure it is true), construct diversity_manager.timer = termination_checker_t(solver.context.gpu_heur_loop, 60000, timer), remove the standalone work_limit_context_t, and keep diversity_manager.diversity_config.dry_run = true before calling diversity_manager.run_solver() so the nested FJ/local search timers use the correct deterministic context.
🧹 Nitpick comments (5)
cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu (2)
74-90: Redundant condition and dead code in determinism mode block.The check
!std::isinf(work_limit)on line 79 is redundant becausedeterminism_modeon line 74 is only true whenstd::isfinite(work_limit)(line 68), which already excludes infinity. Consequently, theelsebranch (lines 88-90) settingestim_iters = std::numeric_limits<int>::max()is dead code.♻️ Proposed simplification
if (determinism_mode) { // try to estimate the iteration count based on the requested work limit // TODO: replace with an actual model. this is a rather ugly hack to avoid having // to touch the PDLP code for this initial PR estim_iters = 100; - if (!std::isinf(work_limit)) { - do { - // TODO: use an actual predictor model here - double estim_ms = 313 + 200 * op_problem.n_variables - 400 * op_problem.n_constraints + - 600 * op_problem.coefficients.size() + 7100 * estim_iters; - estim_ms = std::max(0.0, estim_ms); - if (estim_ms > work_limit * 1000) { break; } - estim_iters += 100; - } while (true); - } else { - estim_iters = std::numeric_limits<int>::max(); - } + do { + // TODO: use an actual predictor model here + double estim_ms = 313 + 200 * op_problem.n_variables - 400 * op_problem.n_constraints + + 600 * op_problem.coefficients.size() + 7100 * estim_iters; + estim_ms = std::max(0.0, estim_ms); + if (estim_ms > work_limit * 1000) { break; } + estim_iters += 100; + } while (true);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu` around lines 74 - 90, The determinism_mode block contains a redundant check for !std::isinf(work_limit) and a dead else branch; remove the inner if/else and always run the estimators loop because determinism_mode is only set when std::isfinite(work_limit) earlier, so delete the else that sets estim_iters = std::numeric_limits<int>::max() and collapse the loop to run unconditionally inside the determinism_mode branch, keeping the estim_iters initialization and the loop that computes estim_ms using op_problem.n_variables, op_problem.n_constraints, op_problem.coefficients.size(), and updating estim_iters.
23-23: Unused include:<atomic>is not used in this file.The
#include <atomic>header was added but nostd::atomictypes are used. The static counter on line 53 uses plainuint64_t. Consider removing the unused include or convertinglp_call_countertostd::atomic<uint64_t>if thread-safety is intended.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu` at line 23, The include <atomic> is unused or lp_call_counter should be atomic for thread-safety; either remove the unused `#include` <atomic> or change the static counter lp_call_counter (currently a uint64_t) to std::atomic<uint64_t> and update its increment/usesites accordingly (use fetch_add or ++ semantics) so thread-safe increments are correct; ensure to include <atomic> only when converting lp_call_counter to std::atomic<uint64_t>.cpp/tests/mip/local_search_test.cu (2)
196-197: Unusedspin_stream_raii_tvariables.These RAII objects are declared but appear unused in the test. If they're intended to keep CUDA streams alive during the test, consider adding a comment explaining their purpose. Otherwise, they can be removed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/local_search_test.cu` around lines 196 - 197, The two RAII stream objects spin_stream_raii_t spin_stream_1 and spin_stream_raii_t spin_stream_2 are declared but never used; either remove these unused declarations or explicitly document their purpose if they are intended to keep CUDA streams alive (add a comment above spin_stream_1/spin_stream_2 stating they intentionally hold/flush streams for the test), and if they must be used, ensure they're constructed with the appropriate stream handles where the test creates CUDA streams so the RAII actually manages those resources.
231-236: Consider enablingFJ_ON_ZEROmode in the test suite.The
FJ_ON_ZEROmode is defined in the enum (line 74) and has a corresponding execution path (lines 162-164), but it's commented out in the test instantiation. If this mode is intended to be deterministic, it should be tested.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/local_search_test.cu` around lines 231 - 236, Enable the commented-out FJ_ON_ZERO test case in the test instantiation: update the INSTANTIATE_TEST_SUITE_P call for LocalSearchTests/LocalSearchTestParams to include std::make_tuple(local_search_mode_t::FJ_ON_ZERO) alongside FP, FJ_LINE_SEGMENT and FJ_ANNEALING so the FJ_ON_ZERO execution path is covered by the test suite; ensure you remove the comment markers around the std::make_tuple(local_search_mode_t::FJ_ON_ZERO) entry so the test framework will instantiate it.cpp/tests/mip/diversity_test.cu (1)
240-241: Structured binding discards the third return value.The
recombinefunction returnsstd::tuple<solution, bool, double>but only two elements are captured. While this works in C++, explicitly ignoring the third element makes intent clearer.💡 Suggested fix
- auto [offspring, success] = + auto [offspring, success, /* work_done */] = diversity_manager.recombine(pop_vector[i], pop_vector[j], recombiner);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/diversity_test.cu` around lines 240 - 241, The structured binding is discarding the third value from diversity_manager.recombine which returns std::tuple<solution, bool, double>; update the capture to explicitly bind the third element (for example capture as a named variable like "metric" or use std::ignore if you intend to discard it) so the intent is clear when calling recombine and avoid silent discarding of the double return; update the line that currently binds "offspring" and "success" to also bind the third value from recombine.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/mip_heuristics/diversity/recombiners/fp_recombiner.cuh`:
- Around line 68-69: The current TODO placeholder sets work to a small static
value and then returns 0.0 on LP-based early exits, losing the work already
spent by get_relaxed_lp_solution; update the code in fp_recombiner.cuh so the
work variable (currently computed from n_vars_from_other) accumulates the LP
pass cost and is returned on all early-exit paths. Concretely: in the scope
around get_relaxed_lp_solution(...) ensure you add the LP work contribution to
the existing work variable (reference n_vars_from_other and work) and return
that accumulated work instead of 0.0; apply the same change to the other
early-exit sites mentioned (around the lines corresponding to 100-105) so
recombine_stats.set_recombiner_work(work) receives the correct nonzero value.
In `@cpp/src/mip_heuristics/problem/presolve_data.cu`:
- Around line 236-241: The code synchronizes the wrong handle's stream: replace
the call to problem.handle_ptr->sync_stream() with a sync on the actual handle
used for subsequent GPU work (the local pointer h), e.g. call h->sync_stream()
or otherwise synchronize h->get_stream() before invoking cuopt::host_copy and
other GPU operations; update any related comments to reflect that
synchronization targets the chosen handle (h) to avoid mismatched streams when
handle_override is provided.
In `@cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu`:
- Around line 53-54: The static non-atomic lp_call_counter causes a data race
when multiple threads call get_relaxed_lp_solution; change lp_call_counter to an
atomic (e.g., std::atomic<uint64_t>) and obtain lp_call_id using an atomic
fetch-add (or equivalent) so increments are thread-safe, updating references to
lp_call_counter and lp_call_id in get_relaxed_lp_solution accordingly.
In `@cpp/src/mip_heuristics/solver.cu`:
- Around line 491-496: The async lambda that calls
RAFT_CUDA_TRY(cudaSetDevice(bb_device_id)) must verify the result and fail fast
on error: replace the bare cudaSetDevice call with the project’s CUDA
error-checking helper (e.g., RAFT_CUDA_TRY or CUDA_CHECK) or explicitly check
cudaError_t and handle failures before calling branch_and_bound->solve; update
the lambda that captures bb_device_id and calls
branch_and_bound->solve(branch_and_bound_solution) so it logs/propagates the
cudaSetDevice error (using the same error handling pattern used elsewhere) and
returns/throws an error instead of proceeding to solve when device
initialization fails.
In `@cpp/tests/mip/diversity_test.cu`:
- Around line 267-279: The block that collects hashes and computes final_hash
(using diversity_manager.get_population_pointer(), pop->population_to_vector(),
detail::compute_hash(hashes), and the printf that references path and
final_hash) is dead/unreachable because an earlier return happens; remove this
unreachable code or move its logic before the earlier return so only reachable
code remains—specifically delete the loop that pushes sol.get_hash(), the
compute_hash call, the printf, and the trailing return final_hash; ensure no
references to final_hash or hashes remain after the earlier return.
---
Outside diff comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1184-1202: The call to strong_branch_helper(...) at the shown site
omits the new work_unit_context parameter so the helper receives nullptr (losing
deterministic work accounting); update the invocation to pass the existing
work_unit_context variable into strong_branch_helper, and ensure the helper's
signature and all other call sites accept the extra parameter (verify function
strong_branch_helper and its declaration are updated accordingly) so
deterministic work accounting is preserved on this DS path.
---
Duplicate comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 2086-2094: The pseudo-cost updates in the block that updates
pseudo_cost_sum_down[j]/pseudo_cost_sum_up[j] (using strong_branch_down[k],
strong_branch_up[k] and root_soln[j]) divide by frac computed as
root_soln[j]-floor(root_soln[j]) and ceil(root_soln[j])-root_soln[j] without
guarding tiny denominators; change the logic in that section to protect against
near-zero frac by either clamping frac = std::max(frac, eps) or skipping the
update when frac < eps (choose a small constant eps, e.g. 1e-12 or a
configurable constant), so that pseudo_cost_num_down/up are only incremented
when a safe division occurred and no large spikes are introduced.
In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu`:
- Around line 1037-1041: The current logic keeps use_graph=true even when
deterministic_work_estimate is set, allowing run_step_device() to execute a full
iterations_per_graph batch and overshoot the deterministic budget; modify the
decision so that when deterministic_work_estimate is true and the remaining
deterministic budget (compare against deterministic_batch_work) is less than the
estimated work for a full graph batch, disable graph mode or force
iterations_per_batch=1 (i.e., set use_graph=false or set iterations_per_batch =
1) before calling run_step_device(), referencing the variables
deterministic_work_estimate, use_graph, iterations_per_batch, run_step_device,
and deterministic_batch_work so the small remaining budget is respected and work
is charged before any multi-iteration graph run.
- Around line 995-1017: The kernel launch init_lhs_and_violated_constraints<i_t,
f_t><<<4096, 256, 0, stream>>>(v) lacks immediate CUDA error checking; add a
post-launch check using the project's CUDA_CHECK (or equivalent) macro/function
right after that launch (and before any thrust::transform_reduce calls) to
capture launch failures on stream, e.g., call CUDA_CHECK(cudaGetLastError())
and/or CUDA_CHECK(cudaPeekAtLastError()/cudaStreamSynchronize(stream)) as
appropriate so failures are reported at the kernel boundary; update surrounding
code in feasibility_jump.cu near init_lhs_and_violated_constraints,
data.violated_constraints.clear, and before the transform_reduce invocations to
perform this check.
In `@cpp/src/mip_heuristics/local_search/local_search.cu`:
- Around line 233-249: The helper CPUFJ threads in ls_cpu_fj are being given
independent deterministic budgets (setting cpu_fj.fj_cpu->work_units_elapsed =
0.0 and cpu_fj.fj_cpu->work_budget = time_limit), which allows aggregate work to
exceed the single deterministic budget; instead, mirror
start_cpufj_deterministic() and register each helper with the shared
producer_sync so they are charged against the common deterministic budget (or
otherwise attach the same shared budget handle) after create_cpu_climber(...)
rather than assigning an isolated work_budget, ensuring helpers use the
synchronized producer_sync accounting.
In `@cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cuh`:
- Line 103: The member declaration termination_checker_t max_timer in
lb_constraint_prop lacks an initializer and will fail to compile because
termination_checker_t has no default constructor; initialize max_timer
explicitly (same pattern used for constraint_prop_t) by constructing it with a
zero timeout and the root_tag_t sentinel (use termination_checker_t{0.0,
cuopt::termination_checker_t::root_tag_t{}} or equivalent) inside the
lb_constraint_prop class so the member is properly initialized when that path is
enabled.
In `@cpp/tests/mip/diversity_test.cu`:
- Around line 147-153: The test creates an ad-hoc work_limit_context_t and
assigns diversity_manager.timer from it, which bypasses the shared GPU heuristic
work accounting; replace that ad-hoc context with the shared
solver.context.gpu_heur_loop: remove the local work_limit_context_t creation and
set diversity_manager.timer =
termination_checker_t(solver.context.gpu_heur_loop, 60000, timer) so nested
timers inside local search/FJ use solver.context.gpu_heur_loop; keep the rest of
the setup (diversity_manager,
diversity_manager.diversity_config.initial_solution_only,
solver.context.diversity_manager_ptr and diversity_manager.run_solver())
unchanged.
- Around line 196-202: The test creates an ad-hoc work_limit_context_t and uses
it to build the termination_checker_t for diversity_manager
(detail::diversity_manager_t<int,double> diversity_manager(...)), but this isn’t
hooked into nested timers; replace the local work_limit_context_t with the
existing solver.context.gpu_heur_loop: set
solver.context.gpu_heur_loop.deterministic = true (or ensure it is true),
construct diversity_manager.timer =
termination_checker_t(solver.context.gpu_heur_loop, 60000, timer), remove the
standalone work_limit_context_t, and keep
diversity_manager.diversity_config.dry_run = true before calling
diversity_manager.run_solver() so the nested FJ/local search timers use the
correct deterministic context.
---
Nitpick comments:
In `@cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu`:
- Around line 74-90: The determinism_mode block contains a redundant check for
!std::isinf(work_limit) and a dead else branch; remove the inner if/else and
always run the estimators loop because determinism_mode is only set when
std::isfinite(work_limit) earlier, so delete the else that sets estim_iters =
std::numeric_limits<int>::max() and collapse the loop to run unconditionally
inside the determinism_mode branch, keeping the estim_iters initialization and
the loop that computes estim_ms using op_problem.n_variables,
op_problem.n_constraints, op_problem.coefficients.size(), and updating
estim_iters.
- Line 23: The include <atomic> is unused or lp_call_counter should be atomic
for thread-safety; either remove the unused `#include` <atomic> or change the
static counter lp_call_counter (currently a uint64_t) to std::atomic<uint64_t>
and update its increment/usesites accordingly (use fetch_add or ++ semantics) so
thread-safe increments are correct; ensure to include <atomic> only when
converting lp_call_counter to std::atomic<uint64_t>.
In `@cpp/tests/mip/diversity_test.cu`:
- Around line 240-241: The structured binding is discarding the third value from
diversity_manager.recombine which returns std::tuple<solution, bool, double>;
update the capture to explicitly bind the third element (for example capture as
a named variable like "metric" or use std::ignore if you intend to discard it)
so the intent is clear when calling recombine and avoid silent discarding of the
double return; update the line that currently binds "offspring" and "success" to
also bind the third value from recombine.
In `@cpp/tests/mip/local_search_test.cu`:
- Around line 196-197: The two RAII stream objects spin_stream_raii_t
spin_stream_1 and spin_stream_raii_t spin_stream_2 are declared but never used;
either remove these unused declarations or explicitly document their purpose if
they are intended to keep CUDA streams alive (add a comment above
spin_stream_1/spin_stream_2 stating they intentionally hold/flush streams for
the test), and if they must be used, ensure they're constructed with the
appropriate stream handles where the test creates CUDA streams so the RAII
actually manages those resources.
- Around line 231-236: Enable the commented-out FJ_ON_ZERO test case in the test
instantiation: update the INSTANTIATE_TEST_SUITE_P call for
LocalSearchTests/LocalSearchTestParams to include
std::make_tuple(local_search_mode_t::FJ_ON_ZERO) alongside FP, FJ_LINE_SEGMENT
and FJ_ANNEALING so the FJ_ON_ZERO execution path is covered by the test suite;
ensure you remove the comment markers around the
std::make_tuple(local_search_mode_t::FJ_ON_ZERO) entry so the test framework
will instantiate it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 85b5b753-f0c9-41b8-be6a-0d56784ce4d5
📒 Files selected for processing (55)
cpp/CMakeLists.txtcpp/include/cuopt/linear_programming/constants.hcpp/include/cuopt/linear_programming/utilities/internals.hppcpp/src/branch_and_bound/pseudo_costs.cppcpp/src/mip_heuristics/diversity/diversity_manager.cucpp/src/mip_heuristics/diversity/diversity_manager.cuhcpp/src/mip_heuristics/diversity/population.cucpp/src/mip_heuristics/diversity/population.cuhcpp/src/mip_heuristics/diversity/recombiners/bound_prop_recombiner.cuhcpp/src/mip_heuristics/diversity/recombiners/fp_recombiner.cuhcpp/src/mip_heuristics/diversity/recombiners/line_segment_recombiner.cuhcpp/src/mip_heuristics/early_heuristic.cuhcpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cucpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cucpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cuhcpp/src/mip_heuristics/local_search/line_segment_search/line_segment_search.cucpp/src/mip_heuristics/local_search/line_segment_search/line_segment_search.cuhcpp/src/mip_heuristics/local_search/local_search.cucpp/src/mip_heuristics/local_search/local_search.cuhcpp/src/mip_heuristics/local_search/rounding/bounds_repair.cucpp/src/mip_heuristics/local_search/rounding/bounds_repair.cuhcpp/src/mip_heuristics/local_search/rounding/constraint_prop.cucpp/src/mip_heuristics/local_search/rounding/constraint_prop.cuhcpp/src/mip_heuristics/local_search/rounding/lb_bounds_repair.cucpp/src/mip_heuristics/local_search/rounding/lb_bounds_repair.cuhcpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cucpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cuhcpp/src/mip_heuristics/presolve/bounds_presolve.cucpp/src/mip_heuristics/presolve/bounds_presolve.cuhcpp/src/mip_heuristics/presolve/bounds_update_data.cucpp/src/mip_heuristics/presolve/lb_probing_cache.cucpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cucpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cuhcpp/src/mip_heuristics/presolve/multi_probe.cucpp/src/mip_heuristics/presolve/multi_probe.cuhcpp/src/mip_heuristics/presolve/probing_cache.cucpp/src/mip_heuristics/presolve/probing_cache.cuhcpp/src/mip_heuristics/problem/presolve_data.cucpp/src/mip_heuristics/problem/presolve_data.cuhcpp/src/mip_heuristics/problem/problem.cucpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cucpp/src/mip_heuristics/solve.cucpp/src/mip_heuristics/solver.cucpp/src/mip_heuristics/solver_context.cuhcpp/src/utilities/copy_helpers.hppcpp/src/utilities/cuda_helpers.cuhcpp/src/utilities/termination_checker.hppcpp/tests/CMakeLists.txtcpp/tests/mip/CMakeLists.txtcpp/tests/mip/determinism_test.cucpp/tests/mip/diversity_test.cucpp/tests/mip/incumbent_callback_test.cucpp/tests/mip/local_search_test.cucpp/tests/mip/presolve_test.cuskills/cuopt-developer/SKILL.md
✅ Files skipped from review due to trivial changes (3)
- skills/cuopt-developer/SKILL.md
- cpp/tests/CMakeLists.txt
- cpp/src/mip_heuristics/diversity/diversity_manager.cu
🚧 Files skipped from review as they are similar to previous changes (19)
- cpp/src/mip_heuristics/presolve/bounds_update_data.cu
- cpp/src/mip_heuristics/presolve/multi_probe.cuh
- cpp/src/mip_heuristics/presolve/load_balanced_bounds_presolve.cu
- cpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cuh
- cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cuh
- cpp/src/mip_heuristics/presolve/lb_probing_cache.cu
- cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cuh
- cpp/src/mip_heuristics/presolve/bounds_presolve.cu
- cpp/src/mip_heuristics/local_search/rounding/lb_bounds_repair.cu
- cpp/src/mip_heuristics/problem/presolve_data.cuh
- cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu
- cpp/src/utilities/cuda_helpers.cuh
- cpp/src/mip_heuristics/solver_context.cuh
- cpp/src/mip_heuristics/local_search/line_segment_search/line_segment_search.cu
- cpp/src/mip_heuristics/problem/problem.cu
- cpp/src/mip_heuristics/presolve/probing_cache.cu
- cpp/tests/mip/determinism_test.cu
- cpp/src/mip_heuristics/local_search/rounding/bounds_repair.cu
- cpp/tests/mip/presolve_test.cu
| // TODO: CHANGE | ||
| double work = static_cast<double>(n_vars_from_other) / 1e8; |
There was a problem hiding this comment.
Don't drop recombiner work on LP-based early exits.
After get_relaxed_lp_solution(...) runs, these branches return 0.0 even though the infeasibility-detection path already spent work. Since cpp/src/mip_heuristics/diversity/diversity_manager.cu:1146-1150 forwards this value into recombine_stats.set_recombiner_work(work), the current code undercounts exactly the expensive failure cases. At minimum, return the accumulated work here; ideally replace the // TODO: CHANGE placeholder with a cost that also covers the LP pass.
Suggested minimal fix
- // TODO: CHANGE
double work = static_cast<double>(n_vars_from_other) / 1e8;
@@
if (lp_response.get_termination_status() == pdlp_termination_status_t::PrimalInfeasible ||
lp_response.get_termination_status() == pdlp_termination_status_t::DualInfeasible ||
lp_response.get_termination_status() == pdlp_termination_status_t::TimeLimit) {
CUOPT_LOG_DEBUG("FP recombiner failed because LP found infeasible!");
- return std::make_tuple(offspring, false, 0.0);
+ return std::make_tuple(offspring, false, work);
}Also applies to: 100-105
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/diversity/recombiners/fp_recombiner.cuh` around lines
68 - 69, The current TODO placeholder sets work to a small static value and then
returns 0.0 on LP-based early exits, losing the work already spent by
get_relaxed_lp_solution; update the code in fp_recombiner.cuh so the work
variable (currently computed from n_vars_from_other) accumulates the LP pass
cost and is returned on all early-exit paths. Concretely: in the scope around
get_relaxed_lp_solution(...) ensure you add the LP work contribution to the
existing work variable (reference n_vars_from_other and work) and return that
accumulated work instead of 0.0; apply the same change to the other early-exit
sites mentioned (around the lines corresponding to 100-105) so
recombine_stats.set_recombiner_work(work) receives the correct nonzero value.
| const auto* h = handle_override ? handle_override : problem.handle_ptr; | ||
| cuopt_assert(assignment.size() == papilo_reduced_to_original_map.size(), | ||
| "Papilo uncrush assignment size mismatch"); | ||
| auto h_assignment = cuopt::host_copy(assignment, problem.handle_ptr->get_stream()); | ||
| problem.handle_ptr->sync_stream(); | ||
| auto h_assignment = cuopt::host_copy(assignment, h->get_stream()); | ||
| std::vector<f_t> full_assignment; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Inspect all papilo_uncrush_assignment call sites and passed handle argument.
rg -nP -C3 --type=cu --type=cpp --type=cuh '\bpapilo_uncrush_assignment\s*\('
# 2) Inspect the exact sync/copy sequence in the implementation under review.
rg -nP -C8 'handle_override|sync_stream|host_copy\(assignment|raft::copy\(assignment\.data' cpp/src/mip_heuristics/problem/presolve_data.cu
# 3) Inspect host_copy helper implementation to confirm synchronization behavior.
rg -nP -C8 --type=hpp --type=h --type=cu --type=cpp '\bhost_copy\s*\('Repository: NVIDIA/cuopt
Length of output: 82
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for papilo_uncrush_assignment call sites
echo "=== papilo_uncrush_assignment call sites ==="
rg -nP -C3 'papilo_uncrush_assignment\s*\(' --glob='*.cu' --glob='*.cpp' --glob='*.cuh'
echo ""
echo "=== Sync/copy sequence in presolve_data.cu ==="
rg -nP -C8 'handle_override|sync_stream|host_copy\(assignment|raft::copy\(assignment\.data' cpp/src/mip_heuristics/problem/presolve_data.cu
echo ""
echo "=== host_copy helper signature ==="
rg -nP -C8 'host_copy\s*\(' --glob='*.hpp' --glob='*.h' --glob='*.cu' --glob='*.cpp'Repository: NVIDIA/cuopt
Length of output: 50369
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read the exact lines from the file to understand context
echo "=== Content around lines 230-250 of presolve_data.cu ==="
sed -n '230,250p' cpp/src/mip_heuristics/problem/presolve_data.cu | cat -n
echo ""
echo "=== Full function context (broader view) ==="
# Find the function containing line 236 and show more context
awk 'NR==200,NR==260' cpp/src/mip_heuristics/problem/presolve_data.cu | cat -nRepository: NVIDIA/cuopt
Length of output: 3700
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find where handle_override is defined and used in this function
echo "=== Function containing the code at line 236 ==="
sed -n '180,280p' cpp/src/mip_heuristics/problem/presolve_data.cu | cat -nRepository: NVIDIA/cuopt
Length of output: 3522
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find host_copy implementation
echo "=== Searching for host_copy definition/implementation ==="
find . -type f \( -name "*.hpp" -o -name "*.h" -o -name "*.cu" -o -name "*.cpp" \) -exec grep -l "host_copy" {} \; | head -20
# Look for the template definition
echo ""
echo "=== host_copy template definition ==="
rg -A 15 'template.*host_copy\s*\(' --glob='*.hpp' --glob='*.h' | head -80Repository: NVIDIA/cuopt
Length of output: 2270
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check what host_copy is - look in header files for implementation
echo "=== Looking for host_copy in utilities ==="
find . -path ./build -prune -o -type f -name "*.hpp" -o -name "*.h" | xargs grep -l 'host_copy' | head -10
# Find it with context
echo ""
echo "=== host_copy usage pattern analysis ==="
rg 'host_copy\(' --glob='*.cu' -B2 -A2 | head -100Repository: NVIDIA/cuopt
Length of output: 7907
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find and display host_copy implementation
echo "=== host_copy implementation in copy_helpers.hpp ==="
cat cpp/src/utilities/copy_helpers.hpp | head -200Repository: NVIDIA/cuopt
Length of output: 4676
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# See the rest of host_copy implementation
echo "=== Continuation of copy_helpers.hpp ==="
sed -n '160,220p' cpp/src/utilities/copy_helpers.hppRepository: NVIDIA/cuopt
Length of output: 1859
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check where assignment data comes from in the call sites
echo "=== Call site 1: solution_callbacks.cuh:102 context ==="
sed -n '95,110p' cpp/src/mip_heuristics/solution_callbacks.cuh
echo ""
echo "=== Call site 2: solve.cu:113 context ==="
sed -n '108,120p' cpp/src/mip_heuristics/solve.cu
# Also check if temp_sol is using a different stream
echo ""
echo "=== solution_t structure and stream handling ==="
rg -A 5 'struct solution_t|class solution_t' --glob='*.cuh' --glob='*.hpp' cpp/src/mip_heuristics/solution/ | head -60Repository: NVIDIA/cuopt
Length of output: 2489
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Now let's understand the rmm::device_uvector and stream associations
echo "=== Check how assignment is created/initialized ==="
rg -B5 -A5 'rmm::device_uvector<f_t>& assignment' cpp/src/mip_heuristics/problem/presolve_data.cu | head -40
# Check what stream assignment is associated with
echo ""
echo "=== Looking for assignment initialization in solution_t ==="
rg -B3 -A3 'assignment.*resize|assignment.*rmm' cpp/src/mip_heuristics/solution/solution.cu | head -50
# Check if there are any operations on assignment before papilo_uncrush_assignment is called
echo ""
echo "=== In solution_callbacks.cuh, what happens to assignment before papilo_uncrush_assignment ==="
sed -n '85,105p' cpp/src/mip_heuristics/solution_callbacks.cuhRepository: NVIDIA/cuopt
Length of output: 4351
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Let's understand stream lifetime in solution_t copy constructor
echo "=== solution_t copy constructor ==="
rg -A 20 'solution_t<i_t, f_t>::solution_t\(const solution_t' cpp/src/mip_heuristics/solution/solution.cu
echo ""
echo "=== solution_t constructor from problem ==="
rg -A 15 'solution_t<i_t, f_t>::solution_t\(problem_t' cpp/src/mip_heuristics/solution/solution.cu | head -40
# Let me check if assignment retains its stream affinity across copies
echo ""
echo "=== rmm::device_uvector stream handling in copy ==="
rg 'assignment = other.assignment|assignment\(' cpp/src/mip_heuristics/solution/solution.cu | head -20Repository: NVIDIA/cuopt
Length of output: 3059
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Let me trace through the actual execution in solution_callbacks.cuh more carefully
echo "=== Check solution_t constructor with handle_override ==="
sed -n '90,110p' cpp/src/mip_heuristics/solution_callbacks.cuh
echo ""
echo "=== Now let's look at the actual fix - does syncing problem.handle_ptr help or hurt? ==="
# The key question: when handle_override differs from problem.handle_ptr, what stream is assignment on?
echo "When papilo_uncrush_assignment is called with handle_override:"
echo "- h = handle_override (if not nullptr)"
echo "- h = problem.handle_ptr (if nullptr)"
echo ""
echo "The assignment device_uvector - which handle does it use?"
echo "Looking at line 240-246 operations:"
rg -B2 -A10 'const auto\* h = handle_override' cpp/src/mip_heuristics/problem/presolve_data.cu
echo ""
echo "=== Check if there are similar mixed-handle sync patterns elsewhere ==="
rg 'problem\.handle_ptr->sync|h->sync|handle_override' cpp/src/mip_heuristics/problem/presolve_data.cuRepository: NVIDIA/cuopt
Length of output: 2706
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for similar patterns: sync on one handle, use another handle's stream
echo "=== Looking for mixed-handle sync/stream patterns ==="
rg -B3 'sync_stream\(\)' cpp/src/mip_heuristics/problem/ --glob='*.cu' | grep -A3 -B3 'handle_override\|h->get_stream\|h->sync'
echo ""
echo "=== Confirm the issue: in solution_callbacks.cuh, sol.handle_ptr vs problem_ptr->handle_ptr ==="
sed -n '88,105p' cpp/src/mip_heuristics/solution_callbacks.cuh | cat -nRepository: NVIDIA/cuopt
Length of output: 2727
Synchronize the correct stream before GPU operations
At line 240, the code syncs problem.handle_ptr->sync_stream(), but the subsequent GPU operations on lines 241 and 244 use h->get_stream(), which may differ when handle_override is provided (as in solution_callbacks.cuh:102). This mixed-handle synchronization is incorrect; the initial sync should target the stream actually used for GPU work.
Suggested fix
- problem.handle_ptr->sync_stream();
+ h->sync_stream();
auto h_assignment = cuopt::host_copy(assignment, h->get_stream());📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const auto* h = handle_override ? handle_override : problem.handle_ptr; | |
| cuopt_assert(assignment.size() == papilo_reduced_to_original_map.size(), | |
| "Papilo uncrush assignment size mismatch"); | |
| auto h_assignment = cuopt::host_copy(assignment, problem.handle_ptr->get_stream()); | |
| problem.handle_ptr->sync_stream(); | |
| auto h_assignment = cuopt::host_copy(assignment, h->get_stream()); | |
| std::vector<f_t> full_assignment; | |
| const auto* h = handle_override ? handle_override : problem.handle_ptr; | |
| cuopt_assert(assignment.size() == papilo_reduced_to_original_map.size(), | |
| "Papilo uncrush assignment size mismatch"); | |
| h->sync_stream(); | |
| auto h_assignment = cuopt::host_copy(assignment, h->get_stream()); | |
| std::vector<f_t> full_assignment; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/problem/presolve_data.cu` around lines 236 - 241, The
code synchronizes the wrong handle's stream: replace the call to
problem.handle_ptr->sync_stream() with a sync on the actual handle used for
subsequent GPU work (the local pointer h), e.g. call h->sync_stream() or
otherwise synchronize h->get_stream() before invoking cuopt::host_copy and other
GPU operations; update any related comments to reflect that synchronization
targets the chosen handle (h) to avoid mismatched streams when handle_override
is provided.
| int bb_device_id = context.handle_ptr->get_device(); | ||
| branch_and_bound_status_future = | ||
| std::async(std::launch::async, [&branch_and_bound, &branch_and_bound_solution, bb_device_id] { | ||
| RAFT_CUDA_TRY(cudaSetDevice(bb_device_id)); | ||
| return branch_and_bound->solve(branch_and_bound_solution); | ||
| }); |
There was a problem hiding this comment.
Add CUDA error checking after cudaSetDevice.
The cudaSetDevice call inside the async B&B thread should have error checking to catch device initialization failures early.
💡 Suggested fix
branch_and_bound_status_future =
std::async(std::launch::async, [&branch_and_bound, &branch_and_bound_solution, bb_device_id] {
- RAFT_CUDA_TRY(cudaSetDevice(bb_device_id));
+ cudaError_t err = cudaSetDevice(bb_device_id);
+ if (err != cudaSuccess) {
+ CUOPT_LOG_ERROR("Failed to set CUDA device %d: %s", bb_device_id, cudaGetErrorString(err));
+ return dual_simplex::mip_status_t::NUMERICAL_ISSUES;
+ }
return branch_and_bound->solve(branch_and_bound_solution);
});As per coding guidelines, "Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| int bb_device_id = context.handle_ptr->get_device(); | |
| branch_and_bound_status_future = | |
| std::async(std::launch::async, [&branch_and_bound, &branch_and_bound_solution, bb_device_id] { | |
| RAFT_CUDA_TRY(cudaSetDevice(bb_device_id)); | |
| return branch_and_bound->solve(branch_and_bound_solution); | |
| }); | |
| int bb_device_id = context.handle_ptr->get_device(); | |
| branch_and_bound_status_future = | |
| std::async(std::launch::async, [&branch_and_bound, &branch_and_bound_solution, bb_device_id] { | |
| cudaError_t err = cudaSetDevice(bb_device_id); | |
| if (err != cudaSuccess) { | |
| CUOPT_LOG_ERROR("Failed to set CUDA device %d: %s", bb_device_id, cudaGetErrorString(err)); | |
| return dual_simplex::mip_status_t::NUMERICAL_ISSUES; | |
| } | |
| return branch_and_bound->solve(branch_and_bound_solution); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/solver.cu` around lines 491 - 496, The async lambda
that calls RAFT_CUDA_TRY(cudaSetDevice(bb_device_id)) must verify the result and
fail fast on error: replace the bare cudaSetDevice call with the project’s CUDA
error-checking helper (e.g., RAFT_CUDA_TRY or CUDA_CHECK) or explicitly check
cudaError_t and handle failures before calling branch_and_bound->solve; update
the lambda that captures bb_device_id and calls
branch_and_bound->solve(branch_and_bound_solution) so it logs/propagates the
cudaSetDevice error (using the same error handling pattern used elsewhere) and
returns/throws an error instead of proceeding to solve when device
initialization fails.
|
|
||
| auto pop = diversity_manager.get_population_pointer(); | ||
| for (const auto& sol : pop->population_to_vector()) { | ||
| hashes.push_back(sol.get_hash()); | ||
| } | ||
|
|
||
| uint32_t final_hash = detail::compute_hash(hashes); | ||
| printf("%s: final hash: 0x%x, pop size %d\n", | ||
| path.c_str(), | ||
| final_hash, | ||
| (int)pop->population_to_vector().size()); | ||
| return final_hash; | ||
| } |
There was a problem hiding this comment.
Remove unreachable dead code.
Lines 267-279 are unreachable because line 266 returns early. This appears to be leftover code from an earlier iteration.
💡 Suggested fix
return detail::compute_hash(hashes);
-
- auto pop = diversity_manager.get_population_pointer();
- for (const auto& sol : pop->population_to_vector()) {
- hashes.push_back(sol.get_hash());
- }
-
- uint32_t final_hash = detail::compute_hash(hashes);
- printf("%s: final hash: 0x%x, pop size %d\n",
- path.c_str(),
- final_hash,
- (int)pop->population_to_vector().size());
- return final_hash;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auto pop = diversity_manager.get_population_pointer(); | |
| for (const auto& sol : pop->population_to_vector()) { | |
| hashes.push_back(sol.get_hash()); | |
| } | |
| uint32_t final_hash = detail::compute_hash(hashes); | |
| printf("%s: final hash: 0x%x, pop size %d\n", | |
| path.c_str(), | |
| final_hash, | |
| (int)pop->population_to_vector().size()); | |
| return final_hash; | |
| } | |
| return detail::compute_hash(hashes); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/tests/mip/diversity_test.cu` around lines 267 - 279, The block that
collects hashes and computes final_hash (using
diversity_manager.get_population_pointer(), pop->population_to_vector(),
detail::compute_hash(hashes), and the printf that references path and
final_hash) is dead/unreachable because an earlier return happens; remove this
unreachable code or move its logic before the earlier return so only reachable
code remains—specifically delete the loop that pushes sol.get_hash(), the
compute_hash call, the printf, and the trailing return final_hash; ensure no
references to final_hash or hashes remain after the earlier return.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
cpp/tests/mip/diversity_test.cu (4)
69-69: Empty function should be removed or documented.
setup_device_symbolsis a no-op function. If it's a placeholder for future implementation, add a TODO comment; otherwise, remove it to avoid confusion.💡 Suggested fix
Either remove the function and its call sites, or document the intent:
-static void setup_device_symbols(rmm::cuda_stream_view stream_view) { (void)stream_view; } +// TODO: Implement device symbol initialization if needed for determinism tests +// static void setup_device_symbols(rmm::cuda_stream_view stream_view) { (void)stream_view; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/diversity_test.cu` at line 69, The function setup_device_symbols(rmm::cuda_stream_view stream_view) is a no-op; either remove this unused function (and clean up any call sites that only exist to invoke it) or make its purpose explicit by adding a TODO/comment indicating it's a deliberate placeholder for future device-symbol setup; update references to setup_device_symbols in tests (e.g., any calls in diversity_test.cu) to reflect the removal or to clarify that the call is intentionally a no-op.
71-73: Unusedseedparameter in test runner functions.The
seedparameter intest_full_run_determinism(line 72) andtest_initial_solution_determinism(line 121) has a default value but is never used within the function body. The global seed is set externally viacuopt::seed_generator::set_seed(seed)in the calling test. Either remove the parameter or use it to seed something locally.💡 Suggested fix
If the seed is intended to be set locally within these functions:
static uint32_t test_full_run_determinism(std::string path, - unsigned long seed = std::random_device{}(), + unsigned long seed, float work_limit = 10.0f) { + cuopt::seed_generator::set_seed(seed); const raft::handle_t handle_{};Or remove the parameter if external seeding is intentional.
Also applies to: 120-121
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/diversity_test.cu` around lines 71 - 73, The two test runner functions test_full_run_determinism and test_initial_solution_determinism declare an unused seed parameter; either remove the seed parameter from both function signatures (and update any callers) if external seeding is intended, or make the functions actually use the value by calling cuopt::seed_generator::set_seed(seed) (or equivalent local seeding) at the start of each function to apply the provided seed; pick one approach and apply it consistently to both functions.
382-394: Test instantiation includes duplicate entries.Lines 387-388 instantiate
mip/neos5.mpsandmip/gen-ip054.mps, which appear both in active entries and as commented-out entries (lines 385-386). The commented-out duplicates should be removed to avoid confusion.💡 Suggested cleanup
INSTANTIATE_TEST_SUITE_P(DiversityTest, DiversityTestParams, testing::Values( - // std::make_tuple("mip/gen-ip054.mps", 5.0f), - // std::make_tuple("mip/pk1.mps", 5.0f), std::make_tuple("mip/neos5.mps", 5.0f), std::make_tuple("mip/gen-ip054.mps", 5.0f), std::make_tuple("mip/pk1.mps", 5.0f), - // std::make_tuple("uccase9.mps"), - // std::make_tuple("mip/neos5.mps", 5.0f), std::make_tuple("mip/50v-10.mps", 5.0f) - // std::make_tuple("mip/rmatr200-p5.mps", 5.0f) ));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/diversity_test.cu` around lines 382 - 394, The INSTANTIATE_TEST_SUITE_P block for DiversityTest (using DiversityTestParams) contains commented-out entries that duplicate active tuples (e.g., "mip/neos5.mps" and "mip/gen-ip054.mps"); to clean up, remove the commented duplicate lines so only one declaration remains per test case in the INSTANTIATE_TEST_SUITE_P(DiversityTest, DiversityTestParams, testing::Values(...)) invocation and leave the active std::make_tuple entries (such as "mip/neos5.mps", "mip/gen-ip054.mps", "mip/pk1.mps", "mip/50v-10.mps") intact.
283-314: Test structure is sound but consider removing debug logging in CI.The test correctly validates recombiner determinism by comparing hashes across runs. However, the debug-level logging setup (lines 286-288) may produce excessive output in CI. Consider either:
- Removing the logger configuration, or
- Guarding it behind an environment variable check
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/diversity_test.cu` around lines 283 - 314, The test recombiners_deterministic currently forces debug logging by calling cuopt::default_logger().set_pattern, set_level and flush_on at the start of the test; remove these three logger configuration calls or wrap them in a runtime guard (e.g., check an env var like CUOPT_DEBUG_LOG) so CI runs don't emit excessive debug output; locate the calls in the TEST_P(DiversityTestParams, recombiners_deterministic) body and either delete the cuopt::default_logger().set_pattern(...), cuopt::default_logger().set_level(...), and cuopt::default_logger().flush_on(...) lines or condition them on std::getenv("CUOPT_DEBUG_LOG") before applying them.cpp/tests/mip/determinism_test.cu (1)
103-136: Consider removing unused helper function.
count_callbacks_with_origin(lines 115-121) is defined but not used anywhere in this file. If it's intended for future use, consider adding a comment; otherwise, it could be removed to avoid dead code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/determinism_test.cu` around lines 103 - 136, The helper function count_callbacks_with_origin is currently unused in this file; either remove its definition to eliminate dead code or, if you plan to use it later, add a brief explanatory comment above count_callbacks_with_origin indicating its intended future use (or mark it with a clear TODO), so reviewers know it’s intentional; locate the function by name and delete it or add the comment accordingly while leaving the other helpers (is_gpu_callback_origin, count_gpu_callbacks, count_branch_and_bound_callbacks) unchanged.cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu (1)
82-84: Model produces unreliable estimates for constraint-heavy problems.The negative coefficient for
n_constraints(-400) causes the model to underestimate work for problems with many constraints relative to variables. For such problems,estim_msclamps to 0, and the loop may overshoot the iteration estimate significantly.Since this is marked as a temporary placeholder (per the TODO), consider adding a tracking issue or ensuring the predictor replacement is prioritized.
Do you want me to open a tracking issue for replacing this heuristic model?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu` around lines 82 - 84, The current heuristic for estim_ms uses a negative weight on op_problem.n_constraints which causes underestimation and clamping to 0 for constraint-heavy problems; update the formula in relaxed_lp.cu (the estim_ms computation using op_problem.n_variables, op_problem.n_constraints, op_problem.coefficients.size(), and estim_iters) to remove or make the n_constraints coefficient non-negative (e.g., replace -400 with a small positive or zero weight) and ensure a reasonable lower bound (e.g., min_ms > 0) instead of hard clamping to 0; also add a TODO comment and open a tracking issue to prioritize replacing this temporary predictor as referenced by the existing TODO so the heuristic is not relied on long-term.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu`:
- Around line 79-90: The loop that increases estim_iters can overflow because
estim_iters is an int and may wrap if work_limit is huge; update the logic in
the block that computes estim_ms and increments estim_iters so it cannot run
unbounded: either change estim_iters to a wider integer type (e.g., int64_t)
and/or add an explicit upper bound check (compare against a safe MAX_ITERS
constant or std::numeric_limits<int>::max()/a guarded cap) and break when
reached, and ensure the estim_ms > work_limit*1000 check is preserved; locate
and modify the variables/loop referencing estim_iters, estim_ms, and work_limit
inside the do/while to add the guard and avoid signed overflow.
---
Nitpick comments:
In `@cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu`:
- Around line 82-84: The current heuristic for estim_ms uses a negative weight
on op_problem.n_constraints which causes underestimation and clamping to 0 for
constraint-heavy problems; update the formula in relaxed_lp.cu (the estim_ms
computation using op_problem.n_variables, op_problem.n_constraints,
op_problem.coefficients.size(), and estim_iters) to remove or make the
n_constraints coefficient non-negative (e.g., replace -400 with a small positive
or zero weight) and ensure a reasonable lower bound (e.g., min_ms > 0) instead
of hard clamping to 0; also add a TODO comment and open a tracking issue to
prioritize replacing this temporary predictor as referenced by the existing TODO
so the heuristic is not relied on long-term.
In `@cpp/tests/mip/determinism_test.cu`:
- Around line 103-136: The helper function count_callbacks_with_origin is
currently unused in this file; either remove its definition to eliminate dead
code or, if you plan to use it later, add a brief explanatory comment above
count_callbacks_with_origin indicating its intended future use (or mark it with
a clear TODO), so reviewers know it’s intentional; locate the function by name
and delete it or add the comment accordingly while leaving the other helpers
(is_gpu_callback_origin, count_gpu_callbacks, count_branch_and_bound_callbacks)
unchanged.
In `@cpp/tests/mip/diversity_test.cu`:
- Line 69: The function setup_device_symbols(rmm::cuda_stream_view stream_view)
is a no-op; either remove this unused function (and clean up any call sites that
only exist to invoke it) or make its purpose explicit by adding a TODO/comment
indicating it's a deliberate placeholder for future device-symbol setup; update
references to setup_device_symbols in tests (e.g., any calls in
diversity_test.cu) to reflect the removal or to clarify that the call is
intentionally a no-op.
- Around line 71-73: The two test runner functions test_full_run_determinism and
test_initial_solution_determinism declare an unused seed parameter; either
remove the seed parameter from both function signatures (and update any callers)
if external seeding is intended, or make the functions actually use the value by
calling cuopt::seed_generator::set_seed(seed) (or equivalent local seeding) at
the start of each function to apply the provided seed; pick one approach and
apply it consistently to both functions.
- Around line 382-394: The INSTANTIATE_TEST_SUITE_P block for DiversityTest
(using DiversityTestParams) contains commented-out entries that duplicate active
tuples (e.g., "mip/neos5.mps" and "mip/gen-ip054.mps"); to clean up, remove the
commented duplicate lines so only one declaration remains per test case in the
INSTANTIATE_TEST_SUITE_P(DiversityTest, DiversityTestParams,
testing::Values(...)) invocation and leave the active std::make_tuple entries
(such as "mip/neos5.mps", "mip/gen-ip054.mps", "mip/pk1.mps", "mip/50v-10.mps")
intact.
- Around line 283-314: The test recombiners_deterministic currently forces debug
logging by calling cuopt::default_logger().set_pattern, set_level and flush_on
at the start of the test; remove these three logger configuration calls or wrap
them in a runtime guard (e.g., check an env var like CUOPT_DEBUG_LOG) so CI runs
don't emit excessive debug output; locate the calls in the
TEST_P(DiversityTestParams, recombiners_deterministic) body and either delete
the cuopt::default_logger().set_pattern(...),
cuopt::default_logger().set_level(...), and
cuopt::default_logger().flush_on(...) lines or condition them on
std::getenv("CUOPT_DEBUG_LOG") before applying them.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6d186eaf-cff0-4117-9f13-2cb2e79896ba
📒 Files selected for processing (5)
cpp/src/branch_and_bound/pseudo_costs.cppcpp/src/mip_heuristics/local_search/rounding/constraint_prop.cucpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cucpp/tests/mip/determinism_test.cucpp/tests/mip/diversity_test.cu
🚧 Files skipped from review as they are similar to previous changes (2)
- cpp/src/branch_and_bound/pseudo_costs.cpp
- cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu
| if (!std::isinf(work_limit)) { | ||
| do { | ||
| // TODO: use an actual predictor model here | ||
| double estim_ms = 313 + 200 * op_problem.n_variables - 400 * op_problem.n_constraints + | ||
| 600 * op_problem.coefficients.size() + 7100 * estim_iters; | ||
| estim_ms = std::max(0.0, estim_ms); | ||
| if (estim_ms > work_limit * 1000) { break; } | ||
| estim_iters += 100; | ||
| } while (true); | ||
| } else { | ||
| estim_iters = std::numeric_limits<int>::max(); | ||
| } |
There was a problem hiding this comment.
Potential unbounded loop and signed integer overflow.
When work_limit is finite but very large (e.g., work_limit = 1e6 seconds), this loop may iterate until estim_iters overflows. Since estim_iters is a signed int, overflow is undefined behavior.
Consider adding bounds:
🛡️ Proposed fix to bound iterations
+ constexpr int max_estim_iters = std::numeric_limits<int>::max() - 100;
if (!std::isinf(work_limit)) {
do {
// TODO: use an actual predictor model here
double estim_ms = 313 + 200 * op_problem.n_variables - 400 * op_problem.n_constraints +
600 * op_problem.coefficients.size() + 7100 * estim_iters;
estim_ms = std::max(0.0, estim_ms);
if (estim_ms > work_limit * 1000) { break; }
+ if (estim_iters >= max_estim_iters) { break; }
estim_iters += 100;
} while (true);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/relaxed_lp/relaxed_lp.cu` around lines 79 - 90, The
loop that increases estim_iters can overflow because estim_iters is an int and
may wrap if work_limit is huge; update the logic in the block that computes
estim_ms and increments estim_iters so it cannot run unbounded: either change
estim_iters to a wider integer type (e.g., int64_t) and/or add an explicit upper
bound check (compare against a safe MAX_ITERS constant or
std::numeric_limits<int>::max()/a guarded cap) and break when reached, and
ensure the estim_ms > work_limit*1000 check is preserved; locate and modify the
variables/loop referencing estim_iters, estim_ms, and work_limit inside the
do/while to add the guard and avoid signed overflow.
This PR implements the integration of GPU heuristics into the deterministic codepath of the solver. This enables running full end-to-end solves with presolve, parallel B&B, and GPU heuristics, with full run-to-run determinism and heterogeneous (CPU+GPU) execution.
For this purpose, the GPU heuristics were modified to eliminate sources of nondeterminism such as non-ordered floating point reductions, atomic operations, scheduling-dependent operations... Extensive testing has been added to the CI runs to ensure individual components (such as probing, FJ, recombiners, FP...) are bitwise deterministic, and their overall orchestration maintains this invariant.
Furthermore, to reduce changes to the codebase and splits in the codepath, wall-clock timers on the heuristics side were replaced with unified termination checking objects, that check for wall time (opportunistic mode), or work total (deterministic mode).
These timers are hierarchical, and thus allow for checking against multiple terminations sources, for example a local algo's work budget, the global wall clock time limit, or in future PRs user-controlled termination signals like Ctrl-C or callback handlers.
B&B acts as the authority for incumbent publishing to the user. At every horizon sync, the queue of GPU heuristic solutions is checked, and if solutions whose timestamp is <= the current B&B work total, they are drained, replayed, and incorporated into the tree exploration.
A new get_solution_callback_ext callback has been introduced to prevent ABI breaks and pass more metadata with each published solution. For now, only the solution origin and its work unit timestamp are exposed, but additional fields may be later added without breaking the ABI.
The benchmark runner now relies on such callbacks to record the incumbents and their timestamps - it is no longer necessary to parse the logs to compute the primal integral.
A few other bugfixes have been included to the existing B&B deterministic exploration regarding node queues. Other fixes include uninitialized memory accesses, and improvements to allow running compute-sanitizer initcheck without false positives. Determinism logs have also been added thorought the code, disabled by default.
Some features remain unsupported in deterministic mode, such as Papilo's Probing presolver, concurrent root solve, RINS, or the sub-MIP recombiner. These will be incorporated in later PRs.
Furthermore, the PR has been modified to take the ML estimators out for the sake of reducing code review workload. They will be incorporated again in later PRs, along with improved tuning.
No intentional API or ABI breaking changes introduced.
Opportunistic mode performance remains unchanged:
Primal gap 12.5%, 226 feasible, primal integral 19.1%.
Deterministic mode benchmarks pending.
Closes #144
Closes #882