Unified pseudocost object for the regular and deterministic mode#1020
Unified pseudocost object for the regular and deterministic mode#1020nguidotti wants to merge 10 commits intoNVIDIA:mainfrom
Conversation
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds BnB mode/constants and atomic helpers; parameterizes pseudo-costs and diving heuristics by mode; refactors strong-branching data flow and PDLP warm-cache usage; moves worker-pool logic to a new header; inlines node inactivity helper and removes Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/branch_and_bound/pseudo_costs.cpp (1)
835-874:⚠️ Potential issue | 🟡 MinorInitialize
averageseven whenreliable_threshold != 0.
averagesis only populated in thereliable_threshold == 0branch, but the final score recomputation can still fall back to it if one side stayed uninitialized after trial branching. That leaves this path reading indeterminate values.🩹 Minimal fix
- pseudo_cost_averages_t<i_t, f_t> averages; + pseudo_cost_averages_t<i_t, f_t> averages = compute_averages(); @@ - if (reliable_threshold == 0) { - averages = compute_averages(); + if (reliable_threshold == 0) { log.printf("PC: num initialized down %d up %d avg down %e up %e\n", averages.num_init_down, averages.num_init_up, averages.down_avg, averages.up_avg);Also applies to: 1056-1056
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 835 - 874, The variable averages (pseudo_cost_averages_t<i_t, f_t> averages) is only assigned inside the reliable_threshold == 0 branch but later code can use it when one side's pseudocosts remain uninitialized; call compute_averages() unconditionally (e.g., assign averages = compute_averages() right after declaring averages or before the if(reliable_threshold==0) block) so averages is always valid, and keep the existing log.printf call inside the reliable_threshold == 0 branch so logging behavior is unchanged; update references to averages accordingly (symbols: averages, compute_averages(), reliable_threshold).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 1471-1474: The abs_gap is incorrectly computed as lower_bound -
lower_bound making it zero; update both inner search loop locations (the block
that sets lower_bound, upper_bound, rel_gap, abs_gap and the similar block
around lines 1597-1600) to compute abs_gap as the difference between upper and
lower bounds (e.g., abs_gap = upper_bound - lower_bound or fabs(upper_bound -
lower_bound)) while keeping rel_gap = user_relative_gap(original_lp_,
upper_bound, lower_bound); adjust the two occurrences that reference variables
lower_bound, upper_bound, rel_gap, abs_gap so abs_gap reflects the true gap.
- Around line 2491-2493: The absolute-gap check uses the stale pre-cut
root_relax_objective, so abs_gap is computed incorrectly and can miss early
termination; change the abs_gap computation to use the cut-improved
root_objective_ (i.e., compute abs_gap = upper_bound_ - root_objective_) so the
check that compares abs_gap against settings_.absolute_mip_gap_tol (and the
related rel_gap computed via user_relative_gap(original_lp_,
upper_bound_.load(), root_objective_)) reflects the tightened root bound; update
the abs_gap declaration near where rel_gap, original_lp_, upper_bound_.load(),
root_relax_objective and root_objective_ are used in branch_and_bound.cpp
accordingly.
- Around line 391-408: The root_reduced_cost_fixing function currently can run
concurrently with the async root LP solve and reads/mutates original_lp_ and
uses root_objective_/root_relax_soln_.z before they are finalized; guard this by
checking a root-relaxation-complete flag (e.g., root_relaxation_done_ or
similar) or otherwise ensure the root solve is finished before proceeding, and
only then snapshot original_lp_.lower/upper under mutex_original_lp_ and call
reduced_cost_fixing; in short, add a precondition that verifies root_objective_
and root_relax_soln_.z are valid (or wait for completion) and only then
lock/copy original_lp_ and invoke reduced_cost_fixing from
root_reduced_cost_fixing to eliminate the race on original_lp_ and uninitialized
root data.
In `@cpp/src/branch_and_bound/diving_heuristics.cpp`:
- Around line 154-160: The guided diving loop reads incumbent[j] without
checking whether an incumbent snapshot exists; if incumbent is empty this can
read past bounds and crash. Fix by guarding access to incumbent in the loop over
fractional (e.g., check incumbent.empty() or incumbent.size() <= j) inside the
function that contains fractional, solution, and rounding_direction_t logic, and
when no incumbent is available compute dir using a safe fallback (for example
decide by f_down vs f_up or treat distances as infinite so rounding uses DOWN/UP
deterministically); ensure the same guarded logic is used wherever guided diving
relies on incumbent to avoid out-of-range reads.
In `@cpp/src/branch_and_bound/diving_heuristics.hpp`:
- Around line 25-33: The header must directly include constants.hpp before any
use of branch_and_bound_mode_t to avoid include-order dependencies: add an
`#include` "constants.hpp" at the top of
cpp/src/branch_and_bound/diving_heuristics.hpp (before the template declarations
for pseudocost_diving and guided_diving that use branch_and_bound_mode_t) and
apply the same change to cpp/src/branch_and_bound/pseudo_costs.hpp so
rounding_direction_t and branch_and_bound_mode_t are defined regardless of
include order.
In `@cpp/src/branch_and_bound/mip_node.hpp`:
- Around line 153-164: The get_variable_bounds flow currently overwrites
start_lower/start_upper with branch bounds and can undo previously tightened
bounds; change the logic so you intersect (lower[i] = max(lower[i],
branch_var_lower), upper[i] = min(upper[i], branch_var_upper)) instead of
assigning, update bounds_changed accordingly, and only then test feasibility
(upper >= lower) and return false if infeasible; apply the same
intersection/fetch-and-test fix where the same pattern appears (the other block
around update_branched_variable_bounds usage, e.g., lines referenced in the
comment), and ensure functions update_branched_variable_bounds /
get_variable_bounds use the lower/upper vectors (and bounds_changed flags) as
inputs to merge rather than skipping when bounds_changed[branch_var] is already
true.
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 574-576: The vectors strong_branch_down and strong_branch_up are
being initialized to 0 which treats unvisited candidates as zero-cost
observations; instead initialize them to NaN (e.g.
std::numeric_limits<f_t>::quiet_NaN()) wherever they are created (including the
other occurrences referenced) so untouched entries are marked missing, and
ensure the update/aggregation code that currently checks only for NaN continues
to skip those entries (and that any counters like num_strong_branches_completed
only reflect actual updates). This preserves the existing NaN-skip logic and
prevents zero-initialized pseudocosts from skewing reliability branching and
objective estimates.
In `@cpp/src/dual_simplex/phase2.cpp`:
- Around line 174-205: The CHECK_CHANGE_IN_REDUCED_COST debug block calls
phase2::compute_reduced_cost_update with outdated parameters (it references lp,
basic_list, nonbasic_list) and the helper was moved/renamed; update the debug
call to use the new helper signature (the standalone compute_delta_z or
new_compute_reduced_cost_update function) and pass the current variables used by
the non-debug path (e.g., delta_y_dense, leaving_index, direction,
delta_z_mark_check, delta_z_indices_check, delta_z_check, work_estimate) instead
of lp/basic_list/nonbasic_list, or adapt the helper call to
compute_delta_z(delta_y_dense, leaving_index, direction, delta_z_mark_check,
delta_z_indices_check, delta_z_check, work_estimate) so the debug path compiles
and validates the same sparse update as the main code.
---
Outside diff comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 835-874: The variable averages (pseudo_cost_averages_t<i_t, f_t>
averages) is only assigned inside the reliable_threshold == 0 branch but later
code can use it when one side's pseudocosts remain uninitialized; call
compute_averages() unconditionally (e.g., assign averages = compute_averages()
right after declaring averages or before the if(reliable_threshold==0) block) so
averages is always valid, and keep the existing log.printf call inside the
reliable_threshold == 0 branch so logging behavior is unchanged; update
references to averages accordingly (symbols: averages, compute_averages(),
reliable_threshold).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5683742e-2ed8-4385-ba08-a3f28add35cf
📒 Files selected for processing (24)
cpp/src/branch_and_bound/CMakeLists.txtcpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/branch_and_bound.hppcpp/src/branch_and_bound/branch_and_bound_worker.hppcpp/src/branch_and_bound/constants.hppcpp/src/branch_and_bound/deterministic_workers.hppcpp/src/branch_and_bound/diving_heuristics.cppcpp/src/branch_and_bound/diving_heuristics.hppcpp/src/branch_and_bound/mip_node.cppcpp/src/branch_and_bound/mip_node.hppcpp/src/branch_and_bound/pseudo_costs.cppcpp/src/branch_and_bound/pseudo_costs.hppcpp/src/branch_and_bound/reduced_cost_fixing.hppcpp/src/branch_and_bound/worker.hppcpp/src/branch_and_bound/worker_pool.hppcpp/src/dual_simplex/phase2.cppcpp/src/dual_simplex/phase2.hppcpp/src/dual_simplex/simplex_solver_settings.hppcpp/src/math_optimization/solver_settings.cucpp/src/mip_heuristics/diversity/lns/rins.cucpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuhcpp/src/mip_heuristics/solver.cucpp/src/utilities/omp_helpers.hppcpp/tests/mip/determinism_test.cu
💤 Files with no reviewable changes (3)
- cpp/src/branch_and_bound/mip_node.cpp
- cpp/src/branch_and_bound/CMakeLists.txt
- cpp/src/branch_and_bound/branch_and_bound_worker.hpp
…nd deterministic mode. Signed-off-by: Nicolas Guidotti <224634272+nguidotti@users.noreply.github.com>
Signed-off-by: Nicolas Guidotti <nguidotti@nvidia.com>
6ec1fe8 to
a517f13
Compare
Signed-off-by: Nicolas Guidotti <nguidotti@nvidia.com>
|
/ok to test f31599c |
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
Signed-off-by: Nicolas Guidotti <nguidotti@nvidia.com>
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
|
/ok to test 202738f |
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
cpp/src/branch_and_bound/diving_heuristics.cpp (1)
154-160:⚠️ Potential issue | 🔴 CriticalGuard guided diving when no incumbent snapshot exists.
incumbent[j]is still unguarded here.cpp/src/branch_and_bound/branch_and_bound.cppbroadcasts an emptyincumbent_snapshotbefore the first feasible solution in deterministic mode, so the first guided dive can read past the end of this vector and crash.As per coding guidelines, "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)".🐛 Minimal hardening
- f_t down_dist = std::abs(incumbent[j] - std::floor(solution[j])); - f_t up_dist = std::abs(std::ceil(solution[j]) - incumbent[j]); - rounding_direction_t dir = - down_dist < up_dist + eps ? rounding_direction_t::DOWN : rounding_direction_t::UP; + rounding_direction_t dir; + if (j < static_cast<i_t>(incumbent.size())) { + f_t down_dist = std::abs(incumbent[j] - std::floor(solution[j])); + f_t up_dist = std::abs(std::ceil(solution[j]) - incumbent[j]); + dir = down_dist < up_dist + eps ? rounding_direction_t::DOWN : rounding_direction_t::UP; + } else { + dir = f_down < f_up + eps ? rounding_direction_t::DOWN : rounding_direction_t::UP; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/diving_heuristics.cpp` around lines 154 - 160, The loop in diving_heuristics.cpp assumes incumbent has elements and reads incumbent[j] without checking, causing out-of-bounds when no incumbent snapshot exists; modify the loop that iterates over fractional to first check that incumbent.size() > j (or !incumbent.empty()) before using incumbent[j], and if no incumbent is available fall back to an unguided behavior (e.g., choose rounding_direction_t based solely on fractional part f_down/f_up or default to DOWN/UP) so guided diving is skipped safely; update references in this code block where rounding_direction_t dir is assigned to use the guarded logic involving incumbent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1247-1252: The pseudocost sums are stored in integer type i_t
(sum_up, sum_down) causing integer truncation and integer division; change the
local sum_up and sum_down to f_t (or cast them to f_t before division) so that
pc_up and pc_down compute floating-point averages correctly; update the
variables around pseudo_cost_sum_up, pseudo_cost_sum_down, pseudo_cost_num_up,
pseudo_cost_num_down and ensure pc_up/pc_down use floating division (e.g., f_t
sum_up = pseudo_cost_sum_up[j]; pc_up = num_up>0 ? sum_up / (f_t)num_up :
averages.up_avg) to prevent loss of precision.
- Line 1357: The variable `averages` (type pseudo_cost_averages_t<i_t, f_t>) may
remain default-constructed when `reliable_threshold != 0`, causing undefined
data to be used in fallback rescoring paths; initialize or reset `averages` to a
known safe state before any early exit or before rescoring (the rescoring code
paths around the fallback at the areas referenced near the rescoring calls) by
explicitly constructing or calling its reset/initialize routine (or set its
member counts/averages to zero) immediately after declaration and also whenever
a time-limited/failed trial branch occurs; ensure the code handling
`reliable_threshold` checks and the functions that perform rescoring (the
sections that read from `averages` at re-score points) see the initialized
state.
- Around line 1284-1303: The loop that computes averages increments
num_initialized_down/up even when pseudo_cost_sum_down/up[j] is non-finite,
biasing averages; change the logic in the loop so you only increment
num_initialized_down when pseudo_cost_sum_down[j] is finite (same for
num_initialized_up with pseudo_cost_sum_up[j]) before adding sum/num to
pseudo_cost_down_avg/pseudo_cost_up_avg; alternatively clamp or skip non-finite
sums before counting; ensure the final pseudo_cost_averages_t<i_t,f_t>
construction uses these corrected counts so down_avg/up_avg divide only by the
number of finite entries.
In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Around line 238-272: The snapshot assignment functions in
pseudo_cost_snapshot_t (the two operator= overloads and the
pseudo_cost_snapshot_t(const Base&)) only copy the four pseudo_cost arrays
leaving inherited members (pseudo_cost_mutex_*, AT,
reliability_branching_settings, strong_branching_lp_iter, etc.) at constructor
defaults which breaks reliable_variable_selection(); fix by calling
Base::resize(n) (or Base::resize(other.size()) as appropriate) before copying
the arrays and then copy all inherited state that reliable_variable_selection()
reads (mutex vectors, AT/transpose state, reliability_branching_settings,
strong_branching_lp_iter and any other Base members accessed by
selection/ranking). Ensure both operator=(const pseudo_costs_t<...>&) and
operator=(const Base&) perform the resize and full-member copy to produce a
deterministic snapshot.
In `@cpp/src/branch_and_bound/worker_pool.hpp`:
- Around line 58-64: The is_active flag on branch_and_bound_worker_t is accessed
concurrently by return_worker_to_pool(), get_lower_bound(), and
broadcast_root_bounds_change() causing UB; update the declaration of is_active
in cpp/src/branch_and_bound/worker.hpp to std::atomic<bool> (or alternatively
ensure all reads/writes occur under the same mutex), then adjust
return_worker_to_pool() to store(false) atomically (or keep the existing
assignment but performed while holding the mutex_), and update any reads in
get_lower_bound() and broadcast_root_bounds_change() to use atomic load() (or be
performed under the same lock) so all accesses are synchronized; apply the same
atomic/locking change consistently for any other places that read/write
is_active in this component.
---
Duplicate comments:
In `@cpp/src/branch_and_bound/diving_heuristics.cpp`:
- Around line 154-160: The loop in diving_heuristics.cpp assumes incumbent has
elements and reads incumbent[j] without checking, causing out-of-bounds when no
incumbent snapshot exists; modify the loop that iterates over fractional to
first check that incumbent.size() > j (or !incumbent.empty()) before using
incumbent[j], and if no incumbent is available fall back to an unguided behavior
(e.g., choose rounding_direction_t based solely on fractional part f_down/f_up
or default to DOWN/UP) so guided diving is skipped safely; update references in
this code block where rounding_direction_t dir is assigned to use the guarded
logic involving incumbent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 05d6cf49-3b1c-4948-b2b7-7746ad398df9
📒 Files selected for processing (14)
cpp/src/branch_and_bound/CMakeLists.txtcpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/branch_and_bound.hppcpp/src/branch_and_bound/constants.hppcpp/src/branch_and_bound/deterministic_workers.hppcpp/src/branch_and_bound/diving_heuristics.cppcpp/src/branch_and_bound/diving_heuristics.hppcpp/src/branch_and_bound/mip_node.cppcpp/src/branch_and_bound/mip_node.hppcpp/src/branch_and_bound/pseudo_costs.cppcpp/src/branch_and_bound/pseudo_costs.hppcpp/src/branch_and_bound/worker.hppcpp/src/branch_and_bound/worker_pool.hppcpp/src/utilities/omp_helpers.hpp
💤 Files with no reviewable changes (2)
- cpp/src/branch_and_bound/CMakeLists.txt
- cpp/src/branch_and_bound/mip_node.cpp
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
cpp/src/branch_and_bound/pseudo_costs.cpp (1)
1247-1256:⚠️ Potential issue | 🟠 MajorPseudocost sums are still truncated to integer type.
The local variables
sum_upandsum_downare declared asi_t(integer), butpseudo_cost_sum_up[j]andpseudo_cost_sum_down[j]are floating-point accumulators. This truncates the sums before division, producing incorrect average pseudocosts.🛠️ Proposed fix
i_t num_up = pseudo_cost_num_up[j]; - i_t sum_up = pseudo_cost_sum_up[j]; + f_t sum_up = pseudo_cost_sum_up[j]; i_t num_down = pseudo_cost_num_down[j]; - i_t sum_down = pseudo_cost_sum_down[j]; + f_t sum_down = pseudo_cost_sum_down[j]; f_t pc_up = num_up > 0 ? sum_up / num_up : averages.up_avg; f_t pc_down = num_down > 0 ? sum_down / num_down : averages.down_avg;As per coding guidelines: "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1247 - 1256, The bug is that sum_up and sum_down are declared as integer i_t and thus truncate floating-point accumulators pseudo_cost_sum_up[j] and pseudo_cost_sum_down[j]; change the types of sum_up and sum_down to the floating-point type f_t (matching pseudo_cost_sum_up/ pseudo_cost_sum_down) so pc_up and pc_down compute correct floating-point averages (ensure divisions use f_t to avoid integer division and keep using averages.up_avg/averages.down_avg and eps as before).cpp/src/branch_and_bound/diving_heuristics.cpp (1)
154-160:⚠️ Potential issue | 🔴 CriticalAdd bounds validation before accessing incumbent array in guided_diving().
The code accesses
incumbent[j]at lines 157-158 wherejcomes fromfractional, with only an.empty()check at the caller site. This is insufficient—incumbentmay be undersized if populated from a transformed problem context. Add a defensive size check similar to line 100 inpseudocost_diving():if (j >= incumbent.size()) { // fallback or error handling }Alternatively, validate that
incumbent.size() == solution.size()before the loop.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/diving_heuristics.cpp` around lines 154 - 160, guided_diving() currently indexes incumbent[j] for j from fractional without verifying incumbent size; add a defensive check before the loop or at loop entry to ensure incumbent.size() == solution.size() (or that each j < incumbent.size()), and if the check fails handle gracefully (e.g., skip rounding logic, fallback to default rounding_direction_t, or log/return an error) similar to the validation used in pseudocost_diving() around line 100; update the usage of fractional, solution, and incumbent in guided_diving() so all accesses to incumbent[j] are guarded by the size check.
🧹 Nitpick comments (3)
cpp/src/branch_and_bound/branch_and_bound.cpp (1)
1190-1191: Prefer a single owner for the recompute flags.
update_tree_impl()now writesrecompute_basis/recompute_bounds, and both callers immediately overwrite the same state fromnode_statusright afterupdate_tree(). Keeping that transition in one place would make the branch/dive state machine easier to reason about.Also applies to: 1252-1253
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 1190 - 1191, The recompute flags are being written in multiple places; make update_tree_impl() the sole owner by moving the assignments for worker->recompute_basis and worker->recompute_bounds into update_tree_impl() and remove the duplicate writes from its callers (the places that currently overwrite them from node_status after calling update_tree()). In update_tree_impl(), set the flags based on the node_status value passed in (or the same source the callers used) so callers no longer touch recompute_basis/recompute_bounds and the branch/dive state is only updated in update_tree_impl().cpp/src/branch_and_bound/pseudo_costs.hpp (1)
238-273: Snapshot assignment copies only core arrays, not full state.The assignment operators copy only
pseudo_cost_{sum,num}_{down,up}arrays. Inherited members likeAT,reliability_branching_settings, andstrong_branching_lp_iterremain at constructor defaults, and mutex vectors stay at size 1.For current usage (diving heuristics calling
compute_averages()andcalculate_pseudocost_score()), this is sufficient. However, callingreliable_variable_selection()on a snapshot would fail due to undersized mutex vectors and uninitializedAT.Consider adding a comment or static_assert to document that snapshots are only intended for diving heuristics:
// Note: Snapshots only copy core pseudo-cost arrays. Do NOT call // reliable_variable_selection() or other methods requiring AT/mutex state.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.hpp` around lines 238 - 273, The pseudo_cost_snapshot_t copy operations (pseudo_cost_snapshot_t::operator= overloads and constructors) only copy pseudo_cost_{sum,num}_{down,up} arrays and leave inherited state like AT, reliability_branching_settings, strong_branching_lp_iter and mutex vectors at defaults, which can break callers such as reliable_variable_selection(); update the pseudo_cost_snapshot_t declaration/constructors to include a clear comment (or a static_assert) stating that snapshots are intended only for diving heuristics and only safe to call compute_averages() and calculate_pseudocost_score(), and explicitly warn that methods requiring AT or full mutex state (e.g., reliable_variable_selection()) must not be called on snapshots so reviewers and future maintainers see the constraint.cpp/src/branch_and_bound/worker_pool.hpp (1)
104-116: Minor: Unconventional template parameter order.The template parameters are
<f_t, i_t>instead of the conventional<i_t, f_t>used elsewhere in the codebase. While functional, this inconsistency could cause confusion.♻️ Suggested fix
-template <typename f_t, typename i_t> +template <typename i_t, typename f_t> std::vector<search_strategy_t> get_search_strategies( diving_heuristics_settings_t<i_t, f_t> settings)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/worker_pool.hpp` around lines 104 - 116, The template parameter order for get_search_strategies is inconsistent with the rest of the codebase (currently template <typename f_t, typename i_t>); change it to the conventional order template <typename i_t, typename f_t> and update the function signature and any internal uses to match, ensuring diving_heuristics_settings_t<i_t, f_t> still receives the same types; this keeps get_search_strategies' template parameter order consistent with other functions and types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 1116-1142: The default case in the switch over
this->worker.diving_type currently logs an error and returns an invalid pair
{-1, rounding_direction_t::NONE}, which can lead update_tree_impl() to use an
out-of-range index; replace that return with a fail-fast error (throw
std::runtime_error or std::logic_error with a clear message) or, if preferred,
call a safe fallback selector (e.g. invoke pseudocost_diving with
this->worker.pc_snapshot and the same args) so the function never returns an
invalid (branch_var, dir); update the default branch in branch_and_bound.cpp
(the switch handling diving types) accordingly and ensure callers like
update_tree_impl() now receive a valid pair or the exception is propagated.
In `@cpp/src/branch_and_bound/deterministic_workers.hpp`:
- Around line 61-62: The worker-side snapshot uses a narrower integer type
causing truncation: change deterministic_worker_base_t::total_lp_iters_snapshot
from i_t to int64_t (matching the struct's total_lp_iters) and update any
related member declarations, constructors, and assignments that reference
total_lp_iters_snapshot so they use int64_t; also audit uses inside
deterministic_dive() (and any places computing local iteration limits from the
snapshot) to ensure all arithmetic and comparisons are done with int64_t to
prevent truncation.
---
Duplicate comments:
In `@cpp/src/branch_and_bound/diving_heuristics.cpp`:
- Around line 154-160: guided_diving() currently indexes incumbent[j] for j from
fractional without verifying incumbent size; add a defensive check before the
loop or at loop entry to ensure incumbent.size() == solution.size() (or that
each j < incumbent.size()), and if the check fails handle gracefully (e.g., skip
rounding logic, fallback to default rounding_direction_t, or log/return an
error) similar to the validation used in pseudocost_diving() around line 100;
update the usage of fractional, solution, and incumbent in guided_diving() so
all accesses to incumbent[j] are guarded by the size check.
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1247-1256: The bug is that sum_up and sum_down are declared as
integer i_t and thus truncate floating-point accumulators pseudo_cost_sum_up[j]
and pseudo_cost_sum_down[j]; change the types of sum_up and sum_down to the
floating-point type f_t (matching pseudo_cost_sum_up/ pseudo_cost_sum_down) so
pc_up and pc_down compute correct floating-point averages (ensure divisions use
f_t to avoid integer division and keep using averages.up_avg/averages.down_avg
and eps as before).
---
Nitpick comments:
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 1190-1191: The recompute flags are being written in multiple
places; make update_tree_impl() the sole owner by moving the assignments for
worker->recompute_basis and worker->recompute_bounds into update_tree_impl() and
remove the duplicate writes from its callers (the places that currently
overwrite them from node_status after calling update_tree()). In
update_tree_impl(), set the flags based on the node_status value passed in (or
the same source the callers used) so callers no longer touch
recompute_basis/recompute_bounds and the branch/dive state is only updated in
update_tree_impl().
In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Around line 238-273: The pseudo_cost_snapshot_t copy operations
(pseudo_cost_snapshot_t::operator= overloads and constructors) only copy
pseudo_cost_{sum,num}_{down,up} arrays and leave inherited state like AT,
reliability_branching_settings, strong_branching_lp_iter and mutex vectors at
defaults, which can break callers such as reliable_variable_selection(); update
the pseudo_cost_snapshot_t declaration/constructors to include a clear comment
(or a static_assert) stating that snapshots are intended only for diving
heuristics and only safe to call compute_averages() and
calculate_pseudocost_score(), and explicitly warn that methods requiring AT or
full mutex state (e.g., reliable_variable_selection()) must not be called on
snapshots so reviewers and future maintainers see the constraint.
In `@cpp/src/branch_and_bound/worker_pool.hpp`:
- Around line 104-116: The template parameter order for get_search_strategies is
inconsistent with the rest of the codebase (currently template <typename f_t,
typename i_t>); change it to the conventional order template <typename i_t,
typename f_t> and update the function signature and any internal uses to match,
ensuring diving_heuristics_settings_t<i_t, f_t> still receives the same types;
this keeps get_search_strategies' template parameter order consistent with other
functions and types.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 23581b53-dbd3-46eb-b680-de9db83d0eb2
📒 Files selected for processing (14)
cpp/src/branch_and_bound/CMakeLists.txtcpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/branch_and_bound.hppcpp/src/branch_and_bound/constants.hppcpp/src/branch_and_bound/deterministic_workers.hppcpp/src/branch_and_bound/diving_heuristics.cppcpp/src/branch_and_bound/diving_heuristics.hppcpp/src/branch_and_bound/mip_node.cppcpp/src/branch_and_bound/mip_node.hppcpp/src/branch_and_bound/pseudo_costs.cppcpp/src/branch_and_bound/pseudo_costs.hppcpp/src/branch_and_bound/worker.hppcpp/src/branch_and_bound/worker_pool.hppcpp/src/utilities/omp_helpers.hpp
💤 Files with no reviewable changes (2)
- cpp/src/branch_and_bound/CMakeLists.txt
- cpp/src/branch_and_bound/mip_node.cpp
There was a problem hiding this comment.
♻️ Duplicate comments (3)
cpp/src/branch_and_bound/worker_pool.hpp (1)
58-89:⚠️ Potential issue | 🔴 CriticalSynchronize
is_activeacross pool handoff and readers.
return_worker_to_pool()writesworker->is_activebefore taking the pool mutex, whileget_lower_bound()andbroadcast_root_bounds_change()read it lock-free. Ifbranch_and_bound_worker_t::is_activeis still a plainbool, the pool can miss activeBEST_FIRSTworkers and compute a too-optimistic lower bound.Run this to confirm the declaration and access pattern:
#!/bin/bash set -euo pipefail fd -p 'worker.hpp' cpp/src/branch_and_bound | while read -r file; do echo "== $file ==" sed -n '1,220p' "$file" done echo echo "== is_active accesses ==" rg -n -C2 '\bis_active\b' cpp/src/branch_and_boundExpected:
is_activeis atomic, or every read/write goes through the same mutex.As per coding guidelines, "Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/worker_pool.hpp` around lines 58 - 89, return_worker_to_pool sets branch_and_bound_worker_t::is_active without holding the pool mutex while get_lower_bound and broadcast_root_bounds_change read is_active lock-free, causing a data race; fix by making branch_and_bound_worker_t::is_active an std::atomic<bool> (or ensure all reads/writes go through the same mutex) and update uses in return_worker_to_pool, get_lower_bound, and broadcast_root_bounds_change to use atomic store/load (or guarded accesses) so active-state visibility is synchronized across threads.cpp/src/branch_and_bound/branch_and_bound.cpp (1)
1116-1142:⚠️ Potential issue | 🔴 CriticalDon't return an invalid branch tuple from the default case.
This only stays safe while the
assert(branch_var >= 0)/assert(dir != rounding_direction_t::NONE)checks remain enabled. In release builds, an unexpected enum still flows intoleaf_solution.x[branch_var]andsearch_tree.branch(...).🐛 Minimal fallback
- default: CUOPT_LOG_ERROR("Invalid diving method!"); return {-1, rounding_direction_t::NONE}; + default: + CUOPT_LOG_ERROR("Invalid diving method; falling back to pseudocost diving."); + return pseudocost_diving( + this->worker.pc_snapshot, fractional, x, *this->worker.root_solution, log);As per coding guidelines, "Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 1116 - 1142, The default switch branch currently returns an invalid tuple ({-1, rounding_direction_t::NONE}) when worker.diving_type is unrecognized; instead log the unexpected diving_type and perform a safe fallback (e.g. call pseudocost_diving with this->worker.pc_snapshot, fractional, x, *this->worker.root_solution, log) or throw a runtime error so we never propagate an invalid branch_var/rounding_direction_t into later code; update the default case to either call pseudocost_diving(...) as a safe fallback or raise an exception (and remove the {-1, NONE} return), keeping the CUOPT_LOG_ERROR call for context.cpp/src/branch_and_bound/pseudo_costs.hpp (1)
238-272:⚠️ Potential issue | 🟠 MajorCopy the full inherited state into snapshots.
These constructors/assignments still copy only the four sum/count arrays. The deterministic snapshot broadcast then carries default
AT, reliability settings,strong_branching_lp_iter,pdlp_warm_cache, and 1-entry mutex vectors fromBase(1)instead of the livepc_state.🛠️ Minimal fix
pseudo_cost_snapshot_t& operator=( const pseudo_costs_t<i_t, f_t, branch_and_bound_mode_t::PARALLEL>& other) { i_t n = other.pseudo_cost_num_down.size(); - Base::pseudo_cost_num_down.resize(n); - Base::pseudo_cost_num_up.resize(n); - Base::pseudo_cost_sum_down.resize(n); - Base::pseudo_cost_sum_up.resize(n); + Base::resize(n); + Base::AT = other.AT; + Base::reliability_branching_settings = other.reliability_branching_settings; + Base::strong_branching_lp_iter = other.strong_branching_lp_iter.get_no_atomic(); + Base::pdlp_warm_cache = other.pdlp_warm_cache; for (i_t i = 0; i < n; ++i) { Base::pseudo_cost_num_down[i] = other.pseudo_cost_num_down[i].get_no_atomic(); Base::pseudo_cost_num_up[i] = other.pseudo_cost_num_up[i].get_no_atomic(); Base::pseudo_cost_sum_down[i] = other.pseudo_cost_sum_down[i].get_no_atomic(); Base::pseudo_cost_sum_up[i] = other.pseudo_cost_sum_up[i].get_no_atomic(); } return *this; } pseudo_cost_snapshot_t& operator=(const Base& other) { if (this != &other) { + Base::resize(other.pseudo_cost_num_down.size()); + Base::AT = other.AT; + Base::reliability_branching_settings = other.reliability_branching_settings; + Base::strong_branching_lp_iter = other.strong_branching_lp_iter; + Base::pdlp_warm_cache = other.pdlp_warm_cache; Base::pseudo_cost_num_down = other.pseudo_cost_num_down; Base::pseudo_cost_num_up = other.pseudo_cost_num_up; Base::pseudo_cost_sum_down = other.pseudo_cost_sum_down; Base::pseudo_cost_sum_up = other.pseudo_cost_sum_up; }As per coding guidelines, "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.hpp` around lines 238 - 272, The snapshot constructors/assignments only copy the four arrays and leave other Base state at Base(1); fix by delegating to Base's copy semantics so the full inherited state (AT, reliability settings, strong_branching_lp_iter, pdlp_warm_cache, mutex vectors, etc.) is copied. Concretely, change the constructors to initialize the base with the source (e.g., pseudo_cost_snapshot_t(const pseudo_costs_t<...>& other) : Base(other) { ... } and pseudo_cost_snapshot_t(const Base& other) : Base(other) { ... }) or call Base::operator=(other) at the start of the assignment, and in operator=(const pseudo_costs_t<...>& other) invoke Base::operator=(other) before copying the arrays so all Base members are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 1116-1142: The default switch branch currently returns an invalid
tuple ({-1, rounding_direction_t::NONE}) when worker.diving_type is
unrecognized; instead log the unexpected diving_type and perform a safe fallback
(e.g. call pseudocost_diving with this->worker.pc_snapshot, fractional, x,
*this->worker.root_solution, log) or throw a runtime error so we never propagate
an invalid branch_var/rounding_direction_t into later code; update the default
case to either call pseudocost_diving(...) as a safe fallback or raise an
exception (and remove the {-1, NONE} return), keeping the CUOPT_LOG_ERROR call
for context.
In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Around line 238-272: The snapshot constructors/assignments only copy the four
arrays and leave other Base state at Base(1); fix by delegating to Base's copy
semantics so the full inherited state (AT, reliability settings,
strong_branching_lp_iter, pdlp_warm_cache, mutex vectors, etc.) is copied.
Concretely, change the constructors to initialize the base with the source
(e.g., pseudo_cost_snapshot_t(const pseudo_costs_t<...>& other) : Base(other) {
... } and pseudo_cost_snapshot_t(const Base& other) : Base(other) { ... }) or
call Base::operator=(other) at the start of the assignment, and in
operator=(const pseudo_costs_t<...>& other) invoke Base::operator=(other) before
copying the arrays so all Base members are preserved.
In `@cpp/src/branch_and_bound/worker_pool.hpp`:
- Around line 58-89: return_worker_to_pool sets
branch_and_bound_worker_t::is_active without holding the pool mutex while
get_lower_bound and broadcast_root_bounds_change read is_active lock-free,
causing a data race; fix by making branch_and_bound_worker_t::is_active an
std::atomic<bool> (or ensure all reads/writes go through the same mutex) and
update uses in return_worker_to_pool, get_lower_bound, and
broadcast_root_bounds_change to use atomic store/load (or guarded accesses) so
active-state visibility is synchronized across threads.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d48a9d23-476c-40ff-ae9e-877492c7602a
📒 Files selected for processing (14)
cpp/src/branch_and_bound/CMakeLists.txtcpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/branch_and_bound.hppcpp/src/branch_and_bound/constants.hppcpp/src/branch_and_bound/deterministic_workers.hppcpp/src/branch_and_bound/diving_heuristics.cppcpp/src/branch_and_bound/diving_heuristics.hppcpp/src/branch_and_bound/mip_node.cppcpp/src/branch_and_bound/mip_node.hppcpp/src/branch_and_bound/pseudo_costs.cppcpp/src/branch_and_bound/pseudo_costs.hppcpp/src/branch_and_bound/worker.hppcpp/src/branch_and_bound/worker_pool.hppcpp/src/utilities/omp_helpers.hpp
💤 Files with no reviewable changes (2)
- cpp/src/branch_and_bound/CMakeLists.txt
- cpp/src/branch_and_bound/mip_node.cpp
… shared_ptr to avoid unnecessary copy. Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
|
/ok to test 4aed76c |
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
|
/ok to test a5c111d |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cpp/src/branch_and_bound/worker_pool.hpp (1)
114-114: Caststd::maxresult toi_tfor type consistency.
strategies.size()returnssize_t, butnum_workersisi_t. The ternary andstd::maxmay produce asize_tresult that gets implicitly narrowed when assigned tobfs_workersof typei_t.🔧 Suggested fix
- i_t bfs_workers = std::max(strategies.size() == 1 ? num_workers : num_workers / 4, 1); + i_t bfs_workers = static_cast<i_t>(std::max<i_t>(strategies.size() == 1 ? num_workers : num_workers / 4, 1));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/worker_pool.hpp` at line 114, The assignment to bfs_workers uses std::max with operands of mixed types (strategies.size() yields size_t while num_workers is i_t) which can lead to implicit narrowing; update the expression so the result is explicitly cast to i_t (or ensure both operands are i_t) before assigning to bfs_workers. Locate the statement initializing bfs_workers and change it to compute the max with matching integral types (e.g., cast the std::max result or cast num_workers/1 to i_t) so bfs_workers (type i_t) receives an i_t value without implicit conversion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cpp/src/branch_and_bound/worker_pool.hpp`:
- Line 114: The assignment to bfs_workers uses std::max with operands of mixed
types (strategies.size() yields size_t while num_workers is i_t) which can lead
to implicit narrowing; update the expression so the result is explicitly cast to
i_t (or ensure both operands are i_t) before assigning to bfs_workers. Locate
the statement initializing bfs_workers and change it to compute the max with
matching integral types (e.g., cast the std::max result or cast num_workers/1 to
i_t) so bfs_workers (type i_t) receives an i_t value without implicit
conversion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 07432795-ba3c-4a7d-870c-73c82825ade2
📒 Files selected for processing (5)
cpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/deterministic_workers.hppcpp/src/branch_and_bound/pseudo_costs.cppcpp/src/branch_and_bound/pseudo_costs.hppcpp/src/branch_and_bound/worker_pool.hpp
🚧 Files skipped from review as they are similar to previous changes (2)
- cpp/src/branch_and_bound/deterministic_workers.hpp
- cpp/src/branch_and_bound/pseudo_costs.cpp
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
|
/ok to test c433e41 |
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
|
/ok to test 3676432 |
aliceb-nv
left a comment
There was a problem hiding this comment.
LGTM, amazing work Nicolas :) It is good to see a PR with negative net LoC changes.
Let's get some input from Chris, and benchmark to ensure no regressions.
| worker->recompute_basis = true; | ||
| worker->recompute_bounds = true; |
There was a problem hiding this comment.
Maybe this could be turned into a member function of worker (or a policy function)? something like invalidate_basis or something of the sort (probably not an ideal suggestion, but I feel like we could be higher-level here)
| // Get the underlying value without atomics | ||
| T& get_no_atomic() { return val; } | ||
|
|
||
| T get_no_atomic() const { return val; } | ||
|
|
There was a problem hiding this comment.
Maybe "underlying"? Since we're returning a reference to the underlying object in memory
This PR simplify the pseudo cost class in such a way that both regular and deterministic B&B use the same code path as much as possible. It also disable mutexes and atomics when running in deterministic mode as each thread has its own snapshot of the pseudocost. It also move the routines related with the diving heuristics back to the
diving_heuristics.cpp, renamedbranch_and_bound_worker.hpptoworker.hppto match the new file structure and movedworker_pool_tto a dedicated header.Pending: MIPLIB benchmark
Checklist