Skip to content

Price-reward: strengthen price signal and add negative-price overdrive#27

Merged
rbx merged 2 commits intoFairRootGroup:masterfrom
enlorenz:reward
Mar 2, 2026
Merged

Price-reward: strengthen price signal and add negative-price overdrive#27
rbx merged 2 commits intoFairRootGroup:masterfrom
enlorenz:reward

Conversation

@enlorenz
Copy link
Contributor

@enlorenz enlorenz commented Mar 2, 2026

  • keep legacy price reward path for ablation/debugging
  • rework active price reward to use saturating price-advantage and job-volume terms
  • normalize active price reward against a fixed [0, 1] baseline range
  • add a dedicated negative-price overdrive branch for work done at < 0 prices
  • decouple overdrive job scaling via NEGATIVE_PRICE_JOB_TAU so negative-price reward ramps early even when PRICE_JOB_TAU is large

- keep legacy price reward path for ablation/debugging
- rework active price reward to use saturating price-advantage and job-volume terms
- normalize active price reward against a fixed [0, 1] baseline range
- add a dedicated negative-price overdrive branch for work done at < 0 prices
- decouple overdrive job scaling via NEGATIVE_PRICE_JOB_TAU so negative-price reward ramps early even when PRICE_JOB_TAU is large
@enlorenz enlorenz marked this pull request as draft March 2, 2026 10:49
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 2, 2026

📝 Walkthrough

Walkthrough

Prices shift application in src/prices.py is gated behind a new ENABLE_PRICE_SHIFT flag, disabling it by default. src/reward_calculation.py adds multiple price-related hyperparameters, a price-context helper, reinstates legacy price reward paths, and introduces an active price reward implementation (fast saturation and negative-price overdrive). src/environment.py adapts to the renamed price reward return.

Changes

Cohort / File(s) Summary
Price Initialization
src/prices.py
Added public boolean ENABLE_PRICE_SHIFT and changed the price_shift application guard so price shifting is disabled by default (temporary hard-block comment).
Reward Calculation — Price Logic
src/reward_calculation.py
Added new price hyperparameters/constants, _price_context_average, legacy reward methods (_reward_price_legacy, _reward_price_normalized_legacy), new active reward method (_reward_price), renamed/refactored normalization paths and bounds, and updated calculate to return price_reward instead of price_reward_norm.
Environment adaption
src/environment.py
Updated usage to accept and propagate the renamed price_reward return value (replacing prior price_reward_norm references) for metrics and plotting.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • RewardCalculator #6 — Prior RewardCalculator refactor that changed price-reward APIs and bounds; likely directly related to the added/renamed price-reward methods and environment usage.
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Price-reward: strengthen price signal and add negative-price overdrive' directly summarizes the main changes: improving price reward calculation and introducing negative-price overdrive logic.
Description check ✅ Passed The description provides detailed points about the changes: legacy path retention, active price reward rework, normalization, negative-price overdrive, and job scaling decoupling—all aligned with the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@enlorenz enlorenz marked this pull request as ready for review March 2, 2026 10:50
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/reward_calculation.py (1)

125-125: Remove the stray no-op string literal in _reward_price.

Line 125 is an extra standalone string literal, which is not used and reads like a second docstring.

🧹 Proposed cleanup
-        """Calculates rewards with pre-computed normalization bounds."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/reward_calculation.py` at line 125, Remove the stray standalone string
literal inside the _reward_price function (it's acting like an accidental second
docstring and does nothing); locate the _reward_price definition and delete that
extra quoted string so the function contains only its intended docstring (if
any) and functional code.
src/prices.py (1)

27-28: Replace the always-false gate with an explicit feature toggle.

Line 28 (min_price < 1 and 0) permanently disables this path in a non-obvious way. Use a named flag so intent is explicit and safe to revert after testing.

♻️ Proposed refactor
-            # hard block this for now during testing to keep things simple, but we could consider allowing it in the future with proper handling.
-            if min_price < 1 and 0:
+            # Temporary test toggle: keep raw prices (including negatives) unless explicitly enabled.
+            APPLY_PRICE_SHIFT = False
+            if min_price < 1 and APPLY_PRICE_SHIFT:
                 self.price_shift = 1 - min_price
                 prices = prices + self.price_shift
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/prices.py` around lines 27 - 28, The condition "if min_price < 1 and 0"
is an always-false gate and should be replaced with a clear feature toggle:
introduce a named boolean flag (e.g., ALLOW_MIN_PRICE_BELOW_ONE or
ENABLE_MIN_PRICE_OVERRIDE) and use it in the conditional as "if min_price < 1
and <FLAG>". Update the check around the min_price logic in src/prices.py to
read the flag (module-level constant or config/env-driven) so the path can be
safely enabled/disabled and is obvious to future readers; ensure any tests or
callers referencing this behavior use the new flag.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/reward_calculation.py`:
- Around line 55-60: The legacy normalization uses a fixed 0.0 min which is
wrong because _reward_price_legacy can be negative (context_avg -
current_price); update initialization so _min_price_reward_legacy =
-_max_price_reward_legacy (compute _max_price_reward_legacy via
_reward_price_legacy(self.prices.MIN_PRICE, self.prices.MAX_PRICE, ...)) so the
min/max reflect symmetric extremes, and change _reward_price_normalized_legacy
to normalize using (raw - _min_price_reward_legacy) / (_max_price_reward_legacy
- _min_price_reward_legacy) and clamp the result to [0,1]; reference symbols:
_min_price_reward_legacy, _max_price_reward_legacy, _reward_price_legacy,
_reward_price_normalized_legacy, context_avg, current_price.

---

Nitpick comments:
In `@src/prices.py`:
- Around line 27-28: The condition "if min_price < 1 and 0" is an always-false
gate and should be replaced with a clear feature toggle: introduce a named
boolean flag (e.g., ALLOW_MIN_PRICE_BELOW_ONE or ENABLE_MIN_PRICE_OVERRIDE) and
use it in the conditional as "if min_price < 1 and <FLAG>". Update the check
around the min_price logic in src/prices.py to read the flag (module-level
constant or config/env-driven) so the path can be safely enabled/disabled and is
obvious to future readers; ensure any tests or callers referencing this behavior
use the new flag.

In `@src/reward_calculation.py`:
- Line 125: Remove the stray standalone string literal inside the _reward_price
function (it's acting like an accidental second docstring and does nothing);
locate the _reward_price definition and delete that extra quoted string so the
function contains only its intended docstring (if any) and functional code.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70b4aa5 and 630fbb5.

📒 Files selected for processing (2)
  • src/prices.py
  • src/reward_calculation.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/reward_calculation.py (1)

65-70: ⚠️ Potential issue | 🟠 Major

Fix legacy price normalization bounds/clamping to avoid out-of-range values.

At Line 65, _min_price_reward_legacy is fixed at 0.0, but Line 117 can produce negative raw legacy reward. This makes Line 125 return values below 0 for valid inputs.

🐛 Proposed fix
-        self._min_price_reward_legacy = 0.0
+        self._min_price_reward_legacy = self._reward_price_legacy(
+            self.prices.MAX_PRICE,
+            self.prices.MIN_PRICE,
+            MAX_NEW_JOBS_PER_HOUR,
+        )
         self._max_price_reward_legacy = self._reward_price_legacy(
             self.prices.MIN_PRICE,
             self.prices.MAX_PRICE,
             MAX_NEW_JOBS_PER_HOUR,
         )
@@
-        return self._normalize(current_reward, self._min_price_reward_legacy, self._max_price_reward_legacy)
+        normalized = self._normalize(current_reward, self._min_price_reward_legacy, self._max_price_reward_legacy)
+        return float(np.clip(normalized, 0.0, 1.0))

Also applies to: 120-126

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/reward_calculation.py` around lines 65 - 70, Compute true legacy reward
bounds by evaluating _reward_price_legacy at the price endpoints and set
_min_price_reward_legacy/_max_price_reward_legacy to the min/max of those
evaluations (instead of hardcoding 0.0), and then ensure the normalization logic
that scales raw legacy rewards (the code using _reward_price_legacy around the
normalization block) clamps the final normalized value into the allowed range
(e.g., clamp to [0.0, 1.0] or to [_min_price_reward_legacy,
_max_price_reward_legacy] as appropriate) so negative or out-of-range normalized
rewards cannot be returned.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/reward_calculation.py`:
- Around line 65-70: Compute true legacy reward bounds by evaluating
_reward_price_legacy at the price endpoints and set
_min_price_reward_legacy/_max_price_reward_legacy to the min/max of those
evaluations (instead of hardcoding 0.0), and then ensure the normalization logic
that scales raw legacy rewards (the code using _reward_price_legacy around the
normalization block) clamps the final normalized value into the allowed range
(e.g., clamp to [0.0, 1.0] or to [_min_price_reward_legacy,
_max_price_reward_legacy] as appropriate) so negative or out-of-range normalized
rewards cannot be returned.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 630fbb5 and 60b1e90.

📒 Files selected for processing (1)
  • src/reward_calculation.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/reward_calculation.py (2)

252-256: Variable naming may cause confusion.

The variable price_reward_norm is no longer truly normalized to [0, 1] since it can exceed 1.0 with negative-price overdrive. Consider renaming to price_reward or price_reward_raw to avoid implying normalization, or update the return tuple documentation accordingly.

♻️ Optional: Rename for clarity
-        # 2. increase reward if jobs were scheduled in this step and the current price is below average, with fast saturation and extra bonus for negative-price overdrive.
-        price_reward_norm = self._reward_price(
+        # 2. increase reward if jobs were scheduled in this step and the current price is below average, with fast saturation and extra bonus for negative-price overdrive.
+        price_reward = self._reward_price(
             current_price, average_future_price, num_processed_jobs
         )
-        price_reward_weighted = weights.price_weight * price_reward_norm
+        price_reward_weighted = weights.price_weight * price_reward

Note: This would require updating the return statement and docstring as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/reward_calculation.py` around lines 252 - 256, The name price_reward_norm
is misleading because _reward_price can return values >1.0; rename
price_reward_norm to price_reward (or price_reward_raw) and update all usages
(e.g., the multiplication into price_reward_weighted and any return values or
docstrings) so the symbol change is consistent, and update the docstring/return
documentation for _reward_price if it currently claims a normalized [0,1] value.

120-125: Consider clipping legacy normalized output for consistency.

The legacy normalization relies on symmetric bounds but doesn't clip the final result. While the bounds should keep values in [0, 1] for typical inputs, edge cases could produce values outside this range. Since this path is for ablation/debugging, you may want to add a clip for consistent comparison with the active path.

♻️ Optional: Add clipping for edge-case safety
     def _reward_price_normalized_legacy(self, current_price: float, average_future_price: float, num_processed_jobs: int) -> float:
         """Legacy normalized price reward [0, 1] in typical operating range."""
         if num_processed_jobs == 0:
             return 0.0
         current_reward = self._reward_price_legacy(current_price, average_future_price, num_processed_jobs)
-        return self._normalize(current_reward, self._min_price_reward_legacy, self._max_price_reward_legacy)
+        normalized = self._normalize(current_reward, self._min_price_reward_legacy, self._max_price_reward_legacy)
+        return float(np.clip(normalized, 0.0, 1.0))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/reward_calculation.py` around lines 120 - 125, The legacy normalized
reward in _reward_price_normalized_legacy may produce out-of-range values in
edge cases; after computing current_reward and calling
self._normalize(current_reward, self._min_price_reward_legacy,
self._max_price_reward_legacy) clamp the resulting float to the [0.0, 1.0] range
(e.g., using min/max or a clamp helper) before returning to ensure consistency
with the active path and safe comparisons during ablation/debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/reward_calculation.py`:
- Around line 252-256: The name price_reward_norm is misleading because
_reward_price can return values >1.0; rename price_reward_norm to price_reward
(or price_reward_raw) and update all usages (e.g., the multiplication into
price_reward_weighted and any return values or docstrings) so the symbol change
is consistent, and update the docstring/return documentation for _reward_price
if it currently claims a normalized [0,1] value.
- Around line 120-125: The legacy normalized reward in
_reward_price_normalized_legacy may produce out-of-range values in edge cases;
after computing current_reward and calling self._normalize(current_reward,
self._min_price_reward_legacy, self._max_price_reward_legacy) clamp the
resulting float to the [0.0, 1.0] range (e.g., using min/max or a clamp helper)
before returning to ensure consistency with the active path and safe comparisons
during ablation/debugging.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60b1e90 and 05699e4.

📒 Files selected for processing (1)
  • src/reward_calculation.py

@enlorenz enlorenz force-pushed the reward branch 2 times, most recently from f187ff1 to a83a843 Compare March 2, 2026 11:45
…ling

- make the active price signal signed so expensive execution can produce negative reward
- add asymmetric job saturation constants:
  - PRICE_JOB_TAU_POS for low-price reward growth
  - PRICE_JOB_TAU_NEG for faster high-price penalty growth
- keep negative-price overdrive decoupled via NEGATIVE_PRICE_JOB_TAU
- preserve negative-price overdrive floor/bonus behavior
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/environment.py (1)

478-485: Clarify that price_reward is not normalized anymore.

Line 478 still labels these as normalized components, but price_reward is now signed and can exceed 1.0. Updating the label/comment will prevent misreading plot diagnostics.

♻️ Suggested clarification
-        # Store normalized reward components for plotting
+        # Store reward components for plotting.
+        # NOTE: price_reward is signed and not strictly normalized.
         self.metrics.eff_rewards.append(eff_reward_norm * 100)
         self.metrics.price_rewards.append(price_reward * 100)
         self.metrics.job_age_penalties.append(job_age_penalty_norm * 100)
         self.metrics.idle_penalties.append(idle_penalty_norm * 100)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/environment.py` around lines 478 - 485, The comment above the
reward-appending block is misleading because price_reward is no longer
normalized; update the comment from "Store normalized reward components for
plotting" to something like "Store reward components for plotting (price_reward
is signed/unnormalized and may exceed 1.0)" and add a short inline note next to
the self.metrics.price_rewards and self.metrics.episode_price_rewards appends to
clarify that price_reward is unnormalized so downstream plotting/analysis treats
its scale appropriately.
src/prices.py (1)

20-20: Make ENABLE_PRICE_SHIFT configurable at construction if this is meant to be a real flag.

Right now it is always reset to False inside __init__, so callers cannot enable shifting without editing source. Consider wiring it as an init parameter.

♻️ Suggested wiring
-class Prices:
+class Prices:
@@
-    def __init__(self, external_prices: np.ndarray | list[float] | None = None) -> None:
+    def __init__(
+        self,
+        external_prices: np.ndarray | list[float] | None = None,
+        enable_price_shift: bool = False,
+    ) -> None:
@@
-        self.ENABLE_PRICE_SHIFT: bool = False
+        self.ENABLE_PRICE_SHIFT: bool = enable_price_shift

Also applies to: 28-29

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/prices.py` at line 20, ENABLE_PRICE_SHIFT is hardcoded to False inside
__init__; make it configurable by adding an __init__ parameter (e.g.,
enable_price_shift: bool = False) and assign it to self.ENABLE_PRICE_SHIFT
(self.ENABLE_PRICE_SHIFT = enable_price_shift). Do the same pattern for the
other boolean flags initialized nearby in __init__ (the flags set around lines
28-29) so callers can enable them via constructor parameters and defaults remain
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/environment.py`:
- Around line 478-485: The comment above the reward-appending block is
misleading because price_reward is no longer normalized; update the comment from
"Store normalized reward components for plotting" to something like "Store
reward components for plotting (price_reward is signed/unnormalized and may
exceed 1.0)" and add a short inline note next to the self.metrics.price_rewards
and self.metrics.episode_price_rewards appends to clarify that price_reward is
unnormalized so downstream plotting/analysis treats its scale appropriately.

In `@src/prices.py`:
- Line 20: ENABLE_PRICE_SHIFT is hardcoded to False inside __init__; make it
configurable by adding an __init__ parameter (e.g., enable_price_shift: bool =
False) and assign it to self.ENABLE_PRICE_SHIFT (self.ENABLE_PRICE_SHIFT =
enable_price_shift). Do the same pattern for the other boolean flags initialized
nearby in __init__ (the flags set around lines 28-29) so callers can enable them
via constructor parameters and defaults remain unchanged.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05699e4 and 2bc1a33.

📒 Files selected for processing (3)
  • src/environment.py
  • src/prices.py
  • src/reward_calculation.py

@rbx rbx merged commit 22e32ef into FairRootGroup:master Mar 2, 2026
4 checks passed
@enlorenz enlorenz deleted the reward branch March 3, 2026 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants