Price-reward: strengthen price signal and add negative-price overdrive by enlorenz · Pull Request #27 · FairRootGroup/powersched

enlorenz · 2026-03-02T10:49:03Z

keep legacy price reward path for ablation/debugging
rework active price reward to use saturating price-advantage and job-volume terms
normalize active price reward against a fixed [0, 1] baseline range
add a dedicated negative-price overdrive branch for work done at < 0 prices
decouple overdrive job scaling via NEGATIVE_PRICE_JOB_TAU so negative-price reward ramps early even when PRICE_JOB_TAU is large

- keep legacy price reward path for ablation/debugging - rework active price reward to use saturating price-advantage and job-volume terms - normalize active price reward against a fixed [0, 1] baseline range - add a dedicated negative-price overdrive branch for work done at < 0 prices - decouple overdrive job scaling via NEGATIVE_PRICE_JOB_TAU so negative-price reward ramps early even when PRICE_JOB_TAU is large

coderabbitai · 2026-03-02T10:49:12Z

📝 Walkthrough

Walkthrough

Prices shift application in src/prices.py is gated behind a new ENABLE_PRICE_SHIFT flag, disabling it by default. src/reward_calculation.py adds multiple price-related hyperparameters, a price-context helper, reinstates legacy price reward paths, and introduces an active price reward implementation (fast saturation and negative-price overdrive). src/environment.py adapts to the renamed price reward return.

Changes

Cohort / File(s)	Summary
Price Initialization `src/prices.py`	Added public boolean `ENABLE_PRICE_SHIFT` and changed the `price_shift` application guard so price shifting is disabled by default (temporary hard-block comment).
Reward Calculation — Price Logic `src/reward_calculation.py`	Added new price hyperparameters/constants, `_price_context_average`, legacy reward methods (`_reward_price_legacy`, `_reward_price_normalized_legacy`), new active reward method (`_reward_price`), renamed/refactored normalization paths and bounds, and updated `calculate` to return `price_reward` instead of `price_reward_norm`.
Environment adaption `src/environment.py`	Updated usage to accept and propagate the renamed `price_reward` return value (replacing prior `price_reward_norm` references) for metrics and plotting.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

RewardCalculator #6 — Prior RewardCalculator refactor that changed price-reward APIs and bounds; likely directly related to the added/renamed price-reward methods and environment usage.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Price-reward: strengthen price signal and add negative-price overdrive' directly summarizes the main changes: improving price reward calculation and introducing negative-price overdrive logic.
Description check	✅ Passed	The description provides detailed points about the changes: legacy path retention, active price reward rework, normalization, negative-price overdrive, and job scaling decoupling—all aligned with the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/reward_calculation.py (1)

125-125: Remove the stray no-op string literal in _reward_price.

Line 125 is an extra standalone string literal, which is not used and reads like a second docstring.

🧹 Proposed cleanup

-        """Calculates rewards with pre-computed normalization bounds."""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/reward_calculation.py` at line 125, Remove the stray standalone string
literal inside the _reward_price function (it's acting like an accidental second
docstring and does nothing); locate the _reward_price definition and delete that
extra quoted string so the function contains only its intended docstring (if
any) and functional code.

src/prices.py (1)

27-28: Replace the always-false gate with an explicit feature toggle.

Line 28 (min_price < 1 and 0) permanently disables this path in a non-obvious way. Use a named flag so intent is explicit and safe to revert after testing.

♻️ Proposed refactor

-            # hard block this for now during testing to keep things simple, but we could consider allowing it in the future with proper handling.
-            if min_price < 1 and 0:
+            # Temporary test toggle: keep raw prices (including negatives) unless explicitly enabled.
+            APPLY_PRICE_SHIFT = False
+            if min_price < 1 and APPLY_PRICE_SHIFT:
                 self.price_shift = 1 - min_price
                 prices = prices + self.price_shift

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/prices.py` around lines 27 - 28, The condition "if min_price < 1 and 0"
is an always-false gate and should be replaced with a clear feature toggle:
introduce a named boolean flag (e.g., ALLOW_MIN_PRICE_BELOW_ONE or
ENABLE_MIN_PRICE_OVERRIDE) and use it in the conditional as "if min_price < 1
and <FLAG>". Update the check around the min_price logic in src/prices.py to
read the flag (module-level constant or config/env-driven) so the path can be
safely enabled/disabled and is obvious to future readers; ensure any tests or
callers referencing this behavior use the new flag.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/reward_calculation.py`:
- Around line 55-60: The legacy normalization uses a fixed 0.0 min which is
wrong because _reward_price_legacy can be negative (context_avg -
current_price); update initialization so _min_price_reward_legacy =
-_max_price_reward_legacy (compute _max_price_reward_legacy via
_reward_price_legacy(self.prices.MIN_PRICE, self.prices.MAX_PRICE, ...)) so the
min/max reflect symmetric extremes, and change _reward_price_normalized_legacy
to normalize using (raw - _min_price_reward_legacy) / (_max_price_reward_legacy
- _min_price_reward_legacy) and clamp the result to [0,1]; reference symbols:
_min_price_reward_legacy, _max_price_reward_legacy, _reward_price_legacy,
_reward_price_normalized_legacy, context_avg, current_price.

---

Nitpick comments:
In `@src/prices.py`:
- Around line 27-28: The condition "if min_price < 1 and 0" is an always-false
gate and should be replaced with a clear feature toggle: introduce a named
boolean flag (e.g., ALLOW_MIN_PRICE_BELOW_ONE or ENABLE_MIN_PRICE_OVERRIDE) and
use it in the conditional as "if min_price < 1 and <FLAG>". Update the check
around the min_price logic in src/prices.py to read the flag (module-level
constant or config/env-driven) so the path can be safely enabled/disabled and is
obvious to future readers; ensure any tests or callers referencing this behavior
use the new flag.

In `@src/reward_calculation.py`:
- Line 125: Remove the stray standalone string literal inside the _reward_price
function (it's acting like an accidental second docstring and does nothing);
locate the _reward_price definition and delete that extra quoted string so the
function contains only its intended docstring (if any) and functional code.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70b4aa5 and 630fbb5.

📒 Files selected for processing (2)

src/prices.py
src/reward_calculation.py

src/reward_calculation.py

coderabbitai

♻️ Duplicate comments (1)

src/reward_calculation.py (1)

65-70: ⚠️ Potential issue | 🟠 Major

Fix legacy price normalization bounds/clamping to avoid out-of-range values.

At Line 65, _min_price_reward_legacy is fixed at 0.0, but Line 117 can produce negative raw legacy reward. This makes Line 125 return values below 0 for valid inputs.

🐛 Proposed fix

-        self._min_price_reward_legacy = 0.0
+        self._min_price_reward_legacy = self._reward_price_legacy(
+            self.prices.MAX_PRICE,
+            self.prices.MIN_PRICE,
+            MAX_NEW_JOBS_PER_HOUR,
+        )
         self._max_price_reward_legacy = self._reward_price_legacy(
             self.prices.MIN_PRICE,
             self.prices.MAX_PRICE,
             MAX_NEW_JOBS_PER_HOUR,
         )
@@
-        return self._normalize(current_reward, self._min_price_reward_legacy, self._max_price_reward_legacy)
+        normalized = self._normalize(current_reward, self._min_price_reward_legacy, self._max_price_reward_legacy)
+        return float(np.clip(normalized, 0.0, 1.0))

Also applies to: 120-126

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/reward_calculation.py` around lines 65 - 70, Compute true legacy reward
bounds by evaluating _reward_price_legacy at the price endpoints and set
_min_price_reward_legacy/_max_price_reward_legacy to the min/max of those
evaluations (instead of hardcoding 0.0), and then ensure the normalization logic
that scales raw legacy rewards (the code using _reward_price_legacy around the
normalization block) clamps the final normalized value into the allowed range
(e.g., clamp to [0.0, 1.0] or to [_min_price_reward_legacy,
_max_price_reward_legacy] as appropriate) so negative or out-of-range normalized
rewards cannot be returned.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/reward_calculation.py`:
- Around line 65-70: Compute true legacy reward bounds by evaluating
_reward_price_legacy at the price endpoints and set
_min_price_reward_legacy/_max_price_reward_legacy to the min/max of those
evaluations (instead of hardcoding 0.0), and then ensure the normalization logic
that scales raw legacy rewards (the code using _reward_price_legacy around the
normalization block) clamps the final normalized value into the allowed range
(e.g., clamp to [0.0, 1.0] or to [_min_price_reward_legacy,
_max_price_reward_legacy] as appropriate) so negative or out-of-range normalized
rewards cannot be returned.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 630fbb5 and 60b1e90.

📒 Files selected for processing (1)

src/reward_calculation.py

coderabbitai

🧹 Nitpick comments (2)

src/reward_calculation.py (2)

252-256: Variable naming may cause confusion.

The variable price_reward_norm is no longer truly normalized to [0, 1] since it can exceed 1.0 with negative-price overdrive. Consider renaming to price_reward or price_reward_raw to avoid implying normalization, or update the return tuple documentation accordingly.

♻️ Optional: Rename for clarity

-        # 2. increase reward if jobs were scheduled in this step and the current price is below average, with fast saturation and extra bonus for negative-price overdrive.
-        price_reward_norm = self._reward_price(
+        # 2. increase reward if jobs were scheduled in this step and the current price is below average, with fast saturation and extra bonus for negative-price overdrive.
+        price_reward = self._reward_price(
             current_price, average_future_price, num_processed_jobs
         )
-        price_reward_weighted = weights.price_weight * price_reward_norm
+        price_reward_weighted = weights.price_weight * price_reward

Note: This would require updating the return statement and docstring as well.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/reward_calculation.py` around lines 252 - 256, The name price_reward_norm
is misleading because _reward_price can return values >1.0; rename
price_reward_norm to price_reward (or price_reward_raw) and update all usages
(e.g., the multiplication into price_reward_weighted and any return values or
docstrings) so the symbol change is consistent, and update the docstring/return
documentation for _reward_price if it currently claims a normalized [0,1] value.

120-125: Consider clipping legacy normalized output for consistency.

The legacy normalization relies on symmetric bounds but doesn't clip the final result. While the bounds should keep values in [0, 1] for typical inputs, edge cases could produce values outside this range. Since this path is for ablation/debugging, you may want to add a clip for consistent comparison with the active path.

♻️ Optional: Add clipping for edge-case safety

     def _reward_price_normalized_legacy(self, current_price: float, average_future_price: float, num_processed_jobs: int) -> float:
         """Legacy normalized price reward [0, 1] in typical operating range."""
         if num_processed_jobs == 0:
             return 0.0
         current_reward = self._reward_price_legacy(current_price, average_future_price, num_processed_jobs)
-        return self._normalize(current_reward, self._min_price_reward_legacy, self._max_price_reward_legacy)
+        normalized = self._normalize(current_reward, self._min_price_reward_legacy, self._max_price_reward_legacy)
+        return float(np.clip(normalized, 0.0, 1.0))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/reward_calculation.py` around lines 120 - 125, The legacy normalized
reward in _reward_price_normalized_legacy may produce out-of-range values in
edge cases; after computing current_reward and calling
self._normalize(current_reward, self._min_price_reward_legacy,
self._max_price_reward_legacy) clamp the resulting float to the [0.0, 1.0] range
(e.g., using min/max or a clamp helper) before returning to ensure consistency
with the active path and safe comparisons during ablation/debugging.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/reward_calculation.py`:
- Around line 252-256: The name price_reward_norm is misleading because
_reward_price can return values >1.0; rename price_reward_norm to price_reward
(or price_reward_raw) and update all usages (e.g., the multiplication into
price_reward_weighted and any return values or docstrings) so the symbol change
is consistent, and update the docstring/return documentation for _reward_price
if it currently claims a normalized [0,1] value.
- Around line 120-125: The legacy normalized reward in
_reward_price_normalized_legacy may produce out-of-range values in edge cases;
after computing current_reward and calling self._normalize(current_reward,
self._min_price_reward_legacy, self._max_price_reward_legacy) clamp the
resulting float to the [0.0, 1.0] range (e.g., using min/max or a clamp helper)
before returning to ensure consistency with the active path and safe comparisons
during ablation/debugging.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60b1e90 and 05699e4.

📒 Files selected for processing (1)

src/reward_calculation.py

src/prices.py

src/reward_calculation.py

…ling - make the active price signal signed so expensive execution can produce negative reward - add asymmetric job saturation constants: - PRICE_JOB_TAU_POS for low-price reward growth - PRICE_JOB_TAU_NEG for faster high-price penalty growth - keep negative-price overdrive decoupled via NEGATIVE_PRICE_JOB_TAU - preserve negative-price overdrive floor/bonus behavior

coderabbitai

🧹 Nitpick comments (2)

src/environment.py (1)

478-485: Clarify that price_reward is not normalized anymore.

Line 478 still labels these as normalized components, but price_reward is now signed and can exceed 1.0. Updating the label/comment will prevent misreading plot diagnostics.

♻️ Suggested clarification

-        # Store normalized reward components for plotting
+        # Store reward components for plotting.
+        # NOTE: price_reward is signed and not strictly normalized.
         self.metrics.eff_rewards.append(eff_reward_norm * 100)
         self.metrics.price_rewards.append(price_reward * 100)
         self.metrics.job_age_penalties.append(job_age_penalty_norm * 100)
         self.metrics.idle_penalties.append(idle_penalty_norm * 100)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/environment.py` around lines 478 - 485, The comment above the
reward-appending block is misleading because price_reward is no longer
normalized; update the comment from "Store normalized reward components for
plotting" to something like "Store reward components for plotting (price_reward
is signed/unnormalized and may exceed 1.0)" and add a short inline note next to
the self.metrics.price_rewards and self.metrics.episode_price_rewards appends to
clarify that price_reward is unnormalized so downstream plotting/analysis treats
its scale appropriately.

src/prices.py (1)

20-20: Make ENABLE_PRICE_SHIFT configurable at construction if this is meant to be a real flag.

Right now it is always reset to False inside __init__, so callers cannot enable shifting without editing source. Consider wiring it as an init parameter.

♻️ Suggested wiring

-class Prices:
+class Prices:
@@
-    def __init__(self, external_prices: np.ndarray | list[float] | None = None) -> None:
+    def __init__(
+        self,
+        external_prices: np.ndarray | list[float] | None = None,
+        enable_price_shift: bool = False,
+    ) -> None:
@@
-        self.ENABLE_PRICE_SHIFT: bool = False
+        self.ENABLE_PRICE_SHIFT: bool = enable_price_shift

Also applies to: 28-29

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/prices.py` at line 20, ENABLE_PRICE_SHIFT is hardcoded to False inside
__init__; make it configurable by adding an __init__ parameter (e.g.,
enable_price_shift: bool = False) and assign it to self.ENABLE_PRICE_SHIFT
(self.ENABLE_PRICE_SHIFT = enable_price_shift). Do the same pattern for the
other boolean flags initialized nearby in __init__ (the flags set around lines
28-29) so callers can enable them via constructor parameters and defaults remain
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/environment.py`:
- Around line 478-485: The comment above the reward-appending block is
misleading because price_reward is no longer normalized; update the comment from
"Store normalized reward components for plotting" to something like "Store
reward components for plotting (price_reward is signed/unnormalized and may
exceed 1.0)" and add a short inline note next to the self.metrics.price_rewards
and self.metrics.episode_price_rewards appends to clarify that price_reward is
unnormalized so downstream plotting/analysis treats its scale appropriately.

In `@src/prices.py`:
- Line 20: ENABLE_PRICE_SHIFT is hardcoded to False inside __init__; make it
configurable by adding an __init__ parameter (e.g., enable_price_shift: bool =
False) and assign it to self.ENABLE_PRICE_SHIFT (self.ENABLE_PRICE_SHIFT =
enable_price_shift). Do the same pattern for the other boolean flags initialized
nearby in __init__ (the flags set around lines 28-29) so callers can enable them
via constructor parameters and defaults remain unchanged.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05699e4 and 2bc1a33.

📒 Files selected for processing (3)

src/environment.py
src/prices.py
src/reward_calculation.py

enlorenz marked this pull request as draft March 2, 2026 10:49

enlorenz marked this pull request as ready for review March 2, 2026 10:50

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

src/reward_calculation.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

enlorenz force-pushed the reward branch from 60b1e90 to 05699e4 Compare March 2, 2026 11:38

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

enlorenz force-pushed the reward branch 2 times, most recently from f187ff1 to a83a843 Compare March 2, 2026 11:45

rbx reviewed Mar 2, 2026

View reviewed changes

src/prices.py Outdated Show resolved Hide resolved

rbx reviewed Mar 2, 2026

View reviewed changes

src/reward_calculation.py Outdated Show resolved Hide resolved

rbx reviewed Mar 2, 2026

View reviewed changes

src/reward_calculation.py Outdated Show resolved Hide resolved

enlorenz force-pushed the reward branch from a83a843 to 2bc1a33 Compare March 2, 2026 16:45

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

rbx approved these changes Mar 2, 2026

View reviewed changes

rbx merged commit 22e32ef into FairRootGroup:master Mar 2, 2026
4 checks passed

enlorenz deleted the reward branch March 3, 2026 09:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Price-reward: strengthen price signal and add negative-price overdrive#27

Price-reward: strengthen price signal and add negative-price overdrive#27
rbx merged 2 commits intoFairRootGroup:masterfrom
enlorenz:reward

enlorenz commented Mar 2, 2026

Uh oh!

coderabbitai bot commented Mar 2, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

enlorenz commented Mar 2, 2026

Uh oh!

coderabbitai bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 2, 2026 •

edited

Loading