Skip to content

feat: Run verifier at close() for orphaned rollouts#10

Merged
dzorlu merged 1 commit into
deniz/fleet_clientfrom
feat/close-verifier
Mar 12, 2026
Merged

feat: Run verifier at close() for orphaned rollouts#10
dzorlu merged 1 commit into
deniz/fleet_clientfrom
feat/close-verifier

Conversation

@dzorlu

@dzorlu dzorlu commented Mar 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • When SkyRL ends trajectories early (context overflow, its own max_turns), OpenEnv never ran the verifier — the model got 0 reward even if the task was completed
  • close_async() / close() now run _compute_reward() when step_async() never computed a reward
  • Result stored in self.final_reward for SkyRL to read after closing

Changes

  • Added self.final_reward and self._reward_computed fields
  • close_async(): awaits _compute_reward() if reward wasn't already computed
  • close(): same via asyncio.run(), with fallback if already in an event loop

Test plan

  • Existing OpenEnv tests pass (18/18 — 9 pre-existing failures unrelated)
  • Normal flow: verifier still runs in step_async() when env says done, final_reward stays None
  • Context overflow flow: verifier runs at close, final_reward has actual score

🤖 Generated with Claude Code

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Silent reward loss when close() called from event loop
    • close() now detects a running event loop and logs a warning with guidance to use close_async() instead of silently swallowing the failure path.

Create PR

Or push these changes by commenting:

@cursor push 02e65c87c4
Preview (02e65c87c4)
diff --git a/src/envs/fleet_env/task_env.py b/src/envs/fleet_env/task_env.py
--- a/src/envs/fleet_env/task_env.py
+++ b/src/envs/fleet_env/task_env.py
@@ -452,11 +452,19 @@
         try:
             if not self._reward_computed and self._orch:
                 try:
+                    running_loop = asyncio.get_running_loop()
+                except RuntimeError:
+                    running_loop = None
+
+                if running_loop and running_loop.is_running():
+                    logger.warning(
+                        "Task %s: close() called from a running event loop; "
+                        "cannot compute final reward synchronously. Use await close_async().",
+                        self.task_key,
+                    )
+                else:
                     self.final_reward = asyncio.run(self._compute_reward())
                     self._reward_computed = True
-                except RuntimeError:
-                    # Already inside a running event loop — caller should use close_async()
-                    pass
         finally:
             if self._orch:
                 try:

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment thread src/envs/fleet_env/task_env.py Outdated
SkyRL can end trajectories early (context overflow, its own max_turns)
without OpenEnv knowing. Previously the model got 0 reward even if the
task was completed. Now close_async()/close() run the verifier when
step_async() never computed a reward, storing the result in
self.final_reward for SkyRL to read.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dzorlu dzorlu force-pushed the feat/close-verifier branch from 991e372 to 06dcdd4 Compare March 12, 2026 22:28

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

)
self._rollout_completed_emitted = True
self.final_reward = await self._compute_reward()
self._reward_computed = True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unhandled exception in close_async skips orchestrator cleanup

Medium Severity

In close_async(), the await self._compute_reward() call has no try/except wrapper. If _compute_reward() raises (e.g., the fleet_info() call at the end of that method fails, or fleet_exception() inside the except block raises), execution jumps straight to finally, skipping self._orch.close_async(). This leaks the provisioned Fleet cloud instance. The old code only called fleet_info() here (very unlikely to fail); the new code does HTTP-based verifier calls, making failures more likely. In close(), a similar gap exists — only RuntimeError is caught, so other exceptions also skip self._orch.close().

Additional Locations (1)
Fix in Cursor Fix in Web

@dzorlu dzorlu merged commit baa192a into deniz/fleet_client Mar 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant