feat: Run verifier at close() for orphaned rollouts by dzorlu · Pull Request #10 · fleet-ai/OpenEnv

dzorlu · 2026-03-12T22:01:58Z

Summary

When SkyRL ends trajectories early (context overflow, its own max_turns), OpenEnv never ran the verifier — the model got 0 reward even if the task was completed
close_async() / close() now run _compute_reward() when step_async() never computed a reward
Result stored in self.final_reward for SkyRL to read after closing

Changes

Added self.final_reward and self._reward_computed fields
close_async(): awaits _compute_reward() if reward wasn't already computed
close(): same via asyncio.run(), with fallback if already in an event loop

Test plan

Existing OpenEnv tests pass (18/18 — 9 pre-existing failures unrelated)
Normal flow: verifier still runs in step_async() when env says done, final_reward stays None
Context overflow flow: verifier runs at close, final_reward has actual score

🤖 Generated with Claude Code

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Silent reward loss when close() called from event loop
- close() now detects a running event loop and logs a warning with guidance to use close_async() instead of silently swallowing the failure path.

Or push these changes by commenting:

@cursor push 02e65c87c4

Preview (02e65c87c4)

diff --git a/src/envs/fleet_env/task_env.py b/src/envs/fleet_env/task_env.py
--- a/src/envs/fleet_env/task_env.py
+++ b/src/envs/fleet_env/task_env.py
@@ -452,11 +452,19 @@
         try:
             if not self._reward_computed and self._orch:
                 try:
+                    running_loop = asyncio.get_running_loop()
+                except RuntimeError:
+                    running_loop = None
+
+                if running_loop and running_loop.is_running():
+                    logger.warning(
+                        "Task %s: close() called from a running event loop; "
+                        "cannot compute final reward synchronously. Use await close_async().",
+                        self.task_key,
+                    )
+                else:
                     self.final_reward = asyncio.run(self._compute_reward())
                     self._reward_computed = True
-                except RuntimeError:
-                    # Already inside a running event loop — caller should use close_async()
-                    pass
         finally:
             if self._orch:
                 try:

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

SkyRL can end trajectories early (context overflow, its own max_turns) without OpenEnv knowing. Previously the model got 0 reward even if the task was completed. Now close_async()/close() run the verifier when step_async() never computed a reward, storing the result in self.final_reward for SkyRL to read. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-12T22:33:20Z

-                )
-                self._rollout_completed_emitted = True
+                self.final_reward = await self._compute_reward()
+                self._reward_computed = True


Unhandled exception in close_async skips orchestrator cleanup

Medium Severity

In close_async(), the await self._compute_reward() call has no try/except wrapper. If _compute_reward() raises (e.g., the fleet_info() call at the end of that method fails, or fleet_exception() inside the except block raises), execution jumps straight to finally, skipping self._orch.close_async(). This leaks the provisioned Fleet cloud instance. The old code only called fleet_info() here (very unlikely to fail); the new code does HTTP-based verifier calls, making failures more likely. In close(), a similar gap exists — only RuntimeError is caught, so other exceptions also skip self._orch.close().

Additional Locations (1)

src/envs/fleet_env/task_env.py#L678-L695

cursor Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread src/envs/fleet_env/task_env.py Outdated

dzorlu force-pushed the feat/close-verifier branch from 991e372 to 06dcdd4 Compare March 12, 2026 22:28

cursor Bot reviewed Mar 12, 2026

View reviewed changes

dzorlu merged commit baa192a into deniz/fleet_client Mar 12, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Run verifier at close() for orphaned rollouts#10

feat: Run verifier at close() for orphaned rollouts#10
dzorlu merged 1 commit into
deniz/fleet_clientfrom
feat/close-verifier

dzorlu commented Mar 12, 2026

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dzorlu commented Mar 12, 2026

Summary

Changes

Test plan

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Mar 12, 2026

Choose a reason for hiding this comment

Unhandled exception in close_async skips orchestrator cleanup

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor Bot left a comment •

edited

Loading