feat: Run verifier at close() for orphaned rollouts#10
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Silent reward loss when close() called from event loop
close()now detects a running event loop and logs a warning with guidance to useclose_async()instead of silently swallowing the failure path.
Or push these changes by commenting:
@cursor push 02e65c87c4
Preview (02e65c87c4)
diff --git a/src/envs/fleet_env/task_env.py b/src/envs/fleet_env/task_env.py
--- a/src/envs/fleet_env/task_env.py
+++ b/src/envs/fleet_env/task_env.py
@@ -452,11 +452,19 @@
try:
if not self._reward_computed and self._orch:
try:
+ running_loop = asyncio.get_running_loop()
+ except RuntimeError:
+ running_loop = None
+
+ if running_loop and running_loop.is_running():
+ logger.warning(
+ "Task %s: close() called from a running event loop; "
+ "cannot compute final reward synchronously. Use await close_async().",
+ self.task_key,
+ )
+ else:
self.final_reward = asyncio.run(self._compute_reward())
self._reward_computed = True
- except RuntimeError:
- # Already inside a running event loop — caller should use close_async()
- pass
finally:
if self._orch:
try:This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
SkyRL can end trajectories early (context overflow, its own max_turns) without OpenEnv knowing. Previously the model got 0 reward even if the task was completed. Now close_async()/close() run the verifier when step_async() never computed a reward, storing the result in self.final_reward for SkyRL to read. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
991e372 to
06dcdd4
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| ) | ||
| self._rollout_completed_emitted = True | ||
| self.final_reward = await self._compute_reward() | ||
| self._reward_computed = True |
There was a problem hiding this comment.
Unhandled exception in close_async skips orchestrator cleanup
Medium Severity
In close_async(), the await self._compute_reward() call has no try/except wrapper. If _compute_reward() raises (e.g., the fleet_info() call at the end of that method fails, or fleet_exception() inside the except block raises), execution jumps straight to finally, skipping self._orch.close_async(). This leaks the provisioned Fleet cloud instance. The old code only called fleet_info() here (very unlikely to fail); the new code does HTTP-based verifier calls, making failures more likely. In close(), a similar gap exists — only RuntimeError is caught, so other exceptions also skip self._orch.close().



Summary
close_async()/close()now run_compute_reward()whenstep_async()never computed a rewardself.final_rewardfor SkyRL to read after closingChanges
self.final_rewardandself._reward_computedfieldsclose_async(): awaits_compute_reward()if reward wasn't already computedclose(): same viaasyncio.run(), with fallback if already in an event loopTest plan
step_async()when env says done,final_rewardstaysNonefinal_rewardhas actual score🤖 Generated with Claude Code