Skip to content

Add BaseHTTPClient to talk to Envs via JSON over RPC.#4

Merged
pankit-eng merged 1 commit into
mainfrom
env_code
Oct 7, 2025
Merged

Add BaseHTTPClient to talk to Envs via JSON over RPC.#4
pankit-eng merged 1 commit into
mainfrom
env_code

Conversation

@pankit-eng

Copy link
Copy Markdown
Contributor

The change is primarily adding HTTP base client to talk over JSON RPC to a container. The container bootstrap code is not added yet. It will be in the next PR.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 6, 2025
@pankit-eng pankit-eng merged commit 95af407 into main Oct 7, 2025
1 check passed
pankit-eng pushed a commit that referenced this pull request Nov 3, 2025
burtenshaw added a commit that referenced this pull request Nov 3, 2025
rycerzes pushed a commit to rycerzes/OpenEnv that referenced this pull request Nov 19, 2025
Add BaseHTTPClient to talk to Envs via JSON over RPC.
rycerzes pushed a commit to rycerzes/OpenEnv that referenced this pull request Nov 19, 2025
EchoRaven pushed a commit to BillChan226/openenv-gen that referenced this pull request Jan 5, 2026
feat: webarena torchforge grpo integration
lilyzhng added a commit to lilyzhng/OpenEnv that referenced this pull request Mar 8, 2026
Old: every no-progress turn counted as wasted (25/29 in Run 10)
New: each no-progress streak gets 2-turn grace period (exploration),
only turns beyond grace count as wasted (stuck in a loop)

This aligns with principle huggingface#4: agent MUST explore (ls, cat, grep)
to discover context. Exploration turns shouldn't be penalized.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
akashkathole7 added a commit to akashkathole7/OpenEnv that referenced this pull request Apr 20, 2026
Per Section F done-gate huggingface#4: measured the heuristic scaffold's composite
reward on hero seeds 9500/9501/9502 (Qwen2.5-3B-Instruct loaded on Colab
T4; heuristic eval identical to the dry-run's evaluate_heuristic path).

Measured totals:
  9500 (planned tier_a):  0.3287  R3=0.01
  9501 (planned tier_b):  0.4623  R3=0.99
  9502 (planned tier_c):  0.1892  R3=0.01

Measured ordering 9501 > 9500 > 9502 does NOT match the planned
easy > medium > hard (9500 > 9501 > 9502). The 9500↔9501 swap fails
the monotonicity gate, so labels stay tier_a / tier_b / tier_c.

Mechanism: 9500 has only 2 mismatches but the heuristic over-claims on
one of them and trips Rule 36(4) (R3=0.01). 9501's 5 mismatches spread
across more suppliers; no single supplier's claim exceeds 2B cap so
R3=0.99 dominates. A real trained policy may reverse this — re-measure
after real Colab training before the pitch demo.

Artifact committed: data/hero_baseline.json.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant