Adding chess environment by luccabb · Pull Request #324 · huggingface/OpenEnv

luccabb · 2026-01-23T23:54:32Z

Summary

Type of Change

Alignment Checklist

Before submitting, verify:

I have read .claude/docs/PRINCIPLES.md and this PR aligns with our principles
I have checked .claude/docs/INVARIANTS.md and no invariants are violated
I have run /pre-submit-pr (or bash .claude/hooks/lint.sh and tests) and addressed all issues

RFC Status

Not required (bug fix, docs, minor refactoring)
RFC exists: #___
RFC needed (will create before merge)

Test Plan

$ bash envs/chess_env/scripts/verify.sh

Claude Code Review

greptile-apps · 2026-01-24T01:31:23Z

Greptile Overview

Greptile Summary

This PR adds a new chess environment to OpenEnv using the moonfish chess engine for position evaluation and opponent play. The implementation follows standard OpenEnv patterns and includes several advanced features:

Key Changes:

Added ChessEnvironment with configurable opponents (moonfish engine, random, or self-play mode)
Implemented temporal discounting feature for credit assignment using gamma parameter (γ^(T-1-t) formula)
Full chess rules support via python-chess library with FEN position handling
WebSocket-based client following the standard OpenEnv pattern
Comprehensive test coverage including temporal discounting tests
Documentation and CI integration

Architecture:

Follows OpenEnv principles: rewards calculated inside environment, agent cannot reset
Proper client-server separation via models.py
Uses standard Gymnasium-style API (reset, step, state)
WebSocket communication for orchestration

Issues Found:

Minor style issue in client.py with duplicate done/reward field assignments (inherited from base class)

The implementation is well-structured and thoroughly tested. The temporal discounting feature is a nice addition for improving credit assignment in long chess games.

Confidence Score: 4/5

Safe to merge with minor style improvements recommended
The chess environment implementation is solid with comprehensive tests and follows OpenEnv patterns correctly. One minor style issue with duplicate field assignments doesn't affect functionality. The temporal discounting feature is well-implemented and tested. No security, logic, or alignment issues found.
envs/chess_env/client.py needs minor cleanup to remove duplicate field assignments

Important Files Changed

Filename	Overview
envs/chess_env/server/chess_environment.py	Core chess environment implementation with moonfish engine integration, temporal discounting, and configurable opponents
envs/chess_env/client.py	WebSocket client for chess environment with duplicate done/reward fields in observation parsing
envs/chess_env/models.py	Pydantic models for chess actions, observations, and state - clean implementation
tests/envs/test_chess_environment.py	Comprehensive test coverage for chess environment including temporal discounting and opponent modes

Sequence Diagram

sequenceDiagram
    participant Agent
    participant Client as ChessEnv Client
    participant WS as WebSocket
    participant Server as FastAPI Server
    participant Env as ChessEnvironment
    participant Moonfish as Moonfish Engine

    Agent->>Client: reset()
    Client->>WS: Connect to ws://server
    WS->>Server: WebSocket connection
    Client->>WS: Send reset request
    WS->>Server: Reset message
    Server->>Env: reset(fen=None)
    Env->>Env: Initialize chess.Board
    Env->>Env: Determine agent color
    alt Agent plays Black & opponent configured
        Env->>Moonfish: search_move(board, depth)
        Moonfish-->>Env: Opponent move
        Env->>Env: Execute opponent move
    end
    Env->>Moonfish: board_evaluation(board)
    Moonfish-->>Env: Position evaluation
    Env-->>Server: ChessObservation(fen, legal_moves, etc)
    Server-->>WS: Observation payload
    WS-->>Client: Response
    Client-->>Agent: StepResult[ChessObservation]

    Agent->>Client: step(ChessAction(move="e2e4"))
    Client->>WS: Send step request
    WS->>Server: Step message
    Server->>Env: step(action)
    Env->>Env: Parse & validate move
    Env->>Env: Execute agent move
    Env->>Env: Increment agent_move_count
    Env->>Env: _calculate_reward_and_done()
    alt Game not over & opponent configured
        Env->>Moonfish: search_move(board, depth)
        Moonfish-->>Env: Opponent move
        Env->>Env: Execute opponent move
        Env->>Env: _calculate_reward_and_done()
    end
    alt Game over
        Env->>Env: _compute_discounted_rewards(terminal_reward)
        Note over Env: Apply γ^(T-1-t) discounting<br/>to all agent moves
    end
    Env->>Moonfish: board_evaluation(board)
    Moonfish-->>Env: Position evaluation
    Env-->>Server: ChessObservation with reward & done
    Server-->>WS: Observation payload
    WS-->>Client: Response
    Client-->>Agent: StepResult[ChessObservation]

    Agent->>Client: state
    Client->>WS: Request state
    WS->>Server: State request
    Server->>Env: state property
    Env-->>Server: ChessState
    Server-->>WS: State payload
    WS-->>Client: Response
    Client-->>Agent: ChessState

greptile-apps

_{14 files reviewed, 6 comments}

_{Edit Code Review Agent Settings | Greptile}

A chess reinforcement learning environment for OpenEnv using the moonfish chess engine for opponent play and position evaluation. Features: - Full chess rules via python-chess library - Configurable opponent: moonfish engine, random moves, or self-play (None) - Position evaluation using moonfish's PSQT-based evaluation - Configurable agent color (white/black/alternate each episode) - Custom starting positions via FEN notation - Terminal state detection on reset for custom positions Rewards: +1.0 win, -1.0 loss, 0.0 draw, -0.1 illegal move

luccabb · 2026-01-24T02:14:49Z

@greptile

greptile-apps · 2026-01-24T02:17:20Z

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

Darktex

Left you some comments, marking this as "request changes" just so you see them :D

I'd love to integrate this with the new reward proposal, but I also don't want to block you so happy to land this and then work with you on an updated version once I land the new rubrics.

Curious on your feedback on these rubrics too btw!

Darktex · 2026-01-29T00:49:28Z

+        )
+
+    def _calculate_reward_and_done(self):
+        """Calculate reward and check if episode is done."""


Don't we need some discounting?

Darktex · 2026-01-29T00:50:42Z

+        if self._board.is_insufficient_material():
+            return 0.0, True
+
+        if self._board.is_fifty_moves():


Is this not the same as step_count exceeding 50?

no this is for:

The fifty-move rule in chess states that a player can claim a draw if no capture has been made and no pawn has been moved in the last fifty moves (where a "move" consists of a player completing a turn followed by the opponent completing a turn). The purpose of this rule is to prevent a player with no chance of winning from obstinately continuing to play indefinitely or seeking to win by tiring the opponent.

see https://en.wikipedia.org/wiki/Fifty-move_rule

Darktex · 2026-01-29T00:51:39Z

+        if self._board.is_checkmate():
+            winner = not self._board.turn
+            if winner == self._agent_color:
+                return 1.0, True


Here, I think you actually wanna go back, and provide each move with some reward. Otherwise you would only be scoring the very last move, which would make your rewards too sparse.

I'm proposing an easier way to do this: see #337

added some discounting, let me know if this is what you had in mind

Darktex · 2026-01-29T00:53:10Z

+            result = self._get_result_string()
+
+        metadata = {
+            "evaluation": board_evaluation(self._board),


This is super interesting :o How does this work? @luccabb

it's using PeSTO's Evaluation Function underneath: https://www.chessprogramming.org/PeSTO%27s_Evaluation_Function

- Add gamma parameter (default 0.99) for configurable discounting - Compute discounted rewards at episode end: r_t = γ^(T-1-t) × R_final - Return discounted_rewards in terminal observation metadata - Add tests for discounting formula and behavior - Document the feature in README

luccabb · 2026-01-29T23:33:20Z

@greptile

greptile-apps

_{4 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-29T23:37:11Z

Additional Comments (3)

tests/envs/test_chess_environment.py
Testing logic doesn't match agent color configuration. Agent is configured as white (agent_color="white") but test plays moves for both white and black, treating all 4 moves as agent moves. With opponent=None, both players are the agent, so this test actually validates self-play behavior rather than single-agent behavior. Either configure opponent="random" or update the test description to clarify it's testing self-play mode.

Prompt To Fix With AI

This is a comment left during a code review.
Path: tests/envs/test_chess_environment.py
Line: 1095:1121

Comment:
Testing logic doesn't match agent color configuration. Agent is configured as white (`agent_color="white"`) but test plays moves for both white and black, treating all 4 moves as agent moves. With `opponent=None`, both players are the agent, so this test actually validates self-play behavior rather than single-agent behavior. Either configure `opponent="random"` or update the test description to clarify it's testing self-play mode.

How can I resolve this? If you propose a fix, please make it concise.

envs/chess_env/server/chess_environment.py
Board turn logic inverted - self._board.turn returns True for white's turn, but this checks if not self._board.turn which evaluates to black's turn, yet assigns "white". Should be "white" if self._board.turn else "black"

            current_player="white" if self._board.turn else "black",

Prompt To Fix With AI

This is a comment left during a code review.
Path: envs/chess_env/server/chess_environment.py
Line: 706:706

Comment:
Board turn logic inverted - `self._board.turn` returns `True` for white's turn, but this checks `if not self._board.turn` which evaluates to black's turn, yet assigns `"white"`. Should be `"white" if self._board.turn else "black"`

```suggestion
            current_player="white" if self._board.turn else "black",
```

How can I resolve this? If you propose a fix, please make it concise.

envs/chess_env/server/chess_environment.py
Incorrect result string - when self._board.turn is False (black's turn), black is checkmated so white wins ("1-0"), but this returns "0-1". Logic should be inverted.

        return "1-0" if not self._board.turn else "0-1"

Prompt To Fix With AI

This is a comment left during a code review.
Path: envs/chess_env/server/chess_environment.py
Line: 820:820

Comment:
Incorrect result string - when `self._board.turn` is `False` (black's turn), black is checkmated so white wins ("1-0"), but this returns "0-1". Logic should be inverted.

```suggestion
        return "1-0" if not self._board.turn else "0-1"
```

How can I resolve this? If you propose a fix, please make it concise.

luccabb · 2026-01-29T23:42:12Z

@greptile

greptile-apps

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

Darktex

Looks great!

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 23, 2026

luccabb force-pushed the add-chess-env branch 2 times, most recently from d888ffc to bcc52a0 Compare January 24, 2026 01:23

luccabb marked this pull request as ready for review January 24, 2026 01:25

greptile-apps Bot reviewed Jan 24, 2026

View reviewed changes

luccabb force-pushed the add-chess-env branch from 92c83c4 to 827ab74 Compare January 24, 2026 02:13

Darktex requested changes Jan 29, 2026

View reviewed changes

greptile-apps Bot reviewed Jan 29, 2026

View reviewed changes

Clarify self-play mode in discounting test

d883779

greptile-apps Bot reviewed Jan 29, 2026

View reviewed changes

Comment thread envs/chess_env/client.py

Darktex approved these changes Jan 30, 2026

View reviewed changes

Darktex merged commit 90beb89 into huggingface:main Jan 30, 2026
5 checks passed

luccabb deleted the add-chess-env branch January 30, 2026 00:57

Darktex mentioned this pull request Jan 30, 2026

Find a way to run tests depending on external libraries meta-pytorch/OpenEnvZoo#1

Open

Conversation

luccabb commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

Alignment Checklist

RFC Status

Test Plan

Claude Code Review

Uh oh!

greptile-apps Bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

luccabb commented Jan 24, 2026

Uh oh!

greptile-apps Bot commented Jan 24, 2026

Greptile found no issues!

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luccabb commented Jan 29, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jan 29, 2026

Uh oh!

luccabb commented Jan 29, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luccabb commented Jan 23, 2026 •

edited

Loading

greptile-apps Bot commented Jan 24, 2026 •

edited

Loading