Skip to content

Adding chess environment#324

Merged
Darktex merged 3 commits into
huggingface:mainfrom
luccabb:add-chess-env
Jan 30, 2026
Merged

Adding chess environment#324
Darktex merged 3 commits into
huggingface:mainfrom
luccabb:add-chess-env

Conversation

@luccabb

@luccabb luccabb commented Jan 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • New environment
  • Refactoring

Alignment Checklist

Before submitting, verify:

  • I have read .claude/docs/PRINCIPLES.md and this PR aligns with our principles
  • I have checked .claude/docs/INVARIANTS.md and no invariants are violated
  • I have run /pre-submit-pr (or bash .claude/hooks/lint.sh and tests) and addressed all issues

RFC Status

  • Not required (bug fix, docs, minor refactoring)
  • RFC exists: #___
  • RFC needed (will create before merge)

Test Plan

$ bash envs/chess_env/scripts/verify.sh

Claude Code Review

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 23, 2026
@luccabb luccabb force-pushed the add-chess-env branch 2 times, most recently from d888ffc to bcc52a0 Compare January 24, 2026 01:23
@luccabb luccabb marked this pull request as ready for review January 24, 2026 01:25
@greptile-apps

greptile-apps Bot commented Jan 24, 2026

Copy link
Copy Markdown
Contributor

Greptile Overview

Greptile Summary

This PR adds a new chess environment to OpenEnv using the moonfish chess engine for position evaluation and opponent play. The implementation follows standard OpenEnv patterns and includes several advanced features:

Key Changes:

  • Added ChessEnvironment with configurable opponents (moonfish engine, random, or self-play mode)
  • Implemented temporal discounting feature for credit assignment using gamma parameter (γ^(T-1-t) formula)
  • Full chess rules support via python-chess library with FEN position handling
  • WebSocket-based client following the standard OpenEnv pattern
  • Comprehensive test coverage including temporal discounting tests
  • Documentation and CI integration

Architecture:

  • Follows OpenEnv principles: rewards calculated inside environment, agent cannot reset
  • Proper client-server separation via models.py
  • Uses standard Gymnasium-style API (reset, step, state)
  • WebSocket communication for orchestration

Issues Found:

  • Minor style issue in client.py with duplicate done/reward field assignments (inherited from base class)

The implementation is well-structured and thoroughly tested. The temporal discounting feature is a nice addition for improving credit assignment in long chess games.

Confidence Score: 4/5

  • Safe to merge with minor style improvements recommended
  • The chess environment implementation is solid with comprehensive tests and follows OpenEnv patterns correctly. One minor style issue with duplicate field assignments doesn't affect functionality. The temporal discounting feature is well-implemented and tested. No security, logic, or alignment issues found.
  • envs/chess_env/client.py needs minor cleanup to remove duplicate field assignments

Important Files Changed

Filename Overview
envs/chess_env/server/chess_environment.py Core chess environment implementation with moonfish engine integration, temporal discounting, and configurable opponents
envs/chess_env/client.py WebSocket client for chess environment with duplicate done/reward fields in observation parsing
envs/chess_env/models.py Pydantic models for chess actions, observations, and state - clean implementation
tests/envs/test_chess_environment.py Comprehensive test coverage for chess environment including temporal discounting and opponent modes

Sequence Diagram

sequenceDiagram
    participant Agent
    participant Client as ChessEnv Client
    participant WS as WebSocket
    participant Server as FastAPI Server
    participant Env as ChessEnvironment
    participant Moonfish as Moonfish Engine

    Agent->>Client: reset()
    Client->>WS: Connect to ws://server
    WS->>Server: WebSocket connection
    Client->>WS: Send reset request
    WS->>Server: Reset message
    Server->>Env: reset(fen=None)
    Env->>Env: Initialize chess.Board
    Env->>Env: Determine agent color
    alt Agent plays Black & opponent configured
        Env->>Moonfish: search_move(board, depth)
        Moonfish-->>Env: Opponent move
        Env->>Env: Execute opponent move
    end
    Env->>Moonfish: board_evaluation(board)
    Moonfish-->>Env: Position evaluation
    Env-->>Server: ChessObservation(fen, legal_moves, etc)
    Server-->>WS: Observation payload
    WS-->>Client: Response
    Client-->>Agent: StepResult[ChessObservation]

    Agent->>Client: step(ChessAction(move="e2e4"))
    Client->>WS: Send step request
    WS->>Server: Step message
    Server->>Env: step(action)
    Env->>Env: Parse & validate move
    Env->>Env: Execute agent move
    Env->>Env: Increment agent_move_count
    Env->>Env: _calculate_reward_and_done()
    alt Game not over & opponent configured
        Env->>Moonfish: search_move(board, depth)
        Moonfish-->>Env: Opponent move
        Env->>Env: Execute opponent move
        Env->>Env: _calculate_reward_and_done()
    end
    alt Game over
        Env->>Env: _compute_discounted_rewards(terminal_reward)
        Note over Env: Apply γ^(T-1-t) discounting<br/>to all agent moves
    end
    Env->>Moonfish: board_evaluation(board)
    Moonfish-->>Env: Position evaluation
    Env-->>Server: ChessObservation with reward & done
    Server-->>WS: Observation payload
    WS-->>Client: Response
    Client-->>Agent: StepResult[ChessObservation]

    Agent->>Client: state
    Client->>WS: Request state
    WS->>Server: State request
    Server->>Env: state property
    Env-->>Server: ChessState
    Server-->>WS: State payload
    WS-->>Client: Response
    Client-->>Agent: ChessState
Loading

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

Comment thread envs/chess_env/server/Dockerfile Outdated
Comment thread envs/chess_env/client.py Outdated
Comment thread envs/chess_env/models.py
Comment thread tests/envs/test_chess_environment.py Outdated
Comment thread envs/chess_env/server/chess_environment.py Outdated
Comment thread envs/chess_env/server/chess_environment.py Outdated
A chess reinforcement learning environment for OpenEnv using the moonfish
chess engine for opponent play and position evaluation.

Features:
- Full chess rules via python-chess library
- Configurable opponent: moonfish engine, random moves, or self-play (None)
- Position evaluation using moonfish's PSQT-based evaluation
- Configurable agent color (white/black/alternate each episode)
- Custom starting positions via FEN notation
- Terminal state detection on reset for custom positions

Rewards: +1.0 win, -1.0 loss, 0.0 draw, -0.1 illegal move
@luccabb

luccabb commented Jan 24, 2026

Copy link
Copy Markdown
Contributor Author

@greptile

@greptile-apps

greptile-apps Bot commented Jan 24, 2026

Copy link
Copy Markdown
Contributor

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left you some comments, marking this as "request changes" just so you see them :D

I'd love to integrate this with the new reward proposal, but I also don't want to block you so happy to land this and then work with you on an updated version once I land the new rubrics.

Curious on your feedback on these rubrics too btw!

)

def _calculate_reward_and_done(self):
"""Calculate reward and check if episode is done."""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need some discounting?

if self._board.is_insufficient_material():
return 0.0, True

if self._board.is_fifty_moves():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not the same as step_count exceeding 50?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this is for:

The fifty-move rule in chess states that a player can claim a draw if no capture has been made and no pawn has been moved in the last fifty moves (where a "move" consists of a player completing a turn followed by the opponent completing a turn). The purpose of this rule is to prevent a player with no chance of winning from obstinately continuing to play indefinitely or seeking to win by tiring the opponent.

see https://en.wikipedia.org/wiki/Fifty-move_rule

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooohhhhh

if self._board.is_checkmate():
winner = not self._board.turn
if winner == self._agent_color:
return 1.0, True

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I think you actually wanna go back, and provide each move with some reward. Otherwise you would only be scoring the very last move, which would make your rewards too sparse.

I'm proposing an easier way to do this: see #337

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some discounting, let me know if this is what you had in mind

result = self._get_result_string()

metadata = {
"evaluation": board_evaluation(self._board),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super interesting :o How does this work? @luccabb

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's using PeSTO's Evaluation Function underneath: https://www.chessprogramming.org/PeSTO%27s_Evaluation_Function

- Add gamma parameter (default 0.99) for configurable discounting
- Compute discounted rewards at episode end: r_t = γ^(T-1-t) × R_final
- Return discounted_rewards in terminal observation metadata
- Add tests for discounting formula and behavior
- Document the feature in README
@luccabb

luccabb commented Jan 29, 2026

Copy link
Copy Markdown
Contributor Author

@greptile

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps

greptile-apps Bot commented Jan 29, 2026

Copy link
Copy Markdown
Contributor
Additional Comments (3)

tests/envs/test_chess_environment.py
Testing logic doesn't match agent color configuration. Agent is configured as white (agent_color="white") but test plays moves for both white and black, treating all 4 moves as agent moves. With opponent=None, both players are the agent, so this test actually validates self-play behavior rather than single-agent behavior. Either configure opponent="random" or update the test description to clarify it's testing self-play mode.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/envs/test_chess_environment.py
Line: 1095:1121

Comment:
Testing logic doesn't match agent color configuration. Agent is configured as white (`agent_color="white"`) but test plays moves for both white and black, treating all 4 moves as agent moves. With `opponent=None`, both players are the agent, so this test actually validates self-play behavior rather than single-agent behavior. Either configure `opponent="random"` or update the test description to clarify it's testing self-play mode.

How can I resolve this? If you propose a fix, please make it concise.

envs/chess_env/server/chess_environment.py
Board turn logic inverted - self._board.turn returns True for white's turn, but this checks if not self._board.turn which evaluates to black's turn, yet assigns "white". Should be "white" if self._board.turn else "black"

            current_player="white" if self._board.turn else "black",
Prompt To Fix With AI
This is a comment left during a code review.
Path: envs/chess_env/server/chess_environment.py
Line: 706:706

Comment:
Board turn logic inverted - `self._board.turn` returns `True` for white's turn, but this checks `if not self._board.turn` which evaluates to black's turn, yet assigns `"white"`. Should be `"white" if self._board.turn else "black"`

```suggestion
            current_player="white" if self._board.turn else "black",
```

How can I resolve this? If you propose a fix, please make it concise.

envs/chess_env/server/chess_environment.py
Incorrect result string - when self._board.turn is False (black's turn), black is checkmated so white wins ("1-0"), but this returns "0-1". Logic should be inverted.

        return "1-0" if not self._board.turn else "0-1"
Prompt To Fix With AI
This is a comment left during a code review.
Path: envs/chess_env/server/chess_environment.py
Line: 820:820

Comment:
Incorrect result string - when `self._board.turn` is `False` (black's turn), black is checkmated so white wins ("1-0"), but this returns "0-1". Logic should be inverted.

```suggestion
        return "1-0" if not self._board.turn else "0-1"
```

How can I resolve this? If you propose a fix, please make it concise.

@luccabb

luccabb commented Jan 29, 2026

Copy link
Copy Markdown
Contributor Author

@greptile

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread envs/chess_env/client.py

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@Darktex Darktex merged commit 90beb89 into huggingface:main Jan 30, 2026
5 checks passed
@luccabb luccabb deleted the add-chess-env branch January 30, 2026 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants