Adding chess environment#324
Conversation
d888ffc to
bcc52a0
Compare
Greptile OverviewGreptile SummaryThis PR adds a new chess environment to OpenEnv using the moonfish chess engine for position evaluation and opponent play. The implementation follows standard OpenEnv patterns and includes several advanced features: Key Changes:
Architecture:
Issues Found:
The implementation is well-structured and thoroughly tested. The temporal discounting feature is a nice addition for improving credit assignment in long chess games. Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent
participant Client as ChessEnv Client
participant WS as WebSocket
participant Server as FastAPI Server
participant Env as ChessEnvironment
participant Moonfish as Moonfish Engine
Agent->>Client: reset()
Client->>WS: Connect to ws://server
WS->>Server: WebSocket connection
Client->>WS: Send reset request
WS->>Server: Reset message
Server->>Env: reset(fen=None)
Env->>Env: Initialize chess.Board
Env->>Env: Determine agent color
alt Agent plays Black & opponent configured
Env->>Moonfish: search_move(board, depth)
Moonfish-->>Env: Opponent move
Env->>Env: Execute opponent move
end
Env->>Moonfish: board_evaluation(board)
Moonfish-->>Env: Position evaluation
Env-->>Server: ChessObservation(fen, legal_moves, etc)
Server-->>WS: Observation payload
WS-->>Client: Response
Client-->>Agent: StepResult[ChessObservation]
Agent->>Client: step(ChessAction(move="e2e4"))
Client->>WS: Send step request
WS->>Server: Step message
Server->>Env: step(action)
Env->>Env: Parse & validate move
Env->>Env: Execute agent move
Env->>Env: Increment agent_move_count
Env->>Env: _calculate_reward_and_done()
alt Game not over & opponent configured
Env->>Moonfish: search_move(board, depth)
Moonfish-->>Env: Opponent move
Env->>Env: Execute opponent move
Env->>Env: _calculate_reward_and_done()
end
alt Game over
Env->>Env: _compute_discounted_rewards(terminal_reward)
Note over Env: Apply γ^(T-1-t) discounting<br/>to all agent moves
end
Env->>Moonfish: board_evaluation(board)
Moonfish-->>Env: Position evaluation
Env-->>Server: ChessObservation with reward & done
Server-->>WS: Observation payload
WS-->>Client: Response
Client-->>Agent: StepResult[ChessObservation]
Agent->>Client: state
Client->>WS: Request state
WS->>Server: State request
Server->>Env: state property
Env-->>Server: ChessState
Server-->>WS: State payload
WS-->>Client: Response
Client-->>Agent: ChessState
|
A chess reinforcement learning environment for OpenEnv using the moonfish chess engine for opponent play and position evaluation. Features: - Full chess rules via python-chess library - Configurable opponent: moonfish engine, random moves, or self-play (None) - Position evaluation using moonfish's PSQT-based evaluation - Configurable agent color (white/black/alternate each episode) - Custom starting positions via FEN notation - Terminal state detection on reset for custom positions Rewards: +1.0 win, -1.0 loss, 0.0 draw, -0.1 illegal move
92c83c4 to
827ab74
Compare
|
@greptile |
Greptile found no issues!From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section. This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR". |
Darktex
left a comment
There was a problem hiding this comment.
Left you some comments, marking this as "request changes" just so you see them :D
I'd love to integrate this with the new reward proposal, but I also don't want to block you so happy to land this and then work with you on an updated version once I land the new rubrics.
Curious on your feedback on these rubrics too btw!
| ) | ||
|
|
||
| def _calculate_reward_and_done(self): | ||
| """Calculate reward and check if episode is done.""" |
There was a problem hiding this comment.
Don't we need some discounting?
| if self._board.is_insufficient_material(): | ||
| return 0.0, True | ||
|
|
||
| if self._board.is_fifty_moves(): |
There was a problem hiding this comment.
Is this not the same as step_count exceeding 50?
There was a problem hiding this comment.
no this is for:
The fifty-move rule in chess states that a player can claim a draw if no capture has been made and no pawn has been moved in the last fifty moves (where a "move" consists of a player completing a turn followed by the opponent completing a turn). The purpose of this rule is to prevent a player with no chance of winning from obstinately continuing to play indefinitely or seeking to win by tiring the opponent.
| if self._board.is_checkmate(): | ||
| winner = not self._board.turn | ||
| if winner == self._agent_color: | ||
| return 1.0, True |
There was a problem hiding this comment.
Here, I think you actually wanna go back, and provide each move with some reward. Otherwise you would only be scoring the very last move, which would make your rewards too sparse.
I'm proposing an easier way to do this: see #337
There was a problem hiding this comment.
added some discounting, let me know if this is what you had in mind
| result = self._get_result_string() | ||
|
|
||
| metadata = { | ||
| "evaluation": board_evaluation(self._board), |
There was a problem hiding this comment.
This is super interesting :o How does this work? @luccabb
There was a problem hiding this comment.
it's using PeSTO's Evaluation Function underneath: https://www.chessprogramming.org/PeSTO%27s_Evaluation_Function
- Add gamma parameter (default 0.99) for configurable discounting - Compute discounted rewards at episode end: r_t = γ^(T-1-t) × R_final - Return discounted_rewards in terminal observation metadata - Add tests for discounting formula and behavior - Document the feature in README
|
@greptile |
Additional Comments (3)
Prompt To Fix With AIThis is a comment left during a code review.
Path: tests/envs/test_chess_environment.py
Line: 1095:1121
Comment:
Testing logic doesn't match agent color configuration. Agent is configured as white (`agent_color="white"`) but test plays moves for both white and black, treating all 4 moves as agent moves. With `opponent=None`, both players are the agent, so this test actually validates self-play behavior rather than single-agent behavior. Either configure `opponent="random"` or update the test description to clarify it's testing self-play mode.
How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix With AIThis is a comment left during a code review.
Path: envs/chess_env/server/chess_environment.py
Line: 706:706
Comment:
Board turn logic inverted - `self._board.turn` returns `True` for white's turn, but this checks `if not self._board.turn` which evaluates to black's turn, yet assigns `"white"`. Should be `"white" if self._board.turn else "black"`
```suggestion
current_player="white" if self._board.turn else "black",
```
How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix With AIThis is a comment left during a code review.
Path: envs/chess_env/server/chess_environment.py
Line: 820:820
Comment:
Incorrect result string - when `self._board.turn` is `False` (black's turn), black is checkmated so white wins ("1-0"), but this returns "0-1". Logic should be inverted.
```suggestion
return "1-0" if not self._board.turn else "0-1"
```
How can I resolve this? If you propose a fix, please make it concise. |
|
@greptile |
Summary
Type of Change
Alignment Checklist
Before submitting, verify:
.claude/docs/PRINCIPLES.mdand this PR aligns with our principles.claude/docs/INVARIANTS.mdand no invariants are violated/pre-submit-pr(orbash .claude/hooks/lint.shand tests) and addressed all issuesRFC Status
Test Plan
Claude Code Review