Skip to content

Add Env for playing Pokémon using poke-env#131

Closed
init27 wants to merge 43 commits into
mainfrom
project-pikachu
Closed

Add Env for playing Pokémon using poke-env#131
init27 wants to merge 43 commits into
mainfrom
project-pikachu

Conversation

@init27

@init27 init27 commented Nov 2, 2025

Copy link
Copy Markdown
Collaborator

This idea came super thanks to @cpich3g from our Hackathon.
We are continuing here but thanks again Justin for leading and inspiring!

Adds a complete Pokemon battle environment integration using poke-env and Pokemon Showdown, following OpenEnv's HTTP-based framework patterns.

Design Decisions

Event Loop Management

  • poke-env runs on dedicated POKE_LOOP background thread
  • FastAPI runs on main uvicorn event loop
  • Proper async synchronization via asyncio.run_coroutine_threadsafe()
  • Lazy initialization of asyncio.Event objects on correct loop

Memory Management

  • Automatic battle cleanup every 10 episodes (prevents accumulation)
  • Previous battle task cancellation on reset
  • Stable memory usage over 100+ episodes

Concurrency

  • Single lock for both reset() and step() operations
  • Prevents race conditions in battle state access
  • Thread-safe action queuing

Error Handling

  • Illegal move validation with fallback to random legal action
  • Battle state validation before step execution
  • Comprehensive None checks for Pokemon attributes
  • Graceful handling of finished battles

File Structure

src/envs/pokemon_env/
├── init.py # Package exports
├── models.py # Action, Observation, State dataclasses
├── client.py # HTTPEnvClient implementation
├── README.md # Environment documentation
└── server/
├── init.py
├── app.py # FastAPI application
├── pokemon_environment.py # Core environment logic
├── Dockerfile # Container with Showdown + OpenEnv
└── requirements.txt # Dependencies

examples/project-pikachu/
├── test_local_pokemon.py # Direct environment tests (6 tests)
├── test_http_pokemon.py # HTTP client tests (6 tests)
└── [documentation files] # Testing guides and architecture docs

Reward Modes

Sparse (default):

  • +1.0 for win, -1.0 for loss, 0.0 otherwise

Dense:

  • +0.2 per opponent Pokemon fainted
  • -0.2 per own Pokemon fainted
  • +0.05 per HP% damage dealt
  • +0.5 bonus for win, -0.5 penalty for loss

Credits

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 2, 2025
@init27 init27 requested a review from zkwentz November 2, 2025 03:57
@github-actions

github-actions Bot commented Nov 2, 2025

Copy link
Copy Markdown

⚠️ Deployment failed for pokemon_env

  • Space repo:
  • Live URL:

Please resolve your environment.

You can iterate locally or validate fixes by running scripts/deploy_to_hf.sh --env "pokemon_env".

@init27 init27 requested a review from Darktex November 2, 2025 03:57
Added observations on popular choices and practices in Pokemon-RL projects.
@github-actions

github-actions Bot commented Nov 2, 2025

Copy link
Copy Markdown

✅ Deployment succeeded for pokemon_env

Nice work! Wait for a code review and we're ready to go.

You can iterate locally or validate fixes by running scripts/deploy_to_hf.sh --env "pokemon_env".

Add observations section to Readme
Updated the checklist for setting up the Pokémon RL training environment, including system requirements, hardware recommendations, setup instructions, and known issues.
Removed unnecessary comments and cleaned up formatting.
Added instructions for forking the Pokemon-Showdown repository and noted compatibility issues with random battle generation.
@github-actions

github-actions Bot commented Nov 5, 2025

Copy link
Copy Markdown

✅ Deployment succeeded for pokemon_env

Nice work! Wait for a code review and we're ready to go.

You can iterate locally or validate fixes by running scripts/deploy_to_hf.sh --env "pokemon_env".

1 similar comment
@github-actions

github-actions Bot commented Nov 5, 2025

Copy link
Copy Markdown

✅ Deployment succeeded for pokemon_env

Nice work! Wait for a code review and we're ready to go.

You can iterate locally or validate fixes by running scripts/deploy_to_hf.sh --env "pokemon_env".

Created a  Basic Setup and Installation.md file
@github-actions

github-actions Bot commented Nov 5, 2025

Copy link
Copy Markdown

✅ Deployment succeeded for pokemon_env

Nice work! Wait for a code review and we're ready to go.

You can iterate locally or validate fixes by running scripts/deploy_to_hf.sh --env "pokemon_env".

@github-actions

github-actions Bot commented Nov 5, 2025

Copy link
Copy Markdown

✅ Deployment succeeded for pokemon_env

Nice work! Wait for a code review and we're ready to go.

You can iterate locally or validate fixes by running scripts/deploy_to_hf.sh --env "pokemon_env".

@Darktex Darktex requested a review from Copilot November 7, 2025 18:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a comprehensive Pokemon battle environment integration to OpenEnv using poke-env and Pokemon Showdown. The implementation provides HTTP-based access to Pokemon battles for reinforcement learning and AI training.

  • Complete Pokemon environment integration with poke-env library
  • Docker containerization with Pokemon Showdown server and OpenEnv API server
  • Client library for HTTP-based environment interaction
  • Comprehensive test suite and documentation

Reviewed Changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 24 comments.

Show a summary per file
File Description
src/envs/pokemon_env/server/pokemon_environment.py Core environment implementation with async/event loop synchronization
src/envs/pokemon_env/client.py HTTP client for Pokemon environment
src/envs/pokemon_env/models.py Data models for actions, observations, and state
src/envs/pokemon_env/server/app.py FastAPI server application
src/envs/pokemon_env/server/Dockerfile Multi-service Docker image configuration
src/envs/pokemon_env/test_enhancements.py Comprehensive test suite
examples/project-pikachu/test_*.py Integration and HTTP client tests
examples/project-pikachu/TESTING.md Testing guide and troubleshooting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if obs.available_switches and random.random() < 0.2 and active_hp < 0.3:
switch_idx = random.choice(range(len(obs.available_switches)))
action = PokemonAction(action_type="switch", action_index=switch_idx)
action_desc = f"Switch to {obs.team[switch_idx + 1].species.title()}"

Copilot AI Nov 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array index mismatch: switch_idx is selected from available_switches (line 498), but used to index obs.team[switch_idx + 1] (line 500). The available_switches list contains indices into the team, not raw team indices. This should be obs.team[obs.available_switches[switch_idx]] to correctly map from the chosen switch index to the actual team member.

Suggested change
action_desc = f"Switch to {obs.team[switch_idx + 1].species.title()}"
action_desc = f"Switch to {obs.team[obs.available_switches[switch_idx]].species.title()}"

Copilot uses AI. Check for mistakes.
dynamax: Whether to dynamax this turn (if applicable)
terastallize: Whether to terastallize this turn (if applicable)
"""
action_type: Literal["move", "switch"] = "move"

Copilot AI Nov 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action_type field doesn't include 'forfeit' or 'default' as valid options, but the pokemon_environment.py implementation handles these action types (lines 142, 193). Consider adding these to the Literal type for consistency: Literal[\"move\", \"switch\", \"forfeit\", \"default\"].

Suggested change
action_type: Literal["move", "switch"] = "move"
action_type: Literal["move", "switch", "forfeit", "default"] = "move"

Copilot uses AI. Check for mistakes.
except ImportError as e:
raise ImportError(
"poke-env is not installed. "
"Please install it with: pip install poke-env"

Copilot AI Nov 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected command formatting. The installation command should use backticks for code formatting consistency: 'Please install it with: pip install poke-env'

Suggested change
"Please install it with: pip install poke-env"
"Please install it with: `pip install poke-env`"

Copilot uses AI. Check for mistakes.
# Otherwise, use a move
elif obs.available_moves:
move_idx = random.choice(range(len(obs.available_moves)))
move_name = obs.available_moves[move_idx]

Copilot AI Nov 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obs.available_moves is a list of integers (move indices, not move names) based on the PokemonObservation model definition. Using it directly as move_name will result in displaying an integer instead of the actual move name. To get the move name, use: obs.active_pokemon.moves[obs.available_moves[move_idx]]['id'] if available.

Suggested change
move_name = obs.available_moves[move_idx]
move_index = obs.available_moves[move_idx]
move_name = obs.active_pokemon.moves[move_index]['id'] if obs.active_pokemon and obs.active_pokemon.moves and move_index < len(obs.active_pokemon.moves) else str(move_index)

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +36
mega_evolve: bool = False
dynamax: bool = False
terastallize: bool = False

Copilot AI Nov 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation for the z_move field. While mega_evolve, dynamax, and terastallize are documented in the docstring, z_move is referenced in pokemon_environment.py but not included in the PokemonAction dataclass or its docstring. Either add the field or remove references to it in the implementation.

Copilot uses AI. Check for mistakes.
"""

import sys
import os

Copilot AI Nov 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.
import sys
import os
import argparse
import time

Copilot AI Nov 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'time' is not used.

Suggested change
import time

Copilot uses AI. Check for mistakes.
"""

import sys
import os

Copilot AI Nov 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.

import sys
import os
import time

Copilot AI Nov 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'time' is not used.

Suggested change
import time

Copilot uses AI. Check for mistakes.
sys.path.insert(0, str(project_root / "src"))

# Import models and environment
from envs.pokemon_env.models import PokemonAction, PokemonObservation, PokemonState

Copilot AI Nov 7, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'PokemonObservation' is not used.
Import of 'PokemonState' is not used.

Suggested change
from envs.pokemon_env.models import PokemonAction, PokemonObservation, PokemonState
from envs.pokemon_env.models import PokemonAction

Copilot uses AI. Check for mistakes.

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving my manual review here and claude review coming next...

action_index: int = 0
move_id: Optional[str] = None
switch_pokemon: Optional[str] = None
mega_evolve: bool = False

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be inside Action type then?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or like, these should be "can_mega_evolve", no?

ability: Optional[str]
item: Optional[str]

attack: int

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't get to see the opponent pokemon's stats ever though

active_pokemon: Optional[PokemonData] = None
opponent_active_pokemon: Optional[PokemonData] = None
team: List[PokemonData] = field(default_factory=list)
opponent_team: List[PokemonData] = field(default_factory=list)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a placeholder for opponent pokemon that you have not yet seen. eg at the very start of the battle you see that the other guy has 6 pokemon, then they launch the first and you see that he has 5 more but you don't know which ones

turn: int = 0
forced_switch: bool = False

can_mega_evolve: bool = False

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are already there in the PokemonData of the active pokemon

def _compute_reward(self, battle, done: bool) -> float:
"""Compute reward based on reward_mode."""

if self.reward_mode == "sparse":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do magic numbers here, these should be configurable

# Final outcome bonus
if done:
if battle.won:
reward += 0.5

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(You probably want something more like 10 for this)

self._last_player_fainted = player_fainted

# Small reward for opponent HP damage
if battle.opponent_active_pokemon and hasattr(battle.opponent_active_pokemon, 'current_hp_fraction'):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should likely add a reward for statuses. Paralyzed, Toxin, Spikes, Leech Seed etc.

There should be another reward for REMOVING statuses. What made Blissey super strong was that she could learn Aromatherapy which removes status debuffs on your entire team.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also add a reward for buffs which are a big deal in this game (we banned using Curse on Snorlax back in the day because he became ridiculously strong). Marowak is one Swords Dance away from 999 Attack which one-shots most pokemon in the game. Buffs are heavily quantized, eg Dragon Dance is +1 Atk +1 Spd while Swords Dance is +2 Atk and Belly Drum is +4 Atk (but you lose 50% hp!). Each +1 is +50%.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, there should be a reward for cleaning buffs. Skarmory was played for this reason: he could Whirlwind out your pokemon (buffs do not survive a switch unlike statuses)

# Small reward for opponent HP damage
if battle.opponent_active_pokemon and hasattr(battle.opponent_active_pokemon, 'current_hp_fraction'):
current_hp = battle.opponent_active_pokemon.current_hp_fraction
if current_hp is not None:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also HP are often recovered. Moves like Recover, etc. There should be another reward for recovery of HP

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR adds a comprehensive Pokemon battle environment integration using poke-env and Pokemon Showdown. The implementation demonstrates excellent async synchronization, proper memory management, and strong test coverage. The architecture follows OpenEnv patterns well.

Highlights ✅

  • Excellent async event loop bridging between FastAPI and poke-env's POKE_LOOP
  • Memory leak prevention with automatic battle cleanup
  • Comprehensive error handling with fallback to legal actions
  • Dual reward modes (sparse/dense) for different RL approaches
  • Strong test coverage (needs relocation)
  • Thread-safe with proper locking

Critical Issues 🔴

Two blocking issues must be fixed before merge:

  1. Missing Copyright Headers: All Python files in src/envs/pokemon_env/ need Meta copyright headers
  2. Tests in Wrong Location: Tests should be in tests/envs/pokemon_env/, not examples/

Important Issues 🟡

  1. Test Suite Integration: Tests need proper pytest structure and CI/CD integration
  2. Dockerfile Organization: Multiple Dockerfiles need clarification (see proposal below)
  3. Hard-coded Port: Dockerfile should respect PORT environment variable
  4. Error Messages: Add context to observation metadata for better debugging
  5. Documentation Cleanup: Remove development artifacts from examples (see proposal below)

Minor Issues 🟢

Issues #8-12 are accepted as minor improvements for future consideration.


Proposals

Proposal for Issue #4: Dockerfile Organization

Current situation: Three Dockerfiles exist:

  • Dockerfile - Combines both Showdown + OpenEnv with supervisor ✅
  • Dockerfile.showdown - Standalone Showdown
  • Dockerfile.pokemonenv - Standalone OpenEnv

Recommendation: Keep the main Dockerfile (it's well-designed) and remove Dockerfile.showdown and Dockerfile.pokemonenv because:

  1. The main Dockerfile already handles everything with multi-stage build
  2. Having multiple Dockerfiles creates confusion about which to use
  3. The README should document the single, consolidated approach
  4. Separate containers are rarely needed for this use case (Showdown must run alongside the env)

Alternative (if separate builds are truly needed): Keep all three but:

  • Rename them clearly: Dockerfile, Dockerfile.showdown-standalone, Dockerfile.env-only
  • Add clear documentation in README explaining when to use each
  • Add a decision tree: "Use main Dockerfile unless you need X"

My recommendation: Remove the extra Dockerfiles. The all-in-one approach is simpler and matches the architecture (two services that must communicate).


Proposal for Issue #7: Documentation Cleanup

Current state: examples/project-pikachu/ contains:

  • test_local_pokemon.py (move to tests) ✓
  • test_http_pokemon.py (move to tests) ✓
  • BasicSetupCheckList.md - Development artifact ❌
  • IMPLEMENTATION_SUMMARY.md - Development artifact ❌
  • Poke-Env.MD - Library documentation (external) ❌
  • Readme.MD - Duplicate of main README ❌
  • TESTING.md - Internal testing guide ❌

Recommendation: Remove entire examples/project-pikachu/ directory and instead:

  1. Create examples/pokemon_env_example.py with a simple, documented example:
#!/usr/bin/env python3
"""
Simple Pokemon Battle Environment Example.

This example shows how to use the Pokemon environment with OpenEnv.

Prerequisites:
    docker run -p 8000:8000 -p 9980:9980 pokemon-env:latest

Usage:
    python examples/pokemon_env_example.py
"""

from envs.pokemon_env import PokemonEnv, PokemonAction
import random

def main():
    # Connect to server
    client = PokemonEnv(base_url="http://localhost:9980")
    
    # Reset environment
    result = client.reset()
    print(f"Battle started! Active Pokemon: {result.observation.active_pokemon.species}")
    
    # Battle loop
    step = 0
    while not result.done and step < 100:
        # Choose random legal action
        if result.observation.available_moves:
            action = PokemonAction(
                action_type="move",
                action_index=random.choice(result.observation.available_moves)
            )
        elif result.observation.available_switches:
            action = PokemonAction(
                action_type="switch",
                action_index=random.choice(result.observation.available_switches)
            )
        else:
            break
        
        result = client.step(action)
        step += 1
        
        if step % 10 == 0:
            print(f"Turn {step}: Reward={result.reward:.2f}")
    
    print(f"\nBattle finished in {step} turns")
    print(f"Final reward: {result.reward}")
    state = client.state()
    print(f"Winner: {state.battle_winner}")

if __name__ == "__main__":
    main()
  1. Merge useful content into main README:

    • Architecture details from IMPLEMENTATION_SUMMARY.md
    • Any unique troubleshooting from TESTING.md
  2. Benefits:

    • Single source of truth (README.md)
    • Clear, runnable example for users
    • No confusion about what's documentation vs development artifacts
    • Cleaner repository structure

Overall: This is a high-quality implementation. Once copyright headers are added and tests are relocated, this will be an excellent addition to OpenEnv! 🎉

@@ -0,0 +1,127 @@
"""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL: Missing Copyright Header

All Python files in the OpenEnv repository must include the Meta copyright header. This file and all other Python files in src/envs/pokemon_env/ are missing this required header.

Required header:

# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

Files needing headers:

  • models.py
  • client.py
  • __init__.py
  • server/app.py
  • server/pokemon_environment.py
  • server/__init__.py

Please add this header to the top of each file before the module docstring.

@@ -0,0 +1,310 @@
#!/usr/bin/env python3

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL: Tests in Wrong Location

Tests should be located in tests/envs/pokemon_env/, not in examples/. The examples directory is for user-facing demonstration code, not test suites.

Required changes:

  1. Create tests/envs/pokemon_env/ directory
  2. Move test_local_pokemon.pytests/envs/pokemon_env/test_environment.py
  3. Move test_http_pokemon.pytests/envs/pokemon_env/test_client.py
  4. Remove manual sys.path manipulation (tests should use proper imports)
  5. Follow pytest conventions and integrate with existing test infrastructure

Reference: See tests/envs/atari_env/ or tests/envs/echo_env/ for proper structure.


# Validate battle state
if self._current_battle.finished:
logger.warning("Step called on finished battle, returning final state")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 IMPORTANT: Incomplete Error Messages

When step() is called on a finished battle, the warning is logged but not exposed to the user in the observation.

Improvement:

logger.warning("Step called on finished battle, returning final state")
obs = self._battle_to_observation(self._current_battle, reward=None, done=True)
obs.metadata["warning"] = "Step called on already finished battle"
return obs

This helps users debug issues when they accidentally call step after the battle ends.

\n\
[program:openenv]\n\
command=uvicorn envs.pokemon_env.server.app:app --host 0.0.0.0 --port 9980\n\
directory=/app\n\

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 IMPORTANT: Hard-coded Port

The supervisor config hard-codes port 9980. This should respect the PORT environment variable for deployment flexibility.

Fix:

[program:openenv]
command=bash -c "uvicorn envs.pokemon_env.server.app:app --host 0.0.0.0 --port ${PORT:-9980}"
directory=/app
environment=PYTHONPATH="/app/src",PORT="${PORT:-9980}"

This allows deployments to override the port via environment variable while keeping 9980 as the default.

self._episodes_completed = 0
self._cleanup_interval = 10 # Clean up every 10 episodes

def _pokemon_to_data(self, pokemon) -> Optional[PokemonData]:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR: Type Annotation

The pokemon parameter could have a type hint for better documentation:

from typing import Any

def _pokemon_to_data(self, pokemon: Any) -> Optional[PokemonData]:

Since this comes from the external poke-env library, Any is appropriate here.


# Battle history cleanup interval
self._episodes_completed = 0
self._cleanup_interval = 10 # Clean up every 10 episodes

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR: Magic Number

The cleanup interval is hard-coded. Consider making it configurable:

def __init__(
    self,
    battle_format: str = "gen8randombattle",
    player_username: Optional[str] = None,
    opponent: Optional[Player] = None,
    reward_mode: str = "sparse",
    max_turns: int = 1000,
    cleanup_interval: int = 10,  # Add this parameter
):
    ...
    self._cleanup_interval = cleanup_interval

This allows fine-tuning in production environments if needed.


```bash
# Build both images (run from project root directory)
docker build -t pokemon-showdown:latest -f src/envs/pokemon_env/server/Dockerfile.showdown .

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 IMPORTANT: Dockerfile Documentation

The README mentions building Dockerfile.showdown and Dockerfile.pokemonenv separately, but the main Dockerfile appears to handle everything with supervisor.

Please clarify:

  1. Which Dockerfiles are actually needed?
  2. When to use each one?

See the proposal in the main review for a recommendation to simplify this.

Darktex
Darktex previously approved these changes Jan 13, 2026

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code (alignment-reviewer agent), not a human review. The account posting this is shared with the human maintainer.



Automated review by Claude Code | Learn more about OpenEnv's agentic workflow

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code (alignment-reviewer agent), not a human review. The account posting this is shared with the human maintainer.


Alignment Review Report

Overview

Reviewed PR #131: "Add Env for playing Pokémon using poke-env"

This PR adds a complete Pokemon battle environment integration using poke-env and Pokemon Showdown. The implementation follows OpenEnv patterns but has several critical issues that need addressing.


Tier 1: Fixes Required

Critical Issues (Build Blockers)

  • src/envs/pokemon_env/client.py:13 - Import error: from core.http_env_client import HTTPEnvClient should be from openenv.core.env_client import EnvClient. The HTTPEnvClient class does not exist in the codebase.

  • src/envs/pokemon_env/client.py:11 - Import error: from core.client_types import StepResult should be from openenv.core.client_types import StepResult

  • src/envs/pokemon_env/models.py:13 - Import error: from core.env_server import Action, Observation, State should be from openenv.core.env_server.types import Action, Observation, State

  • src/envs/pokemon_env/server/app.py:26 - Import error: from core.env_server import create_app should be from openenv.core.env_server import create_app

  • src/envs/pokemon_env/server/pokemon_environment.py:22 - Import error: from core.env_server import Action, Environment, Observation should be from openenv.core.env_server.interfaces import Environment and from openenv.core.env_server.types import Action, Observation

  • src/envs/pokemon_env/client.py:21 - Class inheritance error: Should inherit from EnvClient[PokemonAction, PokemonObservation, State] not HTTPEnvClient. The current codebase uses WebSocket-based EnvClient, not HTTP.

Code Organization Issues

  • src/envs/pokemon_env/test_enhancements.py - Test file should not be in src/envs/pokemon_env/. Move to tests/envs/ or examples/project-pikachu/

  • src/envs/pokemon_env/test_pokemon_docker.sh - Test script should not be in src/envs/pokemon_env/. Move to examples/project-pikachu/

  • src/envs/pokemon_env/test_notebook/ - Notebook should not be in src/envs/pokemon_env/. Move to examples/project-pikachu/

Missing Type Information

  • src/envs/pokemon_env/client.py - Missing State type parameter. Should define a proper PokemonState class in models.py and use it in the generic type (currently using bare State)

Documentation

  • src/envs/pokemon_env/models.py:169 - Contains password: Optional[str] = None in PokemonConfig. While this is for poke-env authentication (not exposed), add a comment clarifying this is for Pokemon Showdown server auth, not production secrets.

Tier 2: Alignment Discussion Points

ALIGNMENT FLAG: API Protocol Mismatch

  • Principle at stake: "Minimize lifecycle deltas" (PRINCIPLES.md) and "WebSocket for all environment communication" (INVARIANTS.md note)
  • The concern: This PR uses HTTPEnvClient (HTTP-based) when the codebase has moved to WebSocket-based EnvClient. According to INVARIANTS.md: "We are in the process of deprecating HTTP (see PR #252) in favor of WebSocket-only, but we are still transitioning and both protocols are currently available." However, HTTPEnvClient doesn't exist in the codebase - suggesting this PR may be based on old code or the transition is complete.
  • Impact: New environments should follow current patterns. Using deprecated/non-existent APIs creates technical debt.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG: Test Files in Source Tree

  • Principle at stake: "Be hands-on" (PRINCIPLES.md) and code organization best practices
  • The concern: Test files (test_enhancements.py, test_pokemon_docker.sh, test_notebook/) are placed in src/envs/pokemon_env/ instead of appropriate test directories. This violates typical Python packaging conventions where src/ contains only distributable code.
  • Impact: These files would be included in package distributions, increasing package size and exposing test-only code.
  • Suggested approach: Move to examples/project-pikachu/ (for integration examples) or tests/envs/ (for unit tests)
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG: Client-Server Separation Compliance

  • Principle at stake: "Client-server separation" (INVARIANTS.md)
  • The concern: Need to verify that client.py never imports from server/ directory. Initial review shows proper separation (client.py only imports from models.py and openenv.core), but should be explicitly validated.
  • Status: ✅ Appears compliant - client only imports from models.py (shared) and openenv.core
  • Suggested reviewer: @Darktex (final confirmation)

ALIGNMENT FLAG: Reward Computation Location

  • Principle at stake: "Rewards in environment" (INVARIANTS.md)
  • The concern: Reward computation is correctly placed inside PokemonEnvironment._compute_reward() on the server side, supporting both "sparse" and "dense" modes. This is aligned with the principle.
  • Status: ✅ Compliant - rewards are computed server-side
  • No action needed - documenting for completeness

ALIGNMENT FLAG: Agent Cannot Reset

  • Principle at stake: "Agents cannot reset" (INVARIANTS.md)
  • The concern: The environment correctly does NOT expose reset/step/state via MCP tools. The WebSocket/HTTP interface is for orchestration only. No MCP server is defined in this PR.
  • Status: ✅ Compliant - simulation controls not exposed to agents
  • Future consideration: When MCP tools are added for Pokemon battles (e.g., "choose_move"), ensure they don't expose reset()
  • Suggested reviewer: @Darktex (for future MCP integration)

Automated Checks

Debug Code Check

  • Status: ⚠️ FOUND (but not in PR files)
  • Details: Existing codebase has debug print statements in unrelated files. No debug code found in the Pokemon environment PR files themselves.

Lint Check

  • Status: ❌ UNABLE TO RUN
  • Details: uv not installed in environment. Recommend running bash .claude/hooks/lint.sh locally before merge.

Summary

Critical Path to Merge

  1. Fix all import paths (change core.* to openenv.core.*)
  2. Replace HTTPEnvClient with EnvClient and ensure WebSocket compatibility
  3. Move test files out of src/ directory
  4. Run lint check with bash .claude/hooks/lint.sh
  5. Test the environment to ensure WebSocket client works

Counts

  • 7 critical import/type errors (build blockers)
  • 3 code organization issues (test files in wrong location)
  • 3 alignment discussion points (API protocol, test organization, architectural compliance)
  • 3 alignment confirmations (client-server separation ✅, rewards ✅, agent isolation ✅)

Recommendation

REQUEST CHANGES - The import errors are critical and will cause immediate runtime failures. Once these are fixed and tests are reorganized, the architectural design appears sound and aligned with OpenEnv principles.

The core implementation (event loop management, memory cleanup, battle state serialization) is well-designed and shows good understanding of the poke-env async patterns.


Automated review by Claude Code | Learn more about OpenEnv's agentic workflow

@Darktex Darktex dismissed their stale review January 13, 2026 05:51

Dismissing automated approval due to bug in review bot. The original review either had blank content or approved despite finding blocking issues. Please disregard this approval.

@zkwentz

zkwentz commented Jan 21, 2026

Copy link
Copy Markdown
Collaborator

@greptile

@greptile-apps

greptile-apps Bot commented Jan 21, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds comprehensive Pokemon battle environment using poke-env and Pokemon Showdown, implementing proper async/threading synchronization between FastAPI and poke-env's POKE_LOOP with memory leak prevention through periodic battle cleanup.

Key Changes:

  • Implements PokemonEnvironment with single-lock concurrency control for reset() and step()
  • Bridges two event loops: FastAPI's main loop and poke-env's POKE_LOOP using asyncio.run_coroutine_threadsafe()
  • Provides HTTP client (PokemonEnv) following OpenEnv patterns with proper serialization
  • Supports sparse and dense reward modes with configurable battle formats
  • Includes comprehensive test suites for both local and HTTP usage
  • Provides three Docker deployment options: monolithic, separated containers, and development setup

Critical Issues Found:

  • Infinite recursion bug in _compute_reward() when reward_mode is invalid (line 460)
  • Missing RandomPlayer import in pokemon_environment.py (line 30)
  • Incomplete action_type literal in models.py missing "default" and "forfeit" values
  • Port inconsistency: entrypoint.sh and supervisord.conf use port 9000 while all other configs use 9980

Architecture Strengths:

  • Proper event loop isolation prevents deadlocks
  • Single lock design eliminates race conditions between reset/step
  • Automatic cleanup every 10 episodes prevents memory growth
  • Comprehensive error handling with illegal move fallback to random actions

Confidence Score: 2/5

  • Not safe to merge - contains critical bugs including infinite recursion and port mismatches that will cause runtime failures
  • Score reflects presence of critical bugs: infinite recursion in reward computation, missing import, incomplete type definitions, and deployment-breaking port inconsistencies. While the architecture and async design are solid, these issues must be fixed before deployment
  • Pay close attention to src/envs/pokemon_env/server/pokemon_environment.py (infinite recursion + missing import), src/envs/pokemon_env/models.py (incomplete types), src/envs/pokemon_env/server/entrypoint.sh and src/envs/pokemon_env/server/supervisord.conf (port mismatches)

Important Files Changed

Filename Overview
src/envs/pokemon_env/server/pokemon_environment.py Core environment logic with infinite recursion bug and missing import; otherwise solid async/threading design
src/envs/pokemon_env/models.py Data models with incomplete action_type literal definition missing "default" and "forfeit" values
src/envs/pokemon_env/server/entrypoint.sh Port mismatch - uses 9000 instead of 9980, will cause deployment failures
src/envs/pokemon_env/server/supervisord.conf Port mismatch - uses 9000 instead of 9980, inconsistent with other configuration
src/envs/pokemon_env/client.py HTTP client implementation is clean with proper parsing and type handling
src/envs/pokemon_env/server/app.py FastAPI app setup is straightforward and follows OpenEnv patterns correctly

Sequence Diagram

sequenceDiagram
    participant Client as HTTP Client
    participant FastAPI as FastAPI Server<br/>(Main Loop)
    participant PokemonEnv as PokemonEnvironment<br/>(with Lock)
    participant Player as OpenEnvPokemonPlayer
    participant PokeLoop as POKE_LOOP<br/>(Background Thread)
    participant Showdown as Pokemon Showdown<br/>(WebSocket)

    Note over Client,Showdown: Reset Flow
    Client->>FastAPI: POST /reset
    FastAPI->>PokemonEnv: reset()
    PokemonEnv->>PokemonEnv: Acquire _env_lock
    PokemonEnv->>PokemonEnv: Cancel previous battle task
    PokemonEnv->>PokemonEnv: Cleanup old battles (every 10 episodes)
    PokemonEnv->>PokeLoop: asyncio.run_coroutine_threadsafe(start_battle())
    PokeLoop->>Player: battle_against(opponent)
    Player->>Showdown: WebSocket connect & battle init
    Showdown-->>Player: Battle started
    Player-->>PokeLoop: Battle task created
    PokeLoop-->>PokemonEnv: Battle reference
    PokemonEnv->>PokemonEnv: Release _env_lock
    PokemonEnv-->>FastAPI: Initial PokemonObservation
    FastAPI-->>Client: 200 OK with observation

    Note over Client,Showdown: Step Flow
    Client->>FastAPI: POST /step with PokemonAction
    FastAPI->>PokemonEnv: step(action)
    PokemonEnv->>PokemonEnv: Acquire _env_lock
    PokemonEnv->>PokemonEnv: Validate action type
    PokemonEnv->>Player: set_next_action(action)
    Player->>PokeLoop: Schedule _set_action() on POKE_LOOP
    PokeLoop->>Player: Set _next_action, signal _action_event
    PokemonEnv->>PokeLoop: asyncio.run_coroutine_threadsafe(wait_turn())
    Note over Player,Showdown: choose_move() called by poke-env
    Player->>Player: wait_for(_action_event)
    Player->>Player: _action_to_order(action)
    Player->>Showdown: Send move via WebSocket
    Showdown-->>Player: Battle state update
    Player->>Player: Signal _turn_complete_event
    PokeLoop-->>PokemonEnv: Turn complete
    PokemonEnv->>PokemonEnv: _battle_to_observation()
    PokemonEnv->>PokemonEnv: _compute_reward()
    PokemonEnv->>PokemonEnv: Release _env_lock
    PokemonEnv-->>FastAPI: Updated PokemonObservation
    FastAPI-->>Client: 200 OK with observation
Loading

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile


else:
# Unknown mode, use sparse
return self._compute_reward(battle, done) if done else 0.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: recursive call to _compute_reward will cause infinite recursion if reward_mode is not "sparse" or "dense"

Suggested change
return self._compute_reward(battle, done) if done else 0.0
return 0.0 # Unknown mode, default to sparse behavior
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/pokemon_env/server/pokemon_environment.py
Line: 460:460

Comment:
**logic:** recursive call to `_compute_reward` will cause infinite recursion if `reward_mode` is not `"sparse"` or `"dense"`

```suggestion
            return 0.0  # Unknown mode, default to sparse behavior
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +29 to +32
echo "Starting Pokemon OpenEnv server on port 9000..."
cd /app
export PYTHONPATH=/app/src
exec uvicorn envs.pokemon_env.server.app:app --host 0.0.0.0 --port 9000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: port mismatch - entrypoint.sh uses port 9000 but Dockerfile, supervisord.conf, and other files use port 9980

Suggested change
echo "Starting Pokemon OpenEnv server on port 9000..."
cd /app
export PYTHONPATH=/app/src
exec uvicorn envs.pokemon_env.server.app:app --host 0.0.0.0 --port 9000
echo "Starting Pokemon OpenEnv server on port 9980..."
cd /app
export PYTHONPATH=/app/src
exec uvicorn envs.pokemon_env.server.app:app --host 0.0.0.0 --port 9980
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/pokemon_env/server/entrypoint.sh
Line: 29:32

Comment:
**logic:** port mismatch - entrypoint.sh uses port 9000 but Dockerfile, supervisord.conf, and other files use port 9980

```suggestion
echo "Starting Pokemon OpenEnv server on port 9980..."
cd /app
export PYTHONPATH=/app/src
exec uvicorn envs.pokemon_env.server.app:app --host 0.0.0.0 --port 9980
```

How can I resolve this? If you propose a fix, please make it concise.

priority=10

[program:openenv]
command=uvicorn envs.pokemon_env.server.app:app --host 0.0.0.0 --port 9000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: port mismatch - supervisord.conf uses port 9000 but Dockerfile.pokemonenv, Dockerfile, and README.md use port 9980

Suggested change
command=uvicorn envs.pokemon_env.server.app:app --host 0.0.0.0 --port 9000
command=uvicorn envs.pokemon_env.server.app:app --host 0.0.0.0 --port 9980
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/pokemon_env/server/supervisord.conf
Line: 19:19

Comment:
**logic:** port mismatch - supervisord.conf uses port 9000 but Dockerfile.pokemonenv, Dockerfile, and README.md use port 9980

```suggestion
command=uvicorn envs.pokemon_env.server.app:app --host 0.0.0.0 --port 9980
```

How can I resolve this? If you propose a fix, please make it concise.

try:
# Import from top-level poke_env module
from poke_env import Player, RandomPlayer, AccountConfiguration, ServerConfiguration
# Import battle orders from player submodule

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: missing import for RandomPlayer

Suggested change
# Import battle orders from player submodule
from poke_env import Player, AccountConfiguration, ServerConfiguration
from poke_env.player import RandomPlayer
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/pokemon_env/server/pokemon_environment.py
Line: 30:30

Comment:
**syntax:** missing import for `RandomPlayer`

```suggestion
    from poke_env import Player, AccountConfiguration, ServerConfiguration
    from poke_env.player import RandomPlayer
```

How can I resolve this? If you propose a fix, please make it concise.

dynamax: Whether to dynamax this turn (if applicable)
terastallize: Whether to terastallize this turn (if applicable)
"""
action_type: Literal["move", "switch"] = "move"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: missing "default" and "forfeit" action types from literal - these are used in pokemon_environment.py lines 142, 193, and 197

Suggested change
action_type: Literal["move", "switch"] = "move"
action_type: Literal["move", "switch", "default", "forfeit"] = "move"
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/envs/pokemon_env/models.py
Line: 30:30

Comment:
**syntax:** missing `"default"` and `"forfeit"` action types from literal - these are used in `pokemon_environment.py` lines 142, 193, and 197

```suggestion
    action_type: Literal["move", "switch", "default", "forfeit"] = "move"
```

How can I resolve this? If you propose a fix, please make it concise.

@burtenshaw

Copy link
Copy Markdown
Collaborator

@init27 It looks like this PR is no longer compatible with the code base. I'd suggest closing and opening a new PR with the env in envs/, just for simplicity.

@burtenshaw

Copy link
Copy Markdown
Collaborator

@init27 @Darktex as discussed, I'll close because they are out of sync with the current project. Hope to see them reopened.

@burtenshaw burtenshaw closed this Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. New Environment

Projects

None yet

Development

Successfully merging this pull request may close these issues.