Kuan Zhang*, Dongchen Liu*, Qiyue Zhao, Jinkun Hou, Xinran Zhang, Qinlei Xie, Miao Liu†, Yiming Li†
GameVerse is a comprehensive benchmark framework designed to evaluate the capabilities of game-playing agents and Vision-Language Models (VLMs) across a diverse set of complex games.
conda create -n generalgamebench python=3.10 -y
conda activate generalgamebench
pip install -r requirements.txtCreate API key files (example for OpenAI):
mkdir -p src/agent_servers/keys/openai-key
echo "YOUR_OPENAI_KEY" > src/agent_servers/keys/openai-key/key.envRun one evaluation:
python scripts/play_game.py --config src/agent_client/configs/snake/config.yamlOr use leaderboard launcher scripts (recommended for reproducible batch runs):
# Linux / macOS
bash scripts/leaderboard/{game}/{game}.sh# Windows PowerShell
powershell -ExecutionPolicy Bypass -File scripts/leaderboard/{game}/{game}.ps1Please complete game-specific setup first:
- docs/setup_angry_birds.md
- docs/setup_baba_is_you.md
- docs/setup_civilization.md
- docs/setup_forza_horizon5.md
- docs/setup_genshin.md
- docs/setup_maze.md
- docs/setup_metro.md
- docs/setup_pvz.md
- docs/setup_red_dead_redemption2.md
- docs/setup_slay_the_spire.md
- docs/setup_snake.md
- docs/setup_tic_tac_toe.md
- docs/setup_twenty_fourty_eight.md
- docs/setup_ace_attorney.md
From repository root:
conda create -n generalgamebench python=3.10 -y
conda activate generalgamebench
pip install -r requirements.txtOptional editable install:
pip install -e .Store key files in src/agent_servers/keys (one provider per folder):
src/agent_servers/keys/
openai-key/key.env
google-key/key.env
qwen-key/key.env
seed-key/key.env
key.env supports both formats:
- Plain key text only
KEY_NAME=your_key
You can also set environment variables directly (e.g., OPENAI_API_KEY, GOOGLE_API_KEY, DASHSCOPE_API_KEY, QWEN_API_KEY, ARK_API_KEY).
Primary script: scripts/play_game.py
Configuration reference: docs/configuration.md
For standardized benchmark runs, prefer scripts/leaderboard/{game}/ scripts (.sh for Linux/macOS, .ps1 for Windows). They wrap common configurations for each game.
python scripts/play_game.py --config src/agent_client/configs/snake/config.yamlpython scripts/play_game.py \
--config src/agent_client/configs/snake/config.yaml \
agent.llm_name=gpt-4o-mini \
agent.agent_type=zeroshot_agent \
env.action_mode=semantic \
runner.max_steps=100--config: base YAML config file, usuallysrc/agent_client/configs/{game}/config.yamlagent.llm_name: model name, e.g.gpt-4o,gpt-4o-mini,gemini-2.5-flash,qwen3-vl-32b-instructagent.agent_type: e.g.zeroshot_agent,memory_agentenv.action_mode:semanticorguirunner.max_steps: maximum steps per run
Single-run entry script per game:
bash scripts/leaderboard/snake/snake.shpowershell -ExecutionPolicy Bypass -File scripts/leaderboard/snake/snake.ps1Batch scripts are also available in each game folder (e.g. snake_batch.sh, snake_batch.ps1, snake_batch_vl.sh, snake_batch_vl.ps1).
python scripts/generate_reflection.py --help
python scripts/generate_milestone.py --helpCurrent built-in game configs:
angry_birdsbaba_is_youcivilizationforza_horizon5genshinmazemetropvzpwaat(Ace Attorney)red_dead_redemption2scene_investigator_demoslay_the_spiresnaketic_tac_toetwenty_fourty_eight
All config entries are in src/agent_client/configs.
To add a new game my_game, follow this minimal path:
- Create game server implementation under
src/game_servers/my_game/. - Add agent server prompts/logic under
src/agent_servers/my_game/. - Add config file
src/agent_client/configs/my_game/config.yaml. - Ensure
EnvCreator(config).create()can resolveenv_name: my_gameto your env class. - Run evaluation with:
python scripts/play_game.py --config src/agent_client/configs/my_game/config.yamlRecommendation: copy a structurally similar existing game folder and modify incrementally.
A huge thanks to the following projects that made this work possible:
🎮 Gaming Loop Framework: Inspired by Orak & LMGame-Bench.
🖱️ GUI Action Space: Action settings are based on FlashAdventure & UI-TARS.
We are grateful to these authors for their pioneering contributions to the field of GUI agents and gaming benchmarks.
@article{gameverse2026,
title={GameVerse: Can Vision-Language Models Learn from Video-based Reflection?},
author={Zhang, Kuan and Liu, Dongchen and Zhao, Qiyue and Hou, Jinkun and Zhang, Xinran and Xie, Qinlei and Liu, Miao and Li, Yiming},
journal={arXiv},
year={2026},
url={https://arxiv.org/abs/2603.06656}
}