GameVerse: Can Vision-Language Models Learn from Video-based Reflection?

Kuan Zhang^*, Dongchen Liu^*, Qiyue Zhao, Jinkun Hou, Xinran Zhang, Qinlei Xie, Miao Liu^†, Yiming Li^†

GameVerse is a comprehensive benchmark framework designed to evaluate the capabilities of game-playing agents and Vision-Language Models (VLMs) across a diverse set of complex games.

Quick Start

conda create -n generalgamebench python=3.10 -y
conda activate generalgamebench
pip install -r requirements.txt

Create API key files (example for OpenAI):

mkdir -p src/agent_servers/keys/openai-key
echo "YOUR_OPENAI_KEY" > src/agent_servers/keys/openai-key/key.env

Run one evaluation:

python scripts/play_game.py --config src/agent_client/configs/snake/config.yaml

Or use leaderboard launcher scripts (recommended for reproducible batch runs):

# Linux / macOS
bash scripts/leaderboard/{game}/{game}.sh

# Windows PowerShell
powershell -ExecutionPolicy Bypass -File scripts/leaderboard/{game}/{game}.ps1

Installation

1) Game setup

Please complete game-specific setup first:

2) Conda environment

From repository root:

conda create -n generalgamebench python=3.10 -y
conda activate generalgamebench
pip install -r requirements.txt

Optional editable install:

pip install -e .

3) API key setup

Store key files in src/agent_servers/keys (one provider per folder):

src/agent_servers/keys/
	openai-key/key.env
	google-key/key.env
	qwen-key/key.env
	seed-key/key.env

key.env supports both formats:

Plain key text only
KEY_NAME=your_key

You can also set environment variables directly (e.g., OPENAI_API_KEY, GOOGLE_API_KEY, DASHSCOPE_API_KEY, QWEN_API_KEY, ARK_API_KEY).

Evaluation

Primary script: scripts/play_game.py Configuration reference: docs/configuration.md

For standardized benchmark runs, prefer scripts/leaderboard/{game}/ scripts (.sh for Linux/macOS, .ps1 for Windows). They wrap common configurations for each game.

Run one evaluation

python scripts/play_game.py --config src/agent_client/configs/snake/config.yaml

Override config from CLI

python scripts/play_game.py \
	--config src/agent_client/configs/snake/config.yaml \
	agent.llm_name=gpt-4o-mini \
	agent.agent_type=zeroshot_agent \
	env.action_mode=semantic \
	runner.max_steps=100

Common parameters

--config: base YAML config file, usually src/agent_client/configs/{game}/config.yaml
agent.llm_name: model name, e.g. gpt-4o, gpt-4o-mini, gemini-2.5-flash, qwen3-vl-32b-instruct
agent.agent_type: e.g. zeroshot_agent, memory_agent
env.action_mode: semantic or gui
runner.max_steps: maximum steps per run

Leaderboard scripts (recommended)

Single-run entry script per game:

bash scripts/leaderboard/snake/snake.sh

powershell -ExecutionPolicy Bypass -File scripts/leaderboard/snake/snake.ps1

Batch scripts are also available in each game folder (e.g. snake_batch.sh, snake_batch.ps1, snake_batch_vl.sh, snake_batch_vl.ps1).

Reflection / milestone scripts

python scripts/generate_reflection.py --help
python scripts/generate_milestone.py --help

Supported Games

Current built-in game configs:

angry_birds
baba_is_you
civilization
forza_horizon5
genshin
maze
metro
pvz
pwaat (Ace Attorney)
red_dead_redemption2
scene_investigator_demo
slay_the_spire
snake
tic_tac_toe
twenty_fourty_eight

All config entries are in src/agent_client/configs.

Extend to New Games

To add a new game my_game, follow this minimal path:

Create game server implementation under src/game_servers/my_game/.
Add agent server prompts/logic under src/agent_servers/my_game/.
Add config file src/agent_client/configs/my_game/config.yaml.
Ensure EnvCreator(config).create() can resolve env_name: my_game to your env class.
Run evaluation with:

python scripts/play_game.py --config src/agent_client/configs/my_game/config.yaml

Recommendation: copy a structurally similar existing game folder and modify incrementally.

Acknowledgement

A huge thanks to the following projects that made this work possible:

🎮 Gaming Loop Framework: Inspired by Orak & LMGame-Bench.

🖱️ GUI Action Space: Action settings are based on FlashAdventure & UI-TARS.

We are grateful to these authors for their pioneering contributions to the field of GUI agents and gaming benchmarks.

Citation

@article{gameverse2026,
	title={GameVerse: Can Vision-Language Models Learn from Video-based Reflection?},
	author={Zhang, Kuan and Liu, Dongchen and Zhao, Qiyue and Hou, Jinkun and Zhang, Xinran and Xie, Qinlei and Liu, Miao and Li, Yiming},
	journal={arXiv},
	year={2026},
	url={https://arxiv.org/abs/2603.06656}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs		docs
executables/slay_the_spire		executables/slay_the_spire
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GameVerse: Can Vision-Language Models Learn from Video-based Reflection?

Quick Start

Installation

1) Game setup

2) Conda environment

3) API key setup

Evaluation

Run one evaluation

Override config from CLI

Common parameters

Leaderboard scripts (recommended)

Reflection / milestone scripts

Supported Games

Extend to New Games

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GameVerse: Can Vision-Language Models Learn from Video-based Reflection?

Quick Start

Installation

1) Game setup

2) Conda environment

3) API key setup

Evaluation

Run one evaluation

Override config from CLI

Common parameters

Leaderboard scripts (recommended)

Reflection / milestone scripts

Supported Games

Extend to New Games

Acknowledgement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages