Skip to content

THUSI-Lab/GameVerse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GameVerse: Can Vision-Language Models Learn from Video-based Reflection?

Kuan Zhang*, Dongchen Liu*, Qiyue Zhao, Jinkun Hou, Xinran Zhang, Qinlei Xie, Miao Liu, Yiming Li

arXiv Project Page License

GeneralGameBench Framework

GameVerse is a comprehensive benchmark framework designed to evaluate the capabilities of game-playing agents and Vision-Language Models (VLMs) across a diverse set of complex games.


Quick Start

conda create -n generalgamebench python=3.10 -y
conda activate generalgamebench
pip install -r requirements.txt

Create API key files (example for OpenAI):

mkdir -p src/agent_servers/keys/openai-key
echo "YOUR_OPENAI_KEY" > src/agent_servers/keys/openai-key/key.env

Run one evaluation:

python scripts/play_game.py --config src/agent_client/configs/snake/config.yaml

Or use leaderboard launcher scripts (recommended for reproducible batch runs):

# Linux / macOS
bash scripts/leaderboard/{game}/{game}.sh
# Windows PowerShell
powershell -ExecutionPolicy Bypass -File scripts/leaderboard/{game}/{game}.ps1

Installation

1) Game setup

Please complete game-specific setup first:

2) Conda environment

From repository root:

conda create -n generalgamebench python=3.10 -y
conda activate generalgamebench
pip install -r requirements.txt

Optional editable install:

pip install -e .

3) API key setup

Store key files in src/agent_servers/keys (one provider per folder):

src/agent_servers/keys/
	openai-key/key.env
	google-key/key.env
	qwen-key/key.env
	seed-key/key.env

key.env supports both formats:

  • Plain key text only
  • KEY_NAME=your_key

You can also set environment variables directly (e.g., OPENAI_API_KEY, GOOGLE_API_KEY, DASHSCOPE_API_KEY, QWEN_API_KEY, ARK_API_KEY).

Evaluation

Primary script: scripts/play_game.py Configuration reference: docs/configuration.md

For standardized benchmark runs, prefer scripts/leaderboard/{game}/ scripts (.sh for Linux/macOS, .ps1 for Windows). They wrap common configurations for each game.

Run one evaluation

python scripts/play_game.py --config src/agent_client/configs/snake/config.yaml

Override config from CLI

python scripts/play_game.py \
	--config src/agent_client/configs/snake/config.yaml \
	agent.llm_name=gpt-4o-mini \
	agent.agent_type=zeroshot_agent \
	env.action_mode=semantic \
	runner.max_steps=100

Common parameters

  • --config: base YAML config file, usually src/agent_client/configs/{game}/config.yaml
  • agent.llm_name: model name, e.g. gpt-4o, gpt-4o-mini, gemini-2.5-flash, qwen3-vl-32b-instruct
  • agent.agent_type: e.g. zeroshot_agent, memory_agent
  • env.action_mode: semantic or gui
  • runner.max_steps: maximum steps per run

Leaderboard scripts (recommended)

Single-run entry script per game:

bash scripts/leaderboard/snake/snake.sh
powershell -ExecutionPolicy Bypass -File scripts/leaderboard/snake/snake.ps1

Batch scripts are also available in each game folder (e.g. snake_batch.sh, snake_batch.ps1, snake_batch_vl.sh, snake_batch_vl.ps1).

Reflection / milestone scripts

python scripts/generate_reflection.py --help
python scripts/generate_milestone.py --help

Supported Games

Current built-in game configs:

  • angry_birds
  • baba_is_you
  • civilization
  • forza_horizon5
  • genshin
  • maze
  • metro
  • pvz
  • pwaat (Ace Attorney)
  • red_dead_redemption2
  • scene_investigator_demo
  • slay_the_spire
  • snake
  • tic_tac_toe
  • twenty_fourty_eight

All config entries are in src/agent_client/configs.

Extend to New Games

To add a new game my_game, follow this minimal path:

  1. Create game server implementation under src/game_servers/my_game/.
  2. Add agent server prompts/logic under src/agent_servers/my_game/.
  3. Add config file src/agent_client/configs/my_game/config.yaml.
  4. Ensure EnvCreator(config).create() can resolve env_name: my_game to your env class.
  5. Run evaluation with:
python scripts/play_game.py --config src/agent_client/configs/my_game/config.yaml

Recommendation: copy a structurally similar existing game folder and modify incrementally.

Acknowledgement

A huge thanks to the following projects that made this work possible:

🎮 Gaming Loop Framework: Inspired by Orak & LMGame-Bench.

🖱️ GUI Action Space: Action settings are based on FlashAdventure & UI-TARS.

We are grateful to these authors for their pioneering contributions to the field of GUI agents and gaming benchmarks.

Citation

@article{gameverse2026,
	title={GameVerse: Can Vision-Language Models Learn from Video-based Reflection?},
	author={Zhang, Kuan and Liu, Dongchen and Zhao, Qiyue and Hou, Jinkun and Zhang, Xinran and Xie, Qinlei and Liu, Miao and Li, Yiming},
	journal={arXiv},
	year={2026},
	url={https://arxiv.org/abs/2603.06656}
}

About

GameVerse: Can Vision-Language Models Learn from Video-based Reflection?

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors