RepoSearch-R1: A RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search

RepoSearch-R1 is a cold-start agentic reinforcement learning framework that enables large language models to efficiently navigate and extract information from complex code repositories through multi-turn tool interactions. Built on the verl framework, it integrates Monte Carlo Tree Search (MCTS) into the Group Relative Policy Optimization (GRPO) pipeline to generate diverse, high-quality reasoning trajectories via self-training without requiring model distillation or external supervision.

This implementation is based on the research paper: "RepoSearch-R1: A RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search"

Key Innovations

RepoSearch-R1 addresses the limitations of existing approaches through several key innovations:

🎯 Cold-Start Training: Eliminates the need for costly distillation from larger LLMs, addressing data compliance concerns in enterprise environments
🌳 MCTS-Guided Exploration: Integrates Monte Carlo Tree Search into the GRPO pipeline for systematic exploration of diverse reasoning trajectories
🛠️ Specialized Tool Suite: Five repository exploration tools designed for semantic understanding and efficient code navigation

Performance Results

Comprehensive evaluation on repository question-answering tasks demonstrates significant improvements:

16.0% enhancement over no-retrieval methods
10.24% improvement over iterative retrieval methods
33% increase in training efficiency compared to general agentic RL approaches
Maintains robust exploration diversity and answer completeness across repository-level reasoning tasks

MCTS Framework

The RepoSearch-R1 framework consists of three main stages:

1. MCTS-Guided Rollout

Selection: Navigate from root to leaf using exploration-decay UCT formula
Expansion: Generate child nodes with self-critique mechanism
Simulation: Complete rollout using current policy until terminal state
Backpropagation: Update node values with reward calculations

2. Trajectory Selection and Reward Computation

Multiple rollout trajectories containing thought-action-observation cycles
LLM-as-a-judge answer quality assessment combined with intermediate process rewards
Selection of most promising exploration paths for training

3. Advantage Computation and GRPO Training

Group-based advantage estimation with relative quality evaluation
KL-free Group Relative Policy Optimization for policy updates
Self-training without external supervision or distilled data

RepoQA-Agent Tools

The framework includes five specialized tools for repository exploration:

Tool Name	Parameters	Description
`review_file`	`file_path`, `start_lineno`, `end_lineno`	Review code in a specific file from start to end line
`search_keyword_in_folder`	`keyword`, `folder_path`	Search for a keyword in all files within a folder
`list_files_in_folder`	`folder_path`	List all files and subdirectories in a folder
`search_symbol_in_file`	`symbol`, `file_path`	Search for code symbols (functions, variables) in a file
`search_file_in_folder`	`file_name`, `folder_path`	Search for specific files in subdirectories

Getting Started

Prerequisites

Python 3.8+
PyTorch 1.12+
CUDA-compatible GPU (recommended)
Access to a language model (Qwen, Llama, etc.)

Installation

# Clone the repository
git clone https://github.com/your-repo/mcts-rollout-verl-json_tool.git
cd mcts-rollout-verl-json_tool

# Install dependencies
pip install -r requirements.txt

# Install verl framework
pip install verl

Quick Start

Prepare your repository dataset following the CoReQA format
Configure MCTS parameters in your training config:

actor_rollout_ref:
  rollout:
    mcts_agent:
      num_rollouts: 40
      exploration_weight: 2.0
      max_depth_allowed: 10
      max_children_num: 2
      enable_reflection: true

Run RepoSearch-R1 training:

# Example training command
python examples/repoqa/run_qwen3-8b_repoqa_bash_tool_agent_swanlab.sh

Implementation Details

Core Components

verl/experimental/agent_loop/mcts_tool_agent_loop.py: Main MCTS implementation with UCT selection and self-critique mechanisms
verl/tools/repoqa_tool.py: Repository exploration tools with LLM-as-a-judge reward calculation
examples/repoqa/: Training scripts and configuration files for repository QA tasks

MCTS Configuration

Key parameters for tuning the MCTS behavior:

Parameter	Default	Description
`num_rollouts`	40	Number of MCTS rollout iterations
`exploration_weight`	2.0	Initial UCT exploration weight (w₀)
`max_depth_allowed`	10	Maximum tree depth for exploration
`max_children_num`	2	Number of children generated per node
`enable_reflection`	true	Enable self-critique mechanism

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples/repoqa		examples/repoqa
verl		verl
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RepoSearch-R1: A RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search

Key Innovations

Performance Results

MCTS Framework

1. MCTS-Guided Rollout

2. Trajectory Selection and Reward Computation

3. Advantage Computation and GRPO Training

RepoQA-Agent Tools

Getting Started

Prerequisites

Installation

Quick Start

Implementation Details

Core Components

MCTS Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RepoSearch-R1: A RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search

Key Innovations

Performance Results

MCTS Framework

1. MCTS-Guided Rollout

2. Trajectory Selection and Reward Computation

3. Advantage Computation and GRPO Training

RepoQA-Agent Tools

Getting Started

Prerequisites

Installation

Quick Start

Implementation Details

Core Components

MCTS Configuration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages