Bridging the gap between medical reasoning quality and inference efficiency through DAG-structured parallel execution.
- [04/15/2026] The code of MedVerse was released!
- [02/10/2026] MedVerse paper released on arXiv.
Medical reasoning is inherently multi-faceted: answering a clinical question requires simultaneously reasoning over differential diagnoses, laboratory findings, drug interactions, and treatment guidelines. Standard autoregressive LLMs collapse these parallel cognitive tasks into a single sequential chain-of-thought — forcing steps that are logically independent to wait on one another.
MedVerse reformulates medical inference as a parallelizable directed acyclic graph (DAG) grounded in Petri net theory. The framework has three components:
| Component | What it does |
|---|---|
| MedVerse Curator | Automated pipeline that synthesizes knowledge-grounded medical reasoning and converts it into Petri net DAG structures for training |
| Topology-Aware Attention | Training-time attention mechanism with adaptive position indices that enables the model to reason across parallel branches while maintaining logical coherence |
| MedVerse Inference Engine | Customized SGLang-based server that executes the DAG at inference time — forking independent steps into concurrent GPU requests and joining their outputs |
Together, these yield a model that matches specialized medical models while improving general-purpose LLMs by up to 8.9%, and an inference engine that reduces latency by 1.3× and increases generation throughput by 1.7× through parallel decoding.
- DAG-structured parallel inference: MedVerse models emit a
<Plan>block encoding explicit step dependencies as a Petri net. The inference engine parses this into a DAG, identifies independent reasoning paths, and dispatches them as concurrent GPU requests — no client changes required. - Topology-aware attention with adaptive position indices: The fine-tuned model learns to produce coherent parallel reasoning branches through a topology-aware attention mask applied during training. Adaptive position indices ensure each branch maintains positional context relative to the shared prefix, not to each other.
- Knowledge-grounded medical reasoning (MedVerse Curator): Training data is synthesized by the MedVerse Curator: a pipeline that decomposes medical questions into multi-step reasoning graphs, grounded in clinical knowledge sources, then converts them into the Petri net format used at inference time.
- Radix-cache prefix sharing: All parallel child requests in Phase II share the same Phase I KV-cache prefix via SGLang's radix attention tree, making Phase II prefill cost near-zero regardless of the number of parallel branches.
git clone https://github.com/aiming-lab/MedVerse.gitTraining (medverse):
cd MedVerse/train
conda create -n medverse python=3.10 -y
conda activate medverse
pip install -r requirements.txtInference Engine (medverse-engine):
cd MedVerse/MedVerse-Engine
conda create -n medverse-engine python=3.11 -y
conda activate medverse-engine
bash install.shMedVerse is fine-tuned from Qwen2.5-7B-Instruct and LLaMA-3.1-8B-Instruct using topology-aware attention on the MedVerse14k dataset — 13,904 medical questions annotated with knowledge-grounded DAG reasoning paths generated by the MedVerse Curator. The training dataset is available on 🤗 HuggingFace.
The scripts below automatically download MedVerse14k from HuggingFace and convert it into the training format for each model. Run the one matching your target model:
cd data
# Qwen2.5 — converts to ChatML format, saves to data/datasets/MedVerse14k
python preparation/prepare_train.py
# LLaMA-3 — converts to LLaMA chat format, saves to data/datasets/MedVerse14k-LLaMA
python preparation/prepare_train_llama.py
cd ..Or generate the dataset from scratch using the MedVerse Curator pipeline — see data/README.md for the full data generation guide including input format and step-by-step instructions.
bash train/scripts/launch_train_qwen.sh # Qwen2.5-7B-Instruct
bash train/scripts/launch_train_llama.sh # LLaMA-3.1-8B-InstructSee train/README.md for configuration details.
python -m sglang.srt.entrypoints.medverse_server \
--model-path /path/to/MedVerse-Qwen2.5-7B \
--tp-size 1 \
--port 30000 \
--trust-remote-code \
--mem-fraction-static 0.85Wait for Server is ready in the logs.
cd MedVerse/MedVerse-Engine/example
python example.py \
--server_url http://localhost:30000 \
--prompts_dir ./promptSee MedVerse-Engine/README.md for full installation and configuration details.
If you find our work helpful, please consider citing:
@article{chen2026medverse,
title = {MedVerse: Efficient and Reliable Medical Reasoning via DAG-Structured Parallel Execution},
author = {Jianwen Chen and Xinyu Yang and Peng Xia and Arian Azarang and Yueh Z Lee and Gang Li and Hongtu Zhu and Yun Li and Beidi Chen and Huaxiu Yao},
journal = {arXiv preprint arXiv:2602.07529},
year = {2026},
url = {https://arxiv.org/abs/2602.07529},
}We would like to express our gratitude to the open-source community and the following projects for making this work possible: SGLang, Multiverse Engine, Qwen, MedReason, etc.
