Skip to content

Thecommonirin/TITA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TITA: Token-Level Inference-Time Alignment for Vision-Language Models

TITA Overview

Quick Start

conda create -n tita python==3.10 -y
conda activate tita
pip install torch==2.0.1 torchvision==0.15.2
pip install -e .

Benchmarking

tita-bench sample --out sample.jsonl
tita-bench run --api-key $KEY --model deepseek-vl2 \
  --questions sample.jsonl --images ./api_benchmark_results \
  --out results.jsonl
tita-bench eval --dir ./api_benchmark_results

Dataset Layout

data/
├── texvqa/
│   └── train_images/
└── ocrvqa/
    └── images/

Training

bash run_dpo.sh

For example:

deepspeed --include localhost:0,1,2,3 src/train_dpo_ours.py \
  --deepspeed src/configs/deepspeed/zero3_offload.json \
  --model_name_or_path bczhou/tiny-llava-v1-hf \
  ...

Inference

python src/inference.py
python src/llava/serve/test_message.py --controller-address  \
  --model-name your-model-name --message "Describe the image in detail."

About

Token-level Inference-Time Alignment for Vision-Language Models (ACL'2026)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors