Skip to content

CSU-JPG/VIST

Repository files navigation

Vision-centric Token Compression in Large Language Model

This repository contains the code for our NeurIPS 2025 spotlight paper Vision-centric Token Compression in Large Language Model. In this work, we propose VIST — Vision Centric Token Compression, a slow-fast token compression framework that mirrors human skimming.


👁 About VIST

VIST first converts loosely relevant long context into images, which are processed by a frozen vision encoder and a trainable Resampler to produce semantically compact visual tokens. These compressed tokens and the main input tokens are then consumed by the LLM. In this slow-fast setup, the vision encoder acts like the human eye—selectively attending to salient information—while the LLM functions as the brain, concentrating on the most informative content for deeper reasoning.

VIST


Quick Links

Setup

You can install the requirements with:

 pip install --r requirements.txt

Data

For dataset download and preprocessing, please follow the guidelines described in the CEPE project. Our data structure and preparation steps are consistent with that repository.

Training

During training, VIST activates the Resample and enables trainable cross-attention layers within the LLM. You can simply start pretraining with:

bash pretrain.sh

Evaluation

To evaluate VIST, you can run

#ICL tasks
bash scripts run_icl_ddp.sh

#Open-domain QA
bash scripts/run_qa.sh

Citation

Please cite our paper if you use VIST in your work:

@article{xing2025vision,
  title={Vision-centric Token Compression in Large Language Model},
  author={Xing, Ling and Wang, Alex Jinpeng and Yan, Rui and Shu, Xiangbo and Tang, Jinhui},
  journal={arXiv preprint arXiv:2502.00791},
  year={2025}
}

Acknowledgement

This project builds upon and is inspired by the following open-source works:

We sincerely thank the authors for their excellent contributions to the community!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published