CodeBind: Decoupled Representation Learning for Multimodal Alignment with Unified Compositional Codebook

Visual AI Lab, The University of Hong Kong

^{Target modalities are partially aligned with bridging modalities via codebooks, resulting in a shared space. Unique features from both bridging and target modalities are preserved in specific space. Compositional VQ utilizes a combination of multiple low-dimensional codevectors to reconstruct a complete embedding.}

📣 Updates

[April 16, 2026] Initial Release

✨ Overview

Multimodal representation alignment is crucial for large language models and robotics. Traditional methods often struggle with cross-modal information discrepancies and data scarcity, resulting in suboptimal alignment spaces that neglect modality-unique features.

We introduce CodeBind, a novel framework that optimizes multimodal representation spaces using a modality-shared-specific codebook design.

Unlike conventional hard alignment approaches, CodeBind decomposes features into:

Shared Components: Ensuring semantic consistency across modalities.
Specific Components: Preserving modality-unique details.

This approach employs a compositional vector quantization scheme, where a shared codebook bridges modality gaps, and modality-specific codebooks mitigate representation bias by preventing dominant modalities from overshadowing others. Validated across nine modalities (text, image, video, audio, depth, thermal, tactile, 3D point cloud, EEG), CodeBind achieves state-of-the-art performance in multimodal classification and retrieval tasks.

📝 TODOs

Release the training code
Release CodeBind-IB checkpoints
Release applications code

🔨 Installation

First, clone the repository and install the required packages.

git clone https://github.com/Visual-AI/codebind.git
cd codebind
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

📚 Quick Start

You can use CodeBind to extract and compare features across modalities. An example snippet is provided below:

# TBD

📦 Datasets

Please refer to Doc/DATASETS.md for dataset preparation.

🧩 Model Zoo

Please refer to Doc/MODEL_ZOO.md for details on available CodeBind checkpoints.

🚀 Training & Inference

Please refer to Doc/TRAINING.md for details on CodeBind training scripts for different modalities.

🙏 Acknowledgements

This repository builds upon the invaluable contributions of the open-source community. We extend our sincere appreciation to the following projects for their foundational work:

📜 Citation

If you find this repository useful, please consider giving a star ⭐ and citation:

@article{chen2026codebind,
    title={CodeBind: Decoupled Representation Learning for Multimodal Alignment
    with Unified Compositional Codebook},
    author={Zeyu Chen and Jie Li and Kai, Han},
    journal={arXiv preprint arXiv:},
    year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Doc		Doc
assets		assets
config		config
datasets		datasets
models		models
scripts		scripts
.gitignore		.gitignore
README.md		README.md
example.py		example.py
main.py		main.py
main_lossbalance.py		main_lossbalance.py
requirements.txt		requirements.txt
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeBind: Decoupled Representation Learning for Multimodal Alignment with Unified Compositional Codebook

📣 Updates

✨ Overview

📝 TODOs

🔨 Installation

📚 Quick Start

📦 Datasets

🧩 Model Zoo

🚀 Training & Inference

🙏 Acknowledgements

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeBind: Decoupled Representation Learning for Multimodal Alignment with Unified Compositional Codebook

📣 Updates

✨ Overview

📝 TODOs

🔨 Installation

📚 Quick Start

📦 Datasets

🧩 Model Zoo

🚀 Training & Inference

🙏 Acknowledgements

📜 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages