-
Notifications
You must be signed in to change notification settings - Fork 294
Description
Hi👋
We’ve been working on a small project called CompactFusion (repo), built directly on top of xDiT.
Our motivation came from a simple observation: diffusion models show strong temporal redundancy. During parallel inference, we repeatedly transmit large amounts of near-duplicate activations between GPUs.
Our insight is that if we remove this redundant information, we can dramatically reduce communication volume without hurting generation quality. CompactFusion achieves this by transmitting only the compressed residuals (the actual changes between steps), together with a lightweight error-feedback mechanism.
This simple approach cuts communication by 8–16× and delivers significant speedups on PCIe or Ethernet setups, all while maintaining fidelity. In many settings, this compression-based approach yields better generation quality than previous overlap-based methods, since it avoids using stale activations during inference.
The work has just been accepted to NeurIPS 2025 (paper). If the team is open to it, we’d love to have this work integrated into xDiT as an optional module.
Thanks again for making xDiT such an outstanding open-source foundation.