Proposal: Integrating a compression-based acceleration method into xDiT

Hi👋

We’ve been working on a small project called CompactFusion ([repo](https://github.com/Cobalt-27/CompactFusion)), built directly on top of xDiT.

Our motivation came from a simple observation: diffusion models show strong temporal redundancy. During parallel inference, we repeatedly transmit large amounts of near-duplicate activations between GPUs.

Our insight is that if we remove this redundant information, we can dramatically reduce communication volume without hurting generation quality. CompactFusion achieves this by transmitting only the compressed residuals (the actual changes between steps), together with a lightweight error-feedback mechanism.

This simple approach cuts communication by 8–16× and delivers significant speedups on PCIe or Ethernet setups, all while maintaining fidelity. In many settings, this compression-based approach yields better generation quality than previous overlap-based methods, since it avoids using stale activations during inference.

The work has just been accepted to NeurIPS 2025 ([paper](https://arxiv.org/abs/2507.17511)). If the team is open to it, we’d love to have this work integrated into xDiT as an optional module.

Thanks again for making xDiT such an outstanding open-source foundation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Integrating a compression-based acceleration method into xDiT #573

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Integrating a compression-based acceleration method into xDiT #573

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions