Skip to content

Proposal: Integrating a compression-based acceleration method into xDiT #573

@Cobalt-27

Description

@Cobalt-27

Hi👋

We’ve been working on a small project called CompactFusion (repo), built directly on top of xDiT.

Our motivation came from a simple observation: diffusion models show strong temporal redundancy. During parallel inference, we repeatedly transmit large amounts of near-duplicate activations between GPUs.

Our insight is that if we remove this redundant information, we can dramatically reduce communication volume without hurting generation quality. CompactFusion achieves this by transmitting only the compressed residuals (the actual changes between steps), together with a lightweight error-feedback mechanism.

This simple approach cuts communication by 8–16× and delivers significant speedups on PCIe or Ethernet setups, all while maintaining fidelity. In many settings, this compression-based approach yields better generation quality than previous overlap-based methods, since it avoids using stale activations during inference.

The work has just been accepted to NeurIPS 2025 (paper). If the team is open to it, we’d love to have this work integrated into xDiT as an optional module.

Thanks again for making xDiT such an outstanding open-source foundation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions