Releases · vipshop/cache-dit

We are excited to announce that the 🎉v1.1.0 version of cache-dit has finally been released! It brings 🔥Context Parallelism and 🔥Tensor Parallelism to cache-dit, thus making it a Unified and Flexible Inference Engine for 🤗DiTs. Key features: Unified Cache APIs, Forward Pattern Matching, Block Adapter, DBCache, DBPrune, Cache CFG, TaylorSeer, Context Parallelism, Tensor Parallelism and 🎉SOTA performance.

⚙️Installation

You can install the stable release of cache-dit from PyPI:

pip3 install -U cache-dit # or, pip3 install -U "cache-dit[all]" for all features

Or you can install the latest develop version from GitHub:

pip3 install git+https://github.com/vipshop/cache-dit.git

Please also install the latest main branch of diffusers for context parallelism:

pip3 install git+https://github.com/huggingface/diffusers.git

🔥Supported DiTs

Tip

One Model Series may contain many pipelines. cache-dit applies optimizations at the Transformer level; thus, any pipelines that include the supported transformer are already supported by cache-dit. ✅: known work and official supported now; ✖️: unofficial supported now, but maybe support in the future; Q: 4-bits models w/ nunchaku + SVDQ W4A4.

📚Model	Cache	CP	TP	📚Model	Cache	CP	TP
🎉FLUX.1	✅	✅	✅	🎉FLUX.1 `Q`	✅	✅	✖️
🎉FLUX.1-Fill	✅	✅	✅	🎉FLUX.1-Fill `Q`	✅	✅	✖️
🎉Qwen-Image	✅	✅	✅	🎉Qwen-Image `Q`	✅	✅	✖️
🎉Qwen...Edit	✅	✅	✅	🎉Qwen...Edit `Q`	✅	✅	✖️
🎉Qwen...Lightning	✅	✅	✅	🎉Qwen...Light `Q`	✅	✅	✖️
🎉Qwen...Control..	✅	✅	✅	🎉Qwen...E...Light `Q`	✅	✅	✖️
🎉Wan 2.1 I2V/T2V	✅	✅	✅	🎉Mochi	✅	✖️	✅
🎉Wan 2.1 VACE	✅	✅	✅	🎉HiDream	✅	✖️	✖️
🎉Wan 2.2 I2V/T2V	✅	✅	✅	🎉HunyunDiT	✅	✖️	✅
🎉HunyuanVideo	✅	✅	✅	🎉Sana	✅	✖️	✖️
🎉ChronoEdit	✅	✅	✅	🎉Bria	✅	✖️	✖️
🎉CogVideoX	✅	✅	✅	🎉SkyReelsV2	✅	✖️	✖️
🎉CogVideoX 1.5	✅	✅	✅	🎉Lumina 1/2	✅	✖️	✖️
🎉CogView4	✅	✅	✅	🎉DiT-XL	✅	✅	✖️
🎉CogView3Plus	✅	✅	✅	🎉Allegro	✅	✖️	✖️
🎉PixArt Sigma	✅	✅	✅	🎉Cosmos	✅	✖️	✖️
🎉PixArt Alpha	✅	✅	✅	🎉OmniGen	✅	✖️	✖️
🎉Chroma-HD	✅	✅	️✅	🎉EasyAnimate	✅	✖️	✖️
🎉VisualCloze	✅	✅	✅	🎉StableDiffusion3	✅	✖️	✖️
🎉HunyuanImage	✅	✅	✅	🎉PRX T2I	✅	✖️	✖️
🎉Kandinsky5	✅	✅️	✅️	🎉Amused	✅	✖️	✖️
🎉LTXVideo	✅	✅	✅	🎉AuraFlow	✅	✖️	✖️
🎉ConsisID	✅	✅	✅	🎉LongCatVideo	✅	✖️	✖️

⚡️Hybrid Context Parallelism

cache-dit is compatible with context parallelism. Currently, we support the use of Hybrid Cache + Context Parallelism scheme (via NATIVE_DIFFUSER parallelism backend) in cache-dit. Users can use Context Parallelism to further accelerate the speed of inference! For more details, please refer to 📚examples/parallelism. Currently, cache-dit supported context parallelism for FLUX.1, Qwen-Image, Qwen-Image-Lightning, LTXVideo, Wan 2.1, Wan 2.2, HunyuanImage-2.1, HunyuanVideo, CogVideoX 1.0, CogVideoX 1.5, CogView 3/4 and VisualCloze, etc. cache-dit will support more models in the future.

# pip3 install "cache-dit[parallelism]"
from cache_dit import ParallelismConfig

cache_dit.enable_cache(
    pipe_or_adapter, 
    cache_config=DBCacheConfig(...),
    # Set ulysses_size > 1 to enable ulysses style context parallelism.
    parallelism_config=ParallelismConfig(ulysses_size=2),
)
# torchrun --nproc_per_node=2 parallel_cache.py

⚡️Hybrid Tensor Parallelism

cache-dit is also compatible with tensor parallelism. Currently, we support the use of Hybrid Cache + Tensor Parallelism scheme (via NATIVE_PYTORCH parallelism backend) in cache-dit. Users can use Tensor Parallelism to further accelerate the speed of inference and reduce the VRAM usage per GPU! For more details, please refer to 📚examples/parallelism. Now, cache-dit supported tensor parallelism for FLUX.1, Qwen-Image, Qwen-Image-Lightning, Wan2.1, Wan2.2, HunyuanImage-2.1, HunyuanVideo and VisualCloze, etc. cache-dit will support more models in the future.

# pip3 install "cache-dit[parallelism]"
from cache_dit import ParallelismConfig

cache_dit.enable_cache(
    pipe_or_adapter, 
    cache_config=DBCacheConfig(...),
    # Set tp_size > 1 to enable tensor parallelism.
    parallelism_config=ParallelismConfig(tp_size=2),
)
# torchrun --nproc_per_node=2 parallel_cache.py

Important

Please note that in the short term, we have no plans to support Hybrid Parallelism. Please choose to use either Context Parallelism or Tensor Parallelism based on your actual scenario.

Assets 2

17 Nov 04:46

DefTruth

v1.0.16

7e37b4c

v1.0.16

What's Changed

feat: support cogview3/4 cogvideox Tensor Parallelism by @gameofdimension in #419
chore: remove un-needed pytest.ini by @DefTruth in #421
feat: support pixart models Tensor Parallelism by @gameofdimension in #422
feat: support chrono-edit context parallel by @DefTruth in #424
chore: Update README.md by @DefTruth in #425
feat: support Kandinsky5 context parallel by @DefTruth in #426
feat: support LTX-Video Tensor Parallelism by @gameofdimension in #428
chore: Update README.md by @DefTruth in #430
feat: support ConsisID-preview Tensor Parallelism by @gameofdimension in #431
bugfix: fix chrono-edit context parallel by @DefTruth in #432
bugfix: fix chrono-edit context parallel by @DefTruth in #433
chore: add speedup image by @DefTruth in #434
chore: update speedup image by @DefTruth in #435
chore: update speedup image by @DefTruth in #436
chore: update clip-score bench by @DefTruth in #437

Full Changelog: v1.0.15...v1.0.16

Contributors

DefTruth and gameofdimension

Assets 2

13 Nov 03:27

DefTruth

v1.0.15

7df4c89

v1.0.15

What's Changed

feat: support cache & tp for wan vace by @DefTruth in #406
feat: support mochi-1-preview Tensor Parallelism by @gameofdimension in #408
chore: Update README.md by @DefTruth in #409
feat: support HunyuanDiT Tensor Parallelism by @gameofdimension in #411
bugfix: fix summary stats from dict by @DefTruth in #412
bugfix: fix strify error while no-cache by @DefTruth in #414
feat: support wan vace context parallel by @DefTruth in #415
chore: Update README.md by @DefTruth in #416
feat: support Wan2.1-VACE Tensor Parallelism by @gameofdimension in #417
misc: use dummy blocks for flux by default by @DefTruth in #418

Full Changelog: v1.0.14...v1.0.15

Contributors

DefTruth and gameofdimension

Assets 2

Releases: vipshop/cache-dit

v1.1.7

What's Changed

Contributors

Uh oh!

v1.1.6

What's Changed

Contributors

Uh oh!

v1.1.5 🔥HunyuanVideo-1.5/Ovis-Image

What's Changed

New Contributors

Contributors

Uh oh!

v1.1.4 🔥FLUX.2/Z-Image

What's Changed

Contributors

Uh oh!

v1.1.3 🔥FLUX.2

What's Changed

Contributors

Uh oh!

v1.1.2 UAA & SkyReelsV2 TP/CP

What's Changed

New Contributors

Contributors

Uh oh!

v1.1.1

What's Changed

Contributors

Uh oh!

v1.1.0 🎉Context/Tensor Parallelism

🔥Hightlight

⚙️Installation

🔥Supported DiTs

⚡️Hybrid Context Parallelism

⚡️Hybrid Tensor Parallelism

Uh oh!

v1.0.16

What's Changed

Contributors

Uh oh!

v1.0.15

What's Changed

Contributors

Uh oh!