Skip to content

Releases: vipshop/cache-dit

v1.1.7

06 Dec 07:59
7a81c1e

Choose a tag to compare

hotfix for diffusers 0.35.2 compatible

What's Changed

Full Changelog: v1.1.6...v1.1.7

v1.1.6

05 Dec 10:47
6d9f074

Choose a tag to compare

hotfix for diffusers 0.35.2 compatible

What's Changed

  • Add heartbeat to avoid nccl timeout when the service hangs. by @BBuf in #531
  • chore: remove un-needed cp imports by @DefTruth in #532

Full Changelog: v1.1.5...v1.1.6

v1.1.5 🔥HunyuanVideo-1.5/Ovis-Image

05 Dec 08:18
6c21efc

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.1.4...v1.1.5

v1.1.4 🔥FLUX.2/Z-Image

28 Nov 11:46
73fbd44

Choose a tag to compare

What's Changed

Full Changelog: v1.1.3...v1.1.4

v1.1.3 🔥FLUX.2

28 Nov 02:15
b9fd68e

Choose a tag to compare

What's Changed

Full Changelog: v1.1.2...v1.1.3

v1.1.2 UAA & SkyReelsV2 TP/CP

24 Nov 09:55
9936b5c

Choose a tag to compare

What's Changed

  • chore: Update README.md by @DefTruth in #455
  • fix load options drop kwargs by @DefTruth in #456
  • chore: add maybe pad prompt utils by @DefTruth in #458
  • fix: move .to(device) to reduce tp mem by @BBuf in #459
  • example: support more overrided args and memory tracker by @BBuf in #461
  • Add missing model-path args in example by @BBuf in #463
  • UAA: ulysses anything attn w/ zero overhead by @DefTruth in #462
  • fix qwen-image multi-gpu mismatch by @BBuf in #464
  • Fix more models multi gpu mismatch by @BBuf in #466
  • feat: support unshard anything for UAA by @DefTruth in #465
  • chore: update qwen-image example for UAA by @DefTruth in #468
  • chore: Update README.md by @DefTruth in #470
  • chore: Update README.md by @DefTruth in #471
  • support skyreels cp and tp ulysses by @BBuf in #469
  • always use vae tiling if vram <= 48 GiB for qwen-image by @DefTruth in #472
  • chore: Add SkyReelsV2 tp/cp to support-matrix by @BBuf in #473
  • fix: correct string literal syntax errors in examples by @BBuf in #475
  • feat: allow UAA in compiled graph by @DefTruth in #474

New Contributors

  • @BBuf made their first contribution in #459

Full Changelog: v1.1.1...v1.1.2

v1.1.1

19 Nov 10:50
446edfd

Choose a tag to compare

What's Changed

Full Changelog: v1.1.0...v1.1.1

v1.1.0 🎉Context/Tensor Parallelism

18 Nov 03:47
97366d6

Choose a tag to compare

🔥Hightlight

We are excited to announce that the 🎉v1.1.0 version of cache-dit has finally been released! It brings 🔥Context Parallelism and 🔥Tensor Parallelism to cache-dit, thus making it a Unified and Flexible Inference Engine for 🤗DiTs. Key features: Unified Cache APIs, Forward Pattern Matching, Block Adapter, DBCache, DBPrune, Cache CFG, TaylorSeer, Context Parallelism, Tensor Parallelism and 🎉SOTA performance.

⚙️Installation

You can install the stable release of cache-dit from PyPI:

pip3 install -U cache-dit # or, pip3 install -U "cache-dit[all]" for all features

Or you can install the latest develop version from GitHub:

pip3 install git+https://github.com/vipshop/cache-dit.git

Please also install the latest main branch of diffusers for context parallelism:

pip3 install git+https://github.com/huggingface/diffusers.git

🔥Supported DiTs

Tip

One Model Series may contain many pipelines. cache-dit applies optimizations at the Transformer level; thus, any pipelines that include the supported transformer are already supported by cache-dit. ✅: known work and official supported now; ✖️: unofficial supported now, but maybe support in the future; Q: 4-bits models w/ nunchaku + SVDQ W4A4.

📚Model Cache CP TP 📚Model Cache CP TP
🎉FLUX.1 🎉FLUX.1 Q ✖️
🎉FLUX.1-Fill 🎉FLUX.1-Fill Q ✖️
🎉Qwen-Image 🎉Qwen-Image Q ✖️
🎉Qwen...Edit 🎉Qwen...Edit Q ✖️
🎉Qwen...Lightning 🎉Qwen...Light Q ✖️
🎉Qwen...Control.. 🎉Qwen...E...Light Q ✖️
🎉Wan 2.1 I2V/T2V 🎉Mochi ✖️
🎉Wan 2.1 VACE 🎉HiDream ✖️ ✖️
🎉Wan 2.2 I2V/T2V 🎉HunyunDiT ✖️
🎉HunyuanVideo 🎉Sana ✖️ ✖️
🎉ChronoEdit 🎉Bria ✖️ ✖️
🎉CogVideoX 🎉SkyReelsV2 ✖️ ✖️
🎉CogVideoX 1.5 🎉Lumina 1/2 ✖️ ✖️
🎉CogView4 🎉DiT-XL ✖️
🎉CogView3Plus 🎉Allegro ✖️ ✖️
🎉PixArt Sigma 🎉Cosmos ✖️ ✖️
🎉PixArt Alpha 🎉OmniGen ✖️ ✖️
🎉Chroma-HD ️✅ 🎉EasyAnimate ✖️ ✖️
🎉VisualCloze 🎉StableDiffusion3 ✖️ ✖️
🎉HunyuanImage 🎉PRX T2I ✖️ ✖️
🎉Kandinsky5 ✅️ ✅️ 🎉Amused ✖️ ✖️
🎉LTXVideo 🎉AuraFlow ✖️ ✖️
🎉ConsisID 🎉LongCatVideo ✖️ ✖️

⚡️Hybrid Context Parallelism

cache-dit is compatible with context parallelism. Currently, we support the use of Hybrid Cache + Context Parallelism scheme (via NATIVE_DIFFUSER parallelism backend) in cache-dit. Users can use Context Parallelism to further accelerate the speed of inference! For more details, please refer to 📚examples/parallelism. Currently, cache-dit supported context parallelism for FLUX.1, Qwen-Image, Qwen-Image-Lightning, LTXVideo, Wan 2.1, Wan 2.2, HunyuanImage-2.1, HunyuanVideo, CogVideoX 1.0, CogVideoX 1.5, CogView 3/4 and VisualCloze, etc. cache-dit will support more models in the future.

# pip3 install "cache-dit[parallelism]"
from cache_dit import ParallelismConfig

cache_dit.enable_cache(
    pipe_or_adapter, 
    cache_config=DBCacheConfig(...),
    # Set ulysses_size > 1 to enable ulysses style context parallelism.
    parallelism_config=ParallelismConfig(ulysses_size=2),
)
# torchrun --nproc_per_node=2 parallel_cache.py

⚡️Hybrid Tensor Parallelism

cache-dit is also compatible with tensor parallelism. Currently, we support the use of Hybrid Cache + Tensor Parallelism scheme (via NATIVE_PYTORCH parallelism backend) in cache-dit. Users can use Tensor Parallelism to further accelerate the speed of inference and reduce the VRAM usage per GPU! For more details, please refer to 📚examples/parallelism. Now, cache-dit supported tensor parallelism for FLUX.1, Qwen-Image, Qwen-Image-Lightning, Wan2.1, Wan2.2, HunyuanImage-2.1, HunyuanVideo and VisualCloze, etc. cache-dit will support more models in the future.

# pip3 install "cache-dit[parallelism]"
from cache_dit import ParallelismConfig

cache_dit.enable_cache(
    pipe_or_adapter, 
    cache_config=DBCacheConfig(...),
    # Set tp_size > 1 to enable tensor parallelism.
    parallelism_config=ParallelismConfig(tp_size=2),
)
# torchrun --nproc_per_node=2 parallel_cache.py

Important

Please note that in the short term, we have no plans to support Hybrid Parallelism. Please choose to use either Context Parallelism or Tensor Parallelism based on your actual scenario.

v1.0.16

17 Nov 04:46
7e37b4c

Choose a tag to compare

What's Changed

Full Changelog: v1.0.15...v1.0.16

v1.0.15

13 Nov 03:27
7df4c89

Choose a tag to compare

What's Changed

Full Changelog: v1.0.14...v1.0.15