Skip to content

Commit 97366d6

Browse files
authored
Bump up to v1.1.0 (#441)
* Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0 * Bump up to v1.1.0
1 parent 3878a8e commit 97366d6

File tree

126 files changed

+953
-2187
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

126 files changed

+953
-2187
lines changed

.github/workflows/issue.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ jobs:
1515
days-before-issue-stale: 360
1616
days-before-issue-close: 360
1717
stale-issue-label: "stale"
18-
stale-issue-message: "This issue is stale because it has been open for 30 days with no activity."
19-
close-issue-message: "This issue was closed because it has been inactive for 7 days since being marked as stale."
18+
stale-issue-message: "This issue is stale because it has been open for 360 days with no activity."
19+
close-issue-message: "This issue was closed because it has been inactive for 360 days since being marked as stale."
2020
days-before-pr-stale: -1
2121
days-before-pr-close: -1
2222
repo-token: ${{ secrets.GITHUB_TOKEN }}

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,5 +181,4 @@ init_env.sh
181181
*.hdf5
182182

183183
uv.lock
184-
185184
CLAUDE.local.md

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,4 @@ repos:
2828
hooks:
2929
- id: black-jupyter
3030
args:
31-
- --line-length=80
31+
- --line-length=100

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,7 @@ Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved
194194
you may not use this file except in compliance with the License.
195195
You may obtain a copy of the License at
196196

197-
http://www.apache.org/licenses/LICENSE-2.0
197+
http://www.apache.org/licenses/LICENSE-2.0
198198

199199
Unless required by applicable law or agreed to in writing, software
200200
distributed under the License is distributed on an "AS IS" BASIS,

MANIFEST.in

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
include MANIFEST.in
22
include LICENSE
3-
include requirements.txt
43
prune */__pycache__
5-
global-exclude *.o *.so *.dylib *.a .git *.pyc *.swp *.mp4 *.png *.jpg assets docs examples tests .github tmp debug
4+
global-exclude *.o *.so *.dylib *.a .git *.pyc *.swp *.mp4 *.png *.jpg *.jpeg assets docs examples tests .github tmp debug

README.md

Lines changed: 56 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,30 +4,21 @@
44
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/cache-dit-logo.png height="90" align="left">
55
A Unified and Flexible Inference Engine with 🤗🎉<br>Hybrid Cache Acceleration and Parallelism for DiTs<br>
66
<a href="https://pepy.tech/projects/cache-dit"><img src=https://static.pepy.tech/personalized-badge/cache-dit?period=total&units=INTERNATIONAL_SYSTEM&left_color=GRAY&right_color=BLUE&left_text=downloads></a>
7-
<img src=https://img.shields.io/badge/Release-v1.0.*-blue.svg >
7+
<img src=https://img.shields.io/badge/Release-v1.1.*-blue.svg >
88
<a href="https://huggingface.co/docs/diffusers/main/en/optimization/cache_dit"><img src=https://img.shields.io/badge/🤗Diffusers-ecosystem-yellow.svg ></a>
99
<a href="https://hellogithub.com/repository/vipshop/cache-dit" target="_blank"><img src="https://api.hellogithub.com/v1/widgets/recommend.svg?rid=b8b03b3b32a449ea84cfc2b96cd384f3&claim_uid=ofSCbzTmdeQk3FD&theme=small" alt="Featured|HelloGitHub" /></a>
1010
<img src=https://img.shields.io/badge/Models-30+-orange.svg >
1111
</h2>
1212
</p>
13-
<img src=./assets/speedup_v4.png>
13+
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/speedup_v4.png>
1414
</div>
1515

16-
<!--
17-
<img src=https://img.shields.io/github/release/vipshop/cache-dit.svg >
18-
<img src=https://img.shields.io/github/license/vipshop/cache-dit.svg?color=blue >
19-
<a href="https://pepy.tech/projects/cache-dit"><img src=https://static.pepy.tech/personalized-badge/cache-dit?period=total&units=INTERNATIONAL_SYSTEM&left_color=GRAY&right_color=GREEN&left_text=downloads></a>
20-
<a href="https://pypi.org/project/cache-dit/"><img src=https://img.shields.io/pypi/dm/cache-dit.svg ></a>
21-
<img src=https://img.shields.io/github/stars/vipshop/cache-dit.svg?style=dark >
22-
-->
23-
2416
## 🔥Hightlight
2517

26-
We are excited to announce that the **first API-stable version (v1.0.0)** of cache-dit has finally been released!
27-
**[cache-dit](https://github.com/vipshop/cache-dit)** is a **Unified** and **Flexible** Inference Engine for 🤗Diffusers, enabling acceleration with just ♥️**one line**♥️ of code. Key features: **Unified Cache APIs**, **Forward Pattern Matching**, **Automatic Block Adapter**, **DBCache**, **DBPrune**, **Hybrid TaylorSeer Calibrator**, **Hybrid Cache CFG**, **Context Parallelism**, **Tensor Parallelism**, **Torch Compile Compatible** and **🎉SOTA** performance.
18+
We are excited to announce that the 🎉[**v1.1.0**](https://github.com/vipshop/cache-dit/releases/tag/v1.1.0) version of cache-dit has finally been released! It brings **[🔥Context Parallelism](./docs/User_Guide.md/#️hybrid-context-parallelism)** and **[🔥Tensor Parallelism](./docs/User_Guide.md#️hybrid-tensor-parallelism)** to cache-dit, **thus making** it a Unified and Flexible Inference Engine for 🤗DiTs. Key features: **Unified Cache APIs**, **Forward Pattern Matching**, **Block Adapter**, **DBCache**, **DBPrune**, **Cache CFG**, **TaylorSeer**, **Context Parallelism**, **Tensor Parallelism** and **🎉SOTA** performance.
2819

2920
```bash
30-
pip3 install -U cache-dit # pip3 install git+https://github.com/vipshop/cache-dit.git
21+
pip3 install -U cache-dit # Also, pip3 install git+https://github.com/huggingface/diffusers.git (latest)
3122
```
3223
You can install the stable release of cache-dit from PyPI, or the latest development version from GitHub. Then try ♥️ Cache Acceleration with just **one line** of code ~ ♥️
3324
```python
@@ -51,7 +42,54 @@ You can install the stable release of cache-dit from PyPI, or the latest develop
5142
- **[🎉Hybrid Cache Acceleration](./docs/User_Guide.md#taylorseer-calibrator)**: Now supports hybrid **Block-wise Cache + Calibrator** schemes (e.g., DBCache or DBPrune + TaylorSeerCalibrator). DBCache or DBPrune acts as the **Indicator** to decide *when* to cache, while the Calibrator decides *how* to cache. More mainstream cache acceleration algorithms (e.g., FoCa) will be supported in the future, along with additional benchmarks—stay tuned for updates!
5243
- **[🤗Diffusers Ecosystem Integration](https://huggingface.co/docs/diffusers/main/en/optimization/cache_dit)**: 🔥**cache-dit** has joined the Diffusers community ecosystem as the **first** DiT-specific cache acceleration framework! Check out the documentation here: <a href="https://huggingface.co/docs/diffusers/main/en/optimization/cache_dit"><img src=https://img.shields.io/badge/🤗Diffusers-ecosystem-yellow.svg ></a>
5344

54-
![](./assets/clip-score-bench-v2.png)
45+
![](https://github.com/vipshop/cache-dit/raw/main/assets/clip-score-bench-v2.png)
46+
47+
The comparison between **cache-dit** and other algorithms shows that within a speedup ratio (TFLOPs) less than 🎉**4x**, cache-dit achieved the **SOTA** performance. Please refer to [📚Benchmarks](https://github.com/vipshop/cache-dit/tree/main/bench/) for more details.
48+
49+
<div align="center">
50+
51+
| Method | TFLOPs(↓) | SpeedUp(↑) | ImageReward(↑) | Clip Score(↑) |
52+
| --- | --- | --- | --- | --- |
53+
| [**FLUX.1**-dev]: 50 steps | 3726.87 | 1.00× | 0.9898 | 32.404 |
54+
| [**FLUX.1**-dev]: 60% steps | 2231.70 | 1.67× | 0.9663 | 32.312 |
55+
| Δ-DiT(N=2) | 2480.01 | 1.50× | 0.9444 | 32.273 |
56+
| Δ-DiT(N=3) | 1686.76 | 2.21× | 0.8721 | 32.102 |
57+
| [**FLUX.1**-dev]: 34% steps | 1264.63 | 3.13× | 0.9453 | 32.114 |
58+
| Chipmunk | 1505.87 | 2.47× | 0.9936 | 32.776 |
59+
| FORA(N=3) | 1320.07 | 2.82× | 0.9776 | 32.266 |
60+
| **[DBCache(S)](https://github.com/vipshop/cache-dit)** | 1400.08 | **2.66×** | **1.0065** | 32.838 |
61+
| DuCa(N=5) | 978.76 | 3.80× | 0.9955 | 32.241 |
62+
| TaylorSeer(N=4,O=2) | 1042.27 | 3.57× | 0.9857 | 32.413 |
63+
| **[DBCache(S)+TS](https://github.com/vipshop/cache-dit)** | 1153.05 | **3.23×** | **1.0221** | 32.819 |
64+
| **[DBCache(M)](https://github.com/vipshop/cache-dit)** | 944.75 | **3.94×** | 0.9997 | 32.849 |
65+
| **[DBCache(M)+TS](https://github.com/vipshop/cache-dit)** | 944.75 | **3.94×** | **1.0107** | 32.865 |
66+
| **[FoCa(N=5): arxiv.2508.16211](https://arxiv.org/pdf/2508.16211)** | 893.54 | **4.16×** | 1.0029 | **32.948** |
67+
| [**FLUX.1**-dev]: 22% steps | 818.29 | 4.55× | 0.8183 | 31.772 |
68+
| FORA(N=7) | 670.14 | 5.55× | 0.7418 | 31.519 |
69+
| ToCa(N=12) | 644.70 | 5.77× | 0.7155 | 31.808 |
70+
| DuCa(N=10) | 606.91 | 6.13× | 0.8382 | 31.759 |
71+
| TeaCache(l=1.2) | 669.27 | 5.56× | 0.7394 | 31.704 |
72+
| TaylorSeer(N=7,O=2) | 670.44 | 5.54× | 0.9128 | 32.128 |
73+
| **[DBCache(F)](https://github.com/vipshop/cache-dit)** | 651.90 | **5.72x** | 0.9271 | 32.552 |
74+
| **[FoCa(N=8): arxiv.2508.16211](https://arxiv.org/pdf/2508.16211)** | 596.07 | 6.24× | 0.9502 | 32.706 |
75+
| **[DBCache(F)+TS](https://github.com/vipshop/cache-dit)** | 651.90 | **5.72x** | **0.9526** | 32.568 |
76+
| **[DBCache(U)+TS](https://github.com/vipshop/cache-dit)** | 505.47 | **7.37x** | 0.8645 | **32.719** |
77+
78+
</div>
79+
80+
🎉Surprisingly, **cache-dit** still works in the **extremely few-step** distill model, such as **Qwen-Image-Lightning**, with the F16B16 config, the PSNR is 34.8 and the ImageReward is 1.26. It maintained a relatively high precision.
81+
<div align="center">
82+
83+
| Config | PSNR(↑) | Clip Score(↑) | ImageReward(↑) | TFLOPs(↓) | SpeedUp(↑) |
84+
|----------------------------|-----------|------------|--------------|----------|------------|
85+
| [**Lightning**]: 4 steps | INF | 35.5797 | 1.2630 | 274.33 | 1.00x |
86+
| F24B24_W2MC1_R0.8 | 36.3242 | 35.6224 | 1.2630 | 264.74 | 1.04x |
87+
| F16B16_W2MC1_R0.8 | 34.8163 | 35.6109 | 1.2614 | 244.25 | 1.12x |
88+
| F12B12_W2MC1_R0.8 | 33.8953 | 35.6535 | 1.2549 | 234.63 | 1.17x |
89+
| F8B8_W2MC1_R0.8 | 33.1374 | 35.7284 | 1.2517 | 224.29 | 1.22x |
90+
| F1B0_W2MC1_R0.8 | 31.8317 | 35.6651 | 1.2397 | 206.90 | 1.33x |
91+
92+
</div>
5593

5694
## 🔥Supported DiTs
5795

@@ -217,14 +255,15 @@ For more advanced features such as **Unified Cache APIs**, **Forward Pattern Mat
217255
- [⚡️Hybrid Context Parallelism](./docs/User_Guide.md#context-parallelism)
218256
- [⚡️Hybrid Tensor Parallelism](./docs/User_Guide.md#tensor-parallelism)
219257
- [🤖Low-bits Quantization](./docs/User_Guide.md#quantization)
258+
- [🤖How to use FP8 Attention](./docs/User_Guide.md#fp8-attention)
220259
- [🛠Metrics Command Line](./docs/User_Guide.md#metrics-cli)
221260
- [⚙️Torch Compile](./docs/User_Guide.md#️torch-compile)
222261
- [📚API Documents](./docs/User_Guide.md#api-documentation)
223262

224263
## 👋Contribute
225264
<div id="contribute"></div>
226265

227-
How to contribute? Star ⭐️ this repo to support us or check [CONTRIBUTE.md](https://github.com/vipshop/cache-dit/raw/main/CONTRIBUTE.md).
266+
How to contribute? Star ⭐️ this repo to support us or check [CONTRIBUTE.md](https://github.com/vipshop/cache-dit/raw/main/docs/CONTRIBUTE.md).
228267

229268
<div align='center'>
230269
<a href="https://star-history.com/#vipshop/cache-dit&Date">
@@ -243,15 +282,15 @@ Here is a curated list of open-source projects integrating **CacheDiT**, includi
243282

244283
## ©️Acknowledgements
245284

246-
Special thanks to vipshop's Computer Vision AI Team for supporting document, testing and production-level deployment of this project.
285+
Special thanks to vipshop's Computer Vision AI Team for supporting document, testing and production-level deployment of this project. We learned the design and reused code from the following projects: [🤗diffusers](https://huggingface.co/docs/diffusers), [ParaAttention](https://github.com/chengzeyi/ParaAttention), [xDiT](https://github.com/xdit-project/xDiT) and [TaylorSeer](https://github.com/Shenyi-Z/TaylorSeer).
247286

248287
## ©️Citations
249288

250289
<div id="citations"></div>
251290

252291
```BibTeX
253292
@misc{cache-dit@2025,
254-
title={cache-dit: A Unified and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for Diffusers.},
293+
title={cache-dit: A Unified and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.},
255294
url={https://github.com/vipshop/cache-dit.git},
256295
note={Open-source software available at https://github.com/vipshop/cache-dit.git},
257296
author={DefTruth, vipshop.com},

assets/speedup.png

-2.88 MB
Binary file not shown.

assets/speedup_v2.png

-3.53 MB
Binary file not shown.

assets/speedup_v3.png

-8.6 MB
Binary file not shown.

bench/bench.py

Lines changed: 8 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -113,9 +113,7 @@ def init_flux_pipe(args: argparse.Namespace) -> FluxPipeline:
113113
cache_dit.set_compile_configs()
114114
else:
115115
torch._dynamo.config.recompile_limit = 96 # default is 8
116-
torch._dynamo.config.accumulated_recompile_limit = (
117-
2048 # default is 256
118-
)
116+
torch._dynamo.config.accumulated_recompile_limit = 2048 # default is 256
119117
if not args.compile_all:
120118
logger.warning(
121119
"Only compile transformer blocks not the whole model "
@@ -134,9 +132,7 @@ def init_flux_pipe(args: argparse.Namespace) -> FluxPipeline:
134132
return pipe
135133

136134

137-
def gen_flux_image(
138-
args: argparse.Namespace, pipe: FluxPipeline, prompt: str = None
139-
) -> Image.Image:
135+
def gen_flux_image(args: argparse.Namespace, pipe: FluxPipeline, prompt: str = None) -> Image.Image:
140136
assert prompt is not None
141137
image = pipe(
142138
prompt,
@@ -163,30 +159,20 @@ def get_args() -> argparse.ArgumentParser:
163159
parser.add_argument("--max-warmup-steps", "--w", type=int, default=8)
164160
parser.add_argument("--warmup-interval", type=int, default=1)
165161
parser.add_argument("--max-cached-steps", "--mc", type=int, default=-1)
166-
parser.add_argument(
167-
"--max-continuous-cached-steps", "--mcc", type=int, default=-1
168-
)
169-
parser.add_argument(
170-
"--disable-block-adapter", action="store_true", default=False
171-
)
162+
parser.add_argument("--max-continuous-cached-steps", "--mcc", type=int, default=-1)
163+
parser.add_argument("--disable-block-adapter", action="store_true", default=False)
172164
# Compile & FP8
173165
parser.add_argument("--compile", action="store_true", default=False)
174166
parser.add_argument("--inductor-flags", action="store_true", default=False)
175167
parser.add_argument("--compile-all", action="store_true", default=False)
176168
parser.add_argument("--quantize", "--q", action="store_true", default=False)
177169
# Test data
178-
parser.add_argument(
179-
"--save-dir", type=str, default="./tmp/DrawBench200_Default"
180-
)
181-
parser.add_argument(
182-
"--prompt-file", type=str, default="./prompts/DrawBench200.txt"
183-
)
170+
parser.add_argument("--save-dir", type=str, default="./tmp/DrawBench200_Default")
171+
parser.add_argument("--prompt-file", type=str, default="./prompts/DrawBench200.txt")
184172
parser.add_argument("--width", type=int, default=1024, help="Image width")
185173
parser.add_argument("--height", type=int, default=1024, help="Image height")
186174
parser.add_argument("--test-num", type=int, default=None)
187-
parser.add_argument(
188-
"--cal-flops", "--flops", action="store_true", default=False
189-
)
175+
parser.add_argument("--cal-flops", "--flops", action="store_true", default=False)
190176
return parser.parse_args()
191177

192178

@@ -208,9 +194,7 @@ def main():
208194
logger.info(f"Loaded {len(prompts)} prompts from: {args.prompt_file}")
209195

210196
all_times = []
211-
perf_tag = (
212-
f"C{int(args.compile)}_Q{int(args.quantize)}_{cache_dit.strify(pipe)}"
213-
)
197+
perf_tag = f"C{int(args.compile)}_Q{int(args.quantize)}_{cache_dit.strify(pipe)}"
214198
save_dir = os.path.join(args.save_dir, perf_tag)
215199
os.makedirs(save_dir, exist_ok=True)
216200

0 commit comments

Comments
 (0)