|
| 1 | +# Multi-Model Performance Benchmark |
| 2 | + |
| 3 | +This directory contains a performance benchmark script for various image generation pipelines with different optimization configurations. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- Benchmark different models: |
| 8 | + - FLUX |
| 9 | + - Qwen-Image |
| 10 | +- Benchmark different optimization modes: |
| 11 | + - Basic mode (default) |
| 12 | + - FP8 linear optimization |
| 13 | + - Torch compile optimization |
| 14 | + - FP8 + compile combination |
| 15 | + - CPU offloading |
| 16 | +- Generate detailed CUDA timeline traces using torch.profiler |
| 17 | +- Compare performance across different configurations |
| 18 | +- Save sample images from each benchmark run |
| 19 | + |
| 20 | +## Usage |
| 21 | + |
| 22 | +```bash |
| 23 | +# Basic FLUX benchmark |
| 24 | +python model_perf_benchmark.py --model flux --mode basic |
| 25 | + |
| 26 | +# Qwen-Image with FP8 optimization |
| 27 | +python model_perf_benchmark.py --model qwen_image --mode fp8 |
| 28 | + |
| 29 | +# FLUX with Torch compile optimization |
| 30 | +python model_perf_benchmark.py --model flux --mode compile |
| 31 | + |
| 32 | +# Qwen-Image with all optimizations and profiling |
| 33 | +python model_perf_benchmark.py --model qwen_image --mode all --trace-file |
| 34 | + |
| 35 | +# FLUX profiling with auto-generated filename (includes config and GPU info) |
| 36 | +python model_perf_benchmark.py --model flux --mode fp8 --trace-file |
| 37 | + |
| 38 | +# Qwen-Image with custom prompt and profiling |
| 39 | +python model_perf_benchmark.py --model qwen_image --mode fp8 --prompt "a cyberpunk cityscape" --trace-file |
| 40 | + |
| 41 | +# Benchmark with specific loop count |
| 42 | +python model_perf_benchmark.py --model flux --mode compile --num-runs 10 |
| 43 | +``` |
| 44 | + |
| 45 | +For Qwen-Image models, you may need to specify additional paths: |
| 46 | +```bash |
| 47 | +# Qwen-Image with custom model paths |
| 48 | +python model_perf_benchmark.py --model qwen_image --model-path /path/to/model --encoder-path /path/to/encoder --vae-path /path/to/vae --mode basic |
| 49 | +``` |
| 50 | + |
| 51 | +## Output |
| 52 | + |
| 53 | +The script will generate: |
| 54 | +- Performance timing results |
| 55 | +- Sample images from each run |
| 56 | +- Chrome trace files for detailed profiling (if `--trace-file` is specified) |
| 57 | + |
| 58 | +You can view the trace files in https://ui.perfetto.dev/ |
0 commit comments