This repository contains a set of experimental scripts that could damage your training data. Keep backups!
This project's code is simple in its implementations. If anything is overly complicated, it is likely a bug.
This code is a shared academic exercise. Please feel free to contribute improvements, or open issue reports.
The popular trainers available have complicated code that seems to intentionally make things as difficult to understand.
Alternatively, I'm simply just one who needs things written a bit simpler (and in English)!
The functionality of this script is shared between SD 2.1 and SDXL as much as possible, with room for improvement;
- Aspect bucketing is shared
- Latent caching is currently only done for SDXL
- Prompt embed caching is also only done for SDXL
- Multi-GPU support has been enhanced and fixed
With this script, at 1024x1024 batch size 10, we can nearly saturate a single 80G A100!
At 1024x1024 batch size 4, we can use a 48G A6000 GPU, which reduces the cost of multi-GPU training!
Stable Diffusion 2.1 is notoriously difficult to fine-tune. Many of the default scripts are not making the smartest choices, and result in poor-quality outputs.
Some of the problems I've encountered in other tools:
-
Training OpenCLIP concurrently to the U-net. They must be trained in sequence, with the text encoder being tuned first.
-
Not using enforced zero SNR on the terminal timestep, using offset noise instead. This results in a more noisy image.
-
Training on only square, 768x768 images, that will result in the model losing the ability to (or at the very least, simply not improving) super-resolution its output into other aspect ratios.
-
Overfitting the unet on textures, results in "burning". So far, I've not worked around this much other than mix-matching text encoder and unet checkpoints.
Additionally, if something does not provide value to the training process by default, it is simply not included.
-
training.sh- some variables are here, but if they are, they're not meant to be tuned. -
sdxl-env.sh.example- These are the SDXL training parameters, you should copy tosdxl-env.sh -
sd21-env.sh.example- These are the training parameters, copy toenv.sh -
interrogate.py- This is useful for labelling datasets using BLIP. Not very accurate, but good enough for a LARGE dataset that's being used for fine-tuning. -
analyze_laion_data.py- After downloading a lot of LAION's data, you can use this to throw a lot of it away. -
analyze_aspect_ratios_json.py- Use the output fromanalyze_laion_data.pyto nuke images that do not fit our aspect goals. -
helpers/broken_images.py- Scan and remove any images that will not load properly.
Another note here: You might want to make sure it knows your most important concepts. If it doesn't, you can try to fine-tune BLIP using a subset of your data with manually created captions. This generally has a lot of success.
-
inference.py- Generate validation results from the prompts catalogue (prompts.py) using DDIMScheduler. -
inference_ddpm.py- Use DDPMScheduler to assemble a checkpoint from a base model configuration and run through validation prompts. -
inference_karras.py- Use the Karras sigmas with DPM 2M Karras. Useful for testing what might happen in Automatic1111. -
tile_shortnames.py- Tile the outputs from the above scripts into strips. -
inference_snr_test.py- Generate a large number of CFG range images, and catalogue the results for tiling. -
tile_images.py- Generate large image tiles to compare CFG results for zero SNR training / inference tuning.
- Clone the repository and install the dependencies:
git clone https://github.com/bghira/SimpleTuner --branch release
python -m venv .venv
pip3 install -U poetry pip
poetry installYou will need to install some Linux-specific dependencies (Ubuntu is used here):
apt -y install nvidia-cuda-dev nvidia-cuda-toolkitIf you get an error about missing cudNN library, you will want to install torch manually (replace 118 with your CUDA version if not using 11.8):
pip3 install xformers torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118 --forceAlternatively, Pytorch Nightly may be used (Torch 2.1) with Xformers 0.0.21dev (note that this includes torchtriton now):
pip3 install --pre torch torchvision torchaudio torchtriton --extra-index-url https://download.pytorch.org/whl/nightly/cu118 --force
pip3 install --pre https://github.com/facebookresearch/xformers.git@main\#egg=xformersIf the egg install for Xformers does not work, try including xformers on the first line, and run only that:
pip3 install --pre xformers torch torchvision torchaudio torchtriton --extra-index-url https://download.pytorch.org/whl/nightly/cu118 --force- For SD2.1, copy
sd21-env.sh.exampletoenv.sh- be sure to fill out the details. Try to change as little as possible.
For SDXL, copy sdxl-env.sh.example to sdxl-env.sh and then fill in the details.
For both training scripts, any missing values from your user config will fallback to the defaults.
- If you are using
--report_to='wandb'(the default), the following will help you report your statistics:
wandb loginFollow the instructions that are printed, to locate your API key and configure it.
Once that is done, any of your training sessions and validation data will be available on Weights & Biases.
- For SD2.1, run the
training.shscript, probably by redirecting the output to a log file:
bash training.sh > /path/to/training-$(date +%s).log 2>&1For SDXL, run the train_sdxl.sh script, redirecting outputs to the log file:
bash train_sdxl.sh > /path/to/training-$(date +%s).log 2>&1From here, that's really up to you.
- For very poorly distributed aspect buckets, some problems with uneven training are being worked on.
- Some hardcoded values need to be adjusted/removed - images under 860x860 are discarded.
- SDXL latent caching is currently non-deterministic, and will be adjusted for a better hashing method soon.