Skip to content

Upgrading diffusers and adding support for new models #579

@avjves

Description

@avjves

Hey @feifeibear!

I decided to open an issue instead of using the GH Discussions, as this seems like a better place to have a possibly longer conversation.

We use xDiT internally and would like to use and develop it further by pushing PRs that improve the performance of current models as well as implement support for new models. As new models are pushed to diffusers originally and we port them from there, it'd great if we could use latest diffusers version with xDiT.
Diffusers codebase has changed since the current models have been ported, so the current models often have a hard requirement to use a specific diffusers version to work. To overcome this, our idea is to bit by bit update the models to support the latest diffusers and then add strict diffusers version checking into the examples theselves, so we don't have to attempt to somehow support multiple versions.
This would allow us start supporting latest diffusers on select models (and new models) but still allow older diffusers version to be used for the models that have yet to be updated.

What do you think about this idea?

If we can go ahead with the idea above, our initial plan would be as follows:

  1. Add a section to README.md regarding the different diffusers version requirements for models. Currently this information lives in README.md / setup.py for some models, but is a bit hard to find. I believe we could include required diffusers version in the supported model table directly.
  2. Update Flux model definitions to match newer diffusers version. This includes the flux pipeline, attention processor, transformer and both of the examples. We would then include minimum diffusers version assertion in the examples as well as add the version to the model table.
  3. Update Hunyuanvideo model definitions to match newer diffusers version. This doesn't use a custom pipeline, so the only change required is to the example itself.
  4. Add Wan 2.X support. This would be a new example that would heavily use the latest diffusers code and just add SP support for now.

Most of these changes we currently already have ready internally, but would like to upstream as well. Our plans would then afterwards extend to new emerging models as well.

A final discussion point is the performance of yunchang. Due to the way yunchang uses torch.distributed operations for the ring-attention, it isn't compatible with torch.compile (which in the current codebase is disabled for yunchang attention path). This naturally causes a performance regression.
How keen are you on keeping yunchang as part of the repository? Some models already use the USP method, which is fully compatible with torch.compile. We have already tried adding joint_tensor support to USP and it works and produces the same output as if it was done with the hybrid_seq_parallel_attn / yunchang, but performs better due to being compilable. I believe we could replace all calls to use just USP.

What do you think, do these changes sound good? Are there points you would prefer for us to do differently?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions