Skip to content

Add chatterbox support#42413

Open
manmay-nakhashi wants to merge 66 commits into
huggingface:mainfrom
resemble-ai:add-s3gen-hifinet
Open

Add chatterbox support#42413
manmay-nakhashi wants to merge 66 commits into
huggingface:mainfrom
resemble-ai:add-s3gen-hifinet

Conversation

@manmay-nakhashi

Copy link
Copy Markdown

No description provided.

@Rocketknight1

Copy link
Copy Markdown
Member

This PR is very overwhelming, like a lot of code agent PRs. It's not clear to me how novel some of these architectures are, or whether we really need three whole new architectures in the codebase! 😅

Ideally we could cut down the PR size a lot by using modular files and importing, and maybe treating some of the submodels as components or reusing existing architectures for those?

cc @ebezzam @eustlb since it's an audio model/pipeline

@manmay-nakhashi

manmay-nakhashi commented Nov 26, 2025

Copy link
Copy Markdown
Author

@Rocketknight1 t3 model is very similar to llama model but because this is a whole Text-to-Speech pipeline we have tokenizer(s3tokenizer different pr which compresses the audio to 25 tok/ sec we need this for conditioning), encoder(T3- mod-llama model (main changes are speech tokens and conditioning )), decoder(s3gen cfm based model from cozyvoice2) and hifinet(from cozyvoice2) which converts mel to wav.
i checked we don't have this models in the hf currently

@manmay-nakhashi

Copy link
Copy Markdown
Author

@eustlb

@ebezzam ebezzam left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manmay-nakhashi I've done an initial review of some things stuck out to me, and to help you familiarize with different Transformers conventions.

Most notably, there are several modules that already exist within the Transformers library from other models, and those should be used via modular to create your modeling files.

Moreover, here are some PRs of other TTS models that may also help you see how to prepare the various files:

  • CSM: #36719
  • Dia: #38405
  • VibeVoice (ongoing but reflects more recent conventions): #40546

Comment thread src/transformers/models/s3tokenizer/modeling_s3tokenizer.py Outdated
Comment thread src/transformers/models/s3tokenizer/modeling_s3tokenizer.py Outdated
Comment thread src/transformers/models/s3tokenizer/modeling_s3tokenizer.py Outdated
Comment thread src/transformers/models/s3gen/modeling_s3gen.py
Comment thread src/transformers/models/s3gen/modeling_s3gen.py Outdated
Comment thread src/transformers/models/s3gen/modeling_s3gen.py
Comment thread src/transformers/models/s3gen/modeling_s3gen.py Outdated
Comment thread src/transformers/models/s3gen/modeling_s3gen.py Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modeling tests should contain integration tests to compare the outputs of Transformers version with original model. For example:

Comment thread src/transformers/models/s3tokenizer/modeling_s3tokenizer.py Outdated
@github-actions

github-actions Bot commented Jan 6, 2026

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, chatterbox, s3gen, s3tokenizer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants