Add CSM model by eustlb · Pull Request #36719 · huggingface/transformers

eustlb · 2025-03-14T10:50:42Z

What does this PR do?

Adds CSM (audio)-text-to-speech model!
Original code
Hub model weights
Converted weights

github-actions · 2025-03-14T10:50:55Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

HuggingFaceDocBuilderDev · 2025-04-02T16:25:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SeungyounShin · 2025-04-04T05:50:16Z

Do you mind if I contribute to CSMForConditionalGeneration? It looks like it's currently empty. Or are you working on it right now?
cc. @eustlb

eustlb · 2025-04-04T07:56:33Z

Hey @SeungyounShin, currently, conditional generation is handled through the ForCausalLM class. Actually, for CSM it makes no difference if the model is generated from context (text + audio) or from text only. I decided for go with the ForCausalLM naming because of that, usually, the ForConditionalGeneration class is for encoder-decoder like architectures, but this is up to debate with the core maintainers as soon as the PR is ready for review 😊
thanks a lot for offering your help 🤗

ArthurZucker

Let's go!

* draft structure * depth decoder with forward pre hook * full model forward draft * draft update * depth decoder update * ConversationalSpeechModelForCausalLM udpates * add generate * max length criteria small fix * udpate * updates * generation update * update in loss compute * conversion script * update for correct input embeddings * handle interleaved rope * update * update * update * support compile * update training * add doc * update doc * correct inits * ConversationalSpeechModel -> Csm * conf update * name update * tests CsmForCausalLMTest * convert use cached_file * conf + modeling updates * generate utils handle third dim shape * integration test * modeling + conf updates * common test handle more than 2 dims * add nested audio list utils * processing handle nested audio list * csm processing draft * mimi util * init updates * modular update * convert modular * processing update * csm tests update * generate tests handle third dim * generate utils handle third dim * propagate _get_initial_cache_position update * tied_weight_keys update + convert correctly * fix inputs_embeds * revert audio nested list * batch inference update + return audio * audio_utils update * processor update * some more integration tests * remove old test * porcessing output labels * improve * fix * update rope values with equivalent ones * conversion update * udpate tests * handle depth decoder generation config * remove default eos_token_id * make style * revert modeling_mimi * add default generation_config * remove sdpa since handled by default * make * fix conflict * fix conflicts * correct naming * correct imports * make * causal -> conditional naming * causal -> conditional naming * auto update * make * make * add doc * test update * fix weight init * audio tokens offsets as buffer * 4d mask in conditional class * make * doc update * fix causal mask * fix causal mask * doc update * doc update * add processor doc * update doc * fix 4d causal mask * update make_list_of_audio * do not default to mutable * remove duplicates * remove useless reset_parameters * use GradientCheckpointingLayer * use can_return_tuple * formatting * prepend placeholder in _sample * torch compile fix * some more fixies * convert modular * fix * default max_length in convert * handle depth decoder generation config correctly * clearer formulation * handle output_loading_info * handle softmax warning * add doc * propagate _get_initial_cache_position changes * generation in its own module * add processor tests * fix compile witu cuda graphs * fix compile with cuda graphs * add csm.md * include CSM loss * doc nit * doc nit * doc nit * Update docs/source/en/model_doc/csm.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add save_audio to processor * Update src/transformers/models/csm/modular_csm.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * doc update * simplify audio_codes_mask computation * doc update * simplify loss computation * fix static cache test * fix * remove comment * simplify encoded length computation * use hf-internal-testing * doc update * cast to float before numpy * nit * mem efficient codebook head * nit * cat input values with cutoffs --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

johnwick123f · 2025-05-16T23:43:50Z

Not sure if I should ask here but this implementation of csm seems considerably better quality then the official implementation for some reason? Any reason, why?

ArthurZucker · 2025-05-19T13:03:04Z

Not sure how we should answer 😅 happy that you like it 🤗

draft structure

48d2647

github-actions Bot marked this pull request as draft March 14, 2025 10:50

eustlb added New model Audio labels Mar 14, 2025

eustlb added 22 commits March 15, 2025 14:00

depth decoder with forward pre hook

61db81f

full model forward draft

7bfe4b7

draft update

b81db07

depth decoder update

a305fa3

ConversationalSpeechModelForCausalLM udpates

36f7632

add generate

6c4d2d5

max length criteria small fix

3f3625c

udpate

67b6cc4

updates

89b2e95

generation update

f9c19f9

update in loss compute

1fa7465

conversion script

2aabe94

update for correct input embeddings

f0a7212

handle interleaved rope

eb712de

update

bb39449

update

34dc04d

update

602f6d7

support compile

63c160d

update training

936282b

add doc

1e92e69

update doc

abee0f7

correct inits

4269b06

eustlb and others added 6 commits May 2, 2025 18:10

Merge branch 'main' into add-csm

77d9e11

simplify loss computation

297a626

fix static cache test

45fc46e

fix

3204df4

remove comment

8549829

Merge branch 'main' into add-csm

b518012

ArthurZucker approved these changes May 5, 2025

View reviewed changes

Comment thread src/transformers/models/mimi/modeling_mimi.py

Comment thread src/transformers/models/mimi/modeling_mimi.py Outdated

Comment thread src/transformers/models/mimi/modeling_mimi.py Outdated

eustlb and others added 16 commits May 5, 2025 10:25

Merge branch 'main' into add-csm

4a514d1

simplify encoded length computation

aa00fa6

use hf-internal-testing

050907d

Merge branch 'main' into add-csm

d5a86f6

Merge branch 'main' into add-csm

1a71dfe

doc update

abef348

Merge branch 'add-csm' of github.com:eustlb/transformers into add-csm

9999591

cast to float before numpy

a1fc717

nit

42d88a7

mem efficient codebook head

7bf37b0

Merge branch 'main' into add-csm

7b6a275

nit

0872843

Merge branch 'add-csm' of github.com:eustlb/transformers into add-csm

d627274

Merge branch 'main' into add-csm

0ae894f

cat input values with cutoffs

b4bca5c

Merge branch 'main' into add-csm

3d72be3

eustlb merged commit 798f948 into huggingface:main May 7, 2025

sayakpaul mentioned this pull request May 8, 2025

[tests] fix audioldm2 for transformers main. huggingface/diffusers#11522

Merged

johnwick123f mentioned this pull request May 12, 2025

[New Model]: CSM 1b vllm-project/vllm#18005

Closed

1 task

ebezzam mentioned this pull request Dec 8, 2025

Add chatterbox support #42413

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CSM model#36719

Add CSM model#36719
eustlb merged 163 commits into
huggingface:mainfrom
eustlb:add-csm

eustlb commented Mar 14, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 14, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2025

Uh oh!

SeungyounShin commented Apr 4, 2025 •

edited

Loading

Uh oh!

eustlb commented Apr 4, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnwick123f commented May 16, 2025

Uh oh!

ArthurZucker commented May 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Uh oh!

Conversation

eustlb commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions Bot commented Mar 14, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2025

Uh oh!

SeungyounShin commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eustlb commented Apr 4, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnwick123f commented May 16, 2025

Uh oh!

ArthurZucker commented May 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

eustlb commented Mar 14, 2025 •

edited

Loading

SeungyounShin commented Apr 4, 2025 •

edited

Loading