Add generate_streaming for Streaming Audio Generation #262

balalofernandez · 2025-07-22T11:01:33Z

This PR introduces the generate_streaming function, which enables streaming audio generation. As soon as audio tokens are generated by the model, the vocoder is run on the entire sequence (using the full sentence as context for best quality), and only the newly generated audio chunk is yielded. The implementation closely mirrors the existing generate function, but is kept as a separate function (without extracting shared logic) to make the review process easier.

Hopefully adding support for #11, #93, #237, making it usable in conversational use cases #181 and accelerating ttfb #153.

buttercrab · 2025-07-22T15:30:37Z

dia/model.py

+                    )
+                start_time = time.time()
+
+            if current_step_idx - last_yield_step >= chunk_size:


Due to delay patterns, the first chunk is smaller than other chunks

buttercrab · 2025-07-22T15:31:15Z

dia/model.py

+                for i in range(batch_size):
+                    generated_codes[i, : total_lens[i], :] = all_tokens[i]
+                lengths_Bx = torch.tensor(total_lens, device=self.device)
+                audio_chunks = self._generate_output(generated_codes, lengths_Bx)


This is inefficient. Only process newly generate tokens.

I noticed some artifacts when passing only the new tokens to the vocoder.

Okay can you fix the delay pattern problem?

Siddharth0207 · 2025-08-19T04:20:33Z

Thanks a ton !

wwang1110 · 2025-08-22T21:43:34Z

Thanks, when will this feature be release?

nlpkiddo-2001 · 2025-09-24T07:11:17Z

When Can we expect this feature?

balalofernandez added 13 commits July 15, 2025 13:38

First changes

90535c9

Works without overlap

a6c7428

Remove _process_audio_chunk

64dcc46

Give more context to the vocoder

0643a80

Simplify streaming example

13f2192

Removed unnecessary dependencies

05db019

Add tokens/second metric

a80e2e3

Remove prefilled part

ef51c58

Allow batch processing

0ee015d

Remove old dependencies

213fe31

Generate a dialog with voice clone

ffd5dff

Fix unused dependencies

ccecf75

Fix ruff

24ab5c2

buttercrab requested changes Jul 22, 2025

View reviewed changes

Awaiskhan404 approved these changes Aug 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add generate_streaming for Streaming Audio Generation #262

Add generate_streaming for Streaming Audio Generation #262

balalofernandez commented Jul 22, 2025

Uh oh!

buttercrab Jul 22, 2025

Uh oh!

buttercrab Jul 22, 2025

Uh oh!

balalofernandez Jul 22, 2025

Uh oh!

buttercrab Jul 23, 2025

Uh oh!

Siddharth0207 commented Aug 19, 2025

Uh oh!

wwang1110 commented Aug 22, 2025

Uh oh!

nlpkiddo-2001 commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Add generate_streaming for Streaming Audio Generation #262

Are you sure you want to change the base?

Add generate_streaming for Streaming Audio Generation #262

Conversation

balalofernandez commented Jul 22, 2025

Uh oh!

buttercrab Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

buttercrab Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

balalofernandez Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

buttercrab Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Siddharth0207 commented Aug 19, 2025

Uh oh!

wwang1110 commented Aug 22, 2025

Uh oh!

nlpkiddo-2001 commented Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants