Skip to content

Conversation

@balalofernandez
Copy link

This PR introduces the generate_streaming function, which enables streaming audio generation. As soon as audio tokens are generated by the model, the vocoder is run on the entire sequence (using the full sentence as context for best quality), and only the newly generated audio chunk is yielded. The implementation closely mirrors the existing generate function, but is kept as a separate function (without extracting shared logic) to make the review process easier.

Hopefully adding support for #11, #93, #237, making it usable in conversational use cases #181 and accelerating ttfb #153.

)
start_time = time.time()

if current_step_idx - last_yield_step >= chunk_size:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to delay patterns, the first chunk is smaller than other chunks

for i in range(batch_size):
generated_codes[i, : total_lens[i], :] = all_tokens[i]
lengths_Bx = torch.tensor(total_lens, device=self.device)
audio_chunks = self._generate_output(generated_codes, lengths_Bx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inefficient. Only process newly generate tokens.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed some artifacts when passing only the new tokens to the vocoder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay can you fix the delay pattern problem?

@Siddharth0207
Copy link

Thanks a ton !

@wwang1110
Copy link

Thanks, when will this feature be release?

@nlpkiddo-2001
Copy link

When Can we expect this feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants