tts : add SNAC decoder architecture support for Orpheus TTS#318
Open
devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
Open
tts : add SNAC decoder architecture support for Orpheus TTS#318devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
Conversation
- Add LLM_ARCH_SNAC_DEC architecture enum and name mapping - Define 27 SNAC-specific tensor types for decoder and quantizer - Add tensor name mappings in llama-arch.cpp - Add SNAC_DEC to gguf constants with tensor enums and mappings - Implement SnacDecModel class for model conversion - Add comprehensive SNAC implementation documentation This provides the foundational architecture support for SNAC audio codec. Remaining work includes model loading, forward pass, and TTS tool integration. Addresses issue #208 Co-Authored-By: Jake Cosme <jake@cognition.ai>
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
SNAC decoder doesn't use RoPE (it's an audio codec), so add it to the LLAMA_ROPE_TYPE_NONE case alongside WAVTOKENIZER_DEC. Co-Authored-By: Jake Cosme <jake@cognition.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Make sure to read the contributing guidelines before submitting a PR
Summary
This PR adds foundational architecture support for SNAC (Multi-Scale Neural Audio Codec) decoder to enable Orpheus TTS models in llama.cpp. This addresses issue #208.
Note: This PR contains only the architecture infrastructure and does not include model loading, forward pass implementation, or TTS tool integration. It cannot run SNAC models yet but provides the foundation for those components.
Changes
Architecture Registration
LLM_ARCH_SNAC_DECarchitecture enum and registered "snac-dec" namesrc/llama-arch.handsrc/llama-arch.cppTensor Definitions (27 new tensor types)
Decoder tensors:
SNAC_DEC_CONV_IN,SNAC_DEC_CONV_OUTSNAC_DEC_ATTN_NORM,SNAC_DEC_ATTN_Q/K/V/OUTSNAC_DEC_BLK_CONV_UP,SNAC_DEC_BLK_CONV1/2/3,SNAC_DEC_BLK_SNAKE_ALPHAVector quantizer tensors (4 levels):
SNAC_VQ_IN_PROJ,SNAC_VQ_OUT_PROJSNAC_VQ_CODEBOOKEncoder tensors (included for completeness, not needed for TTS inference):
SNAC_ENC_*prefixModel Conversion
Implemented
SnacDecModelclass inconvert_hf_to_gguf.py:_gor_v)codebook_size,decoder_rates,latent_dim,decoder_dimDocumentation
Added
docs/SNAC_IMPLEMENTATION.mdwith:Review Focus Areas
SnacDecModelclass is missing a@ModelBase.register()decorator. Without this, the conversion class won't be invoked. Need to determine the correct HuggingFace architecture name to register.Other items to review:
_gand_vsuffixes is correct for SNAC's weight norm implementationllama-arch.cppmappings - note the use of%dfor block indices vs{bid}in PythonTesting Status
❌ Not tested with actual models yet - this is infrastructure-only
To test after merging:
Next Steps
Remaining work tracked in
docs/SNAC_IMPLEMENTATION.md:@ModelBase.register()decorator toSnacDecModelllama-model.cppllama.cpp(convolutions, Snake activation, attention)References
Link to Devin run: https://app.devin.ai/sessions/f86c58111acb4011894cbaad18a50e62
Requested by: Jake Cosme (jake@cognition.ai) (@jakexcosme)