tts : add SNAC decoder architecture support for Orpheus TTS by devin-ai-integration[bot] · Pull Request #318 · COG-GTM/llama.cpp

devin-ai-integration · 2025-10-22T20:49:55Z

Make sure to read the contributing guidelines before submitting a PR

Summary

This PR adds foundational architecture support for SNAC (Multi-Scale Neural Audio Codec) decoder to enable Orpheus TTS models in llama.cpp. This addresses issue #208.

Note: This PR contains only the architecture infrastructure and does not include model loading, forward pass implementation, or TTS tool integration. It cannot run SNAC models yet but provides the foundation for those components.

Changes

Architecture Registration

Added LLM_ARCH_SNAC_DEC architecture enum and registered "snac-dec" name
Added to src/llama-arch.h and src/llama-arch.cpp

Tensor Definitions (27 new tensor types)

Decoder tensors:

Input/output convolutions: SNAC_DEC_CONV_IN, SNAC_DEC_CONV_OUT
Optional attention: SNAC_DEC_ATTN_NORM, SNAC_DEC_ATTN_Q/K/V/OUT
Decoder blocks (4 blocks): SNAC_DEC_BLK_CONV_UP, SNAC_DEC_BLK_CONV1/2/3, SNAC_DEC_BLK_SNAKE_ALPHA

Vector quantizer tensors (4 levels):

Projections: SNAC_VQ_IN_PROJ, SNAC_VQ_OUT_PROJ
Codebooks: SNAC_VQ_CODEBOOK

Encoder tensors (included for completeness, not needed for TTS inference):

Similar structure to decoder with SNAC_ENC_* prefix

Model Conversion

Implemented SnacDecModel class in convert_hf_to_gguf.py:

Skips weight normalization parameters (tensors ending in _g or _v)
Extracts hyperparameters from config: codebook_size, decoder_rates, latent_dim, decoder_dim
Sets vocab to none (audio codec, not text model)
Marks as non-causal attention

Documentation

Added docs/SNAC_IMPLEMENTATION.md with:

Architecture overview and component descriptions
Tensor naming conventions
Model conversion instructions
TODO list for remaining implementation work
Snake activation implementation notes

Review Focus Areas

⚠️ Critical: The SnacDecModel class is missing a @ModelBase.register() decorator. Without this, the conversion class won't be invoked. Need to determine the correct HuggingFace architecture name to register.

Other items to review:

Tensor naming conventions: Match against actual SNAC model checkpoints from HuggingFace (haven't been tested yet)
Weight normalization handling: Verify that skipping _g and _v suffixes is correct for SNAC's weight norm implementation
Encoder tensors: Should these be included given they're not needed for TTS inference?
Hyperparameter defaults: Verify defaults match standard SNAC configs (24kHz model uses these values)
Tensor mappings in C++: Review the llama-arch.cpp mappings - note the use of %d for block indices vs {bid} in Python

Testing Status

❌ Not tested with actual models yet - this is infrastructure-only

To test after merging:

# Download SNAC model
git clone https://huggingface.co/hubertsiuzdak/snac_24khz

# Convert to GGUF (will fail until @ModelBase.register added)
python convert_hf_to_gguf.py snac_24khz --outfile snac-24khz-f16.gguf --outtype f16

Next Steps

Remaining work tracked in docs/SNAC_IMPLEMENTATION.md:

Add @ModelBase.register() decorator to SnacDecModel
Implement model loading in llama-model.cpp
Implement forward pass in llama.cpp (convolutions, Snake activation, attention)
Integrate with TTS tool
Test with Orpheus TTS models

References

Issue: tts : add support for Orpheus #208
SNAC Paper: https://arxiv.org/abs/2410.14411
SNAC GitHub: https://github.com/hubertsiuzdak/snac
Orpheus Models: https://huggingface.co/collections/canopylabs/orpheus-tts-67d9ea3f6c05a941c06ad9d2
Similar implementation: WavTokenizer support (PR tts : add OuteTTS support ggml-org/llama.cpp#10784)

Link to Devin run: https://app.devin.ai/sessions/f86c58111acb4011894cbaad18a50e62
Requested by: Jake Cosme (jake@cognition.ai) (@jakexcosme)

- Add LLM_ARCH_SNAC_DEC architecture enum and name mapping - Define 27 SNAC-specific tensor types for decoder and quantizer - Add tensor name mappings in llama-arch.cpp - Add SNAC_DEC to gguf constants with tensor enums and mappings - Implement SnacDecModel class for model conversion - Add comprehensive SNAC implementation documentation This provides the foundational architecture support for SNAC audio codec. Remaining work includes model loading, forward pass, and TTS tool integration. Addresses issue #208 Co-Authored-By: Jake Cosme <jake@cognition.ai>

devin-ai-integration · 2025-10-22T20:50:00Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

SNAC decoder doesn't use RoPE (it's an audio codec), so add it to the LLAMA_ROPE_TYPE_NONE case alongside WAVTOKENIZER_DEC. Co-Authored-By: Jake Cosme <jake@cognition.ai>

github-actions bot added documentation Improvements or additions to documentation python labels Oct 22, 2025

Fix build: add SNAC_DEC to rope_type switch statement

b0b6da2

SNAC decoder doesn't use RoPE (it's an audio codec), so add it to the LLAMA_ROPE_TYPE_NONE case alongside WAVTOKENIZER_DEC. Co-Authored-By: Jake Cosme <jake@cognition.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tts : add SNAC decoder architecture support for Orpheus TTS#318

tts : add SNAC decoder architecture support for Orpheus TTS#318
devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
devin/1761164007-add-orpheus-tts-snac-v3

devin-ai-integration bot commented Oct 22, 2025

Uh oh!

devin-ai-integration bot commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

devin-ai-integration bot commented Oct 22, 2025

Summary

Changes

Architecture Registration

Tensor Definitions (27 new tensor types)

Model Conversion

Documentation

Review Focus Areas

Testing Status

Next Steps

References

Uh oh!

devin-ai-integration bot commented Oct 22, 2025

🤖 Devin AI Engineer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants