T5Gemma#1955
Conversation
|
Sorry. I didn't mean to open this PR here yet. It needs a lot of work before it would be ready to go. |
There was a problem hiding this comment.
Pull request overview
This PR adds support for the T5Gemma architecture, a family of encoder-decoder models developed by Google that adapts pretrained decoder-only models into an encoder-decoder architecture. T5Gemma combines T5's encoder-decoder design with Gemma 2 improvements including GQA, RoPE, GeGLU activation, RMSNorm, and pre/post layer normalization patterns.
Key changes:
- Introduces pre/post layer norm support for both encoder and decoder transformer layers
- Adds cross-attention pre/post layer norms for decoder layers to support T5Gemma's normalization pattern
- Implements T5GemmaLoader converter to map T5Gemma models from Hugging Face transformers to CTranslate2
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/layers/transformer.cc | Implements pre/post layer norm logic for encoder and decoder layers, including optional cross-attention layer norms |
| include/ctranslate2/layers/transformer.h | Adds layer norm member variables to support pre/post normalization in encoder and decoder layers |
| python/ctranslate2/specs/transformer_spec.py | Extends transformer specs to support pre_post_layer_norm configuration parameter and creates appropriate layer norm specs |
| python/ctranslate2/converters/transformers.py | Adds T5GemmaLoader to convert T5Gemma models, including weight mapping, vocabulary handling, and RoPE configuration |
| README.md | Updates supported models list to include T5Gemma |
Comments suppressed due to low confidence (1)
python/ctranslate2/converters/transformers.py:1347
- Variable num_heads_kv_enc is not used.
num_heads_kv_enc = None
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Ctranslate2 doesn't currently support the encoder/decoder based embedding models so I'm excited to see this. Have you started working on the Qwen3 embedding models yet by chance? Ctranslate2's c++ and python code would both have to be modified because it currently can't return the "intermediate states" or whatever you call it...Seems like the same issue with t5gemma architecture? |
|
@BBC-Esq Older encoder/decoder models (such as T5) are actually currently supported, but this would add support for one with a more modern architecture. Embedding models can also be used currently. I've used |
|
@jncraton can you please teach me how to do that? Maybe you have a sample script because I can convert the Qwen3 embedding models because the architecture is supported,but I can't figure out how to use it with ctranslate2 yet. |
|
The forward_batch / .last_hidden_state approach works for encoder-only models (BERT, XLM-RoBERTa, etc.) loaded via ctranslate2.Encoder, which returns an EncoderForwardOutput object containing last_hidden_state and pooler_output. This is what my understanding is/was unless you're aware of some kind of workaround? My understanding is that we'd need to modify CTranslate2's C++ inference path to optionally return hidden states from decoder models... |
|
@BBC-Esq That's my understanding as well. I'm not aware of a way to get access to the embeddings from Qwen3 without modifying CTranslate2. |
Support T5Gemma architecture. Here's the basic idea from the
transformersT5Gemma documentation:For reference, here is the T5Gemma PR that merged model support for these architectures into
transformers.