Skip to content

T5Gemma#1955

Closed
jncraton wants to merge 1 commit into
OpenNMT:masterfrom
jncraton:t5gemma
Closed

T5Gemma#1955
jncraton wants to merge 1 commit into
OpenNMT:masterfrom
jncraton:t5gemma

Conversation

@jncraton

Copy link
Copy Markdown
Contributor

Support T5Gemma architecture. Here's the basic idea from the transformers T5Gemma documentation:

T5Gemma (aka encoder-decoder Gemma) was proposed in a research paper by Google. It is a family of encoder-decoder large language models, developed by adapting pretrained decoder-only models into encoder-decoder. T5Gemma includes pretrained and instruction-tuned variants. The architecture is based on transformer encoder-decoder design following T5, with improvements from Gemma 2: GQA, RoPE, GeGLU activation, RMSNorm, and interleaved local/global attention.

For reference, here is the T5Gemma PR that merged model support for these architectures into transformers.

Copilot AI review requested due to automatic review settings December 20, 2025 14:05
@jncraton

Copy link
Copy Markdown
Contributor Author

Sorry. I didn't mean to open this PR here yet. It needs a lot of work before it would be ready to go.

@jncraton jncraton closed this Dec 20, 2025

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the T5Gemma architecture, a family of encoder-decoder models developed by Google that adapts pretrained decoder-only models into an encoder-decoder architecture. T5Gemma combines T5's encoder-decoder design with Gemma 2 improvements including GQA, RoPE, GeGLU activation, RMSNorm, and pre/post layer normalization patterns.

Key changes:

  • Introduces pre/post layer norm support for both encoder and decoder transformer layers
  • Adds cross-attention pre/post layer norms for decoder layers to support T5Gemma's normalization pattern
  • Implements T5GemmaLoader converter to map T5Gemma models from Hugging Face transformers to CTranslate2

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/layers/transformer.cc Implements pre/post layer norm logic for encoder and decoder layers, including optional cross-attention layer norms
include/ctranslate2/layers/transformer.h Adds layer norm member variables to support pre/post normalization in encoder and decoder layers
python/ctranslate2/specs/transformer_spec.py Extends transformer specs to support pre_post_layer_norm configuration parameter and creates appropriate layer norm specs
python/ctranslate2/converters/transformers.py Adds T5GemmaLoader to convert T5Gemma models, including weight mapping, vocabulary handling, and RoPE configuration
README.md Updates supported models list to include T5Gemma
Comments suppressed due to low confidence (1)

python/ctranslate2/converters/transformers.py:1347

  • Variable num_heads_kv_enc is not used.
            num_heads_kv_enc = None

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/layers/transformer.cc
Comment thread python/ctranslate2/converters/transformers.py
Comment thread python/ctranslate2/converters/transformers.py
Comment thread python/ctranslate2/converters/transformers.py
Comment thread src/layers/transformer.cc
Comment thread src/layers/transformer.cc
Comment thread src/layers/transformer.cc
Comment thread src/layers/transformer.cc
@BBC-Esq

BBC-Esq commented Dec 20, 2025

Copy link
Copy Markdown
Contributor

Ctranslate2 doesn't currently support the encoder/decoder based embedding models so I'm excited to see this. Have you started working on the Qwen3 embedding models yet by chance? Ctranslate2's c++ and python code would both have to be modified because it currently can't return the "intermediate states" or whatever you call it...Seems like the same issue with t5gemma architecture?

@jncraton

Copy link
Copy Markdown
Contributor Author

@BBC-Esq Older encoder/decoder models (such as T5) are actually currently supported, but this would add support for one with a more modern architecture. Embedding models can also be used currently. I've used forward_batch and .last_hidden_state to compute document embeddings.

@BBC-Esq

BBC-Esq commented Dec 20, 2025

Copy link
Copy Markdown
Contributor

@jncraton can you please teach me how to do that? Maybe you have a sample script because I can convert the Qwen3 embedding models because the architecture is supported,but I can't figure out how to use it with ctranslate2 yet.

@BBC-Esq

BBC-Esq commented Dec 20, 2025

Copy link
Copy Markdown
Contributor

The forward_batch / .last_hidden_state approach works for encoder-only models (BERT, XLM-RoBERTa, etc.) loaded via ctranslate2.Encoder, which returns an EncoderForwardOutput object containing last_hidden_state and pooler_output.
However, Qwen3-Embedding is a decoder-only architecture and it gets loaded as a ctranslate2.Generator. The Generator.forward_batch() method returns logits (post-lm_head output), not the hidden states.

This is what my understanding is/was unless you're aware of some kind of workaround? My understanding is that we'd need to modify CTranslate2's C++ inference path to optionally return hidden states from decoder models...

@jncraton

jncraton commented Dec 20, 2025

Copy link
Copy Markdown
Contributor Author

@BBC-Esq That's my understanding as well. I'm not aware of a way to get access to the embeddings from Qwen3 without modifying CTranslate2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants