Skip to content

Adding [T5/MT5/UMT5]EncoderForSequenceClassification#40898

Open
cbhyphen wants to merge 11 commits into
huggingface:mainfrom
cbhyphen:t5-enc-for-seq-clf
Open

Adding [T5/MT5/UMT5]EncoderForSequenceClassification#40898
cbhyphen wants to merge 11 commits into
huggingface:mainfrom
cbhyphen:t5-enc-for-seq-clf

Conversation

@cbhyphen

Copy link
Copy Markdown

What does this PR do?

This PR adds an encoder-only sequence classifier for T5. Inspiration for this comes from the following paper: "Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models". The mean of final hidden states is used as the sentence representation (best results from paper). For t5-small, the encoder-only classifier is nearly half the size and takes nearly a third of the time for a forward pass compared to the encoder-decoder classifier .

Note that I tried to include this new class in MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES in modeling_auto.py but I could not get around one failing test test_load_with_mismatched_shapes in test_modeling_common.py. That test seems to invoke the model as a decoder and fails here in the T5Stack class with the following error: ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds. Because of this, I did not add to modeling_auto.py but if there is a need to do so, please let me know (any advice on how-to would be appreciated). Having noted that, this PR does include a small test in test_modeling_t5.py.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker

@ArthurZucker ArthurZucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey! Thanks for the PR. My main question is wether or not there is an actual checkpoint released with this? If not I don't think it really makes sense, but we can do a feature request and leave it up to the community!

You should be able to leverage `
class LlamaForSequenceClassification(GenericForSequenceClassification, LlamaPreTrainedModel):``` as well

@cbhyphen

cbhyphen commented Sep 17, 2025

Copy link
Copy Markdown
Author

hey @ArthurZucker thanks for reviewing! There is a fine-tuned TF model (sentence-t5) from the paper here. It's a bit different though and outputs a 768 length sentence embedding vector.

The paper was mostly inspiration for a decent sentence representation, and my main motivation for the PR was to provide a leaner classifier for T5 that can still use FLAN weights. I often fine-tune pre-trained models for text-classification where efficiency is important. This seems to fit the bill for that and I thought others might find it useful. If you prefer the feature request route just let me know how to proceed! thanks.

@cbhyphen

cbhyphen commented Oct 9, 2025

Copy link
Copy Markdown
Author

Just submitted a feature request here. Looks like I may also need to add this for mt5 and umt5!

@cbhyphen cbhyphen force-pushed the t5-enc-for-seq-clf branch from 2a23c91 to eb9899f Compare October 9, 2025 22:04
@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: mt5, t5, umt5

@cbhyphen cbhyphen changed the title Adding T5EncoderForSequenceClassification Adding [T5/MT5/UMT5]EncoderForSequenceClassification Oct 10, 2025
@cbhyphen

Copy link
Copy Markdown
Author

hey @ArthurZucker there is a feature request out with community support and CI tests are green. please have a look when you have a chance, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants