Adding [T5/MT5/UMT5]EncoderForSequenceClassification#40898
Conversation
4f878f2 to
62bb9c8
Compare
ArthurZucker
left a comment
There was a problem hiding this comment.
hey! Thanks for the PR. My main question is wether or not there is an actual checkpoint released with this? If not I don't think it really makes sense, but we can do a feature request and leave it up to the community!
You should be able to leverage `
class LlamaForSequenceClassification(GenericForSequenceClassification, LlamaPreTrainedModel):``` as well
62bb9c8 to
2a23c91
Compare
|
hey @ArthurZucker thanks for reviewing! There is a fine-tuned TF model (sentence-t5) from the paper here. It's a bit different though and outputs a 768 length sentence embedding vector. The paper was mostly inspiration for a decent sentence representation, and my main motivation for the PR was to provide a leaner classifier for T5 that can still use FLAN weights. I often fine-tune pre-trained models for text-classification where efficiency is important. This seems to fit the bill for that and I thought others might find it useful. If you prefer the feature request route just let me know how to proceed! thanks. |
|
Just submitted a feature request here. Looks like I may also need to add this for mt5 and umt5! |
2a23c91 to
eb9899f
Compare
|
[For maintainers] Suggested jobs to run (before merge) run-slow: mt5, t5, umt5 |
|
hey @ArthurZucker there is a feature request out with community support and CI tests are green. please have a look when you have a chance, thanks! |
What does this PR do?
This PR adds an encoder-only sequence classifier for T5. Inspiration for this comes from the following paper: "Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models". The mean of final hidden states is used as the sentence representation (best results from paper). For
t5-small, the encoder-only classifier is nearly half the size and takes nearly a third of the time for a forward pass compared to the encoder-decoder classifier .Note that I tried to include this new class in
MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMESin modeling_auto.py but I could not get around one failing testtest_load_with_mismatched_shapesin test_modeling_common.py. That test seems to invoke the model as a decoder and fails here in the T5Stack class with the following error:ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds. Because of this, I did not add tomodeling_auto.pybut if there is a need to do so, please let me know (any advice on how-to would be appreciated). Having noted that, this PR does include a small test intest_modeling_t5.py.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker