🚨 [Attn] Remove all old mask APIs from modeling#43924
Conversation
|
run-slow: big_bird,blip_2,bridgetower,clap,flava,ibert,instructblip,instructblipvideo,tapas,vilt |
|
This comment contains models: ["models/big_bird", "models/blip_2", "models/bridgetower", "models/clap", "models/flava", "models/ibert", "models/instructblip", "models/instructblipvideo", "models/tapas", "models/vilt"] |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
CI ResultsCommit Info
Model CI Report❌ 31 new failed tests from this PR 😭
|
|
run-slow: align,altclip,big_bird,blip,blip_2,bridgetower,bros,canine,chinese_clip,clap,convbert,flava,ibert,idefics,imagegpt,instructblip,instructblipvideo,layoutlmv3,lightglue,lilt,longformer,longt5,luke,megatron_bert,mpnet,mra,nystromformer,perceiver,pix2struct,pop2piano,rembert,roformer,splinter,squeezebert,superglue,switch_transformers,tapas,tvp,udop,umt5,vilt,visual_bert |
|
This comment contains models: ["models/align", "models/altclip", "models/big_bird", "models/blip", "models/blip_2", "models/bridgetower", "models/bros", "models/canine", "models/chinese_clip", "models/clap", "models/convbert", "models/flava", "models/ibert", "models/idefics", "models/imagegpt", "models/instructblip", "models/instructblipvideo", "models/layoutlmv3", "models/lightglue", "models/lilt", "models/longformer", "models/longt5", "models/luke", "models/megatron_bert", "models/mpnet", "models/mra", "models/nystromformer", "models/perceiver", "models/pix2struct", "models/pop2piano", "models/rembert", "models/roformer", "models/splinter", "models/squeezebert", "models/superglue", "models/switch_transformers", "models/tapas", "models/tvp", "models/udop", "models/umt5", "models/vilt", "models/visual_bert"] |
CI ResultsCommit Info
Model CI Report❌ 39 new failed tests from this PR 😭
|
|
run-slow: align,altclip,big_bird,blip,blip_2,bridgetower,bros,canine,chinese_clip,clap,convbert,flava,ibert,idefics,imagegpt,instructblip,instructblipvideo,layoutlmv3,lightglue,lilt,longformer,longt5,luke,megatron_bert,mpnet,mra,nystromformer,perceiver,pix2struct,pop2piano,rembert,roformer,splinter,squeezebert,superglue,switch_transformers,tapas,tvp,udop,umt5,vilt,visual_bert |
|
This comment contains models: ["models/align", "models/altclip", "models/big_bird", "models/blip", "models/blip_2", "models/bridgetower", "models/bros", "models/canine", "models/chinese_clip", "models/clap", "models/convbert", "models/flava", "models/ibert", "models/idefics", "models/imagegpt", "models/instructblip", "models/instructblipvideo", "models/layoutlmv3", "models/lightglue", "models/lilt", "models/longformer", "models/longt5", "models/luke", "models/megatron_bert", "models/mpnet", "models/mra", "models/nystromformer", "models/perceiver", "models/pix2struct", "models/pop2piano", "models/rembert", "models/roformer", "models/splinter", "models/squeezebert", "models/superglue", "models/switch_transformers", "models/tapas", "models/tvp", "models/udop", "models/umt5", "models/vilt", "models/visual_bert"] |
Attn] More old mask APIsAttn] Remove all old mask APIs from modeling
|
|
||
| extend_text_masks = create_bidirectional_mask( | ||
| config=self.config, | ||
| input_embeds=text_embeds[:, 0:1, :], # weird case where the mask always wants q_len == 1 |
There was a problem hiding this comment.
happens a few times, I guess they just like to broadcast internally instead
There was a problem hiding this comment.
If it's only for broadcasting, IMO we should drop this and use full length as always - resulting mask should not change
There was a problem hiding this comment.
This is tied to the old API
It always broadcasted along q no matter what and it enabled bad practices imo because the downstream emebedding with the correct shape is probably created at a later time. I don't think it's worth to rewrite the model just for that 😅
| # Cuts off back to 2D | ||
| extended_attention_mask = extended_attention_mask[:, 0, 0, :] | ||
| # Bug in the old mask API converted global masks (==2) to max dtype | ||
| extended_attention_mask[attention_mask == 2] = torch.finfo(embedding_output.dtype).max |
There was a problem hiding this comment.
buggy implementation, followed old behavior here
There was a problem hiding this comment.
Humm, I don't see where the behavior was bugged here before based on the diffs?
There was a problem hiding this comment.
You can only see that it is bugged if you actual look into the mask values (either 0,1, or 2) and what get_extended_attention_mask does
transformers/src/transformers/modeling_utils.py
Line 1010 in b75feb2
Based on that line if you have a value of 2, you get -1 * -(max dtype) == max dtype
There was a problem hiding this comment.
Not sure why we would have 2s inside the mask, but trusting you!
Cyrilvallez
left a comment
There was a problem hiding this comment.
God's work 🙏 Thanks a lot for that, will simplify our lives so much in the future 🤗
I answered directly on most of your comments. but recommenting here to be sure:
- The
def _create_attention_maskswith only 2if/elseIMO we should remove and put them directly in the forward to ease readability - For
bridgetower(and others you mention) where we broadcast the q_len, we should be able to not slice and use usual logic without slicing then broadcasting no? Would be easier - Not sure about
longformer-> was it really a bug before? I don't see any issues about 2s in the mask from diffs
| logger.warning_once( | ||
| "Detected the usage of `get_extended_attention_mask`: This function is deprecated and will be removed in v5.12.0. " | ||
| "Please use the new API in `transformers.masking_utils`" | ||
| ) |
There was a problem hiding this comment.
Can't wait to remove them 🙏
|
[For maintainers] Suggested jobs to run (before merge) run-slow: align, big_bird, blip, blip_2, bridgetower, bros, canine, clap, convbert, flava, git, ibert, idefics, imagegpt, instructblip, instructblipvideo |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43924&sha=a0b6c8 |
|
@Cyrilvallez I think I answered everything:
|
|
ALright, thanks for the added explanations! All good to me! |
* first round * round 2 * round 3 * style * post merge fixes * fix blip 2 * fix lightglue * fix idefics --> uses 3D mask, hence manual creation and no mask API * add deprecation message * move out of separate fns * new deprecation cycle for the naming
* first round * round 2 * round 3 * style * post merge fixes * fix blip 2 * fix lightglue * fix idefics --> uses 3D mask, hence manual creation and no mask API * add deprecation message * move out of separate fns * new deprecation cycle for the naming
As per title, removes the last remains of old mask API usage which means that now everything relies on the new mask API (except for a few exceptions that use 3D masks like idefics)
Adding a deprecation cylce to remove it from modeling utils