Fix model parallel issue for altclip model and ChineseClip model#45487
Conversation
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
|
I need to allocate more time on this PR, please be patient 🙏 |
|
@vasqu Hi, can you help review this PR? There are a lot of file changes here as I need to resolve modular conversion check issue. |
vasqu
left a comment
There was a problem hiding this comment.
Overall looks fine, just one question re vision attention as no split module --> would be nice if we had a layer instead or similar
| base_model_prefix = "altclip" | ||
| input_modalities = ("image", "text") | ||
| _no_split_modules = ["AltCLIPTextEmbeddings", "AltCLIPEncoderLayer", "AltCLIPVisionEmbeddings"] | ||
| _no_split_modules = ["AltRobertaEmbeddings", "AltRobertaLayer", "AltCLIPEncoderLayer", "AltCLIPVisionEmbeddings"] |
There was a problem hiding this comment.
Oh wow, text embeddings doesnt even exist. Probably missed durig a refactor
| "ChineseCLIPVisionEmbeddings", | ||
| "ChineseCLIPTextEmbeddings", | ||
| "ChineseCLIPTextLayer", | ||
| "ChineseCLIPVisionAttention", |
There was a problem hiding this comment.
Why vision attention? Do we maybe have vision layer instead?
There was a problem hiding this comment.
Yes, have changed to ChineseCLIPVisionLayer
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
|
[For maintainers] Suggested jobs to run (before merge) run-slow: altclip, bridgetower, camembert, chinese_clip, clap, data2vec, roberta, roberta_prelayernorm, xlm_roberta, xmod |
vasqu
left a comment
There was a problem hiding this comment.
LGTM, sanity checking now
|
run-slow: altclip, bridgetower, camembert, chinese_clip, clap, data2vec, roberta, roberta_prelayernorm, xlm_roberta, xmod |
|
This comment contains models: ["models/altclip", "models/bridgetower", "models/camembert", "models/chinese_clip", "models/clap", "models/data2vec", "models/roberta", "models/roberta_prelayernorm", "models/xlm_roberta", "models/xmod"] |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…gingface#45487) * fix model parallel device mismatch issue for altclip model Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * fix model parallel issue for ChineseClip model Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * refine code Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
…gingface#45487) * fix model parallel device mismatch issue for altclip model Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * fix model parallel issue for ChineseClip model Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * refine code Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
…gingface#45487) * fix model parallel device mismatch issue for altclip model Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * fix model parallel issue for ChineseClip model Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * refine code Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
This PR tries to fix bug for model parallel test cases for altclip model and ChineseClip model:
pls help review, @ydshieh , thx!