consolidate mbridge distillation: merge distill_hf.py into distill.py#1220
consolidate mbridge distillation: merge distill_hf.py into distill.py#1220j-rausch wants to merge 4 commits intofeature/puzzletronfrom
Conversation
…still.py Signed-off-by: jrausch <jrausch@nvidia.com> Signed-off-by: root <root@pool0-00848.cm.cluster>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughConsolidates Megatron-Bridge distillation into Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/megatron_bridge/distill.py`:
- Line 304: The shutil.copy call copying config.json from args.student_hf_path
will fail for remote HuggingFace model IDs; update the logic in distill.py
(around the shutil.copy usage) to detect if args.student_hf_path is a remote
model ID and in that case use the HuggingFace API (e.g., hf_hub_download or
transformers.AutoConfig.from_pretrained / AutoConfig.save_pretrained) to fetch
the config and write it to args.hf_export_path/config.json, otherwise keep the
local shutil.copy behavior; reference args.student_hf_path and
args.hf_export_path so the code handles both local paths and remote model IDs.
In `@tests/_test_utils/torch/puzzletron/utils.py`:
- Line 89: The hardcoded trust_remote_code=True must be made
caller-configurable: add a boolean parameter trust_remote_code: bool = False to
the function that loads the HF model (the function around lines 66-72 in
tests/_test_utils/torch/puzzletron/utils.py), then pass that parameter into
AutoConfig.from_pretrained(...) and any other pretrained loaders (e.g., the call
at line ~152) instead of the hardcoded True; ensure the new parameter defaults
to False and is threaded through to all transformer loading calls
(AutoConfig.from_pretrained and tokenizer/model loading sites) so callers can
opt in when necessary.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5e8fb78d-7646-49ef-ad4c-8b49d491c83b
📒 Files selected for processing (8)
examples/megatron_bridge/README.mdexamples/megatron_bridge/distill.pyexamples/puzzletron/README.mdexamples/puzzletron/mbridge_distillation/README.mdexamples/puzzletron/mbridge_distillation/distill_hf.pytests/_test_utils/torch/puzzletron/utils.pytests/examples/megatron_bridge/test_distill.pytests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (3)
- examples/puzzletron/mbridge_distillation/README.md
- tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
- examples/puzzletron/mbridge_distillation/distill_hf.py
| For more details, you can refer to the checkpoint conversion scripts in the [Megatron-Bridge README](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/conversion). | ||
| For more details, see the [Megatron-Bridge conversion README](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/conversion). | ||
|
|
||
| > **Known limitation:** HF export does not yet work for Puzzletron AnyModel (heterogeneous) checkpoints -- Megatron-Bridge cannot reload heterogeneous configs from saved checkpoints. Standard models export correctly with both methods. |
There was a problem hiding this comment.
Did you test this in nemo:26.02.00 or nemo:26.02.01? It was fixed in 26.02.01. Please give that a try again
|
|
||
| > **Known limitation:** HF export does not yet work for Puzzletron AnyModel (heterogeneous) checkpoints -- Megatron-Bridge cannot reload heterogeneous configs from saved checkpoints. Standard models export correctly with both methods. | ||
|
|
||
| ### Distillation Results |
There was a problem hiding this comment.
Can you create results/puzzletron.md file and move the results there and add a reference to it here? I plan to add minitron distillation results also so this way we can keep this doc clean.
There was a problem hiding this comment.
Alternatively we can keep in examples/puzzletron/README.md also since thats where actual pruning is happening. Either is fine
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
4ed9996 to
8c2fa10
Compare
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## feature/puzzletron #1220 +/- ##
=======================================================
- Coverage 75.34% 61.40% -13.94%
=======================================================
Files 466 462 -4
Lines 48495 48171 -324
=======================================================
- Hits 36539 29580 -6959
- Misses 11956 18591 +6635
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/ok to test 8c2fa10 |
2669a26 to
977d60a
Compare
…tion Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
3c0fa5a to
0c2a2ee
Compare
Summary
examples/puzzletron/mbridge_distillation/distill_hf.py(AnyModel-specific) intoexamples/megatron_bridge/distill.py(general)--hf_export_path/--student_hf_modelargs for inline HF export after distillation.tests/examples/megatron_bridge/test_distill.pyvocab_size=128(instead of default 102) for TP divisibility including 8.megatron_bridge/README.mdLimitation discovered during consolidation:
HF export via
--hf_export_pathseems to currently not work for Puzzletron AnyModel (heterogeneous) checkpoints. Megatron-Bridge'sexport_ckptcannot reload heterogeneous model configs from saved checkpoints (heterogeneous_layers_config_encoded_jsonisNoneduring__post_init__inheterogeneous_config.py). This affects both inline--hf_export_pathand the separateconvert_checkpoints.py exportscript.The original
distill_hf.pyREADME documented this as supported, but I think it might have been broken there too (on the side of Megatron-Bridge). The consolidated README now documents this as a known limitation. HF export for standard models works fine via both methods.Summary by CodeRabbit
New Features
Documentation
Tests
Chores