HunYuan opensource#39606
Conversation
Squash merge branch 'ready_for_upstream' into 'main' * fix configuration type&docstring * fix style
Squash merge branch 'ready_for_upstream' into 'main' * add doc * fix testcode * fix configuration type&docstring
319d9a1 to
70711e5
Compare
|
cc @ArthurZucker for new text models, but let me know if you want me or someone else to take the initial review! |
Thank you for your attention and response. As this is my first submission, I'm not entirely certain which experts I should approach for code review. Would you be able to offer some suggestions or guidance on this matter? |
f244201 to
aed87b5
Compare
fix usable_length API
* update * fix format * update * revert makefile
ArthurZucker
left a comment
There was a problem hiding this comment.
Hey! the main point of guidance is to properly isolate the differences in architecture with your model and other models supported in the library! This way you can use inheritance to write the modeling code with: https://huggingface.co/docs/transformers/en/modular_transformers !
🤗
|
Otherwise very cool to see this contribution! 🤗 |
ArthurZucker
left a comment
There was a problem hiding this comment.
Last nits on the MoE, it's pretty standard now so it is important to isolate what you are adding / what is new! From what I can see it might be the token capacity ?
If not then let's just use what we already have standardize, know passes compile / potentially TP etc!
|
|
||
|
|
||
| def topkgating(logits: Tensor, topk: int): | ||
| if topk == 1: |
There was a problem hiding this comment.
isn't top_k always 1?
| """Implements Top1Gating on logits.""" | ||
| # everything is in fp32 in this function | ||
| logits = logits.float() | ||
| gates = F.softmax(logits, dim=1) |
There was a problem hiding this comment.
any reason not to use gpt_oss or mixtral gating? should be absolutely equivalent + its now fairly standard!
There was a problem hiding this comment.
We also have other implementations with token capacity as well!
There was a problem hiding this comment.
Hello!We've adopted a more community-standard MoE implementation by referencing other open-source models. This solution delivers identical performance to the original one, and we hope it can effectively resolve the current issue.
| chunks = dispatched_input.chunk(self.num_experts, dim=0) | ||
| expert_outputs = [] | ||
| for chunk, expert in zip(chunks, self.experts): | ||
| expert_outputs.append(expert(chunk)) | ||
|
|
||
| expert_output = torch.cat(expert_outputs, dim=0) | ||
| # combined_output = torch.einsum("sec,ecm->sm", combine_weights.type_as(hidden_states), expert_output) | ||
| combine_exp = combine_weights.type_as(hidden_states).unsqueeze(3) # (s, e, c, 1) | ||
| expert_exp = expert_output.unsqueeze(0) # (1, e, c, m) | ||
| combined_output = (combine_exp * expert_exp).sum(dim=(1, 2)) # (s, m) |
There was a problem hiding this comment.
well, this is rigurously equivalent to the approach we have in llama4 with scattering that uses the dispatch index, I don't think there is a difference with llama4 here no? let's standardize please
There was a problem hiding this comment.
Hello!We've adopted a more community-standard MoE implementation by referencing other open-source models. This solution delivers identical performance to the original one, and we hope it can effectively resolve the current issue.
|
run-slow: auto, hunyuan_v1_dense, hunyuan_v1_moe |
|
This comment contains run-slow, running the specified jobs: models: ['models/auto', 'models/hunyuan_v1_dense', 'models/hunyuan_v1_moe'] |
|
run-slow: auto, hunyuan_v1_dense, hunyuan_v1_moe |
There was a problem hiding this comment.
Do you know why you had to do this ? Happy to help fix
There was a problem hiding this comment.
Since the tokenizer part is not intended to be included in the open-source code, the approach of using trust_remote_code is adopted.
There was a problem hiding this comment.
I can help you convert it no?
There was a problem hiding this comment.
There was a problem hiding this comment.
Using this will allow no relying on remote code!
There was a problem hiding this comment.
This is a history problem. The script you provided has already been used in our dense series models, but the MoE model was open-sourced before that. As a result, the tiktoken-tokenizer file is placed in the model file directory, so user can use the trust_remote_code approach for compatibility.
There was a problem hiding this comment.
Sorry I am not sure I understand, but transformers users (as well as vllm) expect AutoTokenizer to work without having to add trust_remote_code=True when the model is merged with transformers
There was a problem hiding this comment.
OK,I know what you meaning, we remove this test case until our model ready for fast tokenizer
There was a problem hiding this comment.
We have modified the tokenizer in the model card by removing the trust_remote_code dependency and re-added the corresponding tests.
fix moe & gate
* add norm_topk_prob
* fix&skip test
* skip testcase
|
run-slow: auto, hunyuan_v1_dense, hunyuan_v1_moe |
|
This comment contains run-slow, running the specified jobs: models: ['models/auto', 'models/hunyuan_v1_dense', 'models/hunyuan_v1_moe'] |
ArthurZucker
left a comment
There was a problem hiding this comment.
Kudos! My last comment in on norm_topk_prob if it is always true, let's hardcode to remove codepathes!
| bsz, seq_len, hidden_size = hidden_states.shape | ||
| self.shared_mlp = HunYuanMoEV1MLP(config, layer_idx=layer_idx, is_shared_mlp=True) | ||
|
|
||
| def forward(self, hidden_states: torch.Tensor) -> torch.Tensor: |
There was a problem hiding this comment.
a lot better thanks! 🚀
* hardcode norm_topk_prob * fix testcase
already fix done~ |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, hunyuan_v1_dense, hunyuan_v1_moe |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |

What does this PR do?
Fixes # (issue)
This PR primarily aims to add support for the Hunyuan series of models in inference. We noticed that previous Hunyuan models relied on trust_remote_code for inference, which makes version maintenance difficult and often leads to outdated inference code. To address this, we are integrating the inference code into the Transformers library to support continuous updates for future open-source releases.
The submitted code includes the inference implementations for both hunyuan_v1_dense and hunyuan_v1_moe, along with their corresponding configurations and tokenizers.
For unit testing, we added a single-sample test for the hunyuan_v1_moe model using tencent/Hunyuan-A13B-Instruct. Unfortunately, the hunyuan_v1_dense model is not yet officially open-sourced, so we currently lack a testable model for it,we will update upon model release.
This is my first PR submission. After carefully studying the Contribute to 🤗 Transformers guide, I've modified my code to pass all make fixup checks.
I'd greatly appreciate any feedback if additional changes or improvements are needed - please don't hesitate to point them out!
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.