Skip to content

HunYuan opensource#39606

Merged
ArthurZucker merged 46 commits into
huggingface:mainfrom
yjc9696:hunyuan_opensource
Aug 22, 2025
Merged

HunYuan opensource#39606
ArthurZucker merged 46 commits into
huggingface:mainfrom
yjc9696:hunyuan_opensource

Conversation

@yjc9696

@yjc9696 yjc9696 commented Jul 23, 2025

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes # (issue)
This PR primarily aims to add support for the Hunyuan series of models in inference. We noticed that previous Hunyuan models relied on trust_remote_code for inference, which makes version maintenance difficult and often leads to outdated inference code. To address this, we are integrating the inference code into the Transformers library to support continuous updates for future open-source releases.

The submitted code includes the inference implementations for both hunyuan_v1_dense and hunyuan_v1_moe, along with their corresponding configurations and tokenizers.

For unit testing, we added a single-sample test for the hunyuan_v1_moe model using tencent/Hunyuan-A13B-Instruct. Unfortunately, the hunyuan_v1_dense model is not yet officially open-sourced, so we currently lack a testable model for it,we will update upon model release.
This is my first PR submission. After carefully studying the Contribute to 🤗 Transformers guide, I've modified my code to pass all make fixup checks.

I'd greatly appreciate any feedback if additional changes or improvements are needed - please don't hesitate to point them out!

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yjc9696 yjc9696 changed the title Hunyuan opensource HunYuan opensource Jul 23, 2025
@yjc9696

yjc9696 commented Jul 24, 2025

Copy link
Copy Markdown
Contributor Author
image how can I solve this problem?

pridejcyang and others added 8 commits July 24, 2025 17:17
Squash merge branch 'ready_for_upstream' into 'main'

* fix configuration type&docstring
* fix style
Squash merge branch 'ready_for_upstream' into 'main'
* add doc
* fix testcode
* fix configuration type&docstring
@yjc9696 yjc9696 force-pushed the hunyuan_opensource branch from 319d9a1 to 70711e5 Compare July 24, 2025 09:17
@Rocketknight1

Copy link
Copy Markdown
Member

cc @ArthurZucker for new text models, but let me know if you want me or someone else to take the initial review!

@yjc9696

yjc9696 commented Jul 24, 2025

Copy link
Copy Markdown
Contributor Author

cc @ArthurZucker for new text models, but let me know if you want me or someone else to take the initial review!

Thank you for your attention and response. As this is my first submission, I'm not entirely certain which experts I should approach for code review. Would you be able to offer some suggestions or guidance on this matter?

@yjc9696 yjc9696 force-pushed the hunyuan_opensource branch from f244201 to aed87b5 Compare July 25, 2025 09:55

@ArthurZucker ArthurZucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! the main point of guidance is to properly isolate the differences in architecture with your model and other models supported in the library! This way you can use inheritance to write the modeling code with: https://huggingface.co/docs/transformers/en/modular_transformers !
🤗

@ArthurZucker

Copy link
Copy Markdown
Collaborator

Otherwise very cool to see this contribution! 🤗

@ArthurZucker ArthurZucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nits on the MoE, it's pretty standard now so it is important to isolate what you are adding / what is new! From what I can see it might be the token capacity ?
If not then let's just use what we already have standardize, know passes compile / potentially TP etc!



def topkgating(logits: Tensor, topk: int):
if topk == 1:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't top_k always 1?

"""Implements Top1Gating on logits."""
# everything is in fp32 in this function
logits = logits.float()
gates = F.softmax(logits, dim=1)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to use gpt_oss or mixtral gating? should be absolutely equivalent + its now fairly standard!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have other implementations with token capacity as well!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello!We've adopted a more community-standard MoE implementation by referencing other open-source models. This solution delivers identical performance to the original one, and we hope it can effectively resolve the current issue.

Comment on lines +263 to +272
chunks = dispatched_input.chunk(self.num_experts, dim=0)
expert_outputs = []
for chunk, expert in zip(chunks, self.experts):
expert_outputs.append(expert(chunk))

expert_output = torch.cat(expert_outputs, dim=0)
# combined_output = torch.einsum("sec,ecm->sm", combine_weights.type_as(hidden_states), expert_output)
combine_exp = combine_weights.type_as(hidden_states).unsqueeze(3) # (s, e, c, 1)
expert_exp = expert_output.unsqueeze(0) # (1, e, c, m)
combined_output = (combine_exp * expert_exp).sum(dim=(1, 2)) # (s, m)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, this is rigurously equivalent to the approach we have in llama4 with scattering that uses the dispatch index, I don't think there is a difference with llama4 here no? let's standardize please

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello!We've adopted a more community-standard MoE implementation by referencing other open-source models. This solution delivers identical performance to the original one, and we hope it can effectively resolve the current issue.

@ArthurZucker

Copy link
Copy Markdown
Collaborator

run-slow: auto, hunyuan_v1_dense, hunyuan_v1_moe

@github-actions

Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/auto', 'models/hunyuan_v1_dense', 'models/hunyuan_v1_moe']
quantizations: [] ...

@yjc9696

yjc9696 commented Aug 19, 2025

Copy link
Copy Markdown
Contributor Author

run-slow: auto, hunyuan_v1_dense, hunyuan_v1_moe

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why you had to do this ? Happy to help fix

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the tokenizer part is not intended to be included in the open-source code, the approach of using trust_remote_code is adopted.

@ArthurZucker ArthurZucker Aug 20, 2025

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can help you convert it no?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this will allow no relying on remote code!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a history problem. The script you provided has already been used in our dense series models, but the MoE model was open-sourced before that. As a result, the tiktoken-tokenizer file is placed in the model file directory, so user can use the trust_remote_code approach for compatibility.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I am not sure I understand, but transformers users (as well as vllm) expect AutoTokenizer to work without having to add trust_remote_code=True when the model is merged with transformers

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,I know what you meaning, we remove this test case until our model ready for fast tokenizer

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have modified the tokenizer in the model card by removing the trust_remote_code dependency and re-added the corresponding tests.

yjc9696 and others added 5 commits August 20, 2025 19:25
fix moe & gate
* add norm_topk_prob
* fix&skip test
* skip testcase
@ArthurZucker

Copy link
Copy Markdown
Collaborator

run-slow: auto, hunyuan_v1_dense, hunyuan_v1_moe

@github-actions

Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/auto', 'models/hunyuan_v1_dense', 'models/hunyuan_v1_moe']
quantizations: [] ...

@ArthurZucker ArthurZucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kudos! My last comment in on norm_topk_prob if it is always true, let's hardcode to remove codepathes!

bsz, seq_len, hidden_size = hidden_states.shape
self.shared_mlp = HunYuanMoEV1MLP(config, layer_idx=layer_idx, is_shared_mlp=True)

def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot better thanks! 🚀

* hardcode norm_topk_prob

* fix testcase
@yjc9696

yjc9696 commented Aug 21, 2025

Copy link
Copy Markdown
Contributor Author

Kudos! My last comment in on norm_topk_prob if it is always true, let's hardcode to remove codepathes!

already fix done~

@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, hunyuan_v1_dense, hunyuan_v1_moe

@ArthurZucker ArthurZucker enabled auto-merge (squash) August 22, 2025 07:51
@ArthurZucker ArthurZucker merged commit cf487cd into huggingface:main Aug 22, 2025
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants