Skip to content

Document tensor parallelism configuration with Trainer#42876

Open
Likhita-17 wants to merge 1 commit into
huggingface:mainfrom
Likhita-17:patch-1
Open

Document tensor parallelism configuration with Trainer#42876
Likhita-17 wants to merge 1 commit into
huggingface:mainfrom
Likhita-17:patch-1

Conversation

@Likhita-17

Copy link
Copy Markdown

This PR adds a concise documentation section explaining how to configure tensor parallelism (TP) when training with Trainer.

It clarifies that TP must be managed by Accelerate rather than during model loading, and explains why device_map="auto" should not be used with distributed training. A minimal, runnable TP-only Trainer example is included.

Fixes #41141

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

Who can review?

@stevhliu

Added a section on configuring tensor parallelism with Trainer, including an example script for TP-only training.

@stevhliu stevhliu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

  • i would put this under the Trainer optimization section here
  • add a link to tensor parallelism docs
  • i think it's possible to just set tp_plan="auto" in from_pretrained without explicitly using ParallelismConfig. i believe Trainer creates this for you here right @SunMarc? 🙏

@SunMarc

SunMarc commented Jan 8, 2026

Copy link
Copy Markdown
Member

i think it's possible to just set tp_plan="auto" in from_pretrained without explicitly using ParallelismConfig. i believe Trainer creates this for you here right @SunMarc? 🙏

Yeah you can pass tp_plan="auto" which will shard the model on the gpus that you have and Trainer will create the correct ParallelismConfig. You can also pass a device_mesh created from ParallelismConfig when initializing the model. As for device_map="auto", if you run it with torchrun it will also enable tp when loading the model.

@KrishnaKarunya-2K6

Copy link
Copy Markdown

Thanks very much for the review and constructive feedback!

I’m happy to make the suggested adjustments:

  • Move the TP configuration section under the existing Trainer optimization area as recommended.
  • Add a link to the tensor parallelism docs for additional context.
  • Clarify that in many cases, users can simply pass tp_plan="auto" in from_pretrained() and Trainer will create the correct parallelism configuration for TP-only training.

If it’s helpful, I can update the doc text to reflect these points — just let me know if there’s a preferred location/heading for the Trainer optimization section.

Thanks again for guiding this consolidated effort — I’m glad the content has come together nicely and addresses the original issue (#41141)!

@stevhliu

stevhliu commented Jan 9, 2026

Copy link
Copy Markdown
Member

sounds good, you can add a "tensor parallelism" section after this

@KrishnaKarunya-2K6

Copy link
Copy Markdown

Thanks for the guidance! I’ve added the Tensor Parallelism section immediately after the torchcompile heading. Please let me know if you'd like any refinements or additional detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Need a concise example of Tensor Parallelism (TP) training using Trainer/SFTTrainer.

4 participants