(Part 2) feat: allow for tp_size attr for tplizing the model (#37054)

* feat: custom tp_size, new transformers tp interface Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: review cmt - error when tp_plan not set for tp_size Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: nit in docs Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
2025-04-10 21:14:09 +05:30
parent dac443414e
commit 7d76876498
7 changed files with 27 additions and 120 deletions
--- a/docs/source/en/trainer.md
+++ b/docs/source/en/trainer.md
@@ -341,29 +341,9 @@ use_cpu: false
 ```

 </hfoption>
-<hfoption id="Tensor parallelism with PyTorch 2">
-
-```yaml
-compute_environment: LOCAL_MACHINE
-tp_config:
-  tp_size: 4
-distributed_type: TP
-downcast_bf16: 'no'
-machine_rank: 0
-main_training_function: main
-mixed_precision: 'no'
-num_machines: 1
-num_processes: 4
-rdzv_backend: static
-same_network: true
-tpu_env: []
-tpu_use_cluster: false
-tpu_use_sudo: false
-use_cpu: false
-```
-
 </hfoptions>

+
 Run [accelerate_launch](https://hf.co/docs/accelerate/package_reference/cli#accelerate-launch) to start training with the configurations set in `config_file.yaml`. This file is saved to the Accelerate cache folder and automatically loaded when you run `accelerate_launch`.

 The example below launches the [run_glue.py](../../../examples/pytorch/text-classification/run_glue) script with the FSDP configuration shown earlier. Parameters from the `config_file.yaml` file can also be directly set in the command line.