[docs] refine the doc for train with a script (#33423)

* add xpu note

* add one more case

* add more

* Update docs/source/en/run_scripts.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
Fanli Lin
2024-09-13 01:16:12 +08:00
committed by GitHub
parent 5c6257d1fc
commit a05ce550bf
3 changed files with 6 additions and 6 deletions

View File

@@ -126,7 +126,7 @@ python examples/tensorflow/summarization/run_summarization.py \
The [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) supports distributed training and mixed precision, which means you can also use it in a script. To enable both of these features: The [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) supports distributed training and mixed precision, which means you can also use it in a script. To enable both of these features:
- Add the `fp16` argument to enable mixed precision. - Add the `fp16` or `bf16` argument to enable mixed precision. XPU devices only supports `bf16` for mixed precision training.
- Set the number of GPUs to use with the `nproc_per_node` argument. - Set the number of GPUs to use with the `nproc_per_node` argument.
```bash ```bash
@@ -287,7 +287,7 @@ Another helpful option to enable is resuming training from a previous checkpoint
The first method uses the `output_dir previous_output_dir` argument to resume training from the latest checkpoint stored in `output_dir`. In this case, you should remove `overwrite_output_dir`: The first method uses the `output_dir previous_output_dir` argument to resume training from the latest checkpoint stored in `output_dir`. In this case, you should remove `overwrite_output_dir`:
```bash ```bash
python examples/pytorch/summarization/run_summarization.py python examples/pytorch/summarization/run_summarization.py \
--model_name_or_path google-t5/t5-small \ --model_name_or_path google-t5/t5-small \
--do_train \ --do_train \
--do_eval \ --do_eval \
@@ -304,7 +304,7 @@ python examples/pytorch/summarization/run_summarization.py
The second method uses the `resume_from_checkpoint path_to_specific_checkpoint` argument to resume training from a specific checkpoint folder. The second method uses the `resume_from_checkpoint path_to_specific_checkpoint` argument to resume training from a specific checkpoint folder.
```bash ```bash
python examples/pytorch/summarization/run_summarization.py python examples/pytorch/summarization/run_summarization.py \
--model_name_or_path google-t5/t5-small \ --model_name_or_path google-t5/t5-small \
--do_train \ --do_train \
--do_eval \ --do_eval \
@@ -334,7 +334,7 @@ To give your repository a specific name, use the `push_to_hub_model_id` argument
The following example shows how to upload a model with a specific repository name: The following example shows how to upload a model with a specific repository name:
```bash ```bash
python examples/pytorch/summarization/run_summarization.py python examples/pytorch/summarization/run_summarization.py \
--model_name_or_path google-t5/t5-small \ --model_name_or_path google-t5/t5-small \
--do_train \ --do_train \
--do_eval \ --do_eval \

View File

@@ -205,7 +205,7 @@ At this point, only three steps remain:
... save_total_limit=3, ... save_total_limit=3,
... num_train_epochs=4, ... num_train_epochs=4,
... predict_with_generate=True, ... predict_with_generate=True,
... fp16=True, ... fp16=True, #change to bf16=True for XPU
... push_to_hub=True, ... push_to_hub=True,
... ) ... )

View File

@@ -212,7 +212,7 @@ At this point, only three steps remain:
... save_total_limit=3, ... save_total_limit=3,
... num_train_epochs=2, ... num_train_epochs=2,
... predict_with_generate=True, ... predict_with_generate=True,
... fp16=True, ... fp16=True, #change to bf16=True for XPU
... push_to_hub=True, ... push_to_hub=True,
... ) ... )