[docs] refine the doc for train with a script (#33423)
* add xpu note * add one more case * add more * Update docs/source/en/run_scripts.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
@@ -126,7 +126,7 @@ python examples/tensorflow/summarization/run_summarization.py \
|
|||||||
|
|
||||||
The [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) supports distributed training and mixed precision, which means you can also use it in a script. To enable both of these features:
|
The [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) supports distributed training and mixed precision, which means you can also use it in a script. To enable both of these features:
|
||||||
|
|
||||||
- Add the `fp16` argument to enable mixed precision.
|
- Add the `fp16` or `bf16` argument to enable mixed precision. XPU devices only supports `bf16` for mixed precision training.
|
||||||
- Set the number of GPUs to use with the `nproc_per_node` argument.
|
- Set the number of GPUs to use with the `nproc_per_node` argument.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -287,7 +287,7 @@ Another helpful option to enable is resuming training from a previous checkpoint
|
|||||||
The first method uses the `output_dir previous_output_dir` argument to resume training from the latest checkpoint stored in `output_dir`. In this case, you should remove `overwrite_output_dir`:
|
The first method uses the `output_dir previous_output_dir` argument to resume training from the latest checkpoint stored in `output_dir`. In this case, you should remove `overwrite_output_dir`:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python examples/pytorch/summarization/run_summarization.py
|
python examples/pytorch/summarization/run_summarization.py \
|
||||||
--model_name_or_path google-t5/t5-small \
|
--model_name_or_path google-t5/t5-small \
|
||||||
--do_train \
|
--do_train \
|
||||||
--do_eval \
|
--do_eval \
|
||||||
@@ -304,7 +304,7 @@ python examples/pytorch/summarization/run_summarization.py
|
|||||||
The second method uses the `resume_from_checkpoint path_to_specific_checkpoint` argument to resume training from a specific checkpoint folder.
|
The second method uses the `resume_from_checkpoint path_to_specific_checkpoint` argument to resume training from a specific checkpoint folder.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python examples/pytorch/summarization/run_summarization.py
|
python examples/pytorch/summarization/run_summarization.py \
|
||||||
--model_name_or_path google-t5/t5-small \
|
--model_name_or_path google-t5/t5-small \
|
||||||
--do_train \
|
--do_train \
|
||||||
--do_eval \
|
--do_eval \
|
||||||
@@ -334,7 +334,7 @@ To give your repository a specific name, use the `push_to_hub_model_id` argument
|
|||||||
The following example shows how to upload a model with a specific repository name:
|
The following example shows how to upload a model with a specific repository name:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python examples/pytorch/summarization/run_summarization.py
|
python examples/pytorch/summarization/run_summarization.py \
|
||||||
--model_name_or_path google-t5/t5-small \
|
--model_name_or_path google-t5/t5-small \
|
||||||
--do_train \
|
--do_train \
|
||||||
--do_eval \
|
--do_eval \
|
||||||
|
|||||||
@@ -205,7 +205,7 @@ At this point, only three steps remain:
|
|||||||
... save_total_limit=3,
|
... save_total_limit=3,
|
||||||
... num_train_epochs=4,
|
... num_train_epochs=4,
|
||||||
... predict_with_generate=True,
|
... predict_with_generate=True,
|
||||||
... fp16=True,
|
... fp16=True, #change to bf16=True for XPU
|
||||||
... push_to_hub=True,
|
... push_to_hub=True,
|
||||||
... )
|
... )
|
||||||
|
|
||||||
|
|||||||
@@ -212,7 +212,7 @@ At this point, only three steps remain:
|
|||||||
... save_total_limit=3,
|
... save_total_limit=3,
|
||||||
... num_train_epochs=2,
|
... num_train_epochs=2,
|
||||||
... predict_with_generate=True,
|
... predict_with_generate=True,
|
||||||
... fp16=True,
|
... fp16=True, #change to bf16=True for XPU
|
||||||
... push_to_hub=True,
|
... push_to_hub=True,
|
||||||
... )
|
... )
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user