From a05ce550bf1cda8ae3bffcc62607597770460a42 Mon Sep 17 00:00:00 2001 From: Fanli Lin Date: Fri, 13 Sep 2024 01:16:12 +0800 Subject: [PATCH] [docs] refine the doc for `train with a script` (#33423) * add xpu note * add one more case * add more * Update docs/source/en/run_scripts.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/run_scripts.md | 8 ++++---- docs/source/en/tasks/summarization.md | 2 +- docs/source/en/tasks/translation.md | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/source/en/run_scripts.md b/docs/source/en/run_scripts.md index f602cde409..b7a8955919 100644 --- a/docs/source/en/run_scripts.md +++ b/docs/source/en/run_scripts.md @@ -126,7 +126,7 @@ python examples/tensorflow/summarization/run_summarization.py \ The [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) supports distributed training and mixed precision, which means you can also use it in a script. To enable both of these features: -- Add the `fp16` argument to enable mixed precision. +- Add the `fp16` or `bf16` argument to enable mixed precision. XPU devices only supports `bf16` for mixed precision training. - Set the number of GPUs to use with the `nproc_per_node` argument. ```bash @@ -287,7 +287,7 @@ Another helpful option to enable is resuming training from a previous checkpoint The first method uses the `output_dir previous_output_dir` argument to resume training from the latest checkpoint stored in `output_dir`. In this case, you should remove `overwrite_output_dir`: ```bash -python examples/pytorch/summarization/run_summarization.py +python examples/pytorch/summarization/run_summarization.py \ --model_name_or_path google-t5/t5-small \ --do_train \ --do_eval \ @@ -304,7 +304,7 @@ python examples/pytorch/summarization/run_summarization.py The second method uses the `resume_from_checkpoint path_to_specific_checkpoint` argument to resume training from a specific checkpoint folder. ```bash -python examples/pytorch/summarization/run_summarization.py +python examples/pytorch/summarization/run_summarization.py \ --model_name_or_path google-t5/t5-small \ --do_train \ --do_eval \ @@ -334,7 +334,7 @@ To give your repository a specific name, use the `push_to_hub_model_id` argument The following example shows how to upload a model with a specific repository name: ```bash -python examples/pytorch/summarization/run_summarization.py +python examples/pytorch/summarization/run_summarization.py \ --model_name_or_path google-t5/t5-small \ --do_train \ --do_eval \ diff --git a/docs/source/en/tasks/summarization.md b/docs/source/en/tasks/summarization.md index 76e750ed3b..b79415996c 100644 --- a/docs/source/en/tasks/summarization.md +++ b/docs/source/en/tasks/summarization.md @@ -205,7 +205,7 @@ At this point, only three steps remain: ... save_total_limit=3, ... num_train_epochs=4, ... predict_with_generate=True, -... fp16=True, +... fp16=True, #change to bf16=True for XPU ... push_to_hub=True, ... ) diff --git a/docs/source/en/tasks/translation.md b/docs/source/en/tasks/translation.md index 1028e4c6cf..a4b544fe68 100644 --- a/docs/source/en/tasks/translation.md +++ b/docs/source/en/tasks/translation.md @@ -212,7 +212,7 @@ At this point, only three steps remain: ... save_total_limit=3, ... num_train_epochs=2, ... predict_with_generate=True, -... fp16=True, +... fp16=True, #change to bf16=True for XPU ... push_to_hub=True, ... )