[examples] document resuming (#10776)

* document resuming in examples

* fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* put trainer code last, adjust notes

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
Stas Bekman
2021-03-17 12:48:35 -07:00
committed by GitHub
parent 85a114ef47
commit 393739194e

View File

@@ -95,6 +95,21 @@ Coming soon!
| [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/seq2seq) | WMT | ✅ | - | - | - | [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/seq2seq) | WMT | ✅ | - | - | -
## Resuming training
You can resume training from a previous checkpoint like this:
1. Pass `--output_dir previous_output_dir` without `--overwrite_output_dir` to resume training from the latest checkpoint in `output_dir` (what you would use if the training was interrupted, for instance).
2. Pass `--model_name_or_path path_to_a_specific_checkpoint` to resume training from that checkpoint folder.
Should you want to turn an example into a notebook where you'd no longer have access to the command
line, 🤗 Trainer supports resuming from a checkpoint via `trainer.train(resume_from_checkpoint)`.
1. If `resume_from_checkpoint` is `True` it will look for the last checkpoint in the value of `output_dir` passed via `TrainingArguments`.
2. If `resume_from_checkpoint` is a path to a specific checkpoint it will use that saved checkpoint folder to resume the training from.
## Distributed training and mixed precision ## Distributed training and mixed precision
All the PyTorch scripts mentioned above work out of the box with distributed training and mixed precision, thanks to All the PyTorch scripts mentioned above work out of the box with distributed training and mixed precision, thanks to
@@ -104,7 +119,7 @@ use the following command:
```bash ```bash
python -m torch.distributed.launch \ python -m torch.distributed.launch \
--nproc_per_node number_of_gpu_you_have path_to_script.py \ --nproc_per_node number_of_gpu_you_have path_to_script.py \
--all_arguments_of_the_script --all_arguments_of_the_script
``` ```
As an example, here is how you would fine-tune the BERT large model (with whole word masking) on the text As an example, here is how you would fine-tune the BERT large model (with whole word masking) on the text
@@ -148,7 +163,7 @@ regular training script with its arguments (this is similar to the `torch.distri
```bash ```bash
python xla_spawn.py --num_cores num_tpu_you_have \ python xla_spawn.py --num_cores num_tpu_you_have \
path_to_script.py \ path_to_script.py \
--all_arguments_of_the_script --all_arguments_of_the_script
``` ```
As an example, here is how you would fine-tune the BERT large model (with whole word masking) on the text As an example, here is how you would fine-tune the BERT large model (with whole word masking) on the text