Update the README of the text classification example (#9237)
* Update the README of the text classification example * Update examples/README.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Adapt comment from review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
This commit is contained in:
@@ -69,6 +69,43 @@ Coming soon!
|
||||
**Coming soon!**
|
||||
-->
|
||||
|
||||
## Distributed training and mixed precision
|
||||
|
||||
All the PyTorch scripts mentioned above work out of the box with distributed training and mixed precision, thanks to
|
||||
the [Trainer API](https://huggingface.co/transformers/main_classes/trainer.html). To launch one of them on _n_ GPUS,
|
||||
use the following command:
|
||||
|
||||
```bash
|
||||
python -m torch.distributed.launch \
|
||||
--nproc_per_node number_of_gpu_you_have path_to_script.py \
|
||||
--all_arguments_of_the_script
|
||||
```
|
||||
|
||||
As an example, here is how you would fine-tune the BERT large model (with whole word masking) on the text
|
||||
classification MNLI task using the `run_glue` script, with 8 GPUs:
|
||||
|
||||
```bash
|
||||
python -m torch.distributed.launch \
|
||||
--nproc_per_node 8 text-classification/run_glue.py \
|
||||
--model_name_or_path bert-large-uncased-whole-word-masking \
|
||||
--task_name mnli \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--max_seq_length 128 \
|
||||
--per_device_train_batch_size 8 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir /tmp/mnli_output/
|
||||
```
|
||||
|
||||
If you have a GPU with mixed precision capabilities (architecture Pascal or more recent), you can use mixed precision
|
||||
training with PyTorch 1.6.0 or latest, or by installing the [Apex](https://github.com/NVIDIA/apex) library for previous
|
||||
versions. Just add the flag `--fp16` to your command launching one of the scripts mentioned above!
|
||||
|
||||
Using mixed precision training usually results in 2x-speedup for training with the same final results (as shown in
|
||||
[this table](https://github.com/huggingface/transformers/tree/master/examples/text-classification#mixed-precision-training)
|
||||
for text classification).
|
||||
|
||||
## Running on TPUs
|
||||
|
||||
When using Tensorflow, TPUs are supported out of the box as a `tf.distribute.Strategy`.
|
||||
@@ -76,27 +113,34 @@ When using Tensorflow, TPUs are supported out of the box as a `tf.distribute.Str
|
||||
When using PyTorch, we support TPUs thanks to `pytorch/xla`. For more context and information on how to setup your TPU environment refer to Google's documentation and to the
|
||||
very detailed [pytorch/xla README](https://github.com/pytorch/xla/blob/master/README.md).
|
||||
|
||||
In this repo, we provide a very simple launcher script named [xla_spawn.py](https://github.com/huggingface/transformers/tree/master/examples/xla_spawn.py) that lets you run our example scripts on multiple TPU cores without any boilerplate.
|
||||
Just pass a `--num_cores` flag to this script, then your regular training script with its arguments (this is similar to the `torch.distributed.launch` helper for torch.distributed).
|
||||
Note that this approach does not work for examples that use `pytorch-lightning`.
|
||||
|
||||
For example for `run_glue`:
|
||||
In this repo, we provide a very simple launcher script named
|
||||
[xla_spawn.py](https://github.com/huggingface/transformers/tree/master/examples/xla_spawn.py) that lets you run our
|
||||
example scripts on multiple TPU cores without any boilerplate. Just pass a `--num_cores` flag to this script, then your
|
||||
regular training script with its arguments (this is similar to the `torch.distributed.launch` helper for
|
||||
`torch.distributed`):
|
||||
|
||||
```bash
|
||||
python examples/xla_spawn.py --num_cores 8 \
|
||||
examples/text-classification/run_glue.py \
|
||||
--model_name_or_path bert-base-cased \
|
||||
--task_name mnli \
|
||||
--data_dir ./data/glue_data/MNLI \
|
||||
--output_dir ./models/tpu \
|
||||
--overwrite_output_dir \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--num_train_epochs 1 \
|
||||
--save_steps 20000
|
||||
python xla_spawn.py --num_cores num_tpu_you_have \
|
||||
path_to_script.py \
|
||||
--all_arguments_of_the_script
|
||||
```
|
||||
|
||||
Feedback and more use cases and benchmarks involving TPUs are welcome, please share with the community.
|
||||
As an example, here is how you would fine-tune the BERT large model (with whole word masking) on the text
|
||||
classification MNLI task using the `run_glue` script, with 8 TPUs:
|
||||
|
||||
```bash
|
||||
python xla_spawn.py --num_cores 8 \
|
||||
text-classification/run_glue.py \
|
||||
--model_name_or_path bert-large-uncased-whole-word-masking \
|
||||
--task_name mnli \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--max_seq_length 128 \
|
||||
--per_device_train_batch_size 8 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir /tmp/mnli_output/
|
||||
```
|
||||
|
||||
## Logging & Experiment tracking
|
||||
|
||||
|
||||
Reference in New Issue
Block a user