per_device instead of per_gpu/error thrown when argument unknown (#4618)
* per_device instead of per_gpu/error thrown when argument unknown * [docs] Restore examples.md symlink * Correct absolute links so that symlink to the doc works correctly * Update src/transformers/hf_argparser.py Co-authored-by: Julien Chaumond <chaumond@gmail.com> * Warning + reorder * Docs * Style * not for squad Co-authored-by: Julien Chaumond <chaumond@gmail.com>
This commit is contained in:
@@ -16,17 +16,17 @@ This is still a work-in-progress – in particular documentation is still sparse
|
||||
|
||||
| Task | Example datasets | Trainer support | TFTrainer support | pytorch-lightning | Colab
|
||||
|---|---|:---:|:---:|:---:|:---:|
|
||||
| [**`language-modeling`**](./language-modeling) | Raw text | ✅ | - | - | [](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb)
|
||||
| [**`text-classification`**](./text-classification) | GLUE, XNLI | ✅ | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/trainer/01_text_classification.ipynb)
|
||||
| [**`token-classification`**](./token-classification) | CoNLL NER | ✅ | ✅ | ✅ | -
|
||||
| [**`multiple-choice`**](./multiple-choice) | SWAG, RACE, ARC | ✅ | ✅ | - | [](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)
|
||||
| [**`question-answering`**](./question-answering) | SQuAD | - | ✅ | - | -
|
||||
| [**`text-generation`**](./text-generation) | - | - | - | - | [](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)
|
||||
| [**`distillation`**](./distillation) | All | - | - | - | -
|
||||
| [**`summarization`**](./summarization) | CNN/Daily Mail | - | - | - | -
|
||||
| [**`translation`**](./translation) | WMT | - | - | - | -
|
||||
| [**`bertology`**](./bertology) | - | - | - | - | -
|
||||
| [**`adversarial`**](./adversarial) | HANS | - | - | - | -
|
||||
| [**`language-modeling`**](https://github.com/huggingface/transformers/tree/master/examples/language-modeling) | Raw text | ✅ | - | - | [](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb)
|
||||
| [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/text-classification) | GLUE, XNLI | ✅ | ✅ | ✅ | [](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/trainer/01_text_classification.ipynb)
|
||||
| [**`token-classification`**](https://github.com/huggingface/transformers/tree/master/examples/token-classification) | CoNLL NER | ✅ | ✅ | ✅ | -
|
||||
| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/multiple-choice) | SWAG, RACE, ARC | ✅ | ✅ | - | [](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)
|
||||
| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/question-answering) | SQuAD | - | ✅ | - | -
|
||||
| [**`text-generation`**](https://github.com/huggingface/transformers/tree/master/examples/text-generation) | - | - | - | - | [](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)
|
||||
| [**`distillation`**](https://github.com/huggingface/transformers/tree/master/examples/distillation) | All | - | - | - | -
|
||||
| [**`summarization`**](https://github.com/huggingface/transformers/tree/master/examples/summarization) | CNN/Daily Mail | - | - | - | -
|
||||
| [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/translation) | WMT | - | - | - | -
|
||||
| [**`bertology`**](https://github.com/huggingface/transformers/tree/master/examples/bertology) | - | - | - | - | -
|
||||
| [**`adversarial`**](https://github.com/huggingface/transformers/tree/master/examples/adversarial) | HANS | - | - | - | -
|
||||
|
||||
|
||||
<br>
|
||||
@@ -57,7 +57,7 @@ When using Tensorflow, TPUs are supported out of the box as a `tf.distribute.Str
|
||||
When using PyTorch, we support TPUs thanks to `pytorch/xla`. For more context and information on how to setup your TPU environment refer to Google's documentation and to the
|
||||
very detailed [pytorch/xla README](https://github.com/pytorch/xla/blob/master/README.md).
|
||||
|
||||
In this repo, we provide a very simple launcher script named [xla_spawn.py](./xla_spawn.py) that lets you run our example scripts on multiple TPU cores without any boilerplate.
|
||||
In this repo, we provide a very simple launcher script named [xla_spawn.py](https://github.com/huggingface/transformers/tree/master/examples/xla_spawn.py) that lets you run our example scripts on multiple TPU cores without any boilerplate.
|
||||
Just pass a `--num_cores` flag to this script, then your regular training script with its arguments (this is similar to the `torch.distributed.launch` helper for torch.distributed).
|
||||
|
||||
For example for `run_glue`:
|
||||
|
||||
@@ -19,7 +19,7 @@ python ./examples/multiple-choice/run_multiple_choice.py \
|
||||
--max_seq_length 80 \
|
||||
--output_dir models_bert/swag_base \
|
||||
--per_gpu_eval_batch_size=16 \
|
||||
--per_gpu_train_batch_size=16 \
|
||||
--per_device_train_batch_size=16 \
|
||||
--gradient_accumulation_steps 2 \
|
||||
--overwrite_output
|
||||
```
|
||||
@@ -46,7 +46,7 @@ python ./examples/multiple-choice/run_tf_multiple_choice.py \
|
||||
--max_seq_length 80 \
|
||||
--output_dir models_bert/swag_base \
|
||||
--per_gpu_eval_batch_size=16 \
|
||||
--per_gpu_train_batch_size=16 \
|
||||
--per_device_train_batch_size=16 \
|
||||
--logging-dir logs \
|
||||
--gradient_accumulation_steps 2 \
|
||||
--overwrite_output
|
||||
|
||||
@@ -61,8 +61,8 @@ class ExamplesTests(unittest.TestCase):
|
||||
--do_train
|
||||
--do_eval
|
||||
--output_dir ./tests/fixtures/tests_samples/temp_dir
|
||||
--per_gpu_train_batch_size=2
|
||||
--per_gpu_eval_batch_size=1
|
||||
--per_device_train_batch_size=2
|
||||
--per_device_eval_batch_size=1
|
||||
--learning_rate=1e-4
|
||||
--max_steps=10
|
||||
--warmup_steps=2
|
||||
|
||||
@@ -68,7 +68,7 @@ python run_glue.py \
|
||||
--do_eval \
|
||||
--data_dir $GLUE_DIR/$TASK_NAME \
|
||||
--max_seq_length 128 \
|
||||
--per_gpu_train_batch_size 32 \
|
||||
--per_device_train_batch_size 32 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir /tmp/$TASK_NAME/
|
||||
@@ -141,7 +141,7 @@ python run_glue.py \
|
||||
--do_eval \
|
||||
--data_dir $GLUE_DIR/MRPC/ \
|
||||
--max_seq_length 128 \
|
||||
--per_gpu_train_batch_size 32 \
|
||||
--per_device_train_batch_size 32 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir /tmp/mrpc_output/
|
||||
@@ -166,7 +166,7 @@ python run_glue.py \
|
||||
--do_eval \
|
||||
--data_dir $GLUE_DIR/MRPC/ \
|
||||
--max_seq_length 128 \
|
||||
--per_gpu_train_batch_size 32 \
|
||||
--per_device_train_batch_size 32 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir /tmp/mrpc_output/ \
|
||||
@@ -189,7 +189,7 @@ python -m torch.distributed.launch \
|
||||
--do_eval \
|
||||
--data_dir $GLUE_DIR/MRPC/ \
|
||||
--max_seq_length 128 \
|
||||
--per_gpu_train_batch_size 8 \
|
||||
--per_device_train_batch_size 8 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir /tmp/mrpc_output/
|
||||
@@ -221,7 +221,7 @@ python -m torch.distributed.launch \
|
||||
--do_eval \
|
||||
--data_dir $GLUE_DIR/MNLI/ \
|
||||
--max_seq_length 128 \
|
||||
--per_gpu_train_batch_size 8 \
|
||||
--per_device_train_batch_size 8 \
|
||||
--learning_rate 2e-5 \
|
||||
--num_train_epochs 3.0 \
|
||||
--output_dir output_dir \
|
||||
@@ -280,7 +280,7 @@ python run_xnli.py \
|
||||
--do_train \
|
||||
--do_eval \
|
||||
--data_dir $XNLI_DIR \
|
||||
--per_gpu_train_batch_size 32 \
|
||||
--per_device_train_batch_size 32 \
|
||||
--learning_rate 5e-5 \
|
||||
--num_train_epochs 2.0 \
|
||||
--max_seq_length 128 \
|
||||
|
||||
@@ -69,7 +69,7 @@ python3 run_ner.py --data_dir ./ \
|
||||
--output_dir $OUTPUT_DIR \
|
||||
--max_seq_length $MAX_LENGTH \
|
||||
--num_train_epochs $NUM_EPOCHS \
|
||||
--per_gpu_train_batch_size $BATCH_SIZE \
|
||||
--per_device_train_batch_size $BATCH_SIZE \
|
||||
--save_steps $SAVE_STEPS \
|
||||
--seed $SEED \
|
||||
--do_train \
|
||||
@@ -91,7 +91,7 @@ Instead of passing all parameters via commandline arguments, the `run_ner.py` sc
|
||||
"output_dir": "germeval-model",
|
||||
"max_seq_length": 128,
|
||||
"num_train_epochs": 3,
|
||||
"per_gpu_train_batch_size": 32,
|
||||
"per_device_train_batch_size": 32,
|
||||
"save_steps": 750,
|
||||
"seed": 1,
|
||||
"do_train": true,
|
||||
|
||||
Reference in New Issue
Block a user