per_device instead of per_gpu/error thrown when argument unknown (#4618)

* per_device instead of per_gpu/error thrown when argument unknown * [docs] Restore examples.md symlink * Correct absolute links so that symlink to the doc works correctly * Update src/transformers/hf_argparser.py Co-authored-by: Julien Chaumond <chaumond@gmail.com> * Warning + reorder * Docs * Style * not for squad Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-27 11:36:55 -04:00
parent 1381b6d01d
commit 6a17688021
10 changed files with 73 additions and 686 deletions
--- a/examples/README.md
+++ b/examples/README.md
@@ -16,17 +16,17 @@ This is still a work-in-progress – in particular documentation is still sparse

 | Task | Example datasets | Trainer support | TFTrainer support | pytorch-lightning | Colab
 |---|---|:---:|:---:|:---:|:---:|
-| [**`language-modeling`**](./language-modeling)       | Raw text        | ✅ | -  | -  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb)
-| [**`text-classification`**](./text-classification)   | GLUE, XNLI      | ✅ | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/trainer/01_text_classification.ipynb)
-| [**`token-classification`**](./token-classification) | CoNLL NER       | ✅ | ✅ | ✅ | -
-| [**`multiple-choice`**](./multiple-choice)           | SWAG, RACE, ARC | ✅ | ✅ | -  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)
-| [**`question-answering`**](./question-answering)     | SQuAD           | -  | ✅ | -  | -
-| [**`text-generation`**](./text-generation)     | -           | -  | - | -  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)
-| [**`distillation`**](./distillation)       | All               | -  | -  | -  | -
-| [**`summarization`**](./summarization)     | CNN/Daily Mail    | -  | -  | -  | -
-| [**`translation`**](./translation)         | WMT               | -  | -  | -  | -
-| [**`bertology`**](./bertology)             | -                 | -  | -  | -  | -
-| [**`adversarial`**](./adversarial)         | HANS              | -  | -  | -  | -
+| [**`language-modeling`**](https://github.com/huggingface/transformers/tree/master/examples/language-modeling)       | Raw text        | ✅ | -  | -  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb)
+| [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/text-classification)   | GLUE, XNLI      | ✅ | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/trainer/01_text_classification.ipynb)
+| [**`token-classification`**](https://github.com/huggingface/transformers/tree/master/examples/token-classification) | CoNLL NER       | ✅ | ✅ | ✅ | -
+| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/multiple-choice)           | SWAG, RACE, ARC | ✅ | ✅ | -  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)
+| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/question-answering)     | SQuAD           | -  | ✅ | -  | -
+| [**`text-generation`**](https://github.com/huggingface/transformers/tree/master/examples/text-generation)     | -           | -  | - | -  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)
+| [**`distillation`**](https://github.com/huggingface/transformers/tree/master/examples/distillation)       | All               | -  | -  | -  | -
+| [**`summarization`**](https://github.com/huggingface/transformers/tree/master/examples/summarization)     | CNN/Daily Mail    | -  | -  | -  | -
+| [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/translation)         | WMT               | -  | -  | -  | -
+| [**`bertology`**](https://github.com/huggingface/transformers/tree/master/examples/bertology)             | -                 | -  | -  | -  | -
+| [**`adversarial`**](https://github.com/huggingface/transformers/tree/master/examples/adversarial)         | HANS              | -  | -  | -  | -


 <br>
@@ -57,7 +57,7 @@ When using Tensorflow, TPUs are supported out of the box as a `tf.distribute.Str
 When using PyTorch, we support TPUs thanks to `pytorch/xla`. For more context and information on how to setup your TPU environment refer to Google's documentation and to the
 very detailed [pytorch/xla README](https://github.com/pytorch/xla/blob/master/README.md).

-In this repo, we provide a very simple launcher script named [xla_spawn.py](./xla_spawn.py) that lets you run our example scripts on multiple TPU cores without any boilerplate.
+In this repo, we provide a very simple launcher script named [xla_spawn.py](https://github.com/huggingface/transformers/tree/master/examples/xla_spawn.py) that lets you run our example scripts on multiple TPU cores without any boilerplate.
 Just pass a `--num_cores` flag to this script, then your regular training script with its arguments (this is similar to the `torch.distributed.launch` helper for torch.distributed).

 For example for `run_glue`:
--- a/examples/multiple-choice/README.md
+++ b/examples/multiple-choice/README.md
@@ -19,7 +19,7 @@ python ./examples/multiple-choice/run_multiple_choice.py \
 --max_seq_length 80 \
 --output_dir models_bert/swag_base \
 --per_gpu_eval_batch_size=16 \
--per_gpu_train_batch_size=16 \
+--per_device_train_batch_size=16 \
 --gradient_accumulation_steps 2 \
 --overwrite_output
 ```
@@ -46,7 +46,7 @@ python ./examples/multiple-choice/run_tf_multiple_choice.py \
 --max_seq_length 80 \
 --output_dir models_bert/swag_base \
 --per_gpu_eval_batch_size=16 \
--per_gpu_train_batch_size=16 \
+--per_device_train_batch_size=16 \
 --logging-dir logs \
 --gradient_accumulation_steps 2 \
 --overwrite_output
--- a/examples/test_examples.py
+++ b/examples/test_examples.py
@@ -61,8 +61,8 @@ class ExamplesTests(unittest.TestCase):
            --do_train
            --do_eval
            --output_dir ./tests/fixtures/tests_samples/temp_dir
-            --per_gpu_train_batch_size=2
-            --per_gpu_eval_batch_size=1
+            --per_device_train_batch_size=2
+            --per_device_eval_batch_size=1
            --learning_rate=1e-4
            --max_steps=10
            --warmup_steps=2
--- a/examples/text-classification/README.md
+++ b/examples/text-classification/README.md
@@ -68,7 +68,7 @@ python run_glue.py \
  --do_eval \
  --data_dir $GLUE_DIR/$TASK_NAME \
  --max_seq_length 128 \
-  --per_gpu_train_batch_size 32 \
+  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir /tmp/$TASK_NAME/
@@ -141,7 +141,7 @@ python run_glue.py \
  --do_eval \
  --data_dir $GLUE_DIR/MRPC/ \
  --max_seq_length 128 \
-  --per_gpu_train_batch_size 32 \
+  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir /tmp/mrpc_output/
@@ -166,7 +166,7 @@ python run_glue.py \
  --do_eval \
  --data_dir $GLUE_DIR/MRPC/ \
  --max_seq_length 128 \
-  --per_gpu_train_batch_size 32 \
+  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir /tmp/mrpc_output/ \
@@ -189,7 +189,7 @@ python -m torch.distributed.launch \
    --do_eval \
    --data_dir $GLUE_DIR/MRPC/ \
    --max_seq_length 128 \
-    --per_gpu_train_batch_size 8 \
+    --per_device_train_batch_size 8 \
    --learning_rate 2e-5 \
    --num_train_epochs 3.0 \
    --output_dir /tmp/mrpc_output/
@@ -221,7 +221,7 @@ python -m torch.distributed.launch \
    --do_eval \
    --data_dir $GLUE_DIR/MNLI/ \
    --max_seq_length 128 \
-    --per_gpu_train_batch_size 8 \
+    --per_device_train_batch_size 8 \
    --learning_rate 2e-5 \
    --num_train_epochs 3.0 \
    --output_dir output_dir \
@@ -280,7 +280,7 @@ python run_xnli.py \
  --do_train \
  --do_eval \
  --data_dir $XNLI_DIR \
-  --per_gpu_train_batch_size 32 \
+  --per_device_train_batch_size 32 \
  --learning_rate 5e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 128 \
--- a/examples/token-classification/README.md
+++ b/examples/token-classification/README.md
@@ -69,7 +69,7 @@ python3 run_ner.py --data_dir ./ \
 --output_dir $OUTPUT_DIR \
 --max_seq_length  $MAX_LENGTH \
 --num_train_epochs $NUM_EPOCHS \
--per_gpu_train_batch_size $BATCH_SIZE \
+--per_device_train_batch_size $BATCH_SIZE \
 --save_steps $SAVE_STEPS \
 --seed $SEED \
 --do_train \
@@ -91,7 +91,7 @@ Instead of passing all parameters via commandline arguments, the `run_ner.py` sc
    "output_dir": "germeval-model",
    "max_seq_length": 128,
    "num_train_epochs": 3,
-    "per_gpu_train_batch_size": 32,
+    "per_device_train_batch_size": 32,
    "save_steps": 750,
    "seed": 1,
    "do_train": true,