From ddf3c64654989197d38cb1bd90720d94446c1407 Mon Sep 17 00:00:00 2001
From: Stas Bekman <stas00@users.noreply.github.com>
Date: Thu, 26 Nov 2020 14:06:27 -0800
Subject: [PATCH] potpurri of small fixes (#8807)

---
 examples/seq2seq/README.md                         | 14 +++++++-------
 .../builtin_trainer/train_mbart_cc25_enro.sh       |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/examples/seq2seq/README.md b/examples/seq2seq/README.md
index 450fbb3636..c1d599983f 100644
--- a/examples/seq2seq/README.md
+++ b/examples/seq2seq/README.md
@@ -203,30 +203,30 @@ model = AutoModelForSeq2SeqLM.from_pretrained(f'{output_dir}/best_tfmr')
 ```
 
 ### Fine-tuning using Seq2SeqTrainer
-To use `Seq2SeqTrainer` for fine-tuning you should use the `finetune_trainer.py` script. It subclasses `Trainer` to extend it for seq2seq training. Except the `Trainer` releated `TrainingArguments`, it shares the same argument names as that of `finetune.py` file. One notable difference is that, calculating generative metrics (BLEU, ROUGE) is optional and is controlled using the `--predict_with_generate` argument, set this argument to calculate BLEU and ROUGE metrics.
+To use `Seq2SeqTrainer` for fine-tuning you should use the `finetune_trainer.py` script. It subclasses `Trainer` to extend it for seq2seq training. Except the `Trainer`-related `TrainingArguments`, it shares the same argument names as that of `finetune.py` file. One notable difference is that calculating generative metrics (BLEU, ROUGE) is optional and is controlled using the `--predict_with_generate` argument.
 
 With PyTorch 1.6+ it'll automatically use `native AMP` when `--fp16` is set.
 
 To see all the possible command line options, run:
 
 ```bash
-./builtin_trainer/finetune.sh --help # This calls python finetune_trainer.py --help
+python finetune_trainer.py --help
 ```
 
 **At the moment, `Seq2SeqTrainer` does not support *with teacher* distillation.**
 
-All `Seq2SeqTrainer` based fine-tuning scripts are included in the `builtin_trainer` directory.
+All `Seq2SeqTrainer`-based fine-tuning scripts are included in the `builtin_trainer` directory.
 
 #### TPU Training
 `Seq2SeqTrainer` supports TPU training with few caveats
-1. As `generate` method does not work on TPU at the moment, `predict_with_generate` can not be used. You should use `--prediction_loss_only` to only calculate loss, and do not set `--do_predict` and `--predict_with_generate`.
-2. All sequences should be padded to be of equal length otherwise it leads to extremely slow training. (`finetune_trainer.py` does this automatically when running on TPU.)
+1. As `generate` method does not work on TPU at the moment, `predict_with_generate` cannot be used. You should use `--prediction_loss_only` to only calculate loss, and do not set `--do_predict` and `--predict_with_generate`.
+2. All sequences should be padded to be of equal length to avoid extremely slow training. (`finetune_trainer.py` does this automatically when running on TPU.)
 
-We provide a very simple launcher script named `xla_spawn.py` that lets you run our example scripts on multiple TPU cores without any boilerplate. Just pass a --num_cores flag to this script, then your regular training script with its arguments (this is similar to the torch.distributed.launch helper for torch.distributed).
+We provide a very simple launcher script named `xla_spawn.py` that lets you run our example scripts on multiple TPU cores without any boilerplate. Just pass a `--num_cores` flag to this script, then your regular training script with its arguments (this is similar to the `torch.distributed.launch` helper for `torch.distributed`).
 
 `builtin_trainer/finetune_tpu.sh` script provides minimal arguments needed for TPU training.
 
-Following command fine-tunes `sshleifer/student_marian_en_ro_6_3` on TPU V3-8 and should complete one epoch in ~5-6 mins.
+The following command fine-tunes `sshleifer/student_marian_en_ro_6_3` on TPU V3-8 and should complete one epoch in ~5-6 mins.
 
 ```bash
 ./builtin_trainer/train_distil_marian_enro_tpu.sh
diff --git a/examples/seq2seq/builtin_trainer/train_mbart_cc25_enro.sh b/examples/seq2seq/builtin_trainer/train_mbart_cc25_enro.sh
index e8cd841d72..7a2a5c7220 100644
--- a/examples/seq2seq/builtin_trainer/train_mbart_cc25_enro.sh
+++ b/examples/seq2seq/builtin_trainer/train_mbart_cc25_enro.sh
@@ -16,6 +16,6 @@ python finetune_trainer.py \
     --num_train_epochs 6 \
     --save_steps 25000 --eval_steps 25000 --logging_steps 1000 \
     --do_train --do_eval --do_predict --evaluate_during_training \
-    --predict_with_generate --logging_first_step 
+    --predict_with_generate --logging_first_step \
     --task translation \
     "$@"