examples/seq2seq/run_eval.py fixes and docs (#5322)

2020-06-26 19:20:43 -04:00
parent 5543b30aa6
commit 393b8dc09a
5 changed files with 79 additions and 27 deletions
--- a/examples/seq2seq/README.md
+++ b/examples/seq2seq/README.md
@@ -37,13 +37,50 @@ export ENRO_DIR=${PWD}/wmt_en_ro
 If you are using your own data, it must be formatted as one directory with 6 files: train.source, train.target, val.source, val.target, test.source, test.target.  
 The `.source` files are the input, the `.target` files are the desired output.

-### Evaluation
+### Evaluation Commands

-To create summaries for each article in dataset, run:
+To create summaries for each article in dataset, we use `run_eval.py`, here are a few commands that run eval for different tasks and models.
+If 'translation' is in your task name, the computed metric will be BLEU. Otherwise, ROUGE will be used.
+
+For t5, you need to specify --task translation_{src}_to_{tgt} as follows:
 ```bash
-python run_eval.py <path_to_test.source> test_generations.txt <model-name>  --score_path rouge_scores.txt
+export DATA_DIR=wmt_en_ro
+python run_eval.py t5_base \
+    $DATA_DIR/val.source mbart_val_generations.txt \
+    --reference_path $DATA_DIR/val.target \
+    --score_path enro_bleu.json \
+    --task translation_en_to_ro \
+    --n_obs 100 \
+    --device cuda \
+    --fp16 \
+    --bs 32
+```
+
+This command works for MBART, although the BLEU score is suspiciously low.
+```bash
+export DATA_DIR=wmt_en_ro
+python run_eval.py facebook/mbart-large-en-ro $DATA_DIR/val.source mbart_val_generations.txt \
+    --reference_path $DATA_DIR/val.target \
+    --score_path enro_bleu.json \
+    --task translation \
+    --n_obs 100 \
+    --device cuda \
+    --fp16 \
+    --bs 32
+```
+
+Summarization (xsum will be very similar):
+```bash
+export DATA_DIR=cnn_dm
+python run_eval.py sshleifer/distilbart-cnn-12-6 $DATA_DIR/val.source dbart_val_generations.txt \
+    --reference_path $DATA_DIR/val.target \
+    --score_path cnn_rouge.json \
+    --task summarization \
+    --n_obs 100 \
+    --device cuda \
+    --fp16 \
+    --bs 32
 ```
-The default batch size, 4, fits in 16GB GPU memory, but may need to be adjusted to fit your system.


 ### Summarization Finetuning