[examples/seq2seq]: add --label_smoothing option (#5919)
This commit is contained in:
@@ -27,8 +27,18 @@ this should make a directory called `cnn_dm/` with files like `test.source`.
|
||||
```
|
||||
|
||||
WMT16 English-Romanian Translation Data:
|
||||
|
||||
This dataset comes in two formats. The "packed" version merges short training examples into examples of <200 tokens to increase GPU utilization (and also improves validation performance).
|
||||
|
||||
```bash
|
||||
cd examples/seq2seq
|
||||
https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro_packed_train_200.tgz
|
||||
tar -xzvf wmt_en_ro_packed_200.tgz
|
||||
export ENRO_DIR=wmt_en_ro_packed_train_200
|
||||
```
|
||||
|
||||
The original data can also be downloaded with this command:
|
||||
```bash
|
||||
wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz
|
||||
tar -xzvf wmt_en_ro.tar.gz
|
||||
export ENRO_DIR=${PWD}/wmt_en_ro
|
||||
@@ -84,16 +94,31 @@ The following command should work on a 16GB GPU:
|
||||
|
||||
First, follow the wmt_en_ro download instructions.
|
||||
Then you can finetune mbart_cc25 on english-romanian with the following command.
|
||||
**Recommendation:** Read and potentially modify the fairly opinionated defaults in `train_mbart_cc25_enro.sh` script before running it.
|
||||
**Recommendation:** Read and potentially modify the fairly opinionated defaults in `train_mbart_cc25_enro.sh` script before running it.
|
||||
|
||||
Best performing command:
|
||||
```bash
|
||||
export ENRO_DIR=${PWD}/wmt_en_ro # may need to be fixed depending on where you downloaded
|
||||
export MAX_LEN=128
|
||||
# optionally
|
||||
export ENRO_DIR='wmt_en_ro_packed_train_200' # Download instructions above
|
||||
# export WANDB_PROJECT="MT" # optional
|
||||
export MAX_LEN=200
|
||||
export BS=4
|
||||
export GAS=8
|
||||
./train_mbart_cc25_enro.sh --output_dir cc25_v1_frozen/
|
||||
export GAS=8 # gradient accumulation steps
|
||||
./train_mbart_cc25_enro.sh --output_dir enro_finetune_baseline --label_smoothing 0.1 --fp16_opt_level=O1 --logger_name wandb --sortish_sampler
|
||||
```
|
||||
This should take < 2h/epoch on a 16GB v100 and achieve val_avg_ BLEU score above 25. (you can see in wandb or metrics.json).
|
||||
To get results in line with fairseq, you need to do some postprocessing.
|
||||
|
||||
|
||||
MultiGPU command
|
||||
(using 8 GPUS as an example)
|
||||
```bash
|
||||
export ENRO_DIR='wmt_en_ro_packed_train_200' # Download instructions above
|
||||
# export WANDB_PROJECT="MT" # optional
|
||||
export MAX_LEN=200
|
||||
export BS=4
|
||||
export GAS=1 # gradient accumulation steps
|
||||
./train_mbart_cc25_enro.sh --output_dir enro_finetune_baseline --gpus 8 --logger_name wandb
|
||||
```
|
||||
### Finetuning Outputs
|
||||
As you train, `output_dir` will be filled with files, that look kind of like this (comments are mine).
|
||||
Some of them are metrics, some of them are checkpoints, some of them are metadata. Here is a quick tour:
|
||||
@@ -108,7 +133,7 @@ output_dir
|
||||
│ ├── tokenizer_config.json
|
||||
│ └── vocab.json
|
||||
├── git_log.json # repo, branch, and commit hash
|
||||
├── val_avg_rouge2=0.1984-step_count=11.ckpt # this is a pytorch lightning checkpoint associated with the best val score.
|
||||
├── val_avg_rouge2=0.1984-step_count=11.ckpt # this is a pytorch lightning checkpoint associated with the best val score. (it will be called BLEU for MT)
|
||||
├── metrics.json # new validation metrics will continually be appended to this
|
||||
├── student # this is a huggingface checkpoint generated by SummarizationDistiller. It is the student before it gets finetuned.
|
||||
│ ├── config.json
|
||||
|
||||
Reference in New Issue
Block a user