[s2s] Document better mbart finetuning command (#6229)

* Document better MT command

* improve multigpu command
This commit is contained in:
Sam Shleifer
2020-08-03 18:22:31 -04:00
committed by GitHub
parent 0513f8d275
commit 57eb1cb68d
2 changed files with 5 additions and 7 deletions

View File

@@ -113,22 +113,20 @@ Best performing command:
# optionally # optionally
export ENRO_DIR='wmt_en_ro' # Download instructions above export ENRO_DIR='wmt_en_ro' # Download instructions above
# export WANDB_PROJECT="MT" # optional # export WANDB_PROJECT="MT" # optional
export MAX_LEN=200 export MAX_LEN=128
export BS=4 export BS=4
export GAS=8 # gradient accumulation steps
./train_mbart_cc25_enro.sh --output_dir enro_finetune_baseline --label_smoothing 0.1 --fp16_opt_level=O1 --logger_name wandb --sortish_sampler ./train_mbart_cc25_enro.sh --output_dir enro_finetune_baseline --label_smoothing 0.1 --fp16_opt_level=O1 --logger_name wandb --sortish_sampler
``` ```
This should take < 6h/epoch on a 16GB v100 and achieve val_avg_ BLEU score above 25. (you can see metrics in wandb or metrics.json). This should take < 6h/epoch on a 16GB v100 and achieve test BLEU above 26
To get results in line with fairseq, you need to do some postprocessing. To get results in line with fairseq, you need to do some postprocessing. (see `romanian_postprocessing.md`)
MultiGPU command MultiGPU command
(using 8 GPUS as an example) (using 8 GPUS as an example)
```bash ```bash
export ENRO_DIR='wmt_en_ro' # Download instructions above export ENRO_DIR='wmt_en_ro' # Download instructions above
# export WANDB_PROJECT="MT" # optional # export WANDB_PROJECT="MT" # optional
export MAX_LEN=200 export MAX_LEN=128
export BS=4 export BS=4
export GAS=1 # gradient accumulation steps
./train_mbart_cc25_enro.sh --output_dir enro_finetune_baseline --gpus 8 --logger_name wandb ./train_mbart_cc25_enro.sh --output_dir enro_finetune_baseline --gpus 8 --logger_name wandb
``` ```
### Finetuning Outputs ### Finetuning Outputs

View File

@@ -10,7 +10,7 @@ python finetune.py \
--num_train_epochs 6 --src_lang en_XX --tgt_lang ro_RO \ --num_train_epochs 6 --src_lang en_XX --tgt_lang ro_RO \
--data_dir $ENRO_DIR \ --data_dir $ENRO_DIR \
--max_source_length $MAX_LEN --max_target_length $MAX_LEN --val_max_target_length $MAX_LEN --test_max_target_length $MAX_LEN \ --max_source_length $MAX_LEN --max_target_length $MAX_LEN --val_max_target_length $MAX_LEN --test_max_target_length $MAX_LEN \
--train_batch_size=$BS --eval_batch_size=$BS --gradient_accumulation_steps=$GAS \ --train_batch_size=$BS --eval_batch_size=$BS \
--task translation \ --task translation \
--warmup_steps 500 \ --warmup_steps 500 \
--freeze_embeds \ --freeze_embeds \