[s2s] distillBART docs for paper replication (#8150)

This commit is contained in:
Sam Shleifer
2020-10-29 12:01:15 -04:00
committed by GitHub
parent acf56408d8
commit 49e4fece5c
4 changed files with 199 additions and 56 deletions

View File

@@ -33,7 +33,7 @@ python -m torch.distributed.launch --nproc_per_node=8 run_distributed_eval.py \
--type_path train
```
+ These command takes a while to run. For example, pegasus_cnn_cnn_pls.tgz took 8 hours on 8 GPUs.
+ These commands takes a while to run. For example, `pegasus_cnn_cnn_pls.tgz` took 8 hours on 8 GPUs.
+ Pegasus does not work in fp16 :(, Bart, mBART and Marian do.
+ Even if you have 1 GPU, `run_distributed_eval.py` is 10-20% faster than `run_eval.py` because it uses `SortishSampler` to minimize padding computation.