From 4c3d98dddcfbffc8a83329d54fbb08a4b1ae36eb Mon Sep 17 00:00:00 2001
From: Stas Bekman <stas00@users.noreply.github.com>
Date: Thu, 3 Dec 2020 16:05:55 -0800
Subject: [PATCH] [s2s finetune_trainer] add instructions for distributed
 training (#8884)

---
 examples/seq2seq/README.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/examples/seq2seq/README.md b/examples/seq2seq/README.md
index c1d599983f..d025d46c97 100644
--- a/examples/seq2seq/README.md
+++ b/examples/seq2seq/README.md
@@ -213,6 +213,11 @@ To see all the possible command line options, run:
 python finetune_trainer.py --help
 ```
 
+For multi-gpu training use `torch.distributed.launch`, e.g. with 2 gpus:
+```bash
+python -m torch.distributed.launch --nproc_per_node=2  finetune_trainer.py ...
+```
+
 **At the moment, `Seq2SeqTrainer` does not support *with teacher* distillation.**
 
 All `Seq2SeqTrainer`-based fine-tuning scripts are included in the `builtin_trainer` directory.