Fix TPU Convergence bug introduced by PR#6151 (#6488)

Currently with the bug introduced we're taking two optimizer steps per batch: one global one, where `xm.optimizer_step` injects a CRS between all cores in training, and one without. This has been affecting training accuracy (for example, XLNet GLUE on MNLI is not converging, etc.).
2020-08-14 09:47:37 -07:00
parent 895ed8f451
commit 24107c2c83
1 changed files with 1 additions and 1 deletions
--- a/src/transformers/trainer.py
+++ b/src/transformers/trainer.py
@@ -572,7 +572,7 @@ class Trainer:
                    if is_torch_tpu_available():
                        xm.optimizer_step(self.optimizer)
-                    if self.args.fp16 and _use_native_amp:
+                    elif self.args.fp16 and _use_native_amp:
                        self.scaler.step(self.optimizer)
                        self.scaler.update()
                    else: