Fix TPU Convergence bug introduced by PR#6151 (#6488)
Currently with the bug introduced we're taking two optimizer steps per batch: one global one, where `xm.optimizer_step` injects a CRS between all cores in training, and one without. This has been affecting training accuracy (for example, XLNet GLUE on MNLI is not converging, etc.).
This commit is contained in:
committed by
GitHub
parent
895ed8f451
commit
24107c2c83
@@ -572,7 +572,7 @@ class Trainer:
|
|||||||
|
|
||||||
if is_torch_tpu_available():
|
if is_torch_tpu_available():
|
||||||
xm.optimizer_step(self.optimizer)
|
xm.optimizer_step(self.optimizer)
|
||||||
if self.args.fp16 and _use_native_amp:
|
elif self.args.fp16 and _use_native_amp:
|
||||||
self.scaler.step(self.optimizer)
|
self.scaler.step(self.optimizer)
|
||||||
self.scaler.update()
|
self.scaler.update()
|
||||||
else:
|
else:
|
||||||
|
|||||||
Reference in New Issue
Block a user