[trainer] add tf32-mode control (#14606)

* [trainer] add --tf32 support

* it's pt>=.17

* it's pt>=.17

* flip the default to True

* add experimental note

* simplify logic

* style

* switch to 3-state logic

* doc

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* re-style code

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
Stas Bekman
2021-12-03 10:08:58 -08:00
committed by GitHub
parent aada989ad5
commit 71b1bf7ea8
5 changed files with 92 additions and 29 deletions

View File

@@ -358,8 +358,13 @@ Like all cases with reduced precision this may or may not be satisfactory for yo
If you're already using fp16 or bf16 mixed precision it may help with the throughput as well.
You can enable this mode in the 🤗 Trainer with `--tf32`, or disable it with `--tf32 0` or `--no_tf32`.
By default the PyTorch default is used.
Note: tf32 mode is internal to CUDA and can't be accessed directly via `tensor.to(dtype=torch.tf32)` as `torch.tf32` doesn't exit.
Note: you need `torch>=1.7` to enjoy this feature.
### Gradient Checkpointing