[trainer] add tf32-mode control (#14606)

* [trainer] add --tf32 support * it's pt>=.17 * it's pt>=.17 * flip the default to True * add experimental note * simplify logic * style * switch to 3-state logic * doc * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * re-style code Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-03 10:08:58 -08:00
parent aada989ad5
commit 71b1bf7ea8
5 changed files with 92 additions and 29 deletions
--- a/docs/source/performance.md
+++ b/docs/source/performance.md
@@ -358,8 +358,13 @@ Like all cases with reduced precision this may or may not be satisfactory for yo

 If you're already using fp16 or bf16 mixed precision it may help with the throughput as well.

+You can enable this mode in the 🤗 Trainer with `--tf32`, or disable it with `--tf32 0` or `--no_tf32`.
+By default the PyTorch default is used.
+
 Note: tf32 mode is internal to CUDA and can't be accessed directly via `tensor.to(dtype=torch.tf32)` as `torch.tf32` doesn't exit.

+Note: you need `torch>=1.7` to enjoy this feature.
+

 ### Gradient Checkpointing