[docs] Typos - Single GPU efficient training features (#38964)

* Typos

- corrected bf16 training argument
- corrected header for SDPA

* improved readability for SDPA suggested by @stevhliu

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
casinca
2025-06-23 21:33:10 +02:00
committed by GitHub
parent f9be71b34d
commit 21cb353b7b

View File

@@ -31,7 +31,7 @@ Refer to the table below to quickly help you identify the features relevant to y
| data preloading | yes | no |
| torch_empty_cache_steps | no | yes |
| torch.compile | yes | no |
| PEFT | no | yes |
| scaled dot production attention (SDPA) | yes | yes |
## Trainer
@@ -128,7 +128,7 @@ fp16 isn't memory-optimized because the gradients that are computed in fp16 are
[bf16](https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus) trades off some precision for a much larger dynamic range, which is helpful for avoiding overflow and underflow errors. You can use bf16 without adding any loss scaling methods like you would with fp16. bf16 is supported by NVIDIAs Ampere architecture or newer.
Configure [`~TrainingArguments.fp16`] in [`TrainingArguments`] to enable mixed precision training with the bf16 data type.
Configure [`~TrainingArguments.bf16`] in [`TrainingArguments`] to enable mixed precision training with the bf16 data type.
```py
from transformers import TrainingArguments