typo: fix typos in CONTRIBUTING.md and deepspeed.mdx (#24184)

* typo: fix typos in CONTRIBUTING.md and deepspeed.mdx

* Update CONTRIBUTING.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
This commit is contained in:
Jacob
2023-06-12 22:43:58 +08:00
committed by GitHub
parent dadc9fb427
commit 97527898da
2 changed files with 3 additions and 3 deletions

View File

@@ -760,7 +760,7 @@ time. "reuse distance" is a metric we are using to figure out when will a parame
use the `stage3_max_reuse_distance` to decide whether to throw away the parameter or to keep it. If a parameter is
going to be used again in near future (less than `stage3_max_reuse_distance`) then we keep it to reduce communication
overhead. This is super helpful when you have activation checkpointing enabled, where we do a forward recompute and
backward passes a a single layer granularity and want to keep the parameter in the forward recompute till the backward
backward passes a single layer granularity and want to keep the parameter in the forward recompute till the backward
The following configuration values depend on the model's hidden size: