docs: Resolve many typos in the English docs (#20088)

* docs: Fix typo in ONNX parser help: 'tolerence' => 'tolerance'

* docs: Resolve many typos in the English docs

Typos found via 'codespell ./docs/source/en'
This commit is contained in:
Tom Aarsen
2022-11-07 15:19:04 +01:00
committed by GitHub
parent b8112eddec
commit 3222fc645b
21 changed files with 29 additions and 29 deletions

View File

@@ -1499,7 +1499,7 @@ fp32_model = load_state_dict_from_zero_checkpoint(trainer.model, checkpoint_dir)
<Tip>
Note, that once `load_state_dict_from_zero_checkpoint` was run, the `model` will no longer be useable in the
Note, that once `load_state_dict_from_zero_checkpoint` was run, the `model` will no longer be usable in the
DeepSpeed context of the same application. i.e. you will need to re-initialize the deepspeed engine, since
`model.load_state_dict(state_dict)` will remove all the DeepSpeed magic from it. So do this only at the very end
of the training.

View File

@@ -112,7 +112,7 @@ Additionally, the following method can be used to convert SQuAD examples into
[[autodoc]] data.processors.squad.squad_convert_examples_to_features
These processors as well as the aforementionned method can be used with files containing the data as well as with the
These processors as well as the aforementioned method can be used with files containing the data as well as with the
*tensorflow_datasets* package. Examples are given below.

View File

@@ -579,7 +579,7 @@ add `--fsdp "full_shard offload auto_wrap"` or `--fsdp "shard_grad_op offload au
This specifies the transformer layer class name (case-sensitive) to wrap ,e.g, `BertLayer`, `GPTJBlock`, `T5Block` ....
This is important because submodules that share weights (e.g., embedding layer) should not end up in different FSDP wrapped units.
Using this policy, wrapping happens for each block containing Multi-Head Attention followed by couple of MLP layers.
Remaining layers including the shared embeddings are conviniently wrapped in same outermost FSDP unit.
Remaining layers including the shared embeddings are conveniently wrapped in same outermost FSDP unit.
Therefore, use this for transformer based models.
- For size based auto wrap policy, please add `--fsdp_min_num_params <number>` to command line arguments.
It specifies FSDP's minimum number of parameters for auto wrapping.
@@ -620,7 +620,7 @@ please follow this nice medium article [GPU-Acceleration Comes to PyTorch on M1
**Usage**:
User has to just pass `--use_mps_device` argument.
For example, you can run the offical Glue text classififcation task (from the root folder) using Apple Silicon GPU with below command:
For example, you can run the official Glue text classififcation task (from the root folder) using Apple Silicon GPU with below command:
```bash
export TASK_NAME=mrpc