docs: Resolve many typos in the English docs (#20088)

* docs: Fix typo in ONNX parser help: 'tolerence' => 'tolerance' * docs: Resolve many typos in the English docs Typos found via 'codespell ./docs/source/en'
2022-11-07 15:19:04 +01:00
parent b8112eddec
commit 3222fc645b
21 changed files with 29 additions and 29 deletions
--- a/docs/source/en/main_classes/deepspeed.mdx
+++ b/docs/source/en/main_classes/deepspeed.mdx
@@ -1499,7 +1499,7 @@ fp32_model = load_state_dict_from_zero_checkpoint(trainer.model, checkpoint_dir)

 <Tip>

-Note, that once `load_state_dict_from_zero_checkpoint` was run, the `model` will no longer be useable in the
+Note, that once `load_state_dict_from_zero_checkpoint` was run, the `model` will no longer be usable in the
 DeepSpeed context of the same application. i.e. you will need to re-initialize the deepspeed engine, since
 `model.load_state_dict(state_dict)` will remove all the DeepSpeed magic from it. So do this only at the very end
 of the training.
--- a/docs/source/en/main_classes/processors.mdx
+++ b/docs/source/en/main_classes/processors.mdx
@@ -112,7 +112,7 @@ Additionally, the following method can be used to convert SQuAD examples into
 [[autodoc]] data.processors.squad.squad_convert_examples_to_features


-These processors as well as the aforementionned method can be used with files containing the data as well as with the
+These processors as well as the aforementioned method can be used with files containing the data as well as with the
 *tensorflow_datasets* package. Examples are given below.


--- a/docs/source/en/main_classes/trainer.mdx
+++ b/docs/source/en/main_classes/trainer.mdx
@@ -579,7 +579,7 @@ add `--fsdp "full_shard offload auto_wrap"` or `--fsdp "shard_grad_op offload au
  This specifies the transformer layer class name (case-sensitive) to wrap ,e.g, `BertLayer`, `GPTJBlock`, `T5Block` ....
  This is important because submodules that share weights (e.g., embedding layer) should not end up in different FSDP wrapped units. 
  Using this policy, wrapping happens for each block containing Multi-Head Attention followed by couple of MLP layers. 
-  Remaining layers including the shared embeddings are conviniently wrapped in same outermost FSDP unit.
+  Remaining layers including the shared embeddings are conveniently wrapped in same outermost FSDP unit.
  Therefore, use this for transformer based models.
  - For size based auto wrap policy, please add `--fsdp_min_num_params <number>` to command line arguments.
  It specifies FSDP's minimum number of parameters for auto wrapping.
@@ -620,7 +620,7 @@ please follow this nice medium article [GPU-Acceleration Comes to PyTorch on M1

 **Usage**:
 User has to just pass `--use_mps_device` argument. 
-For example, you can run the offical Glue text classififcation task (from the root folder) using Apple Silicon GPU with below command:
+For example, you can run the official Glue text classififcation task (from the root folder) using Apple Silicon GPU with below command:

 ```bash
 export TASK_NAME=mrpc