docs: Resolve many typos in the English docs (#20088)
* docs: Fix typo in ONNX parser help: 'tolerence' => 'tolerance' * docs: Resolve many typos in the English docs Typos found via 'codespell ./docs/source/en'
This commit is contained in:
@@ -1499,7 +1499,7 @@ fp32_model = load_state_dict_from_zero_checkpoint(trainer.model, checkpoint_dir)
|
||||
|
||||
<Tip>
|
||||
|
||||
Note, that once `load_state_dict_from_zero_checkpoint` was run, the `model` will no longer be useable in the
|
||||
Note, that once `load_state_dict_from_zero_checkpoint` was run, the `model` will no longer be usable in the
|
||||
DeepSpeed context of the same application. i.e. you will need to re-initialize the deepspeed engine, since
|
||||
`model.load_state_dict(state_dict)` will remove all the DeepSpeed magic from it. So do this only at the very end
|
||||
of the training.
|
||||
|
||||
@@ -112,7 +112,7 @@ Additionally, the following method can be used to convert SQuAD examples into
|
||||
[[autodoc]] data.processors.squad.squad_convert_examples_to_features
|
||||
|
||||
|
||||
These processors as well as the aforementionned method can be used with files containing the data as well as with the
|
||||
These processors as well as the aforementioned method can be used with files containing the data as well as with the
|
||||
*tensorflow_datasets* package. Examples are given below.
|
||||
|
||||
|
||||
|
||||
@@ -579,7 +579,7 @@ add `--fsdp "full_shard offload auto_wrap"` or `--fsdp "shard_grad_op offload au
|
||||
This specifies the transformer layer class name (case-sensitive) to wrap ,e.g, `BertLayer`, `GPTJBlock`, `T5Block` ....
|
||||
This is important because submodules that share weights (e.g., embedding layer) should not end up in different FSDP wrapped units.
|
||||
Using this policy, wrapping happens for each block containing Multi-Head Attention followed by couple of MLP layers.
|
||||
Remaining layers including the shared embeddings are conviniently wrapped in same outermost FSDP unit.
|
||||
Remaining layers including the shared embeddings are conveniently wrapped in same outermost FSDP unit.
|
||||
Therefore, use this for transformer based models.
|
||||
- For size based auto wrap policy, please add `--fsdp_min_num_params <number>` to command line arguments.
|
||||
It specifies FSDP's minimum number of parameters for auto wrapping.
|
||||
@@ -620,7 +620,7 @@ please follow this nice medium article [GPU-Acceleration Comes to PyTorch on M1
|
||||
|
||||
**Usage**:
|
||||
User has to just pass `--use_mps_device` argument.
|
||||
For example, you can run the offical Glue text classififcation task (from the root folder) using Apple Silicon GPU with below command:
|
||||
For example, you can run the official Glue text classififcation task (from the root folder) using Apple Silicon GPU with below command:
|
||||
|
||||
```bash
|
||||
export TASK_NAME=mrpc
|
||||
|
||||
Reference in New Issue
Block a user