chore: Fix typos in docs and examples (#36524)
Fix typos in docs and examples Signed-off-by: co63oc <co63oc@users.noreply.github.com>
This commit is contained in:
@@ -57,7 +57,7 @@ There is never more than two levels of abstraction for any model to keep the cod
|
||||
|
||||
Other important functions like the forward method are defined in the `modeling.py` file.
|
||||
|
||||
Specific model heads (for example, sequence classification or language modeling) should call the base model in the forward pass rather than inherting from it to keep abstraction low.
|
||||
Specific model heads (for example, sequence classification or language modeling) should call the base model in the forward pass rather than inheriting from it to keep abstraction low.
|
||||
|
||||
New models require a configuration, for example `BrandNewLlamaConfig`, that is stored as an attribute of [`PreTrainedModel`].
|
||||
|
||||
@@ -233,7 +233,7 @@ If you run into issues, you'll need to choose one of the following debugging str
|
||||
This strategy relies on breaking the original model into smaller sub-components, such as when the code can be easily run in eager mode. While more difficult, there are some advantages to this approach.
|
||||
|
||||
1. It is easier later to compare the original model to your implementation. You can automatically verify that each individual component matches its corresponding component in the Transformers' implementation. This is better than relying on a visual comparison based on print statements.
|
||||
2. It is easier to port individal components instead of the entire model.
|
||||
2. It is easier to port individual components instead of the entire model.
|
||||
3. It is easier for understanding how a model works by breaking it up into smaller parts.
|
||||
4. It is easier to prevent regressions at a later stage when you change your code thanks to component-by-component tests.
|
||||
|
||||
@@ -328,7 +328,7 @@ def _init_weights(self, module):
|
||||
|
||||
The initialization scheme can look different if you need to adapt it to your model. For example, [`Wav2Vec2ForPreTraining`] initializes [nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) in its last two linear layers.
|
||||
|
||||
The `_is_hf_initialized` flag makes sure the submodule is only initialized once. Setting `module.project_q` and `module.project_hid` to `True` ensures the custom initialization is not overriden later. The `_init_weights` function won't be applied to these modules.
|
||||
The `_is_hf_initialized` flag makes sure the submodule is only initialized once. Setting `module.project_q` and `module.project_hid` to `True` ensures the custom initialization is not overridden later. The `_init_weights` function won't be applied to these modules.
|
||||
|
||||
```py
|
||||
def _init_weights(self, module):
|
||||
@@ -457,7 +457,7 @@ Don't be discouraged if your forward pass isn't identical with the output from t
|
||||
Your output should have a precision of *1e-3*. Ensure the output shapes and output values are identical. Common reasons for why the outputs aren't identical include:
|
||||
|
||||
- Some layers were not added (activation layer or a residual connection).
|
||||
- The word embedding matix is not tied.
|
||||
- The word embedding matrix is not tied.
|
||||
- The wrong positional embeddings are used because the original implementation includes an offset.
|
||||
- Dropout is applied during the forward pass. Fix this error by making sure `model.training` is `False` and passing `self.training` to [torch.nn.functional.dropout](https://pytorch.org/docs/stable/nn.functional.html?highlight=dropout#torch.nn.functional.dropout).
|
||||
|
||||
|
||||
@@ -159,7 +159,7 @@ Here are a few examples using notional tools:
|
||||
---
|
||||
{examples}
|
||||
|
||||
Above example were using notional tools that might not exist for you. You only have acces to those tools:
|
||||
Above example were using notional tools that might not exist for you. You only have access to those tools:
|
||||
<<tool_names>>
|
||||
You also can perform computations in the python code you generate.
|
||||
|
||||
|
||||
@@ -840,7 +840,7 @@ Unless you have a lot of free CPU memory, fp32 weights shouldn't be saved during
|
||||
<hfoptions id="save">
|
||||
<hfoption id="offline">
|
||||
|
||||
DeepSpeed provies a [zero_to_fp32.py](https://github.com/microsoft/DeepSpeed/blob/91829476a8fd4d0d9268c03c1d56795d20a51c12/deepspeed/utils/zero_to_fp32.py#L14) script at the top-level checkpoint folder for extracting weights at any point. This is a standalone script and you don't need a config file or [`Trainer`].
|
||||
DeepSpeed provides a [zero_to_fp32.py](https://github.com/microsoft/DeepSpeed/blob/91829476a8fd4d0d9268c03c1d56795d20a51c12/deepspeed/utils/zero_to_fp32.py#L14) script at the top-level checkpoint folder for extracting weights at any point. This is a standalone script and you don't need a config file or [`Trainer`].
|
||||
|
||||
For example, if your checkpoint folder looks like the one shown below, then you can run the following command to create and consolidate the fp32 weights from multiple GPUs into a single `pytorch_model.bin` file. The script automatically discovers the subfolder `global_step1` which contains the checkpoint.
|
||||
|
||||
@@ -942,7 +942,7 @@ import deepspeed
|
||||
ds_config = {...}
|
||||
# must run before instantiating the model to detect zero 3
|
||||
dschf = HfDeepSpeedConfig(ds_config) # keep this object alive
|
||||
# randomly intialize model weights
|
||||
# randomly initialize model weights
|
||||
config = AutoConfig.from_pretrained("openai-community/gpt2")
|
||||
model = AutoModel.from_config(config)
|
||||
engine = deepspeed.initialize(model=model, config_params=ds_config, ...)
|
||||
|
||||
@@ -50,7 +50,7 @@ The `streamer` parameter is compatible with any class with a [`~TextStreamer.put
|
||||
|
||||
Watermarking is useful for detecting whether text is generated. The [watermarking strategy](https://hf.co/papers/2306.04634) in Transformers randomly "colors" a subset of the tokens green. When green tokens are generated, they have a small bias added to their logits, and a higher probability of being generated. You can detect generated text by comparing the proportion of green tokens to the amount of green tokens typically found in human-generated text.
|
||||
|
||||
Watermarking is supported for any generative model in Transformers and doesn't require an extra classfication model to detect the watermarked text.
|
||||
Watermarking is supported for any generative model in Transformers and doesn't require an extra classification model to detect the watermarked text.
|
||||
|
||||
Create a [`WatermarkingConfig`] with the bias value to add to the logits and watermarking algorithm. The example below uses the `"selfhash"` algorithm, where the green token selection only depends on the current token. Pass the [`WatermarkingConfig`] to [`~GenerationMixin.generate`].
|
||||
|
||||
|
||||
@@ -87,7 +87,7 @@ You can customize [`~GenerationMixin.generate`] by overriding the parameters and
|
||||
model.generate(**inputs, num_beams=4, do_sample=True)
|
||||
```
|
||||
|
||||
[`~GenerationMixin.generate`] can also be extended with external libraries or custom code. The `logits_processor` parameter accepts custom [`LogitsProcessor`] instances for manupulating the next token probability distribution. `stopping_criteria` supports custom [`StoppingCriteria`] to stop text generation. Check out the [logits-processor-zoo](https://github.com/NVIDIA/logits-processor-zoo) for more examples of external [`~GenerationMixin.generate`]-compatible extensions.
|
||||
[`~GenerationMixin.generate`] can also be extended with external libraries or custom code. The `logits_processor` parameter accepts custom [`LogitsProcessor`] instances for manipulating the next token probability distribution. `stopping_criteria` supports custom [`StoppingCriteria`] to stop text generation. Check out the [logits-processor-zoo](https://github.com/NVIDIA/logits-processor-zoo) for more examples of external [`~GenerationMixin.generate`]-compatible extensions.
|
||||
|
||||
Refer to the [Generation strategies](./generation_strategies) guide to learn more about search, sampling, and decoding strategies.
|
||||
|
||||
|
||||
@@ -74,7 +74,7 @@ be installed as follows: `apt install libsndfile1-dev`
|
||||
For multilingual speech translation models, `eos_token_id` is used as the `decoder_start_token_id` and
|
||||
the target language id is forced as the first generated token. To force the target language id as the first
|
||||
generated token, pass the `forced_bos_token_id` parameter to the `generate()` method. The following
|
||||
example shows how to transate English speech to French text using the *facebook/s2t-medium-mustc-multilingual-st*
|
||||
example shows how to translate English speech to French text using the *facebook/s2t-medium-mustc-multilingual-st*
|
||||
checkpoint.
|
||||
|
||||
```python
|
||||
|
||||
@@ -111,7 +111,7 @@ def decode(container, sampling_rate, num_frames, clip_idx, num_clips, target_fps
|
||||
Returns:
|
||||
frames (tensor): decoded frames from the video.
|
||||
'''
|
||||
assert clip_idx >= -2, "Not a valied clip_idx {}".format(clip_idx)
|
||||
assert clip_idx >= -2, "Not a valid clip_idx {}".format(clip_idx)
|
||||
frames, fps = pyav_decode(container, sampling_rate, num_frames, clip_idx, num_clips, target_fps)
|
||||
clip_size = sampling_rate * num_frames / target_fps * fps
|
||||
index = np.linspace(0, clip_size - 1, num_frames)
|
||||
|
||||
@@ -355,7 +355,7 @@ class Olmo2Model(OlmoModel):
|
||||
)
|
||||
```
|
||||
|
||||
You only need to change the *type* of the `self.norm` attribute to use `RMSNorm` isntead of `LayerNorm`. This change doesn't affect the logic in the forward method (layer name and usage is identical to the parent class), so you don't need to overwrite it. The linter automatically unravels it.
|
||||
You only need to change the *type* of the `self.norm` attribute to use `RMSNorm` instead of `LayerNorm`. This change doesn't affect the logic in the forward method (layer name and usage is identical to the parent class), so you don't need to overwrite it. The linter automatically unravels it.
|
||||
|
||||
### Model head
|
||||
|
||||
@@ -374,7 +374,7 @@ The logic is identical to `OlmoForCausalLM` which means you don't need to make a
|
||||
|
||||
The [modeling_olmo2.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/olmo2/modeling_olmo2.py) generated by the linter also contains some classes (`Olmo2MLP`, `Olmo2RotaryEmbedding`, `Olmo2PreTrainedModel`) that weren't explicitly defined in `modular_olmo2.py`.
|
||||
|
||||
Classes that are a dependency of an inherited class but aren't explicitly defined are automatically added as a part of depdendency tracing. This is similar to how some functions were added to the `Attention` class without drrectly importing them.
|
||||
Classes that are a dependency of an inherited class but aren't explicitly defined are automatically added as a part of dependency tracing. This is similar to how some functions were added to the `Attention` class without directly importing them.
|
||||
|
||||
For example, `OlmoDecoderLayer` has an attribute defined as `self.mlp = OlmoMLP(config)`. This class was never explicitly redefined in `Olmo2MLP`, so the linter automatically created a `Olmo2MLP` class similar to `OlmoMLP`. It is identical to the code below if it was explicitly written in `modular_olmo2.py`.
|
||||
|
||||
|
||||
@@ -29,7 +29,7 @@ It is important the PSU has stable voltage otherwise it may not be able to suppl
|
||||
|
||||
## Cooling
|
||||
|
||||
An overheated GPU throttles its performance and can even shutdown if it's too hot to prevent damage. Keeping the GPU temperature low, anywhere between 158 - 167F, is essential for delivering full perfomance and maintaining its lifespan. Once temperatures reach 183 - 194F, the GPU may begin to throttle performance.
|
||||
An overheated GPU throttles its performance and can even shutdown if it's too hot to prevent damage. Keeping the GPU temperature low, anywhere between 158 - 167F, is essential for delivering full performance and maintaining its lifespan. Once temperatures reach 183 - 194F, the GPU may begin to throttle performance.
|
||||
|
||||
## Multi-GPU connectivity
|
||||
|
||||
|
||||
@@ -33,7 +33,7 @@ Use the [Model Memory Calculator](https://huggingface.co/spaces/hf-accelerate/mo
|
||||
|
||||
## Data parallelism
|
||||
|
||||
Data parallelism evenly distributes data across multiple GPUs. Each GPU holds a copy of the model and concurrently proccesses their portion of the data. At the end, the results from each GPU are synchronized and combined.
|
||||
Data parallelism evenly distributes data across multiple GPUs. Each GPU holds a copy of the model and concurrently processes their portion of the data. At the end, the results from each GPU are synchronized and combined.
|
||||
|
||||
Data parallelism significantly reduces training time by processing data in parallel, and it is scalable to the number of GPUs available. However, synchronizing results from each GPU can add overhead.
|
||||
|
||||
|
||||
@@ -24,7 +24,7 @@ Tailor the [`Pipeline`] to your task with task specific parameters such as addin
|
||||
|
||||
Transformers has two pipeline classes, a generic [`Pipeline`] and many individual task-specific pipelines like [`TextGenerationPipeline`] or [`VisualQuestionAnsweringPipeline`]. Load these individual pipelines by setting the task identifier in the `task` parameter in [`Pipeline`]. You can find the task identifier for each pipeline in their API documentation.
|
||||
|
||||
Each task is configured to use a default pretrained model and preprocessor, but this can be overriden with the `model` parameter if you want to use a different model.
|
||||
Each task is configured to use a default pretrained model and preprocessor, but this can be overridden with the `model` parameter if you want to use a different model.
|
||||
|
||||
For example, to use the [`TextGenerationPipeline`] with [Gemma 2](./model_doc/gemma2), set `task="text-generation"` and `model="google/gemma-2-2b"`.
|
||||
|
||||
|
||||
@@ -220,7 +220,7 @@ Just run the following line to automatically test every docstring example in the
|
||||
```bash
|
||||
pytest --doctest-modules <path_to_file_or_dir>
|
||||
```
|
||||
If the file has a markdown extention, you should add the `--doctest-glob="*.md"` argument.
|
||||
If the file has a markdown extension, you should add the `--doctest-glob="*.md"` argument.
|
||||
|
||||
### Run only modified tests
|
||||
|
||||
|
||||
@@ -233,7 +233,7 @@ Here are a few examples using notional tools:
|
||||
---
|
||||
{examples}
|
||||
|
||||
Above example were using notional tools that might not exist for you. You only have acces to those tools:
|
||||
Above example were using notional tools that might not exist for you. You only have access to those tools:
|
||||
<<tool_names>>
|
||||
You also can perform computations in the python code you generate.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user