Minor docs typo fixes (#8797)
* Fix minor typos * Additional typos * Style fix Co-authored-by: guyrosin <guyrosin@assist-561.cs.technion.ac.il>
This commit is contained in:
@@ -125,7 +125,7 @@ Follow these steps to start contributing:
|
|||||||
$ git checkout -b a-descriptive-name-for-my-changes
|
$ git checkout -b a-descriptive-name-for-my-changes
|
||||||
```
|
```
|
||||||
|
|
||||||
**do not** work on the `master` branch.
|
**Do not** work on the `master` branch.
|
||||||
|
|
||||||
4. Set up a development environment by running the following command in a virtual environment:
|
4. Set up a development environment by running the following command in a virtual environment:
|
||||||
|
|
||||||
|
|||||||
@@ -2,7 +2,6 @@ Preprocessing data
|
|||||||
=======================================================================================================================
|
=======================================================================================================================
|
||||||
|
|
||||||
In this tutorial, we'll explore how to preprocess your data using 🤗 Transformers. The main tool for this is what we
|
In this tutorial, we'll explore how to preprocess your data using 🤗 Transformers. The main tool for this is what we
|
||||||
|
|
||||||
call a :doc:`tokenizer <main_classes/tokenizer>`. You can build one using the tokenizer class associated to the model
|
call a :doc:`tokenizer <main_classes/tokenizer>`. You can build one using the tokenizer class associated to the model
|
||||||
you would like to use, or directly with the :class:`~transformers.AutoTokenizer` class.
|
you would like to use, or directly with the :class:`~transformers.AutoTokenizer` class.
|
||||||
|
|
||||||
@@ -52,7 +51,7 @@ The tokenizer can decode a list of token ids in a proper sentence:
|
|||||||
"[CLS] Hello, I'm a single sentence! [SEP]"
|
"[CLS] Hello, I'm a single sentence! [SEP]"
|
||||||
|
|
||||||
As you can see, the tokenizer automatically added some special tokens that the model expects. Not all models need
|
As you can see, the tokenizer automatically added some special tokens that the model expects. Not all models need
|
||||||
special tokens; for instance, if we had used` gtp2-medium` instead of `bert-base-cased` to create our tokenizer, we
|
special tokens; for instance, if we had used `gpt2-medium` instead of `bert-base-cased` to create our tokenizer, we
|
||||||
would have seen the same sentence as the original one here. You can disable this behavior (which is only advised if you
|
would have seen the same sentence as the original one here. You can disable this behavior (which is only advised if you
|
||||||
have added those special tokens yourself) by passing ``add_special_tokens=False``.
|
have added those special tokens yourself) by passing ``add_special_tokens=False``.
|
||||||
|
|
||||||
|
|||||||
@@ -240,7 +240,9 @@ activations of the model.
|
|||||||
[ 0.08181786, -0.04179301]], dtype=float32)>,)
|
[ 0.08181786, -0.04179301]], dtype=float32)>,)
|
||||||
|
|
||||||
The model can return more than just the final activations, which is why the output is a tuple. Here we only asked for
|
The model can return more than just the final activations, which is why the output is a tuple. Here we only asked for
|
||||||
the final activations, so we get a tuple with one element. .. note::
|
the final activations, so we get a tuple with one element.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
All 🤗 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final activation
|
All 🤗 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final activation
|
||||||
function (like SoftMax) since this final activation function is often fused with the loss.
|
function (like SoftMax) since this final activation function is often fused with the loss.
|
||||||
|
|||||||
@@ -70,8 +70,8 @@ inference.
|
|||||||
optimizations afterwards.
|
optimizations afterwards.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
For more information about the optimizations enabled by ONNXRuntime, please have a look at the (`ONNXRuntime Github
|
For more information about the optimizations enabled by ONNXRuntime, please have a look at the `ONNXRuntime Github
|
||||||
<https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_)
|
<https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_.
|
||||||
|
|
||||||
Quantization
|
Quantization
|
||||||
-----------------------------------------------------------------------------------------------------------------------
|
-----------------------------------------------------------------------------------------------------------------------
|
||||||
|
|||||||
@@ -20,14 +20,14 @@ DataCollator = NewType("DataCollator", Callable[[List[InputDataClass]], Dict[str
|
|||||||
|
|
||||||
def default_data_collator(features: List[InputDataClass]) -> Dict[str, torch.Tensor]:
|
def default_data_collator(features: List[InputDataClass]) -> Dict[str, torch.Tensor]:
|
||||||
"""
|
"""
|
||||||
Very simple data collator that simply collates batches of dict-like objects and erforms special handling for
|
Very simple data collator that simply collates batches of dict-like objects and performs special handling for
|
||||||
potential keys named:
|
potential keys named:
|
||||||
|
|
||||||
- ``label``: handles a single value (int or float) per object
|
- ``label``: handles a single value (int or float) per object
|
||||||
- ``label_ids``: handles a list of values per object
|
- ``label_ids``: handles a list of values per object
|
||||||
|
|
||||||
Des not do any additional preprocessing: property names of the input object will be used as corresponding inputs to
|
Does not do any additional preprocessing: property names of the input object will be used as corresponding inputs
|
||||||
the model. See glue and ner for example of how it's useful.
|
to the model. See glue and ner for example of how it's useful.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
# In this function we'll make the assumption that all `features` in the batch
|
# In this function we'll make the assumption that all `features` in the batch
|
||||||
|
|||||||
Reference in New Issue
Block a user