Adds a note to resize the token embedding matrix when adding special … (#11120)
* Adds a note to resize the token embedding matrix when adding special tokens * Remove superfluous space
This commit is contained in:
@@ -825,7 +825,13 @@ class SpecialTokensMixin:
|
|||||||
special tokens are NOT in the vocabulary, they are added to it (indexed starting from the last index of the
|
special tokens are NOT in the vocabulary, they are added to it (indexed starting from the last index of the
|
||||||
current vocabulary).
|
current vocabulary).
|
||||||
|
|
||||||
Using : obj:`add_special_tokens` will ensure your special tokens can be used in several ways:
|
.. Note::
|
||||||
|
When adding new tokens to the vocabulary, you should make sure to also resize the token embedding matrix of
|
||||||
|
the model so that its embedding matrix matches the tokenizer.
|
||||||
|
|
||||||
|
In order to do that, please use the :meth:`~transformers.PreTrainedModel.resize_token_embeddings` method.
|
||||||
|
|
||||||
|
Using :obj:`add_special_tokens` will ensure your special tokens can be used in several ways:
|
||||||
|
|
||||||
- Special tokens are carefully handled by the tokenizer (they are never split).
|
- Special tokens are carefully handled by the tokenizer (they are never split).
|
||||||
- You can easily refer to special tokens using tokenizer class attributes like :obj:`tokenizer.cls_token`. This
|
- You can easily refer to special tokens using tokenizer class attributes like :obj:`tokenizer.cls_token`. This
|
||||||
|
|||||||
Reference in New Issue
Block a user