Adds a note to resize the token embedding matrix when adding special … (#11120)

* Adds a note to resize the token embedding matrix when adding special tokens

* Remove superfluous space
This commit is contained in:
Lysandre Debut
2021-04-07 10:06:45 -04:00
committed by GitHub
parent 02f7c2fe66
commit c0d97cee13

View File

@@ -825,6 +825,12 @@ class SpecialTokensMixin:
special tokens are NOT in the vocabulary, they are added to it (indexed starting from the last index of the special tokens are NOT in the vocabulary, they are added to it (indexed starting from the last index of the
current vocabulary). current vocabulary).
.. Note::
When adding new tokens to the vocabulary, you should make sure to also resize the token embedding matrix of
the model so that its embedding matrix matches the tokenizer.
In order to do that, please use the :meth:`~transformers.PreTrainedModel.resize_token_embeddings` method.
Using :obj:`add_special_tokens` will ensure your special tokens can be used in several ways: Using :obj:`add_special_tokens` will ensure your special tokens can be used in several ways:
- Special tokens are carefully handled by the tokenizer (they are never split). - Special tokens are carefully handled by the tokenizer (they are never split).