Tokenization tutorial (#5257)

* All done

* Link to the tutorial

* Typo fixes

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Add metnion of the return_xxx args

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
This commit is contained in:
Sylvain Gugger
2020-06-24 18:43:20 -04:00
committed by GitHub
parent 7ac9110711
commit d12ceb48ba
3 changed files with 375 additions and 1 deletions

View File

@@ -204,7 +204,7 @@ padding token the model was pretrained with. The attention mask is also adapted
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]])}
You can learn more about tokenizers on their :doc:`doc page <main_classes/tokenizer>` (tutorial coming soon).
You can learn more about tokenizers :doc:`here <preprocessing>`.
Using the model
^^^^^^^^^^^^^^^