From 955fd4fea93e26ab5b04961a993fec3c6bbb89a1 Mon Sep 17 00:00:00 2001 From: Yaser Abdelaziz Date: Mon, 4 Oct 2021 12:30:50 +0200 Subject: [PATCH] [docs/gpt-j] fix typo (#13851) --- docs/source/model_doc/gptj.rst | 5 ----- 1 file changed, 5 deletions(-) diff --git a/docs/source/model_doc/gptj.rst b/docs/source/model_doc/gptj.rst index 53363ea0d9..f8ed3de412 100644 --- a/docs/source/model_doc/gptj.rst +++ b/docs/source/model_doc/gptj.rst @@ -53,11 +53,6 @@ Tips: size, the tokenizer for `GPT-J `__ contains 143 extra tokens ``<|extratoken_1|>... <|extratoken_143|>``, so the ``vocab_size`` of tokenizer also becomes 50400. -- Although the embedding matrix has a size of 50400, only 50257 entries are used by the GPT-2 tokenizer. These extra - tokens are added for the sake of efficiency on TPUs. To avoid the mis-match between embedding matrix size and vocab - size, the tokenizer for [GPT-J](https://huggingface.co/EleutherAI/gpt-j-6B) contains 143 extra tokens - ``<|extratoken_1|>... <|extratoken_143|>``, so the ``vocab_size`` of tokenizer also becomes 50400. - Generation _______________________________________________________________________________________________________________________