Tokenizers and Config classes are referenced.

2019-07-05 17:44:59 -04:00
parent df759114c9
commit 64fd986376
6 changed files with 43 additions and 95 deletions
--- a/docs/source/model_doc/gpt2.rst
+++ b/docs/source/model_doc/gpt2.rst
@@ -1,31 +1,18 @@
 OpenAI GPT2
 ----------------------------------------------------

+``GPT2Config``
+~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: pytorch_pretrained_bert.GPT2Config
+    :members:
+

 ``GPT2Tokenizer``
 ~~~~~~~~~~~~~~~~~~~~~

-``GPT2Tokenizer`` perform byte-level Byte-Pair-Encoding (BPE) tokenization.
-
-This class has three arguments:
-
-
-* ``vocab_file``\ : path to a vocabulary file.
-* ``merges_file``\ : path to a file containing the BPE merges.
-* ``errors``\ : How to handle unicode decoding errors. **Default = ``replace``\ **
-
-and two methods:
-
-
-* ``tokenize(text)``\ : convert a ``str`` in a list of ``str`` tokens by performing byte-level BPE.
-* ``convert_tokens_to_ids(tokens)``\ : convert a list of ``str`` tokens in a list of ``int`` indices in the vocabulary.
-* ``convert_ids_to_tokens(tokens)``\ : convert a list of ``int`` indices in a list of ``str`` tokens in the vocabulary.
-* ``set_special_tokens(self, special_tokens)``\ : update the list of special tokens (see above arguments)
-* ``encode(text)``\ : convert a ``str`` in a list of ``int`` tokens by performing byte-level BPE.
-* ``decode(tokens)``\ : convert back a list of ``int`` tokens in a ``str``.
-* `save_vocabulary(directory_path)`: save the vocabulary, merge and special tokens files to `directory_path`. Return the path to the three files: ``vocab_file_path``\ , ``merge_file_path``\ , ``special_tokens_file_path``. The vocabulary can be reloaded with ``OpenAIGPTTokenizer.from_pretrained('directory_path')``.
-
-Please refer to `\ ``tokenization_gpt2.py`` <./pytorch_pretrained_bert/tokenization_gpt2.py>`_ for more details on the ``GPT2Tokenizer``.
+.. autoclass:: pytorch_pretrained_bert.GPT2Tokenizer
+    :members:


 14. ``GPT2Model``