Doc new front (#14590)

* Convert PretrainedConfig doc to Markdown * Use syntax * Add necessary doc files (#14496) * Doc fixes (#14499) * Fixes for the new front * Convert DETR file for table * Title is needed * Simplify a bit * Even simpler * Remove imports * Fix typo in toctree (#14516) * Fix checkpoints badge * Update versions.yml format (#14517) * Doc new front github actions (#14512) * Doc new front github actions * Fix docstring * Fix feature extraction utils import (#14515) * Address Julien's comments * Push to doc-builder * Ready for merge * Remove old build and deploy * Doc misc fixes (#14583) * Rm versions.yml from doc * Fix converting.rst * Rm pretrained_models from toctree * Fix index links (#14567) * Fix links in README * Localized READMEs * Fix copy script * Fix find doc script * Update README_ko.md Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co> * Adapt build command to new CLI tools (#14578) * Fix typo * Fix doc interlinks (#14589) * Convert PretrainedConfig doc to Markdown * Use syntax * Rm pattern <[a-z]+(.html).*> * Rm huggingface.co/transformers/master * Rm .html * Rm .html from index.mdx * Rm .html from model_summary.rst * Update index.mdx rm html * Update remove .html * Fix inner doc links * Fix interlink in preprocssing.rst * Update pr_checks Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Convert PretrainedConfig doc to Markdown * Use syntax * Add necessary doc files (#14496) * Doc fixes (#14499) * Fixes for the new front * Convert DETR file for table * Title is needed * Simplify a bit * Even simpler * Remove imports * Fix checkpoints badge * Fix typo in toctree (#14516) * Update versions.yml format (#14517) * Doc new front github actions (#14512) * Doc new front github actions * Fix docstring * Fix feature extraction utils import (#14515) * Address Julien's comments * Push to doc-builder * Ready for merge * Remove old build and deploy * Doc misc fixes (#14583) * Rm versions.yml from doc * Fix converting.rst * Rm pretrained_models from toctree * Fix index links (#14567) * Fix links in README * Localized READMEs * Fix copy script * Fix find doc script * Update README_ko.md Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co> * Adapt build command to new CLI tools (#14578) * Fix typo * Fix doc interlinks (#14589) * Convert PretrainedConfig doc to Markdown * Use syntax * Rm pattern <[a-z]+(.html).*> * Rm huggingface.co/transformers/master * Rm .html * Rm .html from index.mdx * Rm .html from model_summary.rst * Update index.mdx rm html * Update remove .html * Fix inner doc links * Fix interlink in preprocssing.rst * Update pr_checks Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Styling Co-authored-by: Mishig Davaadorj <mishig.davaadorj@coloradocollege.edu> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co>
2021-12-01 14:13:02 -05:00
parent 14cc50d081
commit 4df7d05a87
43 changed files with 1467 additions and 3185 deletions
--- a/docs/source/preprocessing.rst
+++ b/docs/source/preprocessing.rst
@@ -56,10 +56,9 @@ is its ``__call__``: you just need to feed your sentence to your tokenizer objec
     'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
     'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

-This returns a dictionary string to list of ints. The `input_ids <glossary.html#input-ids>`__ are the indices
-corresponding to each token in our sentence. We will see below what the `attention_mask
-<glossary.html#attention-mask>`__ is used for and in :ref:`the next section <sentence-pairs>` the goal of
-`token_type_ids <glossary.html#token-type-ids>`__.
+This returns a dictionary string to list of ints. The `input_ids <glossary#input-ids>`__ are the indices corresponding
+to each token in our sentence. We will see below what the `attention_mask <glossary#attention-mask>`__ is used for and
+in :ref:`the next section <preprocessing-pairs-of-sentences>` the goal of `token_type_ids <glossary#token-type-ids>`__.

 The tokenizer can decode a list of token ids in a proper sentence:

@@ -132,8 +131,8 @@ You can do all of this by using the following options when feeding your list of
                               [1, 1, 1, 1, 1, 1, 1, 1, 0]])}

 It returns a dictionary with string keys and tensor values. We can now see what the `attention_mask
-<glossary.html#attention-mask>`__ is all about: it points out which tokens the model should pay attention to and which
-ones it should not (because they represent padding in this case).
+<glossary#attention-mask>`__ is all about: it points out which tokens the model should pay attention to and which ones
+it should not (because they represent padding in this case).


 Note that if your model does not have a maximum length associated to it, the command above will throw a warning. You
@@ -166,8 +165,8 @@ This will once again return a dict string to list of ints:
     'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1], 
     'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

-This shows us what the `token_type_ids <glossary.html#token-type-ids>`__ are for: they indicate to the model which part
-of the inputs correspond to the first sentence and which part corresponds to the second sentence. Note that
+This shows us what the `token_type_ids <glossary#token-type-ids>`__ are for: they indicate to the model which part of
+the inputs correspond to the first sentence and which part corresponds to the second sentence. Note that
 `token_type_ids` are not required or handled by all models. By default, a tokenizer will only return the inputs that
 its associated model expects. You can force the return (or the non-return) of any of those special arguments by using
 ``return_input_ids`` or ``return_token_type_ids``.