documentation: some minor clean up (#16850)

This commit is contained in:
Yang Ming
2022-04-27 04:56:08 +08:00
committed by GitHub
parent aaee4038c3
commit 10dfa126b7
3 changed files with 4 additions and 5 deletions

View File

@@ -18,9 +18,7 @@ Rust library [🤗 Tokenizers](https://github.com/huggingface/tokenizers). The "
1. a significant speed-up in particular when doing batched tokenization and 1. a significant speed-up in particular when doing batched tokenization and
2. additional methods to map between the original string (character and words) and the token space (e.g. getting the 2. additional methods to map between the original string (character and words) and the token space (e.g. getting the
index of the token comprising a given character or the span of characters corresponding to a given token). Currently index of the token comprising a given character or the span of characters corresponding to a given token).
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLM-RoBERTa
and XLNet models).
The base classes [`PreTrainedTokenizer`] and [`PreTrainedTokenizerFast`] The base classes [`PreTrainedTokenizer`] and [`PreTrainedTokenizerFast`]
implement the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and implement the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and

View File

@@ -60,11 +60,11 @@ class DebertaV2Tokenizer(PreTrainedTokenizer):
contains the vocabulary necessary to instantiate a tokenizer. contains the vocabulary necessary to instantiate a tokenizer.
do_lower_case (`bool`, *optional*, defaults to `False`): do_lower_case (`bool`, *optional*, defaults to `False`):
Whether or not to lowercase the input when tokenizing. Whether or not to lowercase the input when tokenizing.
bos_token (`string`, *optional*, defaults to "[CLS]"): bos_token (`string`, *optional*, defaults to `"[CLS]"`):
The beginning of sequence token that was used during pre-training. Can be used a sequence classifier token. The beginning of sequence token that was used during pre-training. Can be used a sequence classifier token.
When building a sequence using special tokens, this is not the token that is used for the beginning of When building a sequence using special tokens, this is not the token that is used for the beginning of
sequence. The token used is the `cls_token`. sequence. The token used is the `cls_token`.
eos_token (`string`, *optional*, defaults to "[SEP]"): eos_token (`string`, *optional*, defaults to `"[SEP]"`):
The end of sequence token. When building a sequence using special tokens, this is not the token that is The end of sequence token. When building a sequence using special tokens, this is not the token that is
used for the end of sequence. The token used is the `sep_token`. used for the end of sequence. The token used is the `sep_token`.
unk_token (`str`, *optional*, defaults to `"[UNK]"`): unk_token (`str`, *optional*, defaults to `"[UNK]"`):

View File

@@ -59,3 +59,4 @@ src/transformers/models/wav2vec2/modeling_wav2vec2.py
src/transformers/models/wav2vec2/tokenization_wav2vec2.py src/transformers/models/wav2vec2/tokenization_wav2vec2.py
src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py
src/transformers/models/wavlm/modeling_wavlm.py src/transformers/models/wavlm/modeling_wavlm.py
src/transformers/models/ctrl/modeling_ctrl.py