[lamaTokenizerFast] Update documentation (#24132)
* Update documentation * nits
This commit is contained in:
@@ -65,6 +65,7 @@ This model was contributed by [zphang](https://huggingface.co/zphang) with contr
|
|||||||
- build_inputs_with_special_tokens
|
- build_inputs_with_special_tokens
|
||||||
- get_special_tokens_mask
|
- get_special_tokens_mask
|
||||||
- create_token_type_ids_from_sequences
|
- create_token_type_ids_from_sequences
|
||||||
|
- update_post_processor
|
||||||
- save_vocabulary
|
- save_vocabulary
|
||||||
|
|
||||||
## LlamaModel
|
## LlamaModel
|
||||||
|
|||||||
@@ -48,6 +48,12 @@ class LlamaTokenizerFast(PreTrainedTokenizerFast):
|
|||||||
>>> [1, 15043, 445, 338, 263, 1243]
|
>>> [1, 15043, 445, 338, 263, 1243]
|
||||||
```
|
```
|
||||||
|
|
||||||
|
If you want to change the `bos_token` or the `eos_token`, make sure to specify them when initializing the model, or
|
||||||
|
call `tokenizer.update_post_processor()` to make sure that the post-processing is correctly done (otherwise the
|
||||||
|
values of the first token and final token of an encoded sequence will not be correct). For more details, checkout
|
||||||
|
[post-processors] (https://huggingface.co/docs/tokenizers/api/post-processors) documentation.
|
||||||
|
|
||||||
|
|
||||||
This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
|
This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should
|
||||||
refer to this superclass for more information regarding those methods.
|
refer to this superclass for more information regarding those methods.
|
||||||
|
|
||||||
@@ -108,6 +114,9 @@ class LlamaTokenizerFast(PreTrainedTokenizerFast):
|
|||||||
self.can_save_slow_tokenizer = False if not self.vocab_file else True
|
self.can_save_slow_tokenizer = False if not self.vocab_file else True
|
||||||
|
|
||||||
def update_post_processor(self):
|
def update_post_processor(self):
|
||||||
|
"""
|
||||||
|
Updates the underlying post processor with the current `bos_token` and `eos_token`.
|
||||||
|
"""
|
||||||
bos = self.bos_token
|
bos = self.bos_token
|
||||||
bos_token_id = self.bos_token_id
|
bos_token_id = self.bos_token_id
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user