VLM: special multimodal Tokenizer (#34461)
* kinda works * update * add tests * update * use special tokens in processors * typo * fix copies * fix * fix moshi after rebase * update * fix tests * update * Update docs/source/en/main_classes/tokenizer.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update docs * test for load time adding tokens * fix some more tests which are now fetched better * one more fix --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
ef976a7e18
commit
187439c3fa
@@ -51,6 +51,25 @@ token space (e.g., getting the index of the token comprising a given character o
|
||||
to a given token).
|
||||
|
||||
|
||||
# Multimodal Tokenizer
|
||||
|
||||
Apart from that each tokenizer can be a "multimodal" tokenizer which means that the tokenizer will hold all relevant special tokens
|
||||
as part of tokenizer attributes for easier access. For example, if the tokenizer is loaded from a vision-language model like LLaVA, you will
|
||||
be able to access `tokenizer.image_token_id` to obtain the special image token used as a placeholder.
|
||||
|
||||
To enable extra special tokens for any type of tokenizer, you have to add the following lines and save the tokenizer. Extra special tokens do not
|
||||
have to be modality related and can ne anything that the model often needs access to. In the below code, tokenizer at `output_dir` will have direct access
|
||||
to three more special tokens.
|
||||
|
||||
```python
|
||||
vision_tokenizer = AutoTokenizer.from_pretrained(
|
||||
"llava-hf/llava-1.5-7b-hf",
|
||||
extra_special_tokens={"image_token": "<image>", "boi_token": "<image_start>", "eoi_token": "<image_end>"}
|
||||
)
|
||||
print(vision_tokenizer.image_token, vision_tokenizer.image_token_id)
|
||||
("<image>", 32000)
|
||||
```
|
||||
|
||||
## PreTrainedTokenizer
|
||||
|
||||
[[autodoc]] PreTrainedTokenizer
|
||||
|
||||
Reference in New Issue
Block a user