@@ -41,7 +41,7 @@ The main tool for preprocessing textual data is a [tokenizer](main_classes/token
|
|||||||
|
|
||||||
<Tip>
|
<Tip>
|
||||||
|
|
||||||
If you plan on using a pretrained model, it's important to use the associated pretrained tokenizer. This ensures the text is split the same way as the pretraining corpus, and uses the same corresponding tokens-to-index (usually referrred to as the *vocab*) during pretraining.
|
If you plan on using a pretrained model, it's important to use the associated pretrained tokenizer. This ensures the text is split the same way as the pretraining corpus, and uses the same corresponding tokens-to-index (usually referred to as the *vocab*) during pretraining.
|
||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user