@@ -41,7 +41,7 @@ The main tool for preprocessing textual data is a [tokenizer](main_classes/token
|
||||
|
||||
<Tip>
|
||||
|
||||
If you plan on using a pretrained model, it's important to use the associated pretrained tokenizer. This ensures the text is split the same way as the pretraining corpus, and uses the same corresponding tokens-to-index (usually referrred to as the *vocab*) during pretraining.
|
||||
If you plan on using a pretrained model, it's important to use the associated pretrained tokenizer. This ensures the text is split the same way as the pretraining corpus, and uses the same corresponding tokens-to-index (usually referred to as the *vocab*) during pretraining.
|
||||
|
||||
</Tip>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user