Tokenization behave the same as original XLM proprocessing for most languages except zh, ja and th; Change API to allow specifying language in tokenize
This commit is contained in:
@@ -9,4 +9,6 @@ requests
|
||||
# For OpenAI GPT
|
||||
regex
|
||||
# For XLNet
|
||||
sentencepiece
|
||||
sentencepiece
|
||||
# For XLM
|
||||
sacremoses
|
||||
Reference in New Issue
Block a user