Add mLUKE (#14640)
* implement MLukeTokenizer and LukeForMaskedLM * update tests * update docs * add LukeForMaskedLM to check_repo.py * update README * fix test and specify the entity pad id in tokenization_(m)luke * fix EntityPredictionHeadTransform
This commit is contained in:
@@ -20,7 +20,7 @@ Rust library `tokenizers <https://github.com/huggingface/tokenizers>`__. The "Fa
|
||||
1. a significant speed-up in particular when doing batched tokenization and
|
||||
2. additional methods to map between the original string (character and words) and the token space (e.g. getting the
|
||||
index of the token comprising a given character or the span of characters corresponding to a given token). Currently
|
||||
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLMRoBERTa
|
||||
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLM-RoBERTa
|
||||
and XLNet models).
|
||||
|
||||
The base classes :class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast`
|
||||
|
||||
Reference in New Issue
Block a user