Add a TF in-graph tokenizer for BERT (#17701)
* Add a TF in-graph tokenizer for BERT * Add from_pretrained * Add proper truncation, option handling to match other tokenizers * Add proper imports and guards * Add test, fix all the bugs exposed by said test * Fix truncation of paired texts in graph mode, more test updates * Small fixes, add a (very careful) test for savedmodel * Add tensorflow-text dependency, make fixup * Update documentation * Update documentation * make fixup * Slight changes to tests * Add some docstring examples * Update tests * Update tests and add proper lowercasing/normalization * make fixup * Add docstring for padding! * Mark slow tests * make fixup * Fall back to BertTokenizerFast if BertTokenizer is unavailable * Fall back to BertTokenizerFast if BertTokenizer is unavailable * make fixup * Properly handle tensorflow-text dummies
This commit is contained in:
@@ -58,6 +58,10 @@ This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The o
|
||||
|
||||
[[autodoc]] BertTokenizerFast
|
||||
|
||||
## TFBertTokenizer
|
||||
|
||||
[[autodoc]] TFBertTokenizer
|
||||
|
||||
## Bert specific outputs
|
||||
|
||||
[[autodoc]] models.bert.modeling_bert.BertForPreTrainingOutput
|
||||
|
||||
Reference in New Issue
Block a user