add in layer gpt2 tokenizer (#20421)

* add minimal working gpt2 tokenizer * graph mode and output equivalence tests working * not today tensorflow. serialization test passing! * fix style, documentation, docstrings and all that jazz * passing consistency checks * move keras nlp to tf dependencies * fix tf modeling utils and gpt2 attention to enable compiling * fix (I hope) keras nlp dependencies * rever changes on generation * remove debug prints * remove redundant tf dummy objects * add from config, get config and max length settings to address review * let flake ignore the error on distillation you are welcome * test from config * add padding test * address sgugger review
2022-11-29 12:02:40 -03:00
parent e8d448edcf
commit fb2b45e562
11 changed files with 297 additions and 4 deletions
--- a/docs/source/en/model_doc/gpt2.mdx
+++ b/docs/source/en/model_doc/gpt2.mdx
@@ -138,6 +138,10 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h

 [[autodoc]] modeling_tf_outputs.TFSequenceClassifierOutputWithPast

+## TFGPT2Tokenizer
+
+[[autodoc]] TFGPT2Tokenizer
+
 ## FlaxGPT2Model

 [[autodoc]] FlaxGPT2Model