TF port of ESM (#19587)

* Partial TF port for ESM model

* Add ESM-TF tests

* Add the various imports for TF-ESM

* TF weight conversion almost ready

* Stop ignoring the decoder weights in PT

* Add tests and lots of fixes

* fix-copies

* Fix imports, add model docs

* Add get_vocab() to tokenizer

* Fix vocab links for pretrained files

* Allow multiple inputs with a sep

* Use EOS as SEP token because ESM vocab lacks SEP

* Correctly return special tokens mask from ESM tokenizer

* make fixup

* Stop testing unsupported embedding resizing

* Handle TF bias correctly

* Skip all models with slow tokenizers in the token classification test

* Fixing the batch/unbatcher of pipelines to accomodate the `None` being

passed around.

* Fixing pipeline bug caused by slow tokenizer  being different.

* Update src/transformers/models/esm/modeling_tf_esm.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/models/esm/modeling_tf_esm.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/models/esm/modeling_tf_esm.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update set_input_embeddings and the copyright notices

Co-authored-by: Your Name <you@example.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
This commit is contained in:
Matt
2022-10-17 14:16:16 +01:00
committed by GitHub
parent d7754c43d0
commit 3b3024da70
15 changed files with 1898 additions and 84 deletions

View File

@@ -44,6 +44,8 @@ class TokenClassificationPipelineTests(unittest.TestCase, metaclass=PipelineTest
def run_pipeline_test(self, token_classifier, _):
model = token_classifier.model
tokenizer = token_classifier.tokenizer
if not tokenizer.is_fast:
return # Slow tokenizers do not return offsets mappings, so this test will fail
outputs = token_classifier("A simple string")
self.assertIsInstance(outputs, list)