[split_special_tokens] Add support for split_special_tokens argument to encode (#25081)

* draft changes

* update and add tests

* styling for no

* move test

* path to usable model

* update test

* small update

* update bertbased tokenizers

* don'tuse kwargs for _tokenize

* don'tuse kwargs for _tokenize

* fix copies

* update

* update test for special tokenizers

* fixup

* skip two tests

* remove pdb breakpiont()

* wowo

* rewrite custom tests

* nits

* revert chang in target keys

* fix markup lm

* update documentation of the argument
This commit is contained in:
Arthur
2023-08-18 13:26:27 +02:00
committed by GitHub
parent 9d7afd2536
commit 30b3c46ff5
18 changed files with 122 additions and 24 deletions

View File

@@ -264,6 +264,10 @@ class LayoutLMv3TokenizationTest(TokenizerTesterMixin, unittest.TestCase):
def test_right_and_left_truncation(self):
pass
@unittest.skip("Not implemented")
def test_split_special_tokens(self):
pass
def test_encode_plus_with_padding(self):
tokenizers = self.get_tokenizers(do_lower_case=False)
for tokenizer in tokenizers: