clean_up_tokenization_spaces=False if unset (#31938)

* clean_up_tokenization_spaces=False if unset

* deprecate warning

* updating param for old models

* update models

* make fix-copies

* fix-copies and update bert models

* warning msg

* update prophet and clvp

* updating test since space before is arbitrarily removed

* remove warning for 4.45
This commit is contained in:
Ita Zaporozhets
2024-09-26 13:38:20 -04:00
committed by GitHub
parent 3557f9a14a
commit 6730485b02
16 changed files with 67 additions and 7 deletions

View File

@@ -249,7 +249,7 @@ class Wav2Vec2PhonemeCTCTokenizerTest(TokenizerTesterMixin, unittest.TestCase):
# fmt: on
batch_tokens = tokenizer.batch_decode(sample_ids)
self.assertEqual(batch_tokens, ["k s ɾ ɾ l ɭʲ!?!? $$$", "j ð s j ð s oːɹ $$$"])
self.assertEqual(batch_tokens, ["k s ɾ ɾ l ɭʲ ! ? ! ? $$$", "j ð s j ð s oːɹ $$$"])
@staticmethod
def get_from_offsets(offsets, key):