Fix for #3865. PretrainedTokenizer mapped " do not" into " don't" when .decode(...) is called. Removed the " do not" --> " don't" mapping from clean_up_tokenization(...). (#4024)

This commit is contained in:
Denis
2020-05-13 14:32:57 +02:00
committed by GitHub
parent 241759101e
commit 1e51bb717c

View File

@@ -2195,7 +2195,6 @@ class PreTrainedTokenizer(SpecialTokensMixin):
.replace(" ' ", "'")
.replace(" n't", "n't")
.replace(" 'm", "'m")
.replace(" do not", " don't")
.replace(" 's", "'s")
.replace(" 've", "'ve")
.replace(" 're", "'re")