RafaelWO
cb276b41de
Transformer-XL: Improved tokenization with sacremoses ( #6322 )
...
* Improved tokenization with sacremoses
* The TransfoXLTokenizer is now using sacremoses for tokenization
* Added tokenization of comma-separated and floating point numbers.
* Removed prepare_for_tokenization() from tokenization_transfo_xl.py because punctuation is handled by sacremoses
* Added corresponding tests
* Removed test comapring TransfoXLTokenizer and TransfoXLTokenizerFast
* Added deprecation warning to TransfoXLTokenizerFast
* isort change
Co-authored-by: Teven <teven.lescao@gmail.com >
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2020-08-28 09:56:17 -04:00
..
2020-05-07 13:48:44 -04:00
2020-01-06 15:11:12 +01:00
2020-07-31 04:34:46 -04:00
2020-08-24 11:03:01 -04:00
2020-08-24 11:03:01 -04:00
2020-08-24 11:03:01 -04:00
2020-08-27 20:01:17 -04:00
2020-07-01 10:31:17 -04:00
2020-03-05 17:16:57 -05:00
2020-08-20 11:13:50 -04:00
2020-07-01 10:31:17 -04:00
2020-08-24 11:03:01 -04:00
2020-07-31 04:44:23 -04:00
2020-08-26 11:10:36 -04:00
2020-01-06 15:11:12 +01:00
2020-08-26 17:20:22 +02:00
2020-08-24 11:03:01 -04:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-24 11:03:01 -04:00
2020-08-24 11:03:01 -04:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-25 14:06:28 -04:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-24 11:03:01 -04:00
2020-08-24 11:03:01 -04:00
2020-08-24 11:03:01 -04:00
2020-08-26 17:20:22 +02:00
2020-08-26 23:16:06 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 14:55:41 -04:00
2020-08-24 11:03:01 -04:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-24 11:03:01 -04:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-24 11:03:01 -04:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-17 09:04:35 -04:00
2020-08-24 11:03:01 -04:00
2020-08-27 18:31:51 +02:00
2020-08-26 17:20:22 +02:00
2020-06-15 17:12:51 -04:00
2020-07-01 10:31:17 -04:00
2020-08-17 12:00:23 +08:00
2020-08-06 18:52:28 +08:00
2020-08-26 17:20:22 +02:00
2020-06-15 17:12:51 -04:00
2020-07-01 10:31:17 -04:00
2020-07-07 16:35:12 +02:00
2020-08-28 09:56:17 -04:00
2020-06-15 17:12:51 -04:00
2020-08-11 15:57:07 -04:00
2020-08-26 17:20:22 +02:00
2020-06-15 17:12:51 -04:00
2020-08-11 14:31:23 -04:00
2020-08-26 17:20:22 +02:00
2020-07-01 10:31:17 -04:00
2020-08-26 17:20:22 +02:00
2020-08-28 09:56:17 -04:00
2020-08-12 08:00:56 -04:00
2020-07-01 10:31:17 -04:00
2020-07-01 10:31:17 -04:00
2020-07-01 10:31:17 -04:00
2020-08-20 11:13:50 -04:00
2020-08-27 12:22:18 -04:00