Files
HuggingFace_transformer/tests
RafaelWO cb276b41de Transformer-XL: Improved tokenization with sacremoses (#6322)
* Improved tokenization with sacremoses

 * The TransfoXLTokenizer is now using sacremoses for tokenization
 * Added tokenization of comma-separated and floating point numbers.
 * Removed prepare_for_tokenization() from tokenization_transfo_xl.py because punctuation is handled by sacremoses
 * Added corresponding tests
 * Removed test comapring TransfoXLTokenizer and TransfoXLTokenizerFast
 * Added deprecation warning to TransfoXLTokenizerFast

* isort change

Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-28 09:56:17 -04:00
..
2020-05-07 13:48:44 -04:00
2020-08-24 11:03:01 -04:00
2020-08-26 11:10:36 -04:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-17 09:04:35 -04:00
2020-08-27 18:31:51 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00
2020-08-26 17:20:22 +02:00