Commit Graph

25 Commits

Author SHA1 Message Date
Thomas Wolf
5b322a36db Merge pull request #1811 from huggingface/special-tokens
Fix special tokens addition in decoder #1807
2019-11-14 22:17:24 +01:00
Thomas Wolf
1a237d7f42 Merge pull request #1831 from iedmrc/gpt2-tokenization-sum-func-replacement
sum() is replaced by itertools.chain.from_iterable()
2019-11-14 22:11:54 +01:00
Lysandre
a67e747889 Reorganized max_len warning 2019-11-14 10:30:22 -05:00
İbrahim Ethem Demirci
7627dde1f8 sum() is the leanest method to flatten a string list, so it's been replaced by itertools.chain.from_iterable() 2019-11-14 17:06:15 +03:00
Lysandre
74d0bcb6ff Fix special tokens addition in decoder 2019-11-12 15:27:57 -05:00
Lysandre
b5d330d118 Fix #1784 2019-11-11 10:15:14 -05:00
Lysandre
7d709e55ed Remove 2019-10-22 14:12:33 -04:00
thomwolf
a5997dd81a better error messages 2019-10-10 11:31:01 +02:00
Lysandre Debut
e84470ef81 Merge pull request #1384 from huggingface/encoding-qol
Quality of life enhancements in encoding + patch MLM masking
2019-10-09 11:18:24 -04:00
thomwolf
78ef1a9930 fixes 2019-10-04 17:59:44 -04:00
thomwolf
6c1d0bc066 update encode_plus - add truncation strategies 2019-10-04 17:38:38 -04:00
thomwolf
92c0f2fb90 Merge remote-tracking branch 'origin/julien_multiple-choice' into encoding-qol 2019-10-04 15:48:06 -04:00
LysandreJik
7bddb45a6f Decode documentaton 2019-10-04 14:27:38 -04:00
LysandreJik
aebd83230f Update naming + remove f string in run_lm_finetuning example 2019-10-03 11:31:36 -04:00
LysandreJik
651bfb7ad5 always_truncate by default 2019-10-03 11:31:36 -04:00
LysandreJik
cc412edd42 Supports already existing special tokens 2019-10-03 11:31:36 -04:00
LysandreJik
2f259b228e Sequence IDS 2019-10-03 11:31:36 -04:00
LysandreJik
7c789c337d Always truncate argument in the encode method 2019-10-03 11:31:36 -04:00
danai-antoniou
a95158518d Moved duplicate token check 2019-10-02 07:44:15 +01:00
danai-antoniou
d73957899a Merge branch 'master' of https://github.com/danai-antoniou/pytorch-transformers into add-duplicate-tokens-error 2019-10-02 07:38:50 +01:00
thomwolf
391db836ab fix #1260 - remove special logic for decoding pairs of sequence 2019-10-01 19:09:13 -04:00
Julien Chaumond
b350662955 overflowing_tokens do not really make sense here, let's just return a number
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2019-09-30 16:37:09 -04:00
Julien Chaumond
f5bcde0b2f [multiple-choice] Simplify and use tokenizer.encode_plus 2019-09-30 16:04:55 -04:00
Julien Chaumond
d8b641c839 6 -> 8 models 2019-09-27 17:22:01 -04:00
thomwolf
31c23bd5ee [BIG] pytorch-transformers => transformers 2019-09-26 10:15:53 +02:00