Thomas Wolf
|
5b322a36db
|
Merge pull request #1811 from huggingface/special-tokens
Fix special tokens addition in decoder #1807
|
2019-11-14 22:17:24 +01:00 |
|
Thomas Wolf
|
1a237d7f42
|
Merge pull request #1831 from iedmrc/gpt2-tokenization-sum-func-replacement
sum() is replaced by itertools.chain.from_iterable()
|
2019-11-14 22:11:54 +01:00 |
|
Lysandre
|
a67e747889
|
Reorganized max_len warning
|
2019-11-14 10:30:22 -05:00 |
|
İbrahim Ethem Demirci
|
7627dde1f8
|
sum() is the leanest method to flatten a string list, so it's been replaced by itertools.chain.from_iterable()
|
2019-11-14 17:06:15 +03:00 |
|
Lysandre
|
74d0bcb6ff
|
Fix special tokens addition in decoder
|
2019-11-12 15:27:57 -05:00 |
|
Lysandre
|
b5d330d118
|
Fix #1784
|
2019-11-11 10:15:14 -05:00 |
|
Lysandre
|
7d709e55ed
|
Remove
|
2019-10-22 14:12:33 -04:00 |
|
thomwolf
|
a5997dd81a
|
better error messages
|
2019-10-10 11:31:01 +02:00 |
|
Lysandre Debut
|
e84470ef81
|
Merge pull request #1384 from huggingface/encoding-qol
Quality of life enhancements in encoding + patch MLM masking
|
2019-10-09 11:18:24 -04:00 |
|
thomwolf
|
78ef1a9930
|
fixes
|
2019-10-04 17:59:44 -04:00 |
|
thomwolf
|
6c1d0bc066
|
update encode_plus - add truncation strategies
|
2019-10-04 17:38:38 -04:00 |
|
thomwolf
|
92c0f2fb90
|
Merge remote-tracking branch 'origin/julien_multiple-choice' into encoding-qol
|
2019-10-04 15:48:06 -04:00 |
|
LysandreJik
|
7bddb45a6f
|
Decode documentaton
|
2019-10-04 14:27:38 -04:00 |
|
LysandreJik
|
aebd83230f
|
Update naming + remove f string in run_lm_finetuning example
|
2019-10-03 11:31:36 -04:00 |
|
LysandreJik
|
651bfb7ad5
|
always_truncate by default
|
2019-10-03 11:31:36 -04:00 |
|
LysandreJik
|
cc412edd42
|
Supports already existing special tokens
|
2019-10-03 11:31:36 -04:00 |
|
LysandreJik
|
2f259b228e
|
Sequence IDS
|
2019-10-03 11:31:36 -04:00 |
|
LysandreJik
|
7c789c337d
|
Always truncate argument in the encode method
|
2019-10-03 11:31:36 -04:00 |
|
danai-antoniou
|
a95158518d
|
Moved duplicate token check
|
2019-10-02 07:44:15 +01:00 |
|
danai-antoniou
|
d73957899a
|
Merge branch 'master' of https://github.com/danai-antoniou/pytorch-transformers into add-duplicate-tokens-error
|
2019-10-02 07:38:50 +01:00 |
|
thomwolf
|
391db836ab
|
fix #1260 - remove special logic for decoding pairs of sequence
|
2019-10-01 19:09:13 -04:00 |
|
Julien Chaumond
|
b350662955
|
overflowing_tokens do not really make sense here, let's just return a number
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
|
2019-09-30 16:37:09 -04:00 |
|
Julien Chaumond
|
f5bcde0b2f
|
[multiple-choice] Simplify and use tokenizer.encode_plus
|
2019-09-30 16:04:55 -04:00 |
|
Julien Chaumond
|
d8b641c839
|
6 -> 8 models
|
2019-09-27 17:22:01 -04:00 |
|
thomwolf
|
31c23bd5ee
|
[BIG] pytorch-transformers => transformers
|
2019-09-26 10:15:53 +02:00 |
|