HuggingFace_transformer

Author	SHA1	Message	Date
Thomas Wolf	5afca00b47	Merge pull request #1724 from huggingface/fix_encode_plus Fix encode_plus	2019-11-27 17:14:49 +01:00
Thomas Wolf	21637d4924	Merge branch 'master' into do_lower_case	2019-11-27 17:04:39 +01:00
Julien Chaumond	8742baa531	Improve test protocol for inputs_embeds in TF	2019-11-26 14:39:47 -05:00
Julien Chaumond	cf62bdc962	Improve test protocol for inputs_embeds in TF cc @lysandrejik	2019-11-26 14:37:32 -05:00
Lysandre	f2f329408d	Fix input embeddings	2019-11-26 13:08:12 -05:00
Lysandre	b18509c208	Tests for ALBERT in TF2 + fixes	2019-11-26 13:08:12 -05:00
Lysandre	9d5c49546f	Tests for AlbertForQuestionAnswering AlbertForSequenceClassification	2019-11-26 13:08:12 -05:00
Lysandre	16263f9685	Headmasking	2019-11-26 13:08:12 -05:00
Lysandre	abb23a78ba	Head pruning for ALBERT	2019-11-26 13:08:12 -05:00
Lysandre	c14a22272f	ALBERT passes all tests	2019-11-26 13:08:12 -05:00
Lysandre	870320a24e	Early tests	2019-11-26 13:08:12 -05:00
Lysandre	1e5b31c388	Several fixes and improvements	2019-11-26 13:08:12 -05:00
Lysandre	ee20201d33	Tokenization tests + fixes + init	2019-11-26 13:08:12 -05:00
Thomas Wolf	74ce8de7d8	Merge pull request #1792 from stefan-it/distilbert-for-token-classification DistilBERT for token classification	2019-11-14 22:47:53 +01:00
Thomas Wolf	5b322a36db	Merge pull request #1811 from huggingface/special-tokens Fix special tokens addition in decoder #1807	2019-11-14 22:17:24 +01:00
Thomas Wolf	df99f8c5a1	Merge pull request #1832 from huggingface/memory-leak-schedulers replace LambdaLR scheduler wrappers by function	2019-11-14 22:10:31 +01:00
Rémi Louf	022525b003	replace LambdaLR scheduler wrappers by function Custom schedulers are currently initiated by wrapping Pytorch's LambdaLR class and passing a method of the wrapping class to the __init__ function of LambdaLR. This approach is not appropriate for several reasons: 1. one does not need to define a class when it only defines a __init__() method; 2. instantiating the parent class by passing a method of the child class creates a cyclical reference which leads to memory leaks. See issues #1742 and #1134. In this commit we replace the wrapper classes with functions that instantiate `LambdaLR` with a custom learning rate function. We use a closure to specify the parameter of the latter. We also do a bit of renaming within the function to explicit the behaviour and removed docstrings that were subsequently not necessary.	2019-11-14 15:39:08 +01:00
Lysandre	74d0bcb6ff	Fix special tokens addition in decoder	2019-11-12 15:27:57 -05:00
Julien Chaumond	155c782a2c	[inputs_embeds] All TF models + tests	2019-11-12 11:29:21 -05:00
Julien Chaumond	2aef2f0bbc	[common attributes] Fix previous commit for transfo-xl	2019-11-12 11:29:21 -05:00
Julien Chaumond	2f17464266	[common attributes] Slightly sharper test coverage	2019-11-12 11:29:21 -05:00
Julien Chaumond	9d2398fd99	Ooopsie	2019-11-12 11:29:21 -05:00
Julien Chaumond	70d97ddd60	[TF models] Common attributes as per #1721	2019-11-12 11:29:21 -05:00
Michael Watkins	7246d3c2f9	Consider do_lower_case in PreTrainedTokenizer As pointed out in #1545, when using an uncased model, and adding a new uncased token, the tokenizer does not correctly identify this in the case that the input text contains the token in a cased format. For instance, if we load bert-base-uncased into BertTokenizer, and then use .add_tokens() to add "cool-token", we get the expected result for .tokenize('this is a cool-token'). However, we get a possibly unexpected result for .tokenize('this is a cOOl-Token'), which in fact mirrors the result for the former from before the new token was added. This commit adds - functionality to PreTrainedTokenizer to handle this situation in case a tokenizer (currently Bert, DistilBert, and XLNet) has the do_lower_case=True kwarg by: 1) lowercasing tokens added with .add_tokens() 2) lowercasing text at the beginning of .tokenize() - new common test case for tokenizers https://github.com/huggingface/transformers/issues/1545	2019-11-12 13:08:30 +02:00
Stefan Schweter	94e55253ae	tests: add test case for DistilBertForTokenClassification implementation	2019-11-11 16:20:15 +01:00
Julien Chaumond	27e015bd54	[tests] Flag to test on cuda	2019-11-06 14:03:47 -05:00
Julien Chaumond	13d9135fa5	[tests] get rid of warning cf. https://docs.pytest.org/en/latest/example/simple.html	2019-11-06 14:03:47 -05:00
Julien Chaumond	00337e9687	[inputs_embeds] All PyTorch models	2019-11-05 00:39:18 +00:00
thomwolf	8d6b9d717c	fix #1532 and encode_plus	2019-11-04 17:07:51 +01:00
thomwolf	b340a910ed	fix tests - flagged as slow all the tests downloading from AWS	2019-11-04 16:03:36 +01:00
thomwolf	f02805da6f	fix tests	2019-11-04 15:42:23 +01:00
thomwolf	1724cee8c4	switch from properties to methods	2019-11-04 15:34:10 +01:00
thomwolf	9b45d0f878	Add common properties input_embeddings and output_embeddings	2019-11-04 12:28:56 +01:00
Thomas Wolf	3df4367244	Merge pull request #1601 from huggingface/clean-roberta Clean roberta model & all tokenizers now add special tokens by default (breaking change)	2019-10-30 17:00:40 +01:00
Thomas Wolf	36174696cc	Merge branch 'master' into clean-roberta	2019-10-30 16:51:06 +01:00
Thomas Wolf	228cdd6a6e	Merge branch 'master' into conditional-generation	2019-10-30 16:40:35 +01:00
Rémi Louf	a88a0e4413	add tests to encoder-decoder model	2019-10-30 16:06:29 +01:00
Rémi Louf	3f07cd419c	update test on Bert to include decoder mode	2019-10-30 15:09:53 +01:00
Matt Maybeno	66085a1321	RoBERTa token classification [WIP] copy paste bert token classification for roberta	2019-10-24 14:32:48 -04:00
Lysandre	7d709e55ed	Remove	2019-10-22 14:12:33 -04:00
Rémi Louf	33c01368b1	remove Bert2Rnd test	2019-10-16 18:13:05 +02:00
thomwolf	898ce064f8	add tests on TF2.0 & PT checkpoint => model convertion functions	2019-10-15 10:04:19 +02:00
thomwolf	18a3cef7d5	no nans	2019-10-11 16:09:42 +02:00
thomwolf	1f5d9513d8	fix test	2019-10-11 15:55:01 +02:00
thomwolf	0f9fc4fbde	adding option to desactivate past/memory outputs	2019-10-11 15:47:08 +02:00
Rémi Louf	1e68c28670	add test for initialization of Bert2Rnd	2019-10-10 18:07:11 +02:00
thomwolf	da26bae61b	adding more tests on TF and pytorch serialization - updating configuration for better serialization	2019-10-10 14:30:48 +02:00
thomwolf	bb04edb45b	Add tests that TF 2.0 model can be integrated with other Keras modules	2019-10-10 13:08:24 +02:00
Lysandre Debut	2431fea98a	Merge pull request #1383 from keskarnitish/master Adding CTRL	2019-10-09 11:31:05 -04:00
thomwolf	07d055f849	higher tolerance	2019-10-09 17:10:04 +02:00

1 2

68 Commits