HuggingFace_transformer

Author	SHA1	Message	Date
Rémi Louf	c0443df593	remove beam search	2019-12-09 20:37:55 -05:00
Rémi Louf	4735c2af07	tweaks to the BeamSearch API	2019-12-09 20:37:55 -05:00
Rémi Louf	9660ba1cbd	Add beam search	2019-12-09 20:37:55 -05:00
Aymeric Augustin	0cb163865a	Remove pytest dependency. (#2093 )	2019-12-07 07:46:14 -05:00
Michael Watkins	2670b0d682	Fix bug which lowercases special tokens	2019-12-06 16:15:53 -05:00
Aymeric Augustin	35401fe50f	Remove dependency on pytest for running tests (#2055 ) * Switch to plain unittest for skipping slow tests. Add a RUN_SLOW environment variable for running them. * Switch to plain unittest for PyTorch dependency. * Switch to plain unittest for TensorFlow dependency. * Avoid leaking open files in the test suite. This prevents spurious warnings when running tests. * Fix unicode warning on Python 2 when running tests. The warning was: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal * Support running PyTorch tests on a GPU. Reverts `27e015bd`. * Tests no longer require pytest. * Make tests pass on cuda	2019-12-06 13:57:38 -05:00
Thomas Wolf	5482822a2b	Merge pull request #2046 from jplu/tf2-ner-example Add NER TF2 example.	2019-12-06 12:12:22 +01:00
Thomas Wolf	bebaa14039	Merge pull request #2045 from aaugustin/remove-dead-code Remove dead code in tests.	2019-12-05 14:41:56 +01:00
thomwolf	3268ebd229	fix xlnet test	2019-12-05 13:35:29 +01:00
Julien Plu	9200a759d7	Add few tests on the TF optimization file with some info in the documentation. Complete the README.	2019-12-05 12:56:43 +01:00
Thomas Wolf	1eaf44e713	Merge pull request #2007 from roskoN/xlnet_attention_fix fixed XLNet attention output for both attention streams whenever target_mapping is provided	2019-12-05 12:32:39 +01:00
Thomas Wolf	d425a4d60b	Merge pull request #1870 from alexzubiaga/xlnet-for-token-classification XLNet for Token classification	2019-12-05 09:54:09 +01:00
Julien Chaumond	96fa9a8a70	Python 2 + Post mime-type to S3	2019-12-04 17:22:50 -05:00
Aymeric Augustin	40255ab002	Remove dead code in tests.	2019-12-04 08:21:02 +01:00
Julien Chaumond	e4fbf3e2cc	CLI for authenticated file sharing	2019-12-04 00:52:23 -05:00
Aymeric Augustin	5ab93083e4	Mark tests in TFAutoModelTest as slow. Each test forces downloading the same 536MB file, which is slow even with a decent internet connection.	2019-12-01 18:25:15 +01:00
Rostislav Nedelchev	76c0bc06d5	[XLNet] Changed post-processing of attention w.r.t to target_mapping Whenever target_mapping is provided to the input, XLNet outputs two different attention streams. Based on that the attention output would be on of the two: - a list of tensors (usual case for most transformers) - a list of 2-tuples of tensors, one tesor for each of attention streams Docs and unit-tests have been updated	2019-11-30 21:01:04 +01:00
Thomas Wolf	5afca00b47	Merge pull request #1724 from huggingface/fix_encode_plus Fix encode_plus	2019-11-27 17:14:49 +01:00
Thomas Wolf	21637d4924	Merge branch 'master' into do_lower_case	2019-11-27 17:04:39 +01:00
Julien Chaumond	8742baa531	Improve test protocol for inputs_embeds in TF	2019-11-26 14:39:47 -05:00
Julien Chaumond	cf62bdc962	Improve test protocol for inputs_embeds in TF cc @lysandrejik	2019-11-26 14:37:32 -05:00
Lysandre	f2f329408d	Fix input embeddings	2019-11-26 13:08:12 -05:00
Lysandre	b18509c208	Tests for ALBERT in TF2 + fixes	2019-11-26 13:08:12 -05:00
Lysandre	9d5c49546f	Tests for AlbertForQuestionAnswering AlbertForSequenceClassification	2019-11-26 13:08:12 -05:00
Lysandre	16263f9685	Headmasking	2019-11-26 13:08:12 -05:00
Lysandre	abb23a78ba	Head pruning for ALBERT	2019-11-26 13:08:12 -05:00
Lysandre	c14a22272f	ALBERT passes all tests	2019-11-26 13:08:12 -05:00
Lysandre	870320a24e	Early tests	2019-11-26 13:08:12 -05:00
Lysandre	1e5b31c388	Several fixes and improvements	2019-11-26 13:08:12 -05:00
Lysandre	ee20201d33	Tokenization tests + fixes + init	2019-11-26 13:08:12 -05:00
alexzubiaga	4193aa9f81	add TFXLNetForTokenClassification implementation and unit test add XLNetForTokenClassification implementation and unit tests	2019-11-19 12:47:54 +01:00
Thomas Wolf	74ce8de7d8	Merge pull request #1792 from stefan-it/distilbert-for-token-classification DistilBERT for token classification	2019-11-14 22:47:53 +01:00
Thomas Wolf	5b322a36db	Merge pull request #1811 from huggingface/special-tokens Fix special tokens addition in decoder #1807	2019-11-14 22:17:24 +01:00
Thomas Wolf	df99f8c5a1	Merge pull request #1832 from huggingface/memory-leak-schedulers replace LambdaLR scheduler wrappers by function	2019-11-14 22:10:31 +01:00
Rémi Louf	022525b003	replace LambdaLR scheduler wrappers by function Custom schedulers are currently initiated by wrapping Pytorch's LambdaLR class and passing a method of the wrapping class to the __init__ function of LambdaLR. This approach is not appropriate for several reasons: 1. one does not need to define a class when it only defines a __init__() method; 2. instantiating the parent class by passing a method of the child class creates a cyclical reference which leads to memory leaks. See issues #1742 and #1134. In this commit we replace the wrapper classes with functions that instantiate `LambdaLR` with a custom learning rate function. We use a closure to specify the parameter of the latter. We also do a bit of renaming within the function to explicit the behaviour and removed docstrings that were subsequently not necessary.	2019-11-14 15:39:08 +01:00
Lysandre	74d0bcb6ff	Fix special tokens addition in decoder	2019-11-12 15:27:57 -05:00
Julien Chaumond	155c782a2c	[inputs_embeds] All TF models + tests	2019-11-12 11:29:21 -05:00
Julien Chaumond	2aef2f0bbc	[common attributes] Fix previous commit for transfo-xl	2019-11-12 11:29:21 -05:00
Julien Chaumond	2f17464266	[common attributes] Slightly sharper test coverage	2019-11-12 11:29:21 -05:00
Julien Chaumond	9d2398fd99	Ooopsie	2019-11-12 11:29:21 -05:00
Julien Chaumond	70d97ddd60	[TF models] Common attributes as per #1721	2019-11-12 11:29:21 -05:00
Michael Watkins	7246d3c2f9	Consider do_lower_case in PreTrainedTokenizer As pointed out in #1545, when using an uncased model, and adding a new uncased token, the tokenizer does not correctly identify this in the case that the input text contains the token in a cased format. For instance, if we load bert-base-uncased into BertTokenizer, and then use .add_tokens() to add "cool-token", we get the expected result for .tokenize('this is a cool-token'). However, we get a possibly unexpected result for .tokenize('this is a cOOl-Token'), which in fact mirrors the result for the former from before the new token was added. This commit adds - functionality to PreTrainedTokenizer to handle this situation in case a tokenizer (currently Bert, DistilBert, and XLNet) has the do_lower_case=True kwarg by: 1) lowercasing tokens added with .add_tokens() 2) lowercasing text at the beginning of .tokenize() - new common test case for tokenizers https://github.com/huggingface/transformers/issues/1545	2019-11-12 13:08:30 +02:00
Stefan Schweter	94e55253ae	tests: add test case for DistilBertForTokenClassification implementation	2019-11-11 16:20:15 +01:00
Julien Chaumond	27e015bd54	[tests] Flag to test on cuda	2019-11-06 14:03:47 -05:00
Julien Chaumond	13d9135fa5	[tests] get rid of warning cf. https://docs.pytest.org/en/latest/example/simple.html	2019-11-06 14:03:47 -05:00
Julien Chaumond	00337e9687	[inputs_embeds] All PyTorch models	2019-11-05 00:39:18 +00:00
thomwolf	8d6b9d717c	fix #1532 and encode_plus	2019-11-04 17:07:51 +01:00
thomwolf	b340a910ed	fix tests - flagged as slow all the tests downloading from AWS	2019-11-04 16:03:36 +01:00
thomwolf	f02805da6f	fix tests	2019-11-04 15:42:23 +01:00
thomwolf	1724cee8c4	switch from properties to methods	2019-11-04 15:34:10 +01:00

1 2

86 Commits