HuggingFace_transformer

Author	SHA1	Message	Date
Aymeric Augustin	bb3bfa2d29	Distribute tests from the same file to the same worker. This should prevent two issues: - hitting API rate limits for tests that hit the HF API - multiplying the cost of expensive test setups	2019-12-21 08:43:19 +01:00
Aymeric Augustin	29cbab98f0	Parallelize tests on Circle CI. Set the number of CPUs manually based on the Circle CI resource class, or else we're getting 36 CPUs, which is far too much (perhaps that's the underlying hardware and not what Circle CI allocates to us). Don't parallelize the custom tokenizers tests because they take less than one second to run and parallelization actually makes them slower.	2019-12-21 08:43:19 +01:00
Aymeric Augustin	a4c9338b83	Prevent parallel downloads of the same file with a lock. Since the file is written to the filesystem, a filesystem lock is the way to go here. Add a dependency on the third-party filelock library to get cross-platform functionality.	2019-12-21 08:43:19 +01:00
Aymeric Augustin	b670c26684	Take advantage of the cache when running tests. Caching models across test cases and across runs of the test suite makes slow tests somewhat more bearable. Use gettempdir() instead of /tmp in tests. This makes it easier to change the location of the cache with semi-standard TMPDIR/TEMP/TMP environment variables. Fix #2222.	2019-12-21 08:43:19 +01:00
Aymeric Augustin	b67fa1a8d2	Download models directly to cache_dir. This allows moving the file instead of copying it, which is more reliable. Also it avoids writing large amounts of data to /tmp, which may not be large enough to accomodate it. Refs #2222.	2019-12-21 08:43:19 +01:00
Aymeric Augustin	286d5bb6b7	Use a random temp dir for writing pruned models in tests.	2019-12-21 08:43:19 +01:00
Aymeric Augustin	478e456e83	Use a random temp dir for writing file in tests.	2019-12-21 08:43:19 +01:00
Aymeric Augustin	12726f8556	Remove redundant torch.jit.trace in tests. This looks like it could be expensive, so don't run it twice.	2019-12-21 08:43:19 +01:00
Julien Chaumond	ac1b449cc9	[doc] move distilroberta to more appropriate place cc @lysandrejik	2019-12-21 00:09:01 -05:00
Julien Chaumond	3e52915fa7	[RoBERTa] Embeddings: fix dimensionality bug	2019-12-20 19:01:27 -05:00
Dom Hudson	228f52867c	Bug fix: 1764	2019-12-20 18:27:35 -05:00
Francesco	a80778f40e	small refactoring (only esthetic, not functional)	2019-12-20 17:21:24 -05:00
Francesco	3df1d2d144	- Create the output directory (whose name is passed by the user in the "save_directory" parameter) where it will be saved encoder and decoder, if not exists. - Empty the output directory, if it contains any files or subdirectories. - Create the "encoder" directory inside "save_directory", if not exists. - Create the "decoder" directory inside "save_directory", if not exists. - Save the encoder and the decoder in the previous two directories, respectively.	2019-12-20 17:21:24 -05:00
Lysandre	a436574bfd	Release: v2.3.0 v2.3.0	2019-12-20 16:22:20 -05:00
Thomas Wolf	d0f8b9a978	Merge pull request #2244 from huggingface/fix-tok-pipe Fix Camembert and XLM-R `decode` method- Fix NER pipeline alignement	2019-12-20 22:10:39 +01:00
Thomas Wolf	a557836a70	Merge pull request #2191 from huggingface/fix_sp_np Numpy compatibility for sentence piece	2019-12-20 22:08:08 +01:00
thomwolf	655fd06853	clean up	2019-12-20 21:57:49 +01:00
thomwolf	e5812462fc	clean up debug and less verbose tqdm	2019-12-20 21:51:48 +01:00
thomwolf	4775ec354b	add overwrite - fix ner decoding	2019-12-20 21:47:15 +01:00
Lysandre	cb6d54bfda	Numpy compatibility for sentence piece convert to int earlier	2019-12-20 15:06:28 -05:00
thomwolf	f79a7dc661	fix NER pipeline	2019-12-20 20:57:45 +01:00
thomwolf	a241011057	fix pipeline NER	2019-12-20 20:43:48 +01:00
thomwolf	e37ca8e11a	fix camembert and XLM-R tokenizer	2019-12-20 20:43:42 +01:00
thomwolf	ceae85ad60	fix mc loading	2019-12-20 19:52:24 +01:00
thomwolf	71883b6ddc	update link in readme	2019-12-20 19:40:23 +01:00
Thomas Wolf	8d5a47c79b	Merge pull request #2243 from huggingface/fix-xlm-roberta fixing xlm-roberta tokenizer max_length and automodels	2019-12-20 19:34:08 +01:00
thomwolf	79e4a6a25c	update serving API	2019-12-20 19:33:12 +01:00
thomwolf	bbaaec046c	fixing CLI pipeline	2019-12-20 19:19:20 +01:00
thomwolf	1c12ee0e55	fixing xlm-roberta tokenizer max_length and automodels	2019-12-20 18:28:27 +01:00
Lysandre	65c75fc587	Clean special tokens test	2019-12-20 11:34:16 -05:00
Lysandre	fb393ad994	Added test for all special tokens	2019-12-20 11:29:58 -05:00
Dirk Groeneveld	90debb9ff2	Keep even the first of the special tokens intact while lowercasing.	2019-12-20 11:29:43 -05:00
Morgan Funtowicz	b98ff88544	Added pipelines quick tour in README	2019-12-20 15:52:50 +01:00
Thomas Wolf	3a2c4e6f63	Merge pull request #1548 from huggingface/cli [2.2] - Command-line interface - Pipeline class	2019-12-20 15:28:29 +01:00
Rémi Louf	4e3f745ba4	add example for Model2Model in quickstart	2019-12-20 09:12:31 -05:00
thomwolf	db0795b5d0	defaults models for tf and pt - update tests	2019-12-20 15:07:00 +01:00
Morgan Funtowicz	7f74084528	Fix leading axis added when saving through the command run	2019-12-20 14:47:04 +01:00
thomwolf	c37815f130	clean up PT <=> TF 2.0 conversion and config loading	2019-12-20 14:35:40 +01:00
thomwolf	73fcebf7ec	update serving command	2019-12-20 13:47:35 +01:00
Thomas Wolf	59941c5d1f	Merge pull request #2189 from stefan-it/xlmr Add support for XLM-RoBERTa	2019-12-20 13:26:38 +01:00
thomwolf	15dda5ea32	remove python 2 tests for circle-ci cc @aaugustin @julien-c @LysandreJik	2019-12-20 13:20:41 +01:00
thomwolf	01ffc65e9b	update tests to remove unittest.patch	2019-12-20 13:16:23 +01:00
thomwolf	825697cad4	fix tests	2019-12-20 12:51:10 +01:00
thomwolf	1fa93ca1ea	Clean up framework handling	2019-12-20 12:34:19 +01:00
thomwolf	ca6bdb28f6	fix pipelines and rename model_card => modelcard	2019-12-20 12:10:40 +01:00
Morgan Funtowicz	61d9ee45e3	All tests are green.	2019-12-20 11:47:56 +01:00
Thomas Wolf	ff36e6d8d7	Merge pull request #2231 from huggingface/requests_user_agent [http] customizable requests user-agent	2019-12-20 10:28:10 +01:00
Morgan Funtowicz	e516a34a15	Use BasicTokenizer to split over whitespaces.	2019-12-20 09:38:08 +01:00
Morgan Funtowicz	9d0d1cd339	Filter out entity for NER task.	2019-12-20 09:30:37 +01:00
Julien Chaumond	15d897ff4a	[http] customizable requests user-agent	2019-12-19 18:29:22 -05:00

... 3 4 5 6 7 ...

2942 Commits