HuggingFace_transformer

Author	SHA1	Message	Date
Julien Chaumond	d4c2cb402d	Kill model archive maps (#4636 ) * Kill model archive maps * Fixup * Also kill model_archive_map for MaskedBertPreTrainedModel * Unhook config_archive_map * Tokenizers: align with model id changes * make style && make quality * Fix CI	2020-06-02 09:39:33 -04:00
Bijay Gurung	e19b978151	Add Type Hints to modeling_utils.py Closes #3911 (#3948 ) * Add Type Hints to modeling_utils.py Closes #3911 Add Type Hints to methods in `modeling_utils.py` Note: The coverage isn't 100%. Mostly skipped internal methods. * Reformat according to `black` and `isort` * Use typing.Iterable instead of Sequence * Parameterize Iterable by its generic type * Use typing.Optional when None is the default value * Adhere to style guideline * Update src/transformers/modeling_utils.py * Update src/transformers/modeling_utils.py Co-authored-by: Julien Chaumond <chaumond@gmail.com>	2020-05-22 19:10:22 -04:00
Patrick von Platen	aa925a52fa	[Tests, GPU, SLOW] fix a bunch of GPU hardcoded tests in Pytorch (#4468 ) * fix gpu slow tests in pytorch * change model to device syntax	2020-05-19 21:35:04 +02:00
Julien Chaumond	4c06893610	Fix nn.DataParallel compatibility in PyTorch 1.5 (#4300 ) * Test case for #3936 * multigpu tests pass on pytorch 1.4.0 * Fixup * multigpu tests pass on pytorch 1.5.0 * Update src/transformers/modeling_utils.py * Update src/transformers/modeling_utils.py * rename multigpu to require_multigpu * mode doc	2020-05-18 20:34:50 -04:00
Patrick von Platen	026a5d0888	[T5 fp16] Fix fp16 in T5 (#4436 ) * fix fp16 in t5 * make style * refactor invert_attention_mask fn * fix typo	2020-05-18 17:25:58 +02:00
Patrick von Platen	a27c795908	fix (#4419 )	2020-05-18 15:51:40 +02:00
Sam Shleifer	9a687ebb77	[Marian Fixes] prevent predicting pad_token_id before softmax, support language codes, name multilingual models (#4290 )	2020-05-13 17:29:41 -04:00
Patrick von Platen	dca34695d0	Reformer (#3351 ) * first copy & past commit from Bert and morgans LSH code * add easy way to compare to trax original code * translate most of function * make trax lsh self attention deterministic with numpy seed + copy paste code * add same config * add same config * make layer init work * implemented hash_vectors function for lsh attention * continue reformer translation * hf LSHSelfAttentionLayer gives same output as trax layer * refactor code * refactor code * refactor code * refactor * refactor + add reformer config * delete bogus file * split reformer attention layer into two layers * save intermediate step * save intermediate step * make test work * add complete reformer block layer * finish reformer layer * implement causal and self mask * clean reformer test and refactor code * fix merge conflicts * fix merge conflicts * update init * fix device for GPU * fix chunk length init for tests * include morgans optimization * improve memory a bit * improve comment * factorize num_buckets * better testing parameters * make whole model work * make lm model work * add t5 copy paste tokenizer * add chunking feed forward * clean config * add improved assert statements * make tokenizer work * improve test * correct typo * extend config * add complexer test * add new axial position embeddings * add local block attention layer * clean tests * refactor * better testing * save intermediate progress * clean test file * make shorter input length work for model * allow variable input length * refactor * make forward pass for pretrained model work * add generation possibility * finish dropout and init * make style * refactor * add first version of RevNet Layers * make forward pass work and add convert file * make uploaded model forward pass work * make uploaded model forward pass work * refactor code * add namedtuples and cache buckets * correct head masks * refactor * made reformer more flexible * make style * remove set max length * add attention masks * fix up tests * fix lsh attention mask * make random seed optional for the moment * improve memory in reformer * add tests * make style * make sure masks work correctly * detach gradients * save intermediate * correct backprob through gather * make style * change back num hashes * rename to labels * fix rotation shape * fix detach * update * fix trainer * fix backward dropout * make reformer more flexible * fix conflict * fix * fix * add tests for fixed seed in reformer layer * fix trainer typo * fix typo in activations * add fp16 tests * add fp16 training * support fp16 * correct gradient bug in reformer * add fast gelu * re-add dropout for embedding dropout * better naming * better naming * renaming * finalize test branch * finalize tests * add more tests * finish tests * fix * fix type trainer * fix fp16 tests * fix tests * fix tests * fix tests * fix issue with dropout * fix dropout seeds * correct random seed on gpu * finalize random seed for dropout * finalize random seed for dropout * remove duplicate line * correct half precision bug * make style * refactor * refactor * docstring * remove sinusoidal position encodings for reformer * move chunking to modeling_utils * make style * clean config * make style * fix tests * fix auto tests * pretrained models * fix docstring * update conversion file * Update pretrained_models.rst * fix rst * fix rst * update copyright * fix test path * fix test path * fix small issue in test * include reformer in generation tests * add docs for axial position encoding * finish docs * Update convert_reformer_trax_checkpoint_to_pytorch.py * remove isort * include sams comments * remove wrong comment in utils * correct typos * fix typo * Update reformer.rst * applied morgans optimization * make style * make gpu compatible * remove bogus file * big test refactor * add example for chunking * fix typo * add to README	2020-05-07 10:17:01 +02:00
Julien Chaumond	455c639093	CDN urls (#4030 ) * [file_utils] use_cdn + documentation * Move to cdn. urls for weights * [urls] Hotfix for bert-base-japanese	2020-04-28 20:27:14 -04:00
Sam Shleifer	847e7f3379	MarianMTModel.from_pretrained('Helsinki-NLP/opus-marian-en-de') (#3908 ) Co-Authored-By: Stefan Schweter <stefan@schweter.it>	2020-04-28 18:22:37 -04:00
Patrick von Platen	fa49b9afea	Clean Encoder-Decoder models with Bart/T5-like API and add generate possibility (#3383 ) * change encoder decoder style to bart & t5 style * make encoder decoder generation dummy work for bert * make style * clean init config in encoder decoder * add tests for encoder decoder models * refactor and add last tests * refactor and add last tests * fix attn masks for bert encoder decoder * make style * refactor prepare inputs for Bert * refactor * finish encoder decoder * correct typo * add docstring to config * finish * add tests * better naming * make style * fix flake8 * clean docstring * make style * rename	2020-04-28 15:11:09 +02:00
sshleifer	41750a6cff	Fix typos	2020-04-27 13:25:53 -04:00
Sam Shleifer	dbd041243d	[cleanup] factor out get_head_mask, invert_attn_mask, get_exten… (#3806 ) * Delete some copy pasted code	2020-04-16 09:55:25 -04:00
Patrick von Platen	01c37dcdb5	[Config, Caching] Remove `output_past` everywhere and replace by `use_cache` argument (#3734 ) * remove output_past from pt * make style * add optional input length for gpt2 * add use cache to prepare input * save memory in gpt2 * correct gpt2 test inputs * make past input optional for gpt2 * finish use_cache for all models * make style * delete modeling_gpt2 change in test file * correct docstring * correct is true statements for gpt2	2020-04-14 14:40:28 -04:00
Jin Young Sohn	551b450527	Add `run_glue_tpu.py` that trains models on TPUs (#3702 ) * Initial commit to get BERT + run_glue.py on TPU * Add README section for TPU and address comments. * Cleanup TPU bits from run_glue.py (#3) TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * Cleanup TPU bits from run_glue.py TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * No need to call `xm.mark_step()` explicitly (#4) Since for gradient accumulation we're accumulating on batches from `ParallelLoader` instance which on next() marks the step itself. * Resolve R/W conflicts from multiprocessing (#5) * Add XLNet in list of models for `run_glue_tpu.py` (#6) * Add RoBERTa to list of models in TPU GLUE (#7) * Add RoBERTa and DistilBert to list of models in TPU GLUE (#8) * Use barriers to reduce duplicate work/resources (#9) * Shard eval dataset and aggregate eval metrics (#10) * Shard eval dataset and aggregate eval metrics Also, instead of calling `eval_loss.item()` every time do summation with tensors on device. * Change defaultdict to float * Reduce the pred, label tensors instead of metrics As brought up during review some metrics like f1 cannot be aggregated via averaging. GLUE task metrics depends largely on the dataset, so instead we sync the prediction and label tensors so that the metrics can be computed accurately on those instead. * Only use tb_writer from master (#11) * Apply huggingface black code formatting * Style * Remove `--do_lower_case` as example uses cased * Add option to specify tensorboard logdir This is needed for our testing framework which checks regressions against key metrics writtern by the summary writer. * Using configuration for `xla_device` * Prefix TPU specific comments. * num_cores clarification and namespace eval metrics * Cache features file under `args.cache_dir` Instead of under `args.data_dir`. This is needed as our test infra uses data_dir with a read-only filesystem. * Rename `run_glue_tpu` to `run_tpu_glue` Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>	2020-04-10 12:53:54 -04:00
Patrick von Platen	ce2298fb5f	[T5, generation] Add decoder caching for T5 (#3682 ) * initial commit to add decoder caching for T5 * better naming for caching * finish T5 decoder caching * correct test * added extensive past testing for T5 * clean files * make tests cleaner * improve docstring * improve docstring * better reorder cache * make style * Update src/transformers/modeling_t5.py Co-Authored-By: Yacine Jernite <yjernite@users.noreply.github.com> * make set output past work for all layers * improve docstring * improve docstring Co-authored-by: Yacine Jernite <yjernite@users.noreply.github.com>	2020-04-10 01:02:50 +02:00
Patrick von Platen	390c128592	[Encoder-Decoder] Force models outputs to always have batch_size as their first dim (#3536 ) * solve conflicts * improve comments	2020-04-02 15:18:33 +02:00
Patrick von Platen	b815edf69f	[T5, Testst] Add extensive hard-coded integration tests and make sure PT and TF give equal results (#3550 ) * add some t5 integration tests * finish summarization and translation integration tests for T5 - results loook good * add tf test * fix == vs is bug * fix tf beam search error and make tf t5 tests pass	2020-04-01 18:01:33 +02:00
Patrick von Platen	b38d552a92	[Generate] Add bad words list argument to the generate function (#3367 ) * add bad words list * make style * add bad_words_tokens * make style * better naming * make style * fix typo	2020-03-31 18:42:31 +02:00
Patrick von Platen	75ec6c9e3a	[T5] make decoder input ids optional for t5 training (#3521 ) * make decoder input ids optional for t5 training * lm_lables should not be shifted in t5 * add tests * finish shift right functionality for PT T5 * move shift right to correct class * cleaner code * replace -100 values with pad token id * add assert statement * remove unnecessary for loop * make style	2020-03-30 13:45:26 +02:00
Sam Shleifer	2b2a2f8df2	[Bart] Fix: put dummy_inputs on correct device (#3398 ) * Dummy inputs to model.device * Move self.device to ModuleUtilsMixin	2020-03-26 18:42:09 -04:00
Sam Shleifer	1a5aefc95c	[Seq2Seq Generation] Call encoder before expanding input_ids (#3370 )	2020-03-26 18:41:19 -04:00
Patrick von Platen	ffa17fe322	Extend config with task specific configs. (#3433 ) * add new default configs * change prefix default to None	2020-03-25 21:32:04 +01:00
Patrick von Platen	95e00d0808	Clean special token init in modeling_....py (#3264 ) * make style * fix conflicts	2020-03-20 21:41:04 +01:00
Patrick von Platen	bbf26c4e61	Support T5 Generation (#3228 ) * fix conflicts * update bart max length test * correct spelling mistakes * implemented model specific encode function * fix merge conflicts * better naming * save intermediate state -> need to rethink strucuture a bit * leave tf problem as it is for now * current version * add layers.pop * remove ipdb * make style * clean return cut decoding * remove ipdbs * Fix restoring layers in the decoders that doesnt exists. * push good intermediate solution for now * fix conflicts * always good to refuse to merge conflicts when rebasing * fix small bug * improve function calls * remove unused file * add correct scope behavior for t5_generate Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>	2020-03-19 23:18:23 +01:00
Patrick von Platen	ddb10c6447	improve doctstring (#3327 )	2020-03-18 13:24:09 +01:00
Patrick von Platen	e8f44af5bf	[generate] do_sample default back to False (#3298 ) * change do_samples back * None better default as boolean * adapt do_sample to True in test example * make style	2020-03-17 10:52:37 -04:00
Thomas Wolf	2187c49f5c	CPU/GPU memory benchmarking utilities - Remove support for python 3.5 (now only 3.6+) (#3186 ) * memory benchmark rss * have both forward pass and line-by-line mem tracing * cleaned up tracing * refactored and cleaning up API * no f-strings yet... * add GPU mem logging * fix GPU memory monitoring * style and quality * clean up and doc * update with comments * Switching to python 3.6+ * fix quality	2020-03-17 10:17:11 -04:00
Sam Shleifer	11573231c6	[BART] generation_mode as a kwarg not a class attribute (#3278 )	2020-03-16 12:47:53 -04:00
Patrick von Platen	6047f46b19	re-add eos token to get good bart results	2020-03-12 20:17:50 +01:00
Patrick von Platen	c11160114a	small clean-up	2020-03-12 20:02:35 +01:00
Patrick von Platen	a332cc9f7f	finalize generation merge	2020-03-11 11:53:36 +01:00
Patrick von Platen	d997ac7810	fix typo	2020-03-11 11:06:56 +01:00
Patrick von Platen	7351a8dbaf	re-add scoring filtering	2020-03-11 11:06:56 +01:00
Patrick von Platen	374deef48d	fixed typo	2020-03-11 11:06:56 +01:00
Patrick von Platen	ca2047bc35	refactor variable naming and improve tf generate in line with torch generate	2020-03-11 11:06:56 +01:00
patrickvonplaten	41b437ea3a	add draft version of propsoed changes for ROGUE score	2020-03-11 11:06:56 +01:00
patrickvonplaten	629aac92ec	do not allow do_sample and weird force bos token things	2020-03-11 11:06:56 +01:00
patrickvonplaten	d880a5fbde	finalized PR	2020-03-11 11:06:56 +01:00
patrickvonplaten	2acfe63964	best current version and make style	2020-03-11 11:06:56 +01:00
patrickvonplaten	c62444da39	fix conflicts	2020-03-11 11:06:56 +01:00
Patrick von Platen	333affcb81	add current changes	2020-03-11 11:06:56 +01:00
Patrick von Platen	7a11e925cf	work in progress	2020-03-11 11:06:56 +01:00
Patrick von Platen	7cba11fb9b	better naming	2020-03-11 11:06:56 +01:00
Patrick von Platen	ff648221bd	fix conflicts	2020-03-11 11:06:56 +01:00
Patrick von Platen	c0d9dd3ba9	refactored code a bit and made more generic	2020-03-11 11:06:56 +01:00
Patrick von Platen	d8e2b3c547	fix conflicts	2020-03-11 11:06:56 +01:00
Lysandre Debut	146c521235	Merge branch 'master' into add_models_special_tokens_to_specific_configs	2020-03-05 17:24:42 -05:00
Lysandre Debut	0001d05686	Correct missing keys + test (#3143 )	2020-03-05 17:01:54 -05:00
Patrick von Platen	e33ed12c3b	uncomment expression	2020-03-05 13:41:04 +01:00

1 2 3

101 Commits