HuggingFace_transformer

Author	SHA1	Message	Date
Stas Bekman	2df34f4aba	[trainer] deepspeed integration (#9211 ) * deepspeed integration * style * add test * ds wants to do its own backward * fp16 assert * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * for clarity extract what args are being passed to deepspeed * introduce the concept of self.wrapped_model * s/self.wrapped_model/self.model_wrapped/ * complete transition to self.wrapped_model / self.model * fix * doc * give ds its own init * add custom overrides, handle bs correctly * fix test * clean up model_init logic, fix small bug * complete fix * collapse --deepspeed_config into --deepspeed * style * start adding doc notes * style * implement hf2ds optimizer and scheduler configuration remapping * oops * call get_num_training_steps absolutely when needed * workaround broken auto-formatter * deepspeed_config arg is no longer needed - fixed in deepspeed master * use hf's fp16 args in config * clean * start on the docs * rebase cleanup * finish up --fp16 * clarify the supported stages * big refactor thanks to discovering deepspeed.init_distributed * cleanup * revert fp16 part * add checkpoint-support * more init ds into integrations * extend docs * cleanup * unfix docs * clean up old code * imports * move docs * fix logic * make it clear which file it's referring to * document nodes/gpus * style * wrong format * style * deepspeed handles gradient clipping * easier to read * major doc rewrite * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * docs * switch to AdamW optimizer * style * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * clarify doc Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2021-01-12 19:05:18 -08:00
Stas Bekman	33b7422839	[trainer] remove `--model_parallel` (#9451 ) * fix bad merge - dropped code * remove --model_parallel * Deal with TrainingArguments * Use a private attr and fix batch sizes * fix _n_gpu * add is_parallel helper wrapper * fix attribute * introduce a new attribute is_model_parallel * docs * docs * Put back init False and rearrange doc * Ignore non-init args in HFArgumentParser Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>	2021-01-11 09:39:28 -05:00
Stas Bekman	29acabd886	[trainer] group fp16 args together (#9409 ) * [t5 doc] typos a few run away backticks @sgugger * style * [trainer] put fp16 args together this PR proposes a purely cosmetic change that puts all the fp16 args together - so they are easier to manager/read @sgugger * style	2021-01-05 09:39:38 -05:00
Stas Bekman	748006c0b3	[trainer] --model_parallel hasn't been implemented for most models (#9347 ) * --model_parallel hasn't been implemented for most models * make the help clear as well * implement is_parallelizable; use it * oops * remove property	2021-01-05 04:01:30 -05:00
Sylvain Gugger	490b39e614	Seq2seq trainer (#9241 ) * Add label smoothing in Trainer * Add options for scheduler and Adafactor in Trainer * Put Seq2SeqTrainer in the main lib * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Address review comments and adapt scripts * Documentation * Move test not using script to tests folder Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2020-12-22 11:33:44 -05:00
Sylvain Gugger	1198ba8fba	Add timing inside Trainer (#9196 ) * Add timing inside Trainer * Fix tests * Add n_objs for train * Sort logs	2020-12-18 15:10:39 -05:00
Sylvain Gugger	9a67185344	Experimental support for fairscale ShardedDDP (#9139 ) * Experimental stupport for fairscale ShardedDDP * Add import error if fairscale not available * Address review comments * Fix seq2seq trainer	2020-12-16 13:47:48 -05:00
Sylvain Gugger	51adb97cd6	Fix fp16_backend field	2020-12-15 17:14:37 -05:00
Sylvain Gugger	ad895af98d	Add possibility to switch between APEX and AMP in Trainer (#9137 ) * Add possibility to switch between APEX and AMP in Trainer * Update src/transformers/training_args.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Address review comments * Update src/transformers/training_args.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2020-12-15 16:38:10 -05:00
lewtun	ed1845ef4c	Clarify use of TrainingArguments.disable_tqdm in Jupyter Notebooks (#9076 ) * Clarify impact of disable_tqdm on Jupyter Notebooks * Add weblink to argparse * Replace "dev set" with more common "validation set" in do_eval * Tweak prediction_loss_only * Tweak description of Adam hyperparameters * Add weblink to TensorBoard * Capitalise apex * Tweak local_rank description * Add weblink for wandb * Replace nlp with datasets * Tweak grammar in model_parallel * Capitalise apex * Update TensorFlow training args to match PyTorch ones * Fix style * Fix underscore in weblink Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix underscore in weblink Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix underscore in weblink Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix underscore in weblink Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add obj to datasets.Dataset Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-12-15 09:00:19 -05:00
Navjot	d6af344c9e	correct var name in TrainingArguments docstring (#9096 )	2020-12-14 09:02:54 -05:00
Sylvain Gugger	00aa9dbca2	Copyright (#8970 ) * Add copyright everywhere missing * Style	2020-12-07 18:36:34 -05:00
Sylvain Gugger	b08843cf4d	Add a `parallel_mode` property to TrainingArguments (#8877 ) * Add a `distributed_env` property to TrainingArguments * Change name * Address comment	2020-12-01 13:46:09 -05:00
Sylvain Gugger	7c10dd22ae	Better support for resuming training (#8878 )	2020-12-01 13:45:21 -05:00
Sylvain Gugger	49759c0cda	Document new training argument	2020-11-23 15:02:59 -05:00
alexorona	1cd9be2aeb	gpt2 and t5 parallel modeling (#8696 ) * gpt2 and t5 parallel modeling * model_parallel utils update * adding missing model_parallel_utils Adds missing model_parallel_utils and reverses the changes to code in modeling_gpt2 and modeling_t5 * training_args reformat Reformatted training_args * style formatting Style formatting doc string length on training_args and model_parallel_utils * style changes make style && make quality for training_args and model_parallel_utils. * adding tests * minor change in trainer reverts loss calculation * Update training_args.py * Update training_args.py added back docstring language for adam_beta1 and adam_beta2 * Update trainer.py * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Fix style & rebase Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>	2020-11-23 14:41:23 -05:00
Sylvain Gugger	63e91f5fde	Document adam betas TrainingArguments (#8688 )	2020-11-20 09:27:25 -05:00
Sylvain Gugger	dd52804f5f	Remove deprecated (#8604 ) * Remove old deprecated arguments Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr> * Remove needless imports * Fix tests Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>	2020-11-17 15:11:29 -05:00
Philip May	6a064447f2	improve documentation of training_args.py (#8270 ) * improve documentation of training_args.py - do_train - do_eval - do_predict * fix line too long * fix style with black on training_args.py * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix line length with utils/style_doc * black reformatting Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-11-03 15:57:17 -05:00
Abi See	8f1c960ee7	Fix two bugs with --logging_first_step (#8193 ) * make sure that logging_first_step evaluates * fix bug with incorrect loss on logging_first_step * fix style * logging_first_step only logs, not evals	2020-10-30 16:45:38 -04:00
Santiago Castro	969859d5f6	Fix doc errors and typos across the board (#8139 ) * Fix doc errors and typos across the board * Fix a typo * Fix the CI * Fix more typos * Fix CI * More fixes * Fix CI * More fixes * More fixes	2020-10-29 10:33:33 -04:00
Sylvain Gugger	c42596bc07	Doc styling fixes (#8074 ) * Fix a few docstrings * More fixes * Styling	2020-10-27 07:54:50 -04:00
Sylvain Gugger	08f534d2da	Doc styling (#8067 ) * Important files * Styling them all * Revert "Styling them all" This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e. * Syling them for realsies * Fix syntax error * Fix benchmark_utils * More fixes * Fix modeling auto and script * Remove new line * Fixes * More fixes * Fix more files * Style * Add FSMT * More fixes * More fixes * More fixes * More fixes * Fixes * More fixes * More fixes * Last fixes * Make sphinx happy	2020-10-26 18:26:02 -04:00
Lysandre Debut	3a10764574	Fix TF training arguments instantiation (#8063 )	2020-10-26 14:39:25 -04:00
Bram Vanroy	55bcd0cb59	Raise error when using AMP on non-CUDA device (#7869 ) * Raise error when using AMP on non-CUDA device * make style * make style	2020-10-19 15:59:30 -04:00
Sylvain Gugger	bb9559a7f9	Don't use `store_xxx` on optional bools (#7786 ) * Don't use `store_xxx` on optional bools * Refine test * Refine test	2020-10-14 12:05:02 -04:00
Sylvain Gugger	a1d1b332d0	Add predict step accumulation (#7767 ) * Add eval_accumulation_step and clean distributed eval * Add TPU test * Add TPU stuff * Fix arg name * Fix Seq2SeqTrainer * Fix total_size * Update src/transformers/trainer_pt_utils.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Doc and add test to TPU * Add unit test * Adapt name Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-10-14 11:41:45 -04:00
Tiger	7e73c12805	fixed lots of typos. (#7758 )	2020-10-13 10:00:20 -04:00
Sylvain Gugger	08ba4b4902	Trainer callbacks (#7596 ) * Initial callback proposal * Finish various callbacks * Post-rebase conflicts * Fix tests * Don't use something that's not set * Documentation * Remove unwanted print. * Document all models can work * Add tests + small fixes * Update docs/source/internal/trainer_utils.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments * Fix TF tests * Real fix this time * This one should work * Fix typo * Really fix typo Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-10-07 10:50:21 -04:00
Sylvain Gugger	ca05c2a47d	Fix post_init of some TrainingArguments (#7525 )	2020-10-05 09:19:16 -04:00
Sylvain Gugger	a97a73e0ee	Small QOL improvements to TrainingArguments (#7475 ) * Small QOL improvements to TrainingArguments * With the self.	2020-09-30 12:12:03 -04:00
Sylvain Gugger	52e8392b7e	Add automatic best model loading to Trainer (#7431 ) * Add automatic best model loading to Trainer * Some small fixes * Formatting	2020-09-29 10:41:18 -04:00
Sylvain Gugger	f5518e5631	Formatting	2020-09-22 14:55:12 -04:00
Chady Kamar	17099ebd58	Add num workers cli arg (#7322 ) * Add dataloader_num_workers to TrainingArguments This argument is meant to be used to set the number of workers for the PyTorch DataLoader. * Pass num_workers argument on DataLoader init	2020-09-22 14:44:42 -04:00
Sylvain Gugger	89edf504bf	Add possibility to evaluate every epoch (#7302 ) * Add possibility to evaluate every epoch * Remove multitype arg * Remove needless import * Use a proper enum * Apply suggestions from @LysandreJik Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * One else and formatting Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-09-22 09:52:29 -04:00
Sylvain Gugger	492bb6aa48	Trainer multi label (#7191 ) * Trainer accep multiple labels * Missing import * Fix dosctrings	2020-09-17 08:15:37 -04:00
Sylvain Gugger	08de989a0a	Trainer with grad accum (#6930 ) * Add warning for gradient accumulation * Formatting	2020-09-07 04:54:00 -04:00
Lysandre	a75c64d80c	Black 20 release	2020-08-26 17:20:22 +02:00
Lysandre Debut	77abd1e79f	Centralize logging (#6434 ) * Logging * Style * hf_logging > utils.logging * Address @thomwolf's comments * Update test * Update src/transformers/benchmark/benchmark_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Revert bad change Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-08-26 11:10:36 -04:00
Sylvain Gugger	3a7fdd3f52	Add hyperparameter search to Trainer (#6576 ) * Add optuna hyperparameter search to Trainer * @julien-c suggestions Co-authored-by: Julien Chaumond <chaumond@gmail.com> * Make compute_objective an arg function * Formatting * Rework to make it easier to add ray * Formatting * Initial support for Ray * Formatting * Polish and finalize * Add trial id to checkpoint with Ray * Smaller default * Use GPU in ray if available * Formatting * Fix test * Update install instruction Co-authored-by: Richard Liaw <rliaw@berkeley.edu> * Address review comments * Formatting post-merge Co-authored-by: Julien Chaumond <chaumond@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2020-08-24 11:48:45 -04:00
Sylvain Gugger	b30879fe0c	Don't reset the dataset type + plug for rm unused columns (#6683 ) * Don't reset the type of the dataset * Formatting * Update trainer.py Co-authored-by: Teven <teven.lescao@gmail.com>	2020-08-24 09:22:03 -04:00
Sylvain Gugger	573bdb0a5d	Add tests to Trainer (#6605 ) * Add tests to Trainer * Test if removing long breaks everything * Remove ugly hack * Fix distributed test * Use float for number of epochs	2020-08-20 11:13:50 -04:00
Sylvain Gugger	34fabe1697	Move prediction_loss_only to TrainingArguments (#6426 )	2020-08-12 08:03:45 -04:00
Teven	bd0eab351a	Trainer + wandb quality of life logging tweaks (#6241 ) * added `name` argument for wandb logging, also logging model config with trainer arguments * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * added tf, post-review changes Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2020-08-05 09:05:52 -04:00
Jay Mody	cedc547e7e	Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict output for logging. (#5331 ) * Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict() output * Update wandb config logging to use to_sanitized_dict * removed n_gpu from sanitized dict * fix quality check errors	2020-08-03 09:00:39 -04:00
Gong Linyuan	b21993b362	Allow to set Adam beta1, beta2 in TrainingArgs (#5592 ) * Add Adam beta1, beta2 to trainier * Make style consistent	2020-07-27 05:31:37 -04:00
Alan deLevie	223bad242d	fix typo in (#5893 )	2020-07-20 03:53:03 -04:00
Sylvain Gugger	734a28a767	Clean up diffs in Trainer/TFTrainer (#5417 ) * Cleanup and unify Trainer/TFTrainer * Forgot to adapt TFTrainingArgs * In tf scripts n_gpu -> n_replicas * Update src/transformers/training_args.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments * Formatting * Fix typo Co-authored-by: Lysandre Debut <lysandre@huggingface.co>	2020-07-01 11:00:20 -04:00
Sylvain Gugger	64e3d966b1	Add support for past states (#5399 ) * Add support for past states * Style and forgotten self * You mean, documenting is not enough? I have to actually add it too? * Add memory support during evaluation * Fix tests in eval and add TF support * No need to change this line anymore	2020-07-01 08:11:55 -04:00
Sylvain Gugger	87716a6d07	Documentation for the Trainer API (#5383 ) * Documentation for the Trainer API * Address review comments * Address comments	2020-06-30 11:43:43 -04:00

1 2

60 Commits