Stas Bekman
2df34f4aba
[trainer] deepspeed integration ( #9211 )
...
* deepspeed integration
* style
* add test
* ds wants to do its own backward
* fp16 assert
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* style
* for clarity extract what args are being passed to deepspeed
* introduce the concept of self.wrapped_model
* s/self.wrapped_model/self.model_wrapped/
* complete transition to self.wrapped_model / self.model
* fix
* doc
* give ds its own init
* add custom overrides, handle bs correctly
* fix test
* clean up model_init logic, fix small bug
* complete fix
* collapse --deepspeed_config into --deepspeed
* style
* start adding doc notes
* style
* implement hf2ds optimizer and scheduler configuration remapping
* oops
* call get_num_training_steps absolutely when needed
* workaround broken auto-formatter
* deepspeed_config arg is no longer needed - fixed in deepspeed master
* use hf's fp16 args in config
* clean
* start on the docs
* rebase cleanup
* finish up --fp16
* clarify the supported stages
* big refactor thanks to discovering deepspeed.init_distributed
* cleanup
* revert fp16 part
* add checkpoint-support
* more init ds into integrations
* extend docs
* cleanup
* unfix docs
* clean up old code
* imports
* move docs
* fix logic
* make it clear which file it's referring to
* document nodes/gpus
* style
* wrong format
* style
* deepspeed handles gradient clipping
* easier to read
* major doc rewrite
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* docs
* switch to AdamW optimizer
* style
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* clarify doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-01-12 19:05:18 -08:00
Stas Bekman
33b7422839
[trainer] remove --model_parallel ( #9451 )
...
* fix bad merge - dropped code
* remove --model_parallel
* Deal with TrainingArguments
* Use a private attr and fix batch sizes
* fix _n_gpu
* add is_parallel helper wrapper
* fix attribute
* introduce a new attribute is_model_parallel
* docs
* docs
* Put back init False and rearrange doc
* Ignore non-init args in HFArgumentParser
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com >
2021-01-11 09:39:28 -05:00
Stas Bekman
29acabd886
[trainer] group fp16 args together ( #9409 )
...
* [t5 doc] typos
a few run away backticks
@sgugger
* style
* [trainer] put fp16 args together
this PR proposes a purely cosmetic change that puts all the fp16 args together - so they are easier to manager/read
@sgugger
* style
2021-01-05 09:39:38 -05:00
Stas Bekman
748006c0b3
[trainer] --model_parallel hasn't been implemented for most models ( #9347 )
...
* --model_parallel hasn't been implemented for most models
* make the help clear as well
* implement is_parallelizable; use it
* oops
* remove property
2021-01-05 04:01:30 -05:00
Sylvain Gugger
490b39e614
Seq2seq trainer ( #9241 )
...
* Add label smoothing in Trainer
* Add options for scheduler and Adafactor in Trainer
* Put Seq2SeqTrainer in the main lib
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
* Address review comments and adapt scripts
* Documentation
* Move test not using script to tests folder
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2020-12-22 11:33:44 -05:00
Sylvain Gugger
1198ba8fba
Add timing inside Trainer ( #9196 )
...
* Add timing inside Trainer
* Fix tests
* Add n_objs for train
* Sort logs
2020-12-18 15:10:39 -05:00
Sylvain Gugger
9a67185344
Experimental support for fairscale ShardedDDP ( #9139 )
...
* Experimental stupport for fairscale ShardedDDP
* Add import error if fairscale not available
* Address review comments
* Fix seq2seq trainer
2020-12-16 13:47:48 -05:00
Sylvain Gugger
51adb97cd6
Fix fp16_backend field
2020-12-15 17:14:37 -05:00
Sylvain Gugger
ad895af98d
Add possibility to switch between APEX and AMP in Trainer ( #9137 )
...
* Add possibility to switch between APEX and AMP in Trainer
* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
* Address review comments
* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
2020-12-15 16:38:10 -05:00
lewtun
ed1845ef4c
Clarify use of TrainingArguments.disable_tqdm in Jupyter Notebooks ( #9076 )
...
* Clarify impact of disable_tqdm on Jupyter Notebooks
* Add weblink to argparse
* Replace "dev set" with more common "validation set" in do_eval
* Tweak prediction_loss_only
* Tweak description of Adam hyperparameters
* Add weblink to TensorBoard
* Capitalise apex
* Tweak local_rank description
* Add weblink for wandb
* Replace nlp with datasets
* Tweak grammar in model_parallel
* Capitalise apex
* Update TensorFlow training args to match PyTorch ones
* Fix style
* Fix underscore in weblink
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Fix underscore in weblink
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Fix underscore in weblink
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Fix underscore in weblink
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Add obj to datasets.Dataset
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2020-12-15 09:00:19 -05:00
Navjot
d6af344c9e
correct var name in TrainingArguments docstring ( #9096 )
2020-12-14 09:02:54 -05:00
Sylvain Gugger
00aa9dbca2
Copyright ( #8970 )
...
* Add copyright everywhere missing
* Style
2020-12-07 18:36:34 -05:00
Sylvain Gugger
b08843cf4d
Add a parallel_mode property to TrainingArguments ( #8877 )
...
* Add a `distributed_env` property to TrainingArguments
* Change name
* Address comment
2020-12-01 13:46:09 -05:00
Sylvain Gugger
7c10dd22ae
Better support for resuming training ( #8878 )
2020-12-01 13:45:21 -05:00
Sylvain Gugger
49759c0cda
Document new training argument
2020-11-23 15:02:59 -05:00
alexorona
1cd9be2aeb
gpt2 and t5 parallel modeling ( #8696 )
...
* gpt2 and t5 parallel modeling
* model_parallel utils update
* adding missing model_parallel_utils
Adds missing model_parallel_utils and reverses the changes to code in modeling_gpt2 and modeling_t5
* training_args reformat
Reformatted training_args
* style formatting
Style formatting doc string length on training_args and model_parallel_utils
* style changes
make style && make quality for training_args and model_parallel_utils.
* adding tests
* minor change in trainer
reverts loss calculation
* Update training_args.py
* Update training_args.py
added back docstring language for adam_beta1 and adam_beta2
* Update trainer.py
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Fix style & rebase
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr >
2020-11-23 14:41:23 -05:00
Sylvain Gugger
63e91f5fde
Document adam betas TrainingArguments ( #8688 )
2020-11-20 09:27:25 -05:00
Sylvain Gugger
dd52804f5f
Remove deprecated ( #8604 )
...
* Remove old deprecated arguments
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr >
* Remove needless imports
* Fix tests
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr >
2020-11-17 15:11:29 -05:00
Philip May
6a064447f2
improve documentation of training_args.py ( #8270 )
...
* improve documentation of training_args.py
- do_train
- do_eval
- do_predict
* fix line too long
* fix style with black on training_args.py
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* fix line length with utils/style_doc
* black reformatting
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2020-11-03 15:57:17 -05:00
Abi See
8f1c960ee7
Fix two bugs with --logging_first_step ( #8193 )
...
* make sure that logging_first_step evaluates
* fix bug with incorrect loss on logging_first_step
* fix style
* logging_first_step only logs, not evals
2020-10-30 16:45:38 -04:00
Santiago Castro
969859d5f6
Fix doc errors and typos across the board ( #8139 )
...
* Fix doc errors and typos across the board
* Fix a typo
* Fix the CI
* Fix more typos
* Fix CI
* More fixes
* Fix CI
* More fixes
* More fixes
2020-10-29 10:33:33 -04:00
Sylvain Gugger
c42596bc07
Doc styling fixes ( #8074 )
...
* Fix a few docstrings
* More fixes
* Styling
2020-10-27 07:54:50 -04:00
Sylvain Gugger
08f534d2da
Doc styling ( #8067 )
...
* Important files
* Styling them all
* Revert "Styling them all"
This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e.
* Syling them for realsies
* Fix syntax error
* Fix benchmark_utils
* More fixes
* Fix modeling auto and script
* Remove new line
* Fixes
* More fixes
* Fix more files
* Style
* Add FSMT
* More fixes
* More fixes
* More fixes
* More fixes
* Fixes
* More fixes
* More fixes
* Last fixes
* Make sphinx happy
2020-10-26 18:26:02 -04:00
Lysandre Debut
3a10764574
Fix TF training arguments instantiation ( #8063 )
2020-10-26 14:39:25 -04:00
Bram Vanroy
55bcd0cb59
Raise error when using AMP on non-CUDA device ( #7869 )
...
* Raise error when using AMP on non-CUDA device
* make style
* make style
2020-10-19 15:59:30 -04:00
Sylvain Gugger
bb9559a7f9
Don't use store_xxx on optional bools ( #7786 )
...
* Don't use `store_xxx` on optional bools
* Refine test
* Refine test
2020-10-14 12:05:02 -04:00
Sylvain Gugger
a1d1b332d0
Add predict step accumulation ( #7767 )
...
* Add eval_accumulation_step and clean distributed eval
* Add TPU test
* Add TPU stuff
* Fix arg name
* Fix Seq2SeqTrainer
* Fix total_size
* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* Doc and add test to TPU
* Add unit test
* Adapt name
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2020-10-14 11:41:45 -04:00
Tiger
7e73c12805
fixed lots of typos. ( #7758 )
2020-10-13 10:00:20 -04:00
Sylvain Gugger
08ba4b4902
Trainer callbacks ( #7596 )
...
* Initial callback proposal
* Finish various callbacks
* Post-rebase conflicts
* Fix tests
* Don't use something that's not set
* Documentation
* Remove unwanted print.
* Document all models can work
* Add tests + small fixes
* Update docs/source/internal/trainer_utils.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* Address review comments
* Fix TF tests
* Real fix this time
* This one should work
* Fix typo
* Really fix typo
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2020-10-07 10:50:21 -04:00
Sylvain Gugger
ca05c2a47d
Fix post_init of some TrainingArguments ( #7525 )
2020-10-05 09:19:16 -04:00
Sylvain Gugger
a97a73e0ee
Small QOL improvements to TrainingArguments ( #7475 )
...
* Small QOL improvements to TrainingArguments
* With the self.
2020-09-30 12:12:03 -04:00
Sylvain Gugger
52e8392b7e
Add automatic best model loading to Trainer ( #7431 )
...
* Add automatic best model loading to Trainer
* Some small fixes
* Formatting
2020-09-29 10:41:18 -04:00
Sylvain Gugger
f5518e5631
Formatting
2020-09-22 14:55:12 -04:00
Chady Kamar
17099ebd58
Add num workers cli arg ( #7322 )
...
* Add dataloader_num_workers to TrainingArguments
This argument is meant to be used to set the
number of workers for the PyTorch DataLoader.
* Pass num_workers argument on DataLoader init
2020-09-22 14:44:42 -04:00
Sylvain Gugger
89edf504bf
Add possibility to evaluate every epoch ( #7302 )
...
* Add possibility to evaluate every epoch
* Remove multitype arg
* Remove needless import
* Use a proper enum
* Apply suggestions from @LysandreJik
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* One else and formatting
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2020-09-22 09:52:29 -04:00
Sylvain Gugger
492bb6aa48
Trainer multi label ( #7191 )
...
* Trainer accep multiple labels
* Missing import
* Fix dosctrings
2020-09-17 08:15:37 -04:00
Sylvain Gugger
08de989a0a
Trainer with grad accum ( #6930 )
...
* Add warning for gradient accumulation
* Formatting
2020-09-07 04:54:00 -04:00
Lysandre
a75c64d80c
Black 20 release
2020-08-26 17:20:22 +02:00
Lysandre Debut
77abd1e79f
Centralize logging ( #6434 )
...
* Logging
* Style
* hf_logging > utils.logging
* Address @thomwolf's comments
* Update test
* Update src/transformers/benchmark/benchmark_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Revert bad change
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2020-08-26 11:10:36 -04:00
Sylvain Gugger
3a7fdd3f52
Add hyperparameter search to Trainer ( #6576 )
...
* Add optuna hyperparameter search to Trainer
* @julien-c suggestions
Co-authored-by: Julien Chaumond <chaumond@gmail.com >
* Make compute_objective an arg function
* Formatting
* Rework to make it easier to add ray
* Formatting
* Initial support for Ray
* Formatting
* Polish and finalize
* Add trial id to checkpoint with Ray
* Smaller default
* Use GPU in ray if available
* Formatting
* Fix test
* Update install instruction
Co-authored-by: Richard Liaw <rliaw@berkeley.edu >
* Address review comments
* Formatting post-merge
Co-authored-by: Julien Chaumond <chaumond@gmail.com >
Co-authored-by: Richard Liaw <rliaw@berkeley.edu >
2020-08-24 11:48:45 -04:00
Sylvain Gugger
b30879fe0c
Don't reset the dataset type + plug for rm unused columns ( #6683 )
...
* Don't reset the type of the dataset
* Formatting
* Update trainer.py
Co-authored-by: Teven <teven.lescao@gmail.com >
2020-08-24 09:22:03 -04:00
Sylvain Gugger
573bdb0a5d
Add tests to Trainer ( #6605 )
...
* Add tests to Trainer
* Test if removing long breaks everything
* Remove ugly hack
* Fix distributed test
* Use float for number of epochs
2020-08-20 11:13:50 -04:00
Sylvain Gugger
34fabe1697
Move prediction_loss_only to TrainingArguments ( #6426 )
2020-08-12 08:03:45 -04:00
Teven
bd0eab351a
Trainer + wandb quality of life logging tweaks ( #6241 )
...
* added `name` argument for wandb logging, also logging model config with trainer arguments
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* added tf, post-review changes
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2020-08-05 09:05:52 -04:00
Jay Mody
cedc547e7e
Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict output for logging. ( #5331 )
...
* Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict() output
* Update wandb config logging to use to_sanitized_dict
* removed n_gpu from sanitized dict
* fix quality check errors
2020-08-03 09:00:39 -04:00
Gong Linyuan
b21993b362
Allow to set Adam beta1, beta2 in TrainingArgs ( #5592 )
...
* Add Adam beta1, beta2 to trainier
* Make style consistent
2020-07-27 05:31:37 -04:00
Alan deLevie
223bad242d
fix typo in ( #5893 )
2020-07-20 03:53:03 -04:00
Sylvain Gugger
734a28a767
Clean up diffs in Trainer/TFTrainer ( #5417 )
...
* Cleanup and unify Trainer/TFTrainer
* Forgot to adapt TFTrainingArgs
* In tf scripts n_gpu -> n_replicas
* Update src/transformers/training_args.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* Address review comments
* Formatting
* Fix typo
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2020-07-01 11:00:20 -04:00
Sylvain Gugger
64e3d966b1
Add support for past states ( #5399 )
...
* Add support for past states
* Style and forgotten self
* You mean, documenting is not enough? I have to actually add it too?
* Add memory support during evaluation
* Fix tests in eval and add TF support
* No need to change this line anymore
2020-07-01 08:11:55 -04:00
Sylvain Gugger
87716a6d07
Documentation for the Trainer API ( #5383 )
...
* Documentation for the Trainer API
* Address review comments
* Address comments
2020-06-30 11:43:43 -04:00