Philipp Schmid
805c5200dc
Removes overwrites for output_dir ( #10521 )
...
* removed overwrites
* remove default value for output_dir
* adjusted typing
2021-03-04 17:12:37 +01:00
Sylvain Gugger
b70f441b72
Smp grad accum ( #10488 )
...
* Fix gradient accumulation for SM Model Parallelism
* Style and divide loss by grad accum steps
2021-03-03 12:13:29 -05:00
Tanmay Garg
256482ac92
Introduce save_strategy training argument ( #10286 )
...
* Introduce save_strategy training argument
* deprecate EvaluationStrategy
* collapse EvaluationStrategy and LoggingStrategy into a single
IntervalStrategy enum
* modify tests to use modified enum
2021-02-27 19:34:22 -05:00
Sylvain Gugger
9d14be5c20
Add support for ZeRO-2/3 and ZeRO-offload in fairscale ( #10354 )
...
* Ass support for ZeRO-2/3 and ZeRO-offload in fairscale
* Quality
* Rework from review comments
* Add doc
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
* Address review comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
2021-02-25 11:07:53 -05:00
Tanmay Garg
709c86b5a9
Introduce logging_strategy training argument ( #10267 ) ( #10267 )
...
Introduce logging_strategy training argument
in TrainingArguments and TFTrainingArguments. (#9838 )
2021-02-19 11:49:22 -05:00
Stas Bekman
4eddc459a9
[trainer] implement support for full fp16 in evaluation/predict ( #10268 )
...
* implement --fp16_full_eval
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* style
* add test
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-02-18 17:02:35 -08:00
Stas Bekman
97e688bc22
[Trainer] memory tracker metrics ( #10225 )
...
* memory tracker metrics
* go back to eval for somewhat consistency
* handle no-gpu case
* deal with stackable eval calls
* restore callback order
* style
* simplify the API
* add test
* docs
* consistently use eval_ prefix
* improve docs
* Update src/transformers/trainer_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* rename method
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-02-18 09:27:32 -08:00
Tanmay Garg
d7f38c5d1d
Introduce warmup_ratio training argument ( #10229 )
...
Introduce warmup_ratio training argument in both
TrainingArguments and TFTrainingArguments classes (#6673 )
2021-02-18 12:23:33 -05:00
Sylvain Gugger
31245775e5
Add SageMakerTrainer for model paralellism ( #10122 )
...
* Refactor things out of main train
* Store signature
* Add SageMakerTrainer
* Init + Copyright
* Address review comments
2021-02-11 18:44:18 -05:00
Stas Bekman
b54cb0bd82
[DeepSpeed in notebooks] Jupyter + Colab ( #10130 )
...
* init devices/setup explicitly
* docs + test
* simplify
* cleanup
* cleanup
* cleanup
* correct the required dist setup
* derive local_rank from env LOCAL_RANK
2021-02-11 14:02:05 -08:00
Sylvain Gugger
77c0ce8c0c
Fix some edge cases in report_to and add deprecation warnings ( #10100 )
2021-02-09 10:38:12 -05:00
lewtun
22121e813e
Clarify definition of seed argument in TrainingArguments ( #9903 )
...
* Clarify definition of seed argument in Trainer
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Update src/transformers/training_args_tf.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Fix style
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-01-31 11:09:31 -05:00
Stas Bekman
1420b5ff67
refactor deepspeed setup devices ( #9880 )
2021-01-29 08:18:04 -08:00
Sylvain Gugger
7eadfe166e
When on sagemaker use their env variables for saves ( #9876 )
...
* When on sagemaker use their env variables for saves
* Address review comments
* Quality
2021-01-29 09:52:26 -05:00
abhishek thakur
bc109ae5b8
pin_memory -> dataloader_pin_memory ( #9874 )
2021-01-28 21:10:46 +01:00
abhishek thakur
25fcb5c171
Pin memory in Trainer by default ( #9857 )
...
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
2021-01-28 08:50:46 +01:00
Sylvain Gugger
c7b7bd9963
Add a flag for find_unused_parameters ( #9820 )
...
* Add a flag for find_unused_parameters
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
* Remove negation
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
2021-01-27 06:18:06 -05:00
Sylvain Gugger
0d0efd3a0e
Smdistributed trainer ( #9798 )
...
* Add a debug print
* Adapt Trainer to use smdistributed if available
* Forgotten parenthesis
* Real check for sagemaker
* Donforget to define device...
* Woopsie, local)rank is defined differently
* Update since local_rank has the proper value
* Remove debug statement
* More robust check for smdistributed
* Quality
* Deal with key not present error
2021-01-26 10:28:21 -05:00
Sylvain Gugger
82d46febeb
Add report_to training arguments to control the reporting integrations used ( #9735 )
2021-01-22 10:34:34 -05:00
Sylvain Gugger
2a703773aa
Fix style
2021-01-20 12:17:40 -05:00
Gunjan Chhablani
538245b0c2
Fix Trainer and Args to mention AdamW, not Adam. ( #9685 )
...
* Fix Trainer and Args to mention AdamW, not Adam.
* Update the docs for Training Arguments.
* Change arguments adamw_* to adam_*
* Fixed links to AdamW in TrainerArguments docs
* Fix line length in Training Args docs.
2021-01-20 11:59:31 -05:00
Sylvain Gugger
329fe2746a
Upstream (and rename) sortish sampler ( #9574 )
...
* Upstream (and rename) sortish sampler
* Use proper sampler
* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-01-14 10:38:14 -05:00
Sylvain Gugger
5e1bea4f16
Fix Trainer with a parallel model ( #9578 )
...
* Fix Trainer with a parallel model
* More clean up
2021-01-14 03:23:41 -05:00
Sylvain Gugger
04dc65e5c6
Fix data parallelism in Trainer ( #9566 )
...
* Fix data parallelism in Trainer
* Update src/transformers/training_args.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-01-13 09:54:41 -05:00
Stas Bekman
2df34f4aba
[trainer] deepspeed integration ( #9211 )
...
* deepspeed integration
* style
* add test
* ds wants to do its own backward
* fp16 assert
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* style
* for clarity extract what args are being passed to deepspeed
* introduce the concept of self.wrapped_model
* s/self.wrapped_model/self.model_wrapped/
* complete transition to self.wrapped_model / self.model
* fix
* doc
* give ds its own init
* add custom overrides, handle bs correctly
* fix test
* clean up model_init logic, fix small bug
* complete fix
* collapse --deepspeed_config into --deepspeed
* style
* start adding doc notes
* style
* implement hf2ds optimizer and scheduler configuration remapping
* oops
* call get_num_training_steps absolutely when needed
* workaround broken auto-formatter
* deepspeed_config arg is no longer needed - fixed in deepspeed master
* use hf's fp16 args in config
* clean
* start on the docs
* rebase cleanup
* finish up --fp16
* clarify the supported stages
* big refactor thanks to discovering deepspeed.init_distributed
* cleanup
* revert fp16 part
* add checkpoint-support
* more init ds into integrations
* extend docs
* cleanup
* unfix docs
* clean up old code
* imports
* move docs
* fix logic
* make it clear which file it's referring to
* document nodes/gpus
* style
* wrong format
* style
* deepspeed handles gradient clipping
* easier to read
* major doc rewrite
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* docs
* switch to AdamW optimizer
* style
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* clarify doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-01-12 19:05:18 -08:00
Stas Bekman
33b7422839
[trainer] remove --model_parallel ( #9451 )
...
* fix bad merge - dropped code
* remove --model_parallel
* Deal with TrainingArguments
* Use a private attr and fix batch sizes
* fix _n_gpu
* add is_parallel helper wrapper
* fix attribute
* introduce a new attribute is_model_parallel
* docs
* docs
* Put back init False and rearrange doc
* Ignore non-init args in HFArgumentParser
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com >
2021-01-11 09:39:28 -05:00
Stas Bekman
29acabd886
[trainer] group fp16 args together ( #9409 )
...
* [t5 doc] typos
a few run away backticks
@sgugger
* style
* [trainer] put fp16 args together
this PR proposes a purely cosmetic change that puts all the fp16 args together - so they are easier to manager/read
@sgugger
* style
2021-01-05 09:39:38 -05:00
Stas Bekman
748006c0b3
[trainer] --model_parallel hasn't been implemented for most models ( #9347 )
...
* --model_parallel hasn't been implemented for most models
* make the help clear as well
* implement is_parallelizable; use it
* oops
* remove property
2021-01-05 04:01:30 -05:00
Sylvain Gugger
490b39e614
Seq2seq trainer ( #9241 )
...
* Add label smoothing in Trainer
* Add options for scheduler and Adafactor in Trainer
* Put Seq2SeqTrainer in the main lib
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
* Address review comments and adapt scripts
* Documentation
* Move test not using script to tests folder
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2020-12-22 11:33:44 -05:00
Sylvain Gugger
1198ba8fba
Add timing inside Trainer ( #9196 )
...
* Add timing inside Trainer
* Fix tests
* Add n_objs for train
* Sort logs
2020-12-18 15:10:39 -05:00
Sylvain Gugger
9a67185344
Experimental support for fairscale ShardedDDP ( #9139 )
...
* Experimental stupport for fairscale ShardedDDP
* Add import error if fairscale not available
* Address review comments
* Fix seq2seq trainer
2020-12-16 13:47:48 -05:00
Sylvain Gugger
51adb97cd6
Fix fp16_backend field
2020-12-15 17:14:37 -05:00
Sylvain Gugger
ad895af98d
Add possibility to switch between APEX and AMP in Trainer ( #9137 )
...
* Add possibility to switch between APEX and AMP in Trainer
* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
* Address review comments
* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
2020-12-15 16:38:10 -05:00
lewtun
ed1845ef4c
Clarify use of TrainingArguments.disable_tqdm in Jupyter Notebooks ( #9076 )
...
* Clarify impact of disable_tqdm on Jupyter Notebooks
* Add weblink to argparse
* Replace "dev set" with more common "validation set" in do_eval
* Tweak prediction_loss_only
* Tweak description of Adam hyperparameters
* Add weblink to TensorBoard
* Capitalise apex
* Tweak local_rank description
* Add weblink for wandb
* Replace nlp with datasets
* Tweak grammar in model_parallel
* Capitalise apex
* Update TensorFlow training args to match PyTorch ones
* Fix style
* Fix underscore in weblink
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Fix underscore in weblink
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Fix underscore in weblink
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Fix underscore in weblink
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Add obj to datasets.Dataset
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2020-12-15 09:00:19 -05:00
Navjot
d6af344c9e
correct var name in TrainingArguments docstring ( #9096 )
2020-12-14 09:02:54 -05:00
Sylvain Gugger
00aa9dbca2
Copyright ( #8970 )
...
* Add copyright everywhere missing
* Style
2020-12-07 18:36:34 -05:00
Sylvain Gugger
b08843cf4d
Add a parallel_mode property to TrainingArguments ( #8877 )
...
* Add a `distributed_env` property to TrainingArguments
* Change name
* Address comment
2020-12-01 13:46:09 -05:00
Sylvain Gugger
7c10dd22ae
Better support for resuming training ( #8878 )
2020-12-01 13:45:21 -05:00
Sylvain Gugger
49759c0cda
Document new training argument
2020-11-23 15:02:59 -05:00
alexorona
1cd9be2aeb
gpt2 and t5 parallel modeling ( #8696 )
...
* gpt2 and t5 parallel modeling
* model_parallel utils update
* adding missing model_parallel_utils
Adds missing model_parallel_utils and reverses the changes to code in modeling_gpt2 and modeling_t5
* training_args reformat
Reformatted training_args
* style formatting
Style formatting doc string length on training_args and model_parallel_utils
* style changes
make style && make quality for training_args and model_parallel_utils.
* adding tests
* minor change in trainer
reverts loss calculation
* Update training_args.py
* Update training_args.py
added back docstring language for adam_beta1 and adam_beta2
* Update trainer.py
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Fix style & rebase
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr >
2020-11-23 14:41:23 -05:00
Sylvain Gugger
63e91f5fde
Document adam betas TrainingArguments ( #8688 )
2020-11-20 09:27:25 -05:00
Sylvain Gugger
dd52804f5f
Remove deprecated ( #8604 )
...
* Remove old deprecated arguments
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr >
* Remove needless imports
* Fix tests
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr >
2020-11-17 15:11:29 -05:00
Philip May
6a064447f2
improve documentation of training_args.py ( #8270 )
...
* improve documentation of training_args.py
- do_train
- do_eval
- do_predict
* fix line too long
* fix style with black on training_args.py
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* fix line length with utils/style_doc
* black reformatting
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2020-11-03 15:57:17 -05:00
Abi See
8f1c960ee7
Fix two bugs with --logging_first_step ( #8193 )
...
* make sure that logging_first_step evaluates
* fix bug with incorrect loss on logging_first_step
* fix style
* logging_first_step only logs, not evals
2020-10-30 16:45:38 -04:00
Santiago Castro
969859d5f6
Fix doc errors and typos across the board ( #8139 )
...
* Fix doc errors and typos across the board
* Fix a typo
* Fix the CI
* Fix more typos
* Fix CI
* More fixes
* Fix CI
* More fixes
* More fixes
2020-10-29 10:33:33 -04:00
Sylvain Gugger
c42596bc07
Doc styling fixes ( #8074 )
...
* Fix a few docstrings
* More fixes
* Styling
2020-10-27 07:54:50 -04:00
Sylvain Gugger
08f534d2da
Doc styling ( #8067 )
...
* Important files
* Styling them all
* Revert "Styling them all"
This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e.
* Syling them for realsies
* Fix syntax error
* Fix benchmark_utils
* More fixes
* Fix modeling auto and script
* Remove new line
* Fixes
* More fixes
* Fix more files
* Style
* Add FSMT
* More fixes
* More fixes
* More fixes
* More fixes
* Fixes
* More fixes
* More fixes
* Last fixes
* Make sphinx happy
2020-10-26 18:26:02 -04:00
Lysandre Debut
3a10764574
Fix TF training arguments instantiation ( #8063 )
2020-10-26 14:39:25 -04:00
Bram Vanroy
55bcd0cb59
Raise error when using AMP on non-CUDA device ( #7869 )
...
* Raise error when using AMP on non-CUDA device
* make style
* make style
2020-10-19 15:59:30 -04:00
Sylvain Gugger
bb9559a7f9
Don't use store_xxx on optional bools ( #7786 )
...
* Don't use `store_xxx` on optional bools
* Refine test
* Refine test
2020-10-14 12:05:02 -04:00