Sylvain Gugger
3081d3868e
Push to hub when saving checkpoints ( #13503 )
...
* Push to hub when saving checkpoints
* Add model card
* Revert partial model card
* Small fix for checkpoint
* Add tests
* Add documentation
* Fix tests
* Bump huggingface_hub
* Fix test
2021-09-14 08:02:15 -04:00
Mohan Zhang
41cd52a768
fixed document ( #13414 )
2021-09-08 11:48:00 -04:00
arfy slowy
01977466f4
fix: typo spelling grammar ( #13212 )
...
* fix: typo spelling grammar
* fix: make fixup
2021-08-30 08:09:14 -04:00
Stas Bekman
4a872caef4
remove extra white space from log format ( #12360 )
2021-06-25 13:20:14 -07:00
Stas Bekman
ebe5413589
[trainer] 2 bug fixes and a rename ( #12309 )
...
* bug fixes and a rename
* add extended DDP test
2021-06-22 11:13:23 -07:00
Stas Bekman
dad414d5f9
[trainer + examples] set log level from CLI ( #12276 )
...
* set log level from CLI
* add log_level_replica + test + extended docs
* cleanup
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* rename datasets objects to allow datasets module
* improve the doc
* style
* doc improve
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-06-21 19:30:50 -07:00
Stas Bekman
040283170c
consistent nn. and nn.functional: part 5 docs ( #12161 )
2021-06-14 13:34:32 -07:00
Stas Bekman
640318befa
[deepspeed] Move code and doc into standalone files ( #11984 )
...
* move code and docs
* style
* moved
* restore
2021-06-02 09:56:00 -07:00
Stas Bekman
79712e7e7a
[deepspeed] docs ( #11940 )
...
* deepspeed docs
* cleanup
* cleanup
2021-06-01 09:21:21 -07:00
Stas Bekman
c065025c47
[trainer] document resume randomness ( #11588 )
...
* document resume randomness
* fix link
* reword
* fix
* reword
* style
2021-05-04 14:17:11 -07:00
Stas Bekman
4e7bf94e72
[DeepSpeed] fp32 support ( #11499 )
...
* prep for deepspeed==0.3.16
* new version
* too soon
* support and test fp32 mode
* troubleshooting doc start
* workaround no longer needed
* add fp32 doc
* style
* cleanup, add tf32 note
* clarify
* release was made
2021-04-30 12:51:48 -07:00
Stas Bekman
bc2571e61c
[Deepspeed] ZeRO-Infinity integration plus config revamp ( #11418 )
...
* adding Z-inf
* revamp config process
* up version requirement
* wip
* massive rewrite
* cleanup
* cleanup
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* consistent json commas
* act on suggestions
* leave this feature for 0.3.16
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-04-26 10:40:32 -07:00
Sylvain Gugger
dabeb15292
Examples reorg ( #11350 )
...
* Base move
* Examples reorganization
* Update references
* Put back test data
* Move conftest
* More fixes
* Move test data to test fixtures
* Update path
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* Address review comments and clean
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-04-21 11:11:20 -04:00
Sylvain Gugger
f38cd4373f
Indent code block in the documentation ( #11233 )
...
* Indent code block
* Indent code blocks version 2
* Quality
2021-04-13 15:36:36 -04:00
Stas Bekman
0311ba2153
typo ( #11152 )
...
* typo
* style
2021-04-08 19:47:31 -07:00
Stas Bekman
c2e0fd5283
[setup] make fairscale and deepspeed setup extras ( #11151 )
...
* make fairscale and deepspeed setup extras
* fix default
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* no reason not to ask for the good version
* update the CIs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-04-08 15:46:54 -07:00
Stas Bekman
66446909b2
[tests] relocate core integration tests ( #11146 )
...
* relocate core integration tests
* add sys.path context manager
* cleanup
* try
* try2
* fix path
* doc
* style
* add dep
* add 2 more deps
2021-04-08 13:13:17 -07:00
Stas Bekman
c6d664849b
[DeepSpeed] ZeRO Stage 3 ( #10753 )
...
* synced gpus
* fix
* fix
* need to use t5-small for quality tests
* notes
* complete merge
* fix a disappearing std stream problem
* start zero3 tests
* wip
* tune params
* sorting out the pre-trained model loading
* reworking generate loop wip
* wip
* style
* fix tests
* split the tests
* refactor tests
* wip
* parameterized
* fix
* workout the resume from non-ds checkpoint pass + test
* cleanup
* remove no longer needed code
* split getter/setter functions
* complete the docs
* suggestions
* gpus and their compute capabilities link
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* style
* remove invalid paramgd
* automatically configure zero3 params that rely on hidden size
* make _get_resized_embeddings zero3-aware
* add test exercising resize_token_embeddings()
* add docstring
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-04-08 09:53:01 -07:00
Cheng Li
c83fbc5f2d
[Deepspeed] Allow HF optimizer and scheduler to be passed to deepspeed ( #10464 )
...
* pass hf optimizer and scheduler to deepspeed if not specified in ds config
* pass hf optimizer and scheduler to deepspeed if not specified in ds config
* update
* make init_deepspeed support config dict
* fix docstring formatting
* clean up trainer's comments
* add new tests
* fix type
* composit argparse doesn't work
* style
* add a new test, rename others
* document new functionality
* complete tests, add docs
* style
* correct level
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* add new methods to the doc
* must tell DS we are using a non-native optimizer
* add protection against cpu_offload + HF optimizer combo
* fix the cli overrides
* sync docs + tests
* restore AdamW
* better docs
* need new version
* no longer needed
* remove outdate information
* refactor duplicated code
Co-authored-by: Stas Bekman <stas@stason.org >
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-03-16 15:51:09 -07:00
Théo Matussière
6f840990a7
split seq2seq script into summarization & translation ( #10611 )
...
* split seq2seq script, update docs
* needless diff
* fix readme
* remove test diff
* s/summarization/translation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* cr
* fix arguments & better mbart/t5 refs
* copyright
Co-authored-by: Suraj Patil <surajp815@gmail.com >
* reword readme
Co-authored-by: Suraj Patil <surajp815@gmail.com >
* s/summarization/translation
* short script names
* fix tests
* fix isort, include mbart doc
* delete old script, update tests
* automate source prefix
* automate source prefix for translation
* s/translation/trans
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
* fix script name (short version)
* typos
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
* exact parameter
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
* remove superfluous source_prefix calls in docs
* rename scripts & warn for source prefix
* black
* flake8
Co-authored-by: theo <theo@matussie.re >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Suraj Patil <surajp815@gmail.com >
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
2021-03-15 09:11:42 -04:00
Stas Bekman
4c32f9f26e
AdamW is now supported by default ( #9624 )
2021-03-12 13:40:07 -08:00
Sylvain Gugger
e8246f78f9
Add auto_wrap option in fairscale integration ( #10673 )
...
* Add auto_wrap option in fairscale integration
* Style
2021-03-12 07:50:20 -05:00
Sylvain Gugger
26a33cfd8c
Document Trainer limitation on custom models ( #10635 )
2021-03-10 14:58:22 -05:00
lewtun
12b66215cf
Fix example of custom Trainer to reflect signature of compute_loss ( #10537 )
2021-03-05 07:44:53 -05:00
Sylvain Gugger
9d14be5c20
Add support for ZeRO-2/3 and ZeRO-offload in fairscale ( #10354 )
...
* Ass support for ZeRO-2/3 and ZeRO-offload in fairscale
* Quality
* Rework from review comments
* Add doc
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
* Address review comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
2021-02-25 11:07:53 -05:00
Stas Bekman
eab0afc19c
[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration ( #10310 )
...
* implement gradient_accumulation_steps support in DeepSpeed integration
* typo
* cleanup
* cleanup
2021-02-22 11:15:59 -08:00
Stas Bekman
5da7c78ed8
update to new script; notebook notes ( #10241 )
2021-02-17 15:58:08 -08:00
Stas Bekman
b54cb0bd82
[DeepSpeed in notebooks] Jupyter + Colab ( #10130 )
...
* init devices/setup explicitly
* docs + test
* simplify
* cleanup
* cleanup
* cleanup
* correct the required dist setup
* derive local_rank from env LOCAL_RANK
2021-02-11 14:02:05 -08:00
Stas Bekman
7c07a47dfb
[DeepSpeed docs] new information ( #9610 )
...
* how to specify a specific gpu
* new paper
* expand on buffer sizes
* style
* where to find config examples
* specific example
* small updates
2021-02-09 22:16:20 -08:00
Stas Bekman
82498cbc37
[deepspeed doc] install issues + 1-gpu deployment ( #9582 )
...
* [doc] install + 1-gpu deployment
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* improvements
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
2021-01-14 11:05:04 -08:00
Stas Bekman
2df34f4aba
[trainer] deepspeed integration ( #9211 )
...
* deepspeed integration
* style
* add test
* ds wants to do its own backward
* fp16 assert
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* style
* for clarity extract what args are being passed to deepspeed
* introduce the concept of self.wrapped_model
* s/self.wrapped_model/self.model_wrapped/
* complete transition to self.wrapped_model / self.model
* fix
* doc
* give ds its own init
* add custom overrides, handle bs correctly
* fix test
* clean up model_init logic, fix small bug
* complete fix
* collapse --deepspeed_config into --deepspeed
* style
* start adding doc notes
* style
* implement hf2ds optimizer and scheduler configuration remapping
* oops
* call get_num_training_steps absolutely when needed
* workaround broken auto-formatter
* deepspeed_config arg is no longer needed - fixed in deepspeed master
* use hf's fp16 args in config
* clean
* start on the docs
* rebase cleanup
* finish up --fp16
* clarify the supported stages
* big refactor thanks to discovering deepspeed.init_distributed
* cleanup
* revert fp16 part
* add checkpoint-support
* more init ds into integrations
* extend docs
* cleanup
* unfix docs
* clean up old code
* imports
* move docs
* fix logic
* make it clear which file it's referring to
* document nodes/gpus
* style
* wrong format
* style
* deepspeed handles gradient clipping
* easier to read
* major doc rewrite
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
* docs
* switch to AdamW optimizer
* style
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* clarify doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com >
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2021-01-12 19:05:18 -08:00
Sylvain Gugger
490b39e614
Seq2seq trainer ( #9241 )
...
* Add label smoothing in Trainer
* Add options for scheduler and Adafactor in Trainer
* Put Seq2SeqTrainer in the main lib
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
* Address review comments and adapt scripts
* Documentation
* Move test not using script to tests folder
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2020-12-22 11:33:44 -05:00
Sylvain Gugger
00aa9dbca2
Copyright ( #8970 )
...
* Add copyright everywhere missing
* Style
2020-12-07 18:36:34 -05:00
Chengxi Guo
d65e0bfea3
Fix doc bug ( #8500 )
...
* fix doc bug
Signed-off-by: mymusise <mymusise1@gmail.com >
* fix example bug
Signed-off-by: mymusise <mymusise1@gmail.com >
2020-11-12 11:47:23 -05:00
Sylvain Gugger
08f534d2da
Doc styling ( #8067 )
...
* Important files
* Styling them all
* Revert "Styling them all"
This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e.
* Syling them for realsies
* Fix syntax error
* Fix benchmark_utils
* More fixes
* Fix modeling auto and script
* Remove new line
* Fixes
* More fixes
* Fix more files
* Style
* Add FSMT
* More fixes
* More fixes
* More fixes
* More fixes
* Fixes
* More fixes
* More fixes
* Last fixes
* Make sphinx happy
2020-10-26 18:26:02 -04:00
Tiger
7e73c12805
fixed lots of typos. ( #7758 )
2020-10-13 10:00:20 -04:00
Sylvain Gugger
08ba4b4902
Trainer callbacks ( #7596 )
...
* Initial callback proposal
* Finish various callbacks
* Post-rebase conflicts
* Fix tests
* Don't use something that's not set
* Documentation
* Remove unwanted print.
* Document all models can work
* Add tests + small fixes
* Update docs/source/internal/trainer_utils.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* Address review comments
* Fix TF tests
* Real fix this time
* This one should work
* Fix typo
* Really fix typo
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2020-10-07 10:50:21 -04:00
Sylvain Gugger
3323146e90
Models doc ( #7345 )
...
* Clean up model documentation
* Formatting
* Preparation work
* Long lines
* Main work on rst files
* Cleanup all config files
* Syntax fix
* Clean all tokenizers
* Work on first models
* Models beginning
* FaluBERT
* All PyTorch models
* All models
* Long lines again
* Fixes
* More fixes
* Update docs/source/model_doc/bert.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* Update docs/source/model_doc/electra.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
* Last fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co >
2020-09-23 13:20:45 -04:00
Sylvain Gugger
4cbd50e611
Compute loss method ( #7074 )
2020-09-11 12:06:31 -04:00
Sylvain Gugger
86caab1e0b
Harmonize both Trainers API ( #6157 )
...
* Harmonize both Trainers API
* Fix test
* main_prcess -> process_zero
2020-07-31 09:43:23 -04:00
Sylvain Gugger
87716a6d07
Documentation for the Trainer API ( #5383 )
...
* Documentation for the Trainer API
* Address review comments
* Address comments
2020-06-30 11:43:43 -04:00