Sylvain Gugger
|
9d14be5c20
|
Add support for ZeRO-2/3 and ZeRO-offload in fairscale (#10354)
* Ass support for ZeRO-2/3 and ZeRO-offload in fairscale
* Quality
* Rework from review comments
* Add doc
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Address review comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
|
2021-02-25 11:07:53 -05:00 |
|
Sehoon Kim
|
63645b3b11
|
I-BERT model support (#10153)
* IBertConfig, IBertTokentizer added
* IBert Model names moified
* tokenizer bugfix
* embedding -> QuantEmbedding
* quant utils added
* quant_mode added to configuration
* QuantAct added, Embedding layer + QuantAct addition
* QuantAct added
* unused path removed, QKV quantized
* self attention layer all quantized, except softmax
* temporarl commit
* all liner layers quantized
* quant_utils bugfix
* bugfix: requantization missing
* IntGELU added
* IntSoftmax added
* LayerNorm implemented
* LayerNorm implemented all
* names changed: roberta->ibert
* config not inherit from ROberta
* No support for CausalLM
* static quantization added, quantize_model.py removed
* import modules uncommented
* copyrights fixed
* minor bugfix
* quant_modules, quant_utils merged as one file
* import * fixed
* unused runfile removed
* make style run
* configutration.py docstring fixed
* refactoring: comments removed, function name fixed
* unused dependency removed
* typo fixed
* comments(Copied from), assertion string added
* refactoring: super(..) -> super(), etc.
* refactoring
* refarctoring
* make style
* refactoring
* cuda -> to(x.device)
* weight initialization removed
* QuantLinear set_param removed
* QuantEmbedding set_param removed
* IntLayerNorm set_param removed
* assert string added
* assertion error message fixed
* is_decoder removed
* enc-dec arguments/functions removed
* Converter removed
* quant_modules docstring fixed
* conver_slow_tokenizer rolled back
* quant_utils docstring fixed
* unused aruments e.g. use_cache removed from config
* weight initialization condition fixed
* x_min, x_max initialized with small values to avoid div-zero exceptions
* testing code for ibert
* test emb, linear, gelu, softmax added
* test ln and act added
* style reformatted
* force_dequant added
* error tests overrided
* make style
* Style + Docs
* force dequant tests added
* Fix fast tokenizer in init
* Fix doc
* Remove space
* docstring, IBertConfig, chunk_size
* test_modeling_ibert refactoring
* quant_modules.py refactoring
* e2e integration test added
* tokenizers removed
* IBertConfig added to tokenizer_auto.py
* bugfix
* fix docs & test
* fix style num 2
* final fixes
Co-authored-by: Sehoon Kim <sehoonkim@berkeley.edu>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2021-02-25 10:06:42 -05:00 |
|
Patrick von Platen
|
cb38ffcc5e
|
[PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer (#10324)
* push to show
* small improvement
* small improvement
* Update src/transformers/feature_extraction_utils.py
* Update src/transformers/feature_extraction_utils.py
* implement base
* add common tests
* make all tests pass for wav2vec2
* make padding work & add more tests
* finalize feature extractor utils
* add call method to feature extraction
* finalize feature processor
* finish tokenizer
* finish general processor design
* finish tests
* typo
* remove bogus file
* finish docstring
* add docs
* finish docs
* small fix
* correct docs
* save intermediate
* load changes
* apply changes
* apply changes to doc
* change tests
* apply surajs recommend
* final changes
* Apply suggestions from code review
* fix typo
* fix import
* correct docstring
|
2021-02-25 17:42:46 +03:00 |
|
abhishek thakur
|
9dc7825744
|
Remove unused variable in example for Q&A (#10392)
|
2021-02-25 09:18:47 -05:00 |
|
Lysandre
|
3591844306
|
v4.3.3 docs
|
2021-02-24 15:19:01 -05:00 |
|
Stas Bekman
|
eab0afc19c
|
[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration (#10310)
* implement gradient_accumulation_steps support in DeepSpeed integration
* typo
* cleanup
* cleanup
|
2021-02-22 11:15:59 -08:00 |
|
Sylvain Gugger
|
9e147d31f6
|
Deprecate prepare_seq2seq_batch (#10287)
* Deprecate prepare_seq2seq_batch
* Fix last tests
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* More review comments
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
|
2021-02-22 12:36:16 -05:00 |
|
Lysandre Debut
|
cd8c4c3fc2
|
DeBERTa-v2 fixes (#10328)
Co-authored-by: Pengcheng He <penhe@microsoft.com>
Co-authored-by: Pengcheng He <penhe@microsoft.com>
|
2021-02-22 07:45:18 -05:00 |
|
Pengcheng He
|
9a7e63729f
|
Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… (#10018)
* Integrate DeBERTa v2(the 1.5B model surpassed human performance on SuperGLUE); Add DeBERTa v2 900M,1.5B models;
* DeBERTa-v2
* Fix v2 model loading issue (#10129)
* Doc members
* Update src/transformers/models/deberta/modeling_deberta.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Address Sylvain's comments
* Address Patrick's comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Style
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2021-02-19 18:34:44 -05:00 |
|
Sylvain Gugger
|
f6e53e3c2b
|
Fix example links in the task summary (#10291)
|
2021-02-19 18:04:15 -05:00 |
|
Stas Bekman
|
5da7c78ed8
|
update to new script; notebook notes (#10241)
|
2021-02-17 15:58:08 -08:00 |
|
Joe Davison
|
4210cd96fc
|
fix add_token_positions fn (#10217)
|
2021-02-16 14:00:05 -05:00 |
|
Suraj Patil
|
6fc940ed09
|
Add mBART-50 (#10154)
* add tokenizer for mBART-50
* update tokenizers
* make src_lang and tgt_lang optional
* update tokenizer test
* add setter
* update docs
* update conversion script
* update docs
* update conversion script
* update tokenizer
* update test
* update docs
* doc
* address Sylvain's suggestions
* fix test
* fix formatting
* nits
|
2021-02-15 20:58:54 +05:30 |
|
Sylvain Gugger
|
803498318c
|
[Doc] Fix version control in internal pages (#10124)
|
2021-02-13 08:52:30 -05:00 |
|
Stas Bekman
|
b54cb0bd82
|
[DeepSpeed in notebooks] Jupyter + Colab (#10130)
* init devices/setup explicitly
* docs + test
* simplify
* cleanup
* cleanup
* cleanup
* correct the required dist setup
* derive local_rank from env LOCAL_RANK
|
2021-02-11 14:02:05 -08:00 |
|
Tanmay Thakur
|
2f3b5f4dcc
|
Add new community notebook - Blenderbot (#10126)
* Update:community.md, new nb add
* feat: updated grammar on nb description
* Update: Train summarizer for BlenderBotSmall
|
2021-02-11 12:53:40 +03:00 |
|
Stas Bekman
|
7c07a47dfb
|
[DeepSpeed docs] new information (#9610)
* how to specify a specific gpu
* new paper
* expand on buffer sizes
* style
* where to find config examples
* specific example
* small updates
|
2021-02-09 22:16:20 -08:00 |
|
Boris Dayma
|
7c7962ba89
|
doc: update W&B related doc (#10086)
* doc: update W&B related doc
* doc(wandb): mention report_to
* doc(wandb): commit suggestion
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* doc(wandb): fix typo
* doc(wandb): remove WANDB_DISABLED
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
|
2021-02-09 14:47:52 -05:00 |
|
Sylvain Gugger
|
0c3d23dff7
|
Add patch releases to the doc
|
2021-02-09 14:17:09 -05:00 |
|
Lysandre Debut
|
78f4a0e7e5
|
Logging propagation (#10092)
* Enable propagation by default
* Document enable/disable default handler
|
2021-02-09 10:27:49 -05:00 |
|
Patrick von Platen
|
b972125ced
|
Deprecate Wav2Vec2ForMaskedLM and add Wav2Vec2ForCTC (#10089)
* add wav2vec2CTC and deprecate for maskedlm
* remove from docs
|
2021-02-09 03:49:02 -05:00 |
|
Juan Cruz-Benito
|
e4bf9910dc
|
Removing run_pl_glue.py from text classification docs, include run_xnli.py & run_tf_text_classification.py (#10066)
* Removing run_pl_glue.py from seq classification docs
* Adding run_tf_text_classification.py
* Using :prefix_link: to refer local files
* Applying "make style" to the branch
* Update docs/source/task_summary.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removing last underscores
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
|
2021-02-08 13:04:21 -05:00 |
|
Lysandre
|
0dd579c9cf
|
Docs for v4.3.0
|
2021-02-08 18:53:24 +01:00 |
|
Sylvain Gugger
|
45aaf5f7ab
|
A few fixes in the documentation (#10033)
|
2021-02-08 05:02:01 -05:00 |
|
Patrick von Platen
|
89be094e29
|
[Templates] Add template "call-for-model" markdown and "call-for-big-bird" markdown (#9921)
* add big bird
* change teacher to mentor
* add proposal template
* adapt template
* delete old template
* correct some links
* finish template
* create big bird from template
* add big bird
* improve boxes
* finish boxes
* add pointers for BigBird
* finish big bird
* up
* up
* up
* up
* apply lysandres and sylvains suggestions
* delete bogus file
* correct markdown
* try different style
* try different style
* finalize
|
2021-02-05 15:47:54 +03:00 |
|
Sylvain Gugger
|
3be965c5db
|
Update doc for pre-release (#10014)
* Update doc for pre-release
* Use stable as default
* Use the right commit :facepalms:
|
2021-02-04 16:52:27 -05:00 |
|
Sylvain Gugger
|
b72f16b3ec
|
Fix doc for TFConverBertModel
|
2021-02-04 10:14:46 -05:00 |
|
demSd
|
00031785a8
|
BartForCausalLM analogs to ProphetNetForCausalLM (#9128)
* initiliaze bart4causalLM
* create BartDecoderWrapper, setters/getters
* delete spaces
* forward and additional methods
* update cache function, loss function, remove ngram* params in data class.
* add bartcausallm, bartdecoder testing
* correct bart for causal lm
* remove at
* add mbart as well
* up
* fix typo
* up
* correct
* add pegasusforcausallm
* add blenderbotforcausallm
* add blenderbotsmallforcausallm
* add marianforcausallm
* add test for MarianForCausalLM
* add Pegasus test
* add BlenderbotSmall test
* add blenderbot test
* fix a fail
* fix an import fail
* a fix
* fix
* Update modeling_pegasus.py
* fix models
* fix inputs_embeds setting getter
* adapt tests
* correct repo utils check
* finish test improvement
* fix tf models as well
* make style
* make fix-copies
* fix copies
* run all tests
* last changes
* fix all tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2021-02-04 11:56:12 +03:00 |
|
yylun
|
5442a11f5f
|
fix steps_in_epoch variable in trainer when using max_steps (#9969)
* fix steps_in_epoch variable when using max_steps
* redundant sentence
* Revert "redundant sentence"
This reverts commit ad5c0e9b6e66d65732dee2239cdc9c76dfa0dc5a.
* remove redundant sentence
Co-authored-by: wujindou <wujindou@sogou-inc.com>
|
2021-02-03 09:30:37 -05:00 |
|
Patrick von Platen
|
d6217fb30c
|
Wav2Vec2 (#9659)
* add raw scaffold
* implement feat extract layers
* make style
* remove +
* correctly convert weights
* make feat extractor work
* make feature extraction proj work
* run forward pass
* finish forward pass
* Succesful decoding example
* remove unused files
* more changes
* add wav2vec tokenizer
* add new structure
* fix run forward
* add other layer norm architecture
* finish 2nd structure
* add model tests
* finish tests for tok and model
* clean-up
* make style
* finish docstring for model and config
* make style
* correct docstring
* correct tests
* change checkpoints to fairseq
* fix examples
* finish wav2vec2
* make style
* apply sylvains suggestions
* apply lysandres suggestions
* change print to log.info
* re-add assert statement
* add input_values as required input name
* finish wav2vec2 tokenizer
* Update tests/test_tokenization_wav2vec2.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* apply sylvains suggestions
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
|
2021-02-02 15:52:10 +03:00 |
|
Sylvain Gugger
|
de38a6e4d2
|
Fix 9918 (#9932)
* Initial work
* Fix doc styler and other models
|
2021-02-02 05:22:20 -05:00 |
|
Patrick von Platen
|
0e3be1ac8f
|
Add new model docs (#9667)
* add new model logic
* fix docs
* change structure
* improve add_new_model
* push new changes
* up
* up
* correct spelling
* improve docstring
* correct line length
* update readme
* correct links
* correct typos
* only add rst file for now
* Apply suggestions from code review 1
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
* Apply suggestions from code review
Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
* finish adding all suggestions
* make style
* apply Niels feedback
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply sylvains suggestions
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
|
2021-02-01 17:55:10 +03:00 |
|
Stas Bekman
|
40cfc355f1
|
[doc] nested markup is invalid in rst (#9898)
Apparently nested markup in RST is invalid: https://docutils.sourceforge.io/FAQ.html#is-nested-inline-markup-possible
So currently this line doesn't get rendered properly, leaving inner markdown unrendered, resulting in:
```
https://docutils.sourceforge.io/FAQ.html#is-nested-inline-markup-possible
```
This PR removes the bold which fixes the link.
|
2021-01-30 09:59:19 -05:00 |
|
Stas Bekman
|
15e4ce353a
|
[docs] expand install instructions (#9817)
* expand install instructions
* fix
* white space
* rewrite as discussed in the PR
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* change the wording to encourage issue report
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
|
2021-01-28 09:36:46 -08:00 |
|
Joe Davison
|
caddf9126b
|
tutorial typo
|
2021-01-28 09:21:58 -05:00 |
|
Stefan Schweter
|
5ed5a54684
|
ADD BORT (#9813)
* tests: add integration tests for new Bort model
* bort: add conversion script from Gluonnlp to Transformers 🚀
* bort: minor cleanup (BORT -> Bort)
* add docs
* make fix-copies
* clean doc a bit
* correct docs
* Update docs/source/model_doc/bort.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/model_doc/bort.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* correct dialogpt doc
* correct link
* Update docs/source/model_doc/bort.rst
* Update docs/source/model_doc/dialogpt.rst
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make style
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
|
2021-01-27 21:25:11 +03:00 |
|
abhishek thakur
|
f617490e71
|
ConvBERT Model (#9717)
* finalize convbert
* finalize convbert
* fix
* fix
* fix
* push
* fix
* tf image patches
* fix torch model
* tf tests
* conversion
* everything aligned
* remove print
* tf tests
* fix tf
* make tf tests pass
* everything works
* fix init
* fix
* special treatment for sepconv1d
* style
* 🙏🏽
* add doc and cleanup
* add electra test again
* fix doc
* fix doc again
* fix doc again
* Update src/transformers/modeling_tf_pytorch_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/models/conv_bert/configuration_conv_bert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update docs/source/model_doc/conv_bert.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/conv_bert/configuration_conv_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* conv_bert -> convbert
* more fixes from review
* add conversion script
* dont use pretrained embed
* unused config
* suggestions from julien
* some more fixes
* p -> param
* fix copyright
* fix doc
* Update src/transformers/models/convbert/configuration_convbert.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* comments from reviews
* fix-copies
* fix style
* revert shape_list
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2021-01-27 03:20:09 -05:00 |
|
Yusuke Mori
|
cb73ab5a38
|
Fix broken links in the converting tf ckpt document (#9791)
* Fix broken links in the converting tf ckpt document
* Update docs/source/converting_tensorflow_models.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Reflect the review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
|
2021-01-26 03:37:57 -05:00 |
|
Sylvain Gugger
|
7acfa95afb
|
Add missing new line
|
2021-01-20 14:13:16 -05:00 |
|
Darigov Research
|
5a307ece82
|
Adds flashcards to Glossary & makes small corrections (#8949)
* fix: Makes small typo corrections & standardises glossary
* feat: Adds introduction & links to transformer flashcards
* feat: Adds attribution & adjustments requested in #8949
* feat: Adds flashcards to community.md
* refactor: Removes flashcards from glossary
|
2021-01-20 13:28:40 -05:00 |
|
NielsRogge
|
88583d4958
|
Add notebook (#9696)
|
2021-01-20 10:19:26 -05:00 |
|
NielsRogge
|
d1370d29b1
|
Add DeBERTa head models (#9691)
* Add DebertaForMaskedLM, DebertaForTokenClassification, DebertaForQuestionAnswering
* Add docs and fix quality
* Fix Deberta not having pooler
|
2021-01-20 10:18:50 -05:00 |
|
acul3
|
8940c7662d
|
Add t5 convert to transformers-cli (#9654)
* Update run_mlm.py
* add t5 model to transformers-cli convert
* update rum_mlm.py same as master
* update converting model docs
* update converting model docs
* Update convert.py
* Trigger notification
* update import sorted
* fix typo t5
|
2021-01-20 09:34:27 -05:00 |
|
Sylvain Gugger
|
76f36e183a
|
Add a community page to the docs (#9682)
|
2021-01-20 04:54:36 -05:00 |
|
Stas Bekman
|
82498cbc37
|
[deepspeed doc] install issues + 1-gpu deployment (#9582)
* [doc] install + 1-gpu deployment
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improvements
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
|
2021-01-14 11:05:04 -08:00 |
|
Lysandre
|
e43f3b6190
|
v4.2.1 in docs
|
2021-01-14 14:25:30 +01:00 |
|
Lysandre
|
33a8497db8
|
v4.2.0 documentation
|
2021-01-13 16:15:40 +01:00 |
|
Lysandre
|
7d9a9d0c72
|
Release: v4.2.0
Model templates runner / run_tests_templates (push) Has been cancelled
Release - Conda / build_and_package (push) Has been cancelled
|
2021-01-13 16:01:51 +01:00 |
|
Julien Chaumond
|
247a7b2029
|
Doc: Update pretrained_models wording (#9545)
* Update pretrained_models.rst
To clarify things cf. this tweet for instance https://twitter.com/RTomMcCoy/status/1349094111505211395
* format
|
2021-01-13 05:58:05 -05:00 |
|
Stas Bekman
|
2df34f4aba
|
[trainer] deepspeed integration (#9211)
* deepspeed integration
* style
* add test
* ds wants to do its own backward
* fp16 assert
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* style
* for clarity extract what args are being passed to deepspeed
* introduce the concept of self.wrapped_model
* s/self.wrapped_model/self.model_wrapped/
* complete transition to self.wrapped_model / self.model
* fix
* doc
* give ds its own init
* add custom overrides, handle bs correctly
* fix test
* clean up model_init logic, fix small bug
* complete fix
* collapse --deepspeed_config into --deepspeed
* style
* start adding doc notes
* style
* implement hf2ds optimizer and scheduler configuration remapping
* oops
* call get_num_training_steps absolutely when needed
* workaround broken auto-formatter
* deepspeed_config arg is no longer needed - fixed in deepspeed master
* use hf's fp16 args in config
* clean
* start on the docs
* rebase cleanup
* finish up --fp16
* clarify the supported stages
* big refactor thanks to discovering deepspeed.init_distributed
* cleanup
* revert fp16 part
* add checkpoint-support
* more init ds into integrations
* extend docs
* cleanup
* unfix docs
* clean up old code
* imports
* move docs
* fix logic
* make it clear which file it's referring to
* document nodes/gpus
* style
* wrong format
* style
* deepspeed handles gradient clipping
* easier to read
* major doc rewrite
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* docs
* switch to AdamW optimizer
* style
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* clarify doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
|
2021-01-12 19:05:18 -08:00 |
|