Stas Bekman
c6d664849b
[DeepSpeed] ZeRO Stage 3 (#10753)
* synced gpus
* fix
* fix
* need to use t5-small for quality tests
* notes
* complete merge
* fix a disappearing std stream problem
* start zero3 tests
* wip
* tune params
* sorting out the pre-trained model loading
* reworking generate loop wip
* wip
* style
* fix tests
* split the tests
* refactor tests
* wip
* parameterized
* fix
* workout the resume from non-ds checkpoint pass + test
* cleanup
* remove no longer needed code
* split getter/setter functions
* complete the docs
* suggestions
* gpus and their compute capabilities link
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* style
* remove invalid paramgd
* automatically configure zero3 params that rely on hidden size
* make _get_resized_embeddings zero3-aware
* add test exercising resize_token_embeddings()
* add docstring
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-08 09:53:01 -07:00
..
2021-04-05 12:27:23 -04:00
2020-12-07 18:36:34 -05:00
2021-03-26 11:23:56 -04:00
2021-02-09 10:27:49 -05:00
2020-12-16 13:03:32 +01:00
2020-12-22 11:33:44 -05:00
2021-03-03 14:55:18 -05:00
2020-12-16 12:31:50 -05:00
2021-01-05 06:18:48 -05:00
2021-04-05 10:51:16 -04:00
2021-04-08 09:53:01 -07:00