Stas Bekman
c6d664849b
[DeepSpeed] ZeRO Stage 3 (#10753)
* synced gpus
* fix
* fix
* need to use t5-small for quality tests
* notes
* complete merge
* fix a disappearing std stream problem
* start zero3 tests
* wip
* tune params
* sorting out the pre-trained model loading
* reworking generate loop wip
* wip
* style
* fix tests
* split the tests
* refactor tests
* wip
* parameterized
* fix
* workout the resume from non-ds checkpoint pass + test
* cleanup
* remove no longer needed code
* split getter/setter functions
* complete the docs
* suggestions
* gpus and their compute capabilities link
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* style
* remove invalid paramgd
* automatically configure zero3 params that rely on hidden size
* make _get_resized_embeddings zero3-aware
* add test exercising resize_token_embeddings()
* add docstring
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-08 09:53:01 -07:00
..
2021-04-06 12:53:25 -04:00
2021-02-05 15:47:54 +03:00
2021-03-29 10:39:14 -04:00
2021-04-08 09:53:01 -07:00
2021-04-08 08:22:58 -04:00
2021-02-01 17:55:10 +03:00
2021-01-11 08:53:41 -05:00
2021-01-05 06:18:48 -05:00
2021-03-31 17:00:56 +03:00
2021-04-06 14:56:18 +02:00
2020-06-17 14:01:10 -04:00
2021-01-26 03:37:57 -05:00
2021-03-26 08:07:59 -04:00
2020-05-27 11:36:55 -04:00
2021-04-05 10:51:16 -04:00
2020-02-25 13:48:24 -05:00
2021-02-28 08:27:54 -05:00
2021-04-05 10:51:16 -04:00
2021-04-01 14:25:47 -04:00
2020-12-07 18:36:34 -05:00
2021-01-30 09:59:19 -05:00
2021-01-27 03:20:09 -05:00
2021-01-05 06:18:48 -05:00
2020-04-06 14:32:39 -04:00
2021-04-01 11:58:37 -06:00
2020-12-23 10:15:49 -05:00
2020-12-23 10:15:49 -05:00
2021-03-30 11:15:55 -04:00
2020-12-23 10:15:49 -05:00
2021-04-01 23:13:47 +02:00
2020-12-07 18:36:34 -05:00
2021-04-05 09:36:20 -04:00
2021-03-17 09:23:38 -04:00
2020-12-23 10:15:49 -05:00
2021-01-12 19:05:18 -08:00