[DeepSpeed] ZeRO Stage 3 (#10753)

* synced gpus

* fix

* fix

* need to use t5-small for quality tests

* notes

* complete merge

* fix a disappearing std stream problem

* start zero3 tests

* wip

* tune params

* sorting out the pre-trained model loading

* reworking generate loop wip

* wip

* style

* fix tests

* split the tests

* refactor tests

* wip

* parameterized

* fix

* workout the resume from non-ds checkpoint pass + test

* cleanup

* remove no longer needed code

* split getter/setter functions

* complete the docs

* suggestions

* gpus and their compute capabilities link

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* style

* remove invalid paramgd

* automatically configure zero3 params that rely on hidden size

* make _get_resized_embeddings zero3-aware

* add test exercising resize_token_embeddings()

* add docstring

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

This commit is contained in:

Stas Bekman

2021-04-08 09:53:01 -07:00

committed by

GitHub

parent acc851e1ff

commit c6d664849b

10 changed files with 1307 additions and 268 deletions

									
										1

tests/test_trainer.py
									
												View File
												
				@@ -132,6 +132,7 @@ class RegressionModelConfig(PretrainedConfig):

				        self.a = a

				        self.b = b

				        self.double_output = double_output

				        self.hidden_size = 1

				if is_torch_available():

[DeepSpeed] ZeRO Stage 3 (#10753)

1 tests/test_trainer.py Unescape Escape View File

1

tests/test_trainer.py

View File