[DeepSpeed] ZeRO Stage 3 (#10753)
* synced gpus * fix * fix * need to use t5-small for quality tests * notes * complete merge * fix a disappearing std stream problem * start zero3 tests * wip * tune params * sorting out the pre-trained model loading * reworking generate loop wip * wip * style * fix tests * split the tests * refactor tests * wip * parameterized * fix * workout the resume from non-ds checkpoint pass + test * cleanup * remove no longer needed code * split getter/setter functions * complete the docs * suggestions * gpus and their compute capabilities link * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * style * remove invalid paramgd * automatically configure zero3 params that rely on hidden size * make _get_resized_embeddings zero3-aware * add test exercising resize_token_embeddings() * add docstring Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
This commit is contained in:
@@ -132,6 +132,7 @@ class RegressionModelConfig(PretrainedConfig):
|
||||
self.a = a
|
||||
self.b = b
|
||||
self.double_output = double_output
|
||||
self.hidden_size = 1
|
||||
|
||||
|
||||
if is_torch_available():
|
||||
|
||||
Reference in New Issue
Block a user