Fix typos (#25936)
* fix typo * fix typo * fix typo * fix typos * fix typos * fix typo * fix typo * fix typo * fix typos * fix typo * fix typo * fix typo * fix typos * fix typos
This commit is contained in:
@@ -23,11 +23,11 @@ from PyTorch is:
|
||||
2. Load your pretrained weights.
|
||||
3. Put those pretrained weights in your random model.
|
||||
|
||||
Step 1 and 2 both require a full version of the model in memory, which is not a problem in most cases, but if your model starts weighing several GigaBytes, those two copies can make you got our of RAM. Even worse, if you are using `torch.distributed` to launch a distributed training, each process will load the pretrained model and store these two copies in RAM.
|
||||
Step 1 and 2 both require a full version of the model in memory, which is not a problem in most cases, but if your model starts weighing several GigaBytes, those two copies can make you get out of RAM. Even worse, if you are using `torch.distributed` to launch a distributed training, each process will load the pretrained model and store these two copies in RAM.
|
||||
|
||||
<Tip>
|
||||
|
||||
Note that the randomly created model is initialized with "empty" tensors, which take the space in memory without filling it (thus the random values are whatever was in this chunk of memory at a given time). The random initialization following the appropriate distribution for the kind of model/parameters instatiated (like a normal distribution for instance) is only performed after step 3 on the non-initialized weights, to be as fast as possible!
|
||||
Note that the randomly created model is initialized with "empty" tensors, which take the space in memory without filling it (thus the random values are whatever was in this chunk of memory at a given time). The random initialization following the appropriate distribution for the kind of model/parameters instantiated (like a normal distribution for instance) is only performed after step 3 on the non-initialized weights, to be as fast as possible!
|
||||
|
||||
</Tip>
|
||||
|
||||
@@ -120,4 +120,4 @@ If you want to directly load such a sharded checkpoint inside a model without us
|
||||
|
||||
Sharded checkpoints reduce the memory usage during step 2 of the workflow mentioned above, but in order to use that model in a low memory setting, we recommend leveraging our tools based on the Accelerate library.
|
||||
|
||||
Please read the following guide for more information: [Large model loading using Accelerate](./main_classes/model#large-model-loading)
|
||||
Please read the following guide for more information: [Large model loading using Accelerate](./main_classes/model#large-model-loading)
|
||||
|
||||
Reference in New Issue
Block a user