Speedup model init on CPU (by 10x+ for llama-3-8B as one example) (#31771)
* 1,100%! * Clean * Don't touch DS * Experiment with dtype allocation * skip test_load_save_without_tied_weights test * A little faster * Include proper upscaling? * Fixup tests * Potentially skip? * Let's see if this fixes git history * Maintain new dtype * Fin * Rm hook idea for now * New approach, see what breaks * stage * Clean * Stash * Should be fin now, just need to mark failing models * Clean up * Simplify * Deal with weird models * Enc/Dec * Skip w/ reason * Adjust test * Fix test * one more test * Keep experimenting * Fix ref * TO REMOVE: testing feedback CI * Right push * Update tests/utils/test_modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * disable * Add new func * Test nits from Amy * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Adjust comment * Adjust comment on skip * make private * Fin * Should be a not flag * Clarify and rename test --------- Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
This commit is contained in:
@@ -40,6 +40,10 @@ for text generation, [`~generation.GenerationMixin`] (for the PyTorch models),
|
||||
- push_to_hub
|
||||
- all
|
||||
|
||||
Custom models should also include a `_supports_assign_param_buffer`, which determines if superfast init can apply
|
||||
on the particular model. Signs that your model needs this are if `test_save_and_load_from_pretrained` fails. If so,
|
||||
set this to `False`.
|
||||
|
||||
## ModuleUtilsMixin
|
||||
|
||||
[[autodoc]] modeling_utils.ModuleUtilsMixin
|
||||
|
||||
Reference in New Issue
Block a user