Commit Graph

694 Commits

Author SHA1 Message Date
huismiling
d0b65bb479 [MLU] Fix FA2 check error, remove deepspeed-mlu deps. (#36159)
* add Cambricon MLUs support

* fix mlu device rng state

* up for quality check

* up mlu to support fp16

* fix mlu device dependency error

* fix mlu device dependency error

* enable mlu device for bf16

* fix mlu device memory tracker

* Cambricon support SDPA and flash_attn

* MLU devices : Checks if `mlu` is available via an `cndev-based` check which won't trigger the drivers and leave mlu

* Fix mlu FA2 check. Remove deepspeed-mlu check. add mlu tests support.

* fix testing errors.

* Merge branch 'hf/main' into main

* fix get_device_count error.

* fix mlu testing utils.

* fix code quality and style.

* switch to @require_torch_multi_accelerator
2025-03-31 11:02:49 +02:00
jiqing-feng
286393fbb1 enable tp on CPU (#36299)
* enable tp on CPU

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* get rank from cpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable TP tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* em print

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix model id

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix conflict

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix index and add doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-03-31 10:55:47 +02:00
Cyril Vallez
2bea6bf24e Fix AttentionInterface following feedback (#37010)
* up

* typo

* update doc

* Update attention_interface.md
2025-03-28 18:00:35 +01:00
Cyril Vallez
a86dad56bc Fix state_dict map location when quantized (#37086)
* Update modeling_utils.py

* Update modeling_utils.py
2025-03-28 17:57:16 +01:00
Yih-Dar
581cf96e0c fix tied weigths issue (#37031)
* fix

* comment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-28 16:36:44 +01:00
Minho Ryu
eca74d1367 [WIP] add deepseek-v3 (#35926)
* init commit

* style

* take comments into account

* add deepseekv3 modeling

* remove redundant code

* apply make style

* apply fix-copies

* make format

* add init files

* rename deepseekv3 into deepseek_v3 based on its model_type

* rename deepseekv3 into deepseek_v3 based on its model_type

* deepseek-v3 not deepseek_v3

* set model_type as deepseek_v3

* use default docs

* apply make

* fill type and docstring

* add rope_config_validation

* use custom DeepseekV3MLP

* hold code only for checkpoints congifuration; remove redundant

* revise rope yarn for DeepSeek variation

* rename DeepSeek-V3

* some refactoring

* revise load_hook to work properly; make moe func trainable; use llama instead of mixtral

* fix attention forward

* use -1 for not-changing dim when to use exapnd

* refactor DeepseekV3TopkRouter

* use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim

* register pre_hook and hook both

* make style

* use n_shared_experts

* Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add test file

* update modeling_file according to modular file

* make style

* add mapping for DeepseekV3ForSequenceClassification

* remove aux_loss_alpha

* add deepseek_v3 for perf

* add deepseek_v3

* rename test as deepseekv3

* use tiny-deepseek-v3

* remove DeepseekV3ForSequenceClassification

* cache before padding

* remote output_router_logits

* Revert "remote output_router_logits"

This reverts commit f264f800d04950390db8413b9efb24cef8186330.

* remove output_router_logits

* make e_score_correction_bias as buffer

* skip tests not compatible

* make style

* make e_score_correction_bias as buffer

* use rope_interleave instead of load_hook

* skip tests not compatible with MLA

* add doc for rope_interleave

* fix typo

* remove torch.no_grad for selecting topk

* fix post merge issue

* mrege with main and simplify

* nits

* final

* small fixes

* fix

* support TP better

* stash

* changes currently requires

* remove synch

* more fixes for TP

* temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used

* updates to have generation work!

* push most of the changes

* reorder functions + call for contributions!

* update readme

* nits

* update

* ruff was updated on main

* merge with main and fix copies

* revert unrelated changes

* route all tokens to all experts when testing to avoid no gradient iddues

* finish fixing all tests

* fixup

* nit

* clean config

* last readme changes

* nit

* do cnit

* typo

* last nit

* one more one more

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>
2025-03-28 15:56:59 +01:00
cyyever
41a0e58e5b Set weights_only in torch.load (#36991) 2025-03-27 14:55:50 +00:00
Kyle Sayers
d6d930a64b [Modeling] Load FP8 safetensors such as DeepSeek (#36828)
support loading fp8

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-03-27 10:47:10 +01:00
Mohamed Mekkouri
13d36e89fe Fix device_map check for ggml files (#37003)
fix
2025-03-26 16:24:57 +01:00
Cyril Vallez
788e1092e9 Allow easy registration of custom attention functions (#36889)
* Update modeling_utils.py

* style

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* add to init

* Update modeling_utils.py

* style

* update

* Update modeling_utils.py

* Update modeling_utils.py

* style

* Add some doc

* Update _toctree.yml

* readd it for tgi/vllm compat

* CIs

* CIs
2025-03-26 16:15:06 +01:00
Yih-Dar
c6814b4ee8 Update ruff to 0.11.2 (#36962)
* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 16:00:11 +01:00
Marc Sun
80b4c5dcc9 Fix cuda index issue in cache allocator (#36937)
fix
2025-03-25 11:51:41 +01:00
Mohamed Mekkouri
be2c0e7bff Fixing _pre_quantization_dtype when torch_dtype is None (#36930)
fix
2025-03-25 10:43:27 +01:00
Mohamed Mekkouri
2b8a15cc3f Disallow Offload to disk for gguf files (#36933)
update

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-24 19:30:01 +01:00
Cyril Vallez
dd3933dd65 Simplify keep_in_fp32_modules logic (#36722)
* better regex everywhere

* fix

* Update test_modeling_instructblip.py

* BC with explanations this time otherwise it makes no sense at all

* Update test_modeling_instructblip.py

* style

* CIs

* update _keep_in_fp32_modules in blip2

* Update modeling_utils.py

* Update modeling_utils.py

* style

* CIs

* add check

* trigger CIs

* Update modeling_utils.py

* trigger CIs
2025-03-21 16:12:59 +01:00
Raushan Turganbay
523f6e743c Fix: dtype cannot be str (#36262)
* fix

* this wan't supposed to be here, revert

* refine tests a bit more
2025-03-21 13:27:47 +01:00
Pavel Iakubovskii
66291778dd Refactor Attention implementation for ViT-based models (#36545)
* Refactor vit attention

* Refactor ViT-based models

* 🚨🚨🚨 Fix prefix for DPT

* Update params order

* trigger tests

* Fix Dinov2 attention

* Fix DPT attention impl propagation for backbone config

* Common test fix: config is modif. inplace - avoid it

* view->reshape

* Fixup

* Fixup

* Enable IJepa FA2

* Add FA2 in corresponding model docs
2025-03-20 15:15:01 +00:00
fxmarty-amd
1a374799ce Support loading Quark quantized models in Transformers (#36372)
* add quark quantizer

* add quark doc

* clean up doc

* fix tests

* make style

* more style fixes

* cleanup imports

* cleaning

* precise install

* Update docs/source/en/quantization/quark.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/quark_integration/test_quark.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* remove import guard as suggested

* update copyright headers

* add quark to transformers-quantization-latest-gpu Dockerfile

* make tests pass on transformers main + quark==0.7

* add missing F8_E4M3 and F8_E5M2 keys from str_to_torch_dtype

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-20 15:40:51 +01:00
Pavel Iakubovskii
cf8091c017 Fix import for torch 2.0, 2.1 - guard typehint for "device_mesh" (#36768)
* Fix device_mesh

* Remove rebase leftover
2025-03-20 11:55:47 +00:00
Joao Gante
b47d9b2f8a [generate] clarify docstrings: when to inherit GenerationMixin (#36605) 2025-03-20 10:58:54 +00:00
Artem Kudisov
63380b77d4 Pass state dict (#35234)
* Pass state_dict argument to get_peft_model_state_dict

* Style fix

* Change arguments order
2025-03-20 11:54:59 +01:00
Marc Sun
14b597f518 Fix casting dtype for qunatization (#36799)
* fix

* remove print
2025-03-18 18:46:03 +01:00
Cyril Vallez
db1d4c5a0b Loading optimizations (#36742)
* improvements

* Update modeling_utils.py

* add some doc about loading

* Update modeling_utils.py
2025-03-18 16:38:44 +01:00
Cyril Vallez
2c2495cc7b Fix post_init() code duplication (#36727)
* Update modeling_utils.py

* CIs
2025-03-14 17:36:02 +01:00
Joao Gante
3bd1a0ddf1 [model loading] don't gc.collect() if only 1 shard is used (#36721)
* don't gc collect if 1 shard is used

* delete state dict anyways
2025-03-14 12:56:56 +00:00
Isotr0py
b070025aa6 Add GGUF support to T5-Encoder (#36700)
* add gguf support to t5encoder

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

* remove gguf from model_kwargs

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-13 17:57:33 +01:00
Mohamed Mekkouri
4a60bae8e2 Handling an exception related to HQQ quantization in modeling (#36702)
* adding exception

* style

* add types
2025-03-13 17:53:36 +01:00
wineandchord
bb965d8e87 fix type annotation for ALL_ATTENTION_FUNCTIONS (#36690)
Corrects the type annotation to match actual usage. The variable was typed as
Dict[str, Dict[str, Callable]] but is actually used as Dict[str, Callable]
where keys are attention mechanism names and values are the corresponding
attention functions directly. This change makes the type annotation consistent
with how the dictionary is used in the codebase.
2025-03-13 14:27:50 +00:00
Marc Sun
bc3d5781e7 Fix slicing for 0-dim param (#36580)
* fix

* switch to ellipsis instead

* Add co-author
Co-authored-by: fxmarty-amd <fxmarty-amd@users.noreply.github.com>

* Add co-author second try
Co-authored-by: fxmarty-amd <felmarty@amd.com>
2025-03-13 12:16:13 +01:00
Marc Sun
fbb18ce68b Update config.torch_dtype correctly (#36679)
* fix

* style

* new test
2025-03-13 12:08:02 +01:00
Matt
c7eb95581a Don't accidentally mutate the base_model_tp_plan (#36677)
* Don't accidentally mutate the base_model_tp_plan

* Co-authored by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Trigger tests

* Marking grad accum test as slow

* Add a flaky decorator

* Add a flaky decorator

* Use cyril's codeblock

* Don't copy() when it's None

* Use cyril's new codeblock

* make fixup
2025-03-12 18:59:13 +00:00
Cyril Vallez
071a161d3e [core] Large/full refactor of from_pretrained (#36033)
* squash everything together
start to simplify inner logic

Update modeling_utils.py

Update modeling_utils.py

Update modeling_utils.py

Update modeling_utils.py

continue refactor

fix

small fixes

add type hints/docstring

Update modeling_utils.py

remove _fast_init

keep improving

Update modeling_utils.py

Update modeling_utils.py

new first tp loading version

style

fix weird in-place op

trigger CIs

Update modeling_utils.py

much clearer renaming of keys

fix

update

Update test_modeling_common.py

trigger CIs

update

update

style

Update modeling_utils.py

Update modeling_utils.py

Update modeling_utils.py

fix

fast download first prototype

remove old function

remove old functions

Remove unused function and move back _get_tp_registry

fix tp plan registry

simplify

CIs

Update hub.py

Update modeling_utils.py

simplify

simplify renaming logic

remove unused check

add sanity check back (a test depends on it)

Update modeling_utils.py

finalize sound renaming logic

style

add forgotten check

Update modeling_utils.py

add key_mapping keyword

style

Update modeling_utils.py

add comment

minor updates

minor change for clarity

fix small prefix issue and simplify

style

trigger CIs

typo fix

Post rebase fix

post rebase cleanup

simplify tp

typo

oupsi

typo

correctly escape

improvements based on Marc's review

finalize Marc's review comments

 squash everything

* improve

* Update modeling_utils.py

* Update modeling_utils.py

* fix

* Update modeling_utils.py

* Update modeling_utils.py

* style

* Update modeling_utils.py

* simplify

* style

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* fix dtype issue

* Update modeling_utils.py

* style

* remove test that does not make sense

* style

* small fixes

* style

* fix

* cleanup after rebase

* style

* typo

* escape

* tp for task specific top modules

* Update modeling_utils.py

* Update modeling_utils.py

* fix allocation

* CIs

* CIs

* CIs

* improve docstring

* CIs

* Update modeling_utils.py

* fix
2025-03-12 13:39:25 +01:00
Marc Sun
7652804d23 Fix bnb regression due to empty state dict (#36663)
fix
2025-03-12 11:40:46 +01:00
Arthur
2829013d2d fix block mask typing (#36661)
* fix block mask typing

* updated

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* gemma

* fix

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-03-12 11:29:11 +01:00
Ryan Mullins
50d3530aa0 Gemma3 (#36658)
* Fix converter

* [Broken] Adds Gemma 3 to Hugging Face Transformers

* Consolidating Config and Processor params across impls

* Sorting out configuration parameters. Adds qk_norm before RoPE. Still not sure if RoPE is right.

* Additional plumbing for CausalLM and ConditionalGeneration variants

* incomplete draft of Orbax conversion script

* More complete checkpoint conversion

* Supporting Gemma 3 1B checkpoints

* Updating RoPE for multiple frequencies

* Adjustments to rotary embedder

* Proof of life for text-only operation

* Updating the conversion script to handle multimodal projection weights

* Fixing tet-only conversions

* Cleaner conversion script with multimodal support and a simpler processor

* Additional refatcors to the Gemma3Processor

* Simplified Processor to work over text representations

* Updated conversion script to join text and vision embeddings at converion time

* Logging for debugging

* Update src/transformers/models/gemma2/modeling_gemma2.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* Removed extraneous Config params

* Switching to fast tokenizer for checkpoint conversions

* isolating siglip for performance tetsing

* Minor changes for debugging tests against baselines

* Adding average pooling for soft tokens

* Updating processor code to enable simpler embedding interleaving for arbitrary number of images in prompts

* Updating conversion script for ShieldGemma 2 conversion compatibility

* Allow disable_compile to be provided as a kwarg

* Refresh from modular

* Updated conversion script and corrected sliding window

* Fix type mismatch in cache_position (#4)

* Fix dtype (#5)

* Fix type mismatch in cache_position

* Actually fix in the modular file

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

---------

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* fixes for embedding table overflow and missing image_soft_token_mask from Gemma3Processor

* Adding 2D pooling for image embeddings

* Revert "Adding 2D pooling for image embeddings"

This reverts commit 65350cf531296f050b2078a5b8e46f61642b2648.

* Gemma3 average pooling changed from 1D to 2D

* Major refactor to Gemma3MultimodalInputProjection

* Updating Gemm 3 Auto* registrations

* Add option to save Gemma 3 chat template with tokenizer during weights conversion

* Removing unused imports

* Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditionalGeneration

* Removing duplicate config property

* Removing final logit softcapping and 1-indexing of position ids

* Fixing image processor config and none --> None typo

* Fixing sliding window size for 1B

* Updating image_mean and image_std in Image Processor

* Attention masking changed to lower triangular

* Moving image special tokens to conversion script

* Mirror image processor defaults from conversion script into Gemma3ProcessorKwargs

* Remove special token variables from symbol space

* Moving image soft token mask computation from Gemma3Processor to Gemma3ForConditionalGeneration

* tie lm_head and embedding weights

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Correct tied weights in Gemma3CausalLM

* iterative bidirectional attention

* resolving merge conflicts

* Reverting to Gemma 2 HybridCache with sldiing window support and a sliding_window_pattern of 6

* Correcting RoPE scaling

* clean up first pass, dummy model geenration works

* final clean up before fixing tests

* causal lm test works, so fine

* Fix conversion

* Update src/transformers/models/gemma3/processing_gemma3.py

* model tests are happy

* processor tests are happy

* image processing tests added

* fixup

* Fix pre-processing in conversion

* Inputs merging

* Do not normalize vision embeddings

* Apply Ryan's (and team) changes to attention

* token type ids + mask

* template

* move embed scale, add rope scale, fix tests

* Add chat template to tokenizer

* Use prefix for causal model loading

* use existing code for sliding mask from gemma2

* self.embed_tokens already normalizes

* Correcting Gemma3TextConfig parameters in conversion script

* typo, modular overwrites my fixes

* enable device map for text model

* Conversion updates

* ultra nit: no einsums

* update image token

* copy deepcopy config + some docs

* add some test, still WIP

* Refactoring --include_chat_tempalte logic in converter

* Update src/transformers/models/gemma3/modular_gemma3.py

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Add eos tokens for instruct models

* dump so i can work on dgx

* Removing add_bos by default

* dump

* add fast im proc

* docs for PaS + fixup

* another fixup

* one more fixup

* fix tests

* Inverting prior BOS change

* ultra nit

* Reverting to Tokenizer saved with add_bos_token=True and chat template starting with BOS

* resize embeds, remove sqrt, add slow test outputs

* FA2 but quality is meh

* nit

* skip FA2, no idea what happened

* last bit for green CI

* please, green CI for docs

* T_T

* Fix for Gemma3 logits

* Support both options for system prompt

* Update src/transformers/models/gemma3/image_processing_gemma3_fast.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Docs updates now that assets are live

* Style fixes

---------

Co-authored-by: Joshua Lochner <admin@xenova.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Lysandre <hi@lysand.re>
2025-03-12 09:06:17 +01:00
Matt
556d2c23c6 Remove remote code warning (#36285)
* Remove redundant pipeline warning

* Remove redundant pipeline warning
2025-03-11 13:29:15 +00:00
Arthur
1c4b62b219 Refactor some core stuff (#36539)
* some config changes

* update

* current state

* update

* update

* updates and cleanup

* something that works

* fixup

* fixes

* nits

* nit

* nits and fix

* Update src/transformers/integrations/tensor_parallel.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update src/transformers/integrations/tensor_parallel.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* cleanup

* style

* safe import

* fix

* updates

* rename stuff an clean

* style

* small updates

* ups

* oups

* nit

* protect imports

* update tp

* rodfl

* arf

* turbo nit on init

* fix import error

* frumble gumbgle

* try to fix the import error

* should fix the non model test

* update keep in float32

* update

* fix

* nits

* fix subvconfigs

* test was weird

* nit

* fix failing test

* fix instruct blip

* fixes

* style

* x.com

* fix overwrite

* ok last bit of failing test

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2025-03-11 09:26:28 +01:00
湛露先生
9e385109cf Delete redundancy if case in model_utils (#36559)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-03-06 11:36:11 +00:00
Afanti
9e84b38135 chore: enhance message descriptions in parameters,comments,logs and docstrings (#36554)
* chore: enhance message descriptons in parameters,comments,logs and docstrings

* chore: enhance message descriptons in parameters,comments,logs and docstrings

* Update src/transformers/hf_argparser.py

* Update src/transformers/keras_callbacks.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-03-06 11:02:35 +00:00
Marc Sun
752ef3fd4e guard torch version for uint16 (#36520)
* u16

* style

* fix
2025-03-05 11:27:01 +01:00
Marc Sun
0463901c92 fix torch_dtype, contiguous, and load_state_dict regression (#36512)
* fix regression

* fix param

* fix load_state_dict

* style

* better fix for module

* fix tests

* quick fix for now

* rm print
2025-03-03 18:35:37 +01:00
Zach Mueller
4d8259d245 Fix loading zero3 weights (#36455)
* Check if fixes

* Fix zero3 loading

* Quality

* Fix marc nit

* Add fast tests

* Migrate to integrations.deepspeed rather than modeling_utils

* Style
2025-03-03 15:05:58 +01:00
hlky
dcbdf7e962 Fix _load_state_dict_into_meta_model with device_map=None (#36488)
* Fix _load_state_dict_into_meta_model with device_map=None

* Update src/transformers/modeling_utils.py
2025-03-02 08:33:36 +01:00
Marc Sun
a40f1ac602 Fix couples of issues from #36335 (#36453)
* fix

* style

* better allocation

* fix

* fix

* style

* revert disk

* exit

* style

* return if nothing to cache

* dtensor guard

* fix regressiion

* fix regression

* fix

* fix
2025-03-01 07:12:17 +01:00
Pavel Iakubovskii
02776d2c6a Fix loading models with mismatched sizes (#36463)
* Fix loading model with mismatched sizes

* trigger tests
2025-02-28 11:48:59 +01:00
wejoncy
17792556b2 [save_pretrained ] Skip collecting duplicated weight (#36409)
* Skip collecting duplicated weight

* format
2025-02-27 10:57:11 +01:00
Marc Sun
8ede897c30 restrict cache allocator to non quantized model (#36428) 2025-02-26 22:16:15 +01:00
Arthur
1603018e7a Update form pretrained to make TP a first class citizen (#36335)
* clean code

* oups

* fix merge

* yups

* fix if

* now you can play

* fix shape issue

* try non blocking

* fix

* updates

* up

* updates

* fix most of thetests

* update

* update

* small updates

* up

* fix the remaining bug?

* update

* rename when you read from the file

* buffer issues

* current status

* cleanup

* properly allocate dumb memory

* update a small bug

* fix colwise rep issue

* fix keep in float 32 that was keeping everything in float 32

* typo

* more fixes with keep_in_fp32_modules as we use to serach on it

* fix ROPE dtype for TP

* remove what's breaking the tests

* updates

* update and fixes

* small cleanup after merging

* allocate 2x to be safe

* style, auto

* update

* yup nit

* fix

* remove slow as fuck torch api :(

* work

* fixup

* update

* brting the fix back

* fix and update

* fixes

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* updates because some suggestions were wrong 👀

* update?

* fuck this bloated function

* typo

* fix the dumb prefix thing once and forall

* fixes here and there

* updates

* remove prints

* fix strict cases

* styel

* properly fix keys on load!

* update

* fix base model prefix issue

* style

* update

* fix all?

* remoce 1 print

* fix the final etsts

* fixup

* last nits

* fix the detach issue which cause a 2x slowdown

* fixup

* small fixes

* ultra nit

* fix

* fix

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-26 20:12:38 +01:00
zheliuyu
6513e5e402 add recommendations for NPU using flash_attn (#36383)
* add recommendations for Ascend NPU using flash_attn

* update recommend_message_npu

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-26 14:51:08 +01:00
Cyril Vallez
4b5cf5496d Load models much faster on accelerator devices!! (#36380)
* caching allocator warmup

* Update modeling_utils.py

* reuse expanded map

* style
2025-02-25 09:41:22 +01:00