Cyril Vallez
58e5e976e0
Small fix on context manager detection ( #37562 )
...
* small fixes
* Update modeling_utils.py
* test
* Update test_modeling_common.py
* Update test_modeling_timm_backbone.py
* more general
* simpler
2025-04-17 15:39:44 +02:00
Bowen Bao
e3d3b54638
Keep Quark loading through meta device ( #37538 )
2025-04-16 14:19:56 +02:00
Cyril Vallez
7dafcd0077
More appropriate cuda warmup in resource-constrained hardware ( #37550 )
...
* better allocation in resource constrained env
* Update modeling_utils.py
* CIs
2025-04-16 13:40:02 +02:00
Cyril Vallez
c8e0e603de
Detect and use device context manager or global device in from_pretrained ( #37216 )
...
* Update modeling_utils.py
* improve
* Update modeling_utils.py
* Update test_modeling_common.py
* Update test_modeling_timm_backbone.py
* Update test_modeling_common.py
* Update test_modeling_common.py
* Update test_modeling_common.py
* Update test_modeling_common.py
* CIs
2025-04-15 09:59:20 +02:00
Cyril Vallez
4e53840920
Detect and fix most _init_weights() issues - make it work for composite models ( #37070 )
...
* Update test_modeling_common.py
* Fix Llama and its modular children
* Update test_modeling_common.py
* qwen3
* first try at prioritizing models
* Update test_modeling_common.py
* Update test_modeling_common.py
* Update test_modeling_common.py
* test
* fix
* fix
* more models
* more
* more
* more
* smarter init for composite models!
* fix post rebase
* smol
* fix missing args
* more
* typo
* Super elegant and efficient init for submodels
* Update modeling_utils.py
* style
* last fixes
* cleanup
* finalize cleanup
* CIs
* improve docstring
* Update modeling_utils.py
* llama4
* style
* CIs
* style
* add dpt
* granite speech
* qwen 2.5 omni
* better fix
* Parse the config file instead
* CIs
2025-04-14 16:19:04 +02:00
Bowen Bao
6cef03ba66
[Regression] Fix Quark quantized model loading after refactorization ( #37407 )
2025-04-11 13:43:36 +02:00
cyyever
371c44d0ef
Remove old code for PyTorch, Accelerator and tokenizers ( #37234 )
...
* Remove unneeded library version checks
Signed-off-by: cyy <cyyever@outlook.com >
* Remove PyTorch condition
Signed-off-by: cyy <cyyever@outlook.com >
* Remove PyTorch condition
Signed-off-by: cyy <cyyever@outlook.com >
* Fix ROCm get_device_capability
Signed-off-by: cyy <cyyever@outlook.com >
* Revert "Fix ROCm get_device_capability"
This reverts commit 0e756434bd7e74ffd73de5500476072b096570a6.
* Remove unnecessary check
Signed-off-by: cyy <cyyever@outlook.com >
* Revert changes
Signed-off-by: cyy <cyyever@outlook.com >
---------
Signed-off-by: cyy <cyyever@outlook.com >
2025-04-10 20:54:21 +02:00
Mehant Kammakomati
7d76876498
(Part 2) feat: allow for tp_size attr for tplizing the model ( #37054 )
...
* feat: custom tp_size, new transformers tp interface
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* fix: review cmt - error when tp_plan not set for tp_size
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* fix: nit in docs
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
---------
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com >
2025-04-10 17:44:09 +02:00
Mohamed Mekkouri
0ea1151222
Llama Kernel integration ( #37092 )
...
* initial commit
* style
* update
* change approach attention
* clean up
* fix import
* update
* update
* fix style
* change method
* attention
* add mlp back
* change name
* update name
* fix copies
* fix config
* fix
2025-04-10 17:13:25 +02:00
Wang, Yi
ae5ce22664
from_pretrained should handle xpu case ( #37382 )
...
* from_pretrained should handle xpu case
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com >
* fmt
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com >
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com >
2025-04-10 13:23:17 +02:00
Joao Gante
4321b0648c
[core] remove GenerationMixin inheritance by default in PreTrainedModel ( #37173 )
2025-04-08 16:42:05 +01:00
Cyril Vallez
cdfb018d03
A bit of cleaning 🧹 🧹 ( #37215 )
...
* cleaning
* CIs
2025-04-08 14:33:58 +02:00
Cyril Vallez
48e179857c
Remove HQQ from caching allocator warmup ( #37347 )
...
Update modeling_utils.py
2025-04-07 18:33:48 +02:00
Yih-Dar
e7ad077012
byebye torch 2.0 ( #37277 )
...
* bump Torch 2.1 with broken compatibility `torch.compile`
* dep table
* remove usage of is_torch_greater_or_equal_than_2_1
* remove usage of is_torch_greater_or_equal_than_2_1
* remove if is_torch_greater_or_equal("2.1.0")
* remove torch >= "2.1.0"
* deal with 2.0.0
* PyTorch 2.0+ --> PyTorch 2.1+
* ruff 1
* difficult ruff
* address comment
* address comment
---------
Co-authored-by: Jirka B <j.borovec+github@gmail.com >
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-04-07 15:19:47 +02:00
Cyril Vallez
08f36771b3
Fix init empty weights without accelerate ( #37337 )
...
* add the integration
* Update accelerate.py
* Update accelerate.py
* add find_tied_params as well
* Update accelerate.py
* add where copied from
* simplify
* add error
2025-04-07 11:37:29 +02:00
Cyril Vallez
9db31ea585
Fix deepspeed with quantization ( #37324 )
...
* Update modeling_utils.py
* Update modeling_utils.py
2025-04-07 11:36:44 +02:00
Arthur
25b7f27234
Add llama4 ( #37307 )
...
* remove one of the last deps
* update fast image processor after refactor
* styling
* more quality of life improvements
* nit
* update
* cleanups
* some cleanups
* vllm updates
* update fake image token
* [convert] Fix typo
* [convert] Strip extraneous bytes from shards
* [convert] Minor fixes
* [convert] Use num_experts
* multi-image fixes in modeling + processor
* fixup size
* 128 experts
* Use default rope
* Unfuse mlp
* simplify a lot inputs embeds merging
* remove .item() 👀
* fix from review
* Address feedback
* Use None "default" for rope_scaling. Add eot.
* set seed
* return aspect ratios and bug fixes
* Moe 128 rebased (#8 )
* 128 experts
* Use default rope
* Unfuse mlp
* Address feedback
* Use None "default" for rope_scaling. Add eot.
* Meta/llama quant compat (#7 )
* add quant compatible model & conversion code for llama4
* fix a few issues
* fix a few issues
* minor type mapping fix
---------
Co-authored-by: Lu Fang <fanglu@fb.com >
* use a new config parameter to determine which model definition to use for MoE
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co >
Co-authored-by: Lu Fang <fanglu@fb.com >
* un-comment write_tokenizer from converting script
* remove un-used imports
* [llama4] Pop aspect_ratios from image processor output in Llama4Processor
Signed-off-by: Jon Swenson <jmswen@gmail.com >
* Fix parameter_count name
* Update src/transformers/models/llama4/configuration_llama4.py
* nit
* Add changes for no_rope, moe_layers, chunked attention. Just need to test all
* Update src/transformers/models/llama4/image_processing_llama4_fast.py
* nit
* fix post merge with main
* support flex attention
* fixes
* fix
* add layer
* small updates
* rebase and delete llm_compressor
* nit
* [llama4/mm] Add back <|image|> token that delimits global tile
* [llama4/mm] Fix Llama 4 image processing unit tests
* add explicit dtype
Signed-off-by: Jon Swenson <jmswen@gmail.com >
* sdpa works
* comment todo small
* fix model loading
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
* revert
* nits
* small fix for TP on 1 node
* Read new params from config
* Add <|eom|>
* lol don't know how this got here
* adding fp8
* Save processor, fix chat template
* style
* Add boi/eoi tokens
We don't use them.
* fixes for now flex seems to work :)
* updates
* nits
* updates
* missking keys
* add context parallel
* update
* update
* fix
* nits
* add worldsize and make eager attn work for vision
* Ignore new key present in base models
* add tp_plan
* fix nope
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
* minor fix
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
* Clean up Llama4 vision model
* current updates
* add support for `attn_temperature_tuning`
* add floor scale
* add missing attn scales
* push what works, dirty trick for the device synch
* oups
* Fix pad_token_id
See
https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files
Confirmed in the original codebase.
* fix causallml loading
* rm
* fix tied-weights
* fix sdpa
* push current version
* should work with both short and long
* add compressed_tensos & fix fbgemm tp
* Fix flex impl
* style
* chunking
* try to revert the potentially breaking change
* fix auto factory
* fix shapes in general
* rm processing
* commit cache utils cleanup
* Fix context length
* fix
* allocate
* update tp_plan
* fix SDPA!
* Add support for sparse `Llama4TextMoe` layer from the kernel hub
* cleanup
* better merge
* update
* still broken fixing now
* nits
* revert print
* Write max_position_embeddings and max_model_length
* Update modeling_llama4.py
* Save attention_chunk_size
* Sync eos terminators
* Read initializer_range
* style
* remove `dict`
* fix
* eager should use `chunked_attention_mask`
* revert
* fixup
* fix config
* Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"
This reverts commit ccda19f050867dd42ea143c5de60f3dec81375f0, reversing
changes made to a515579aed8c0fe9bf529b6c40446a289406d5d6.
* Fix typo and remove warning with compiled flex and chunked prefill
* Fix MoE vs FF (#41 )
* fix
* Use correct no_rope_layers if provided one is empty list
* update tests
* fix
* skipping some tests
* fix fp8 loading
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
* fix text geneartion pipeline
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
* eager needs 4D mask
* fix
* Some cleanup
* fix
* update
* fix
* replace correctly module
* patch
* modulelist
* update
* update
* clean up
* Don't move to `cuda:0` in distributed mode
* restrict to compressed tensors for now
* rm print
* Docs!
* Fixes
* Update docs/source/en/model_doc/llama4.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co >
* Fixes
* cuda graph fix
* revert some stuff
* fixup
* styling
* Update src/transformers/models/llama4/modeling_llama4.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* fixup
* commit licence, cleanup here and there and style
* more styling changes
* fix dummies
* fix and clean docstrings
* remove comment
* remove warning
* Only fast image processor is supported
* nit
* trigger CI
* fix issue with flex encoder
* fix dynamic cache
* Code quality
* Code quality
* fix more tests for now
* Code quality
* Code quality
* Nuke bunch of failing stuff
* Code quality
* Code quality
* cleanup removal of slow image processor
* ruff fix fast image processor
* fix
* fix styling
* Docs
* Repo consistency
* Repo consistency
* fix sliding window issue
* separate llama cache
* styling
* Repo consistency
* Repo consistency
* push waht works
* L4 Repo consistency
* Docs
* fix last last alst alst alst alstsaltlsltlaslt
---------
Signed-off-by: Jon Swenson <jmswen@gmail.com >
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com >
Co-authored-by: Pedro Cuenca <pedro@huggingface.co >
Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com >
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com >
Co-authored-by: Keyun Tong <tongkeyun@gmail.com >
Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com >
Co-authored-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com >
Co-authored-by: Jon Swenson <jmswen@gmail.com >
Co-authored-by: jmswen <jmswen@users.noreply.github.com >
Co-authored-by: MekkCyber <mekk.cyber@gmail.com >
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com >
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com >
Co-authored-by: Yong Hoon Shin <yhshin@meta.com >
Co-authored-by: Marc Sun <marc@huggingface.co >
Co-authored-by: drisspg <drisspguessous@gmail.com >
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com >
Co-authored-by: Daniël de Kok <me@danieldk.eu >
Co-authored-by: Lysandre <hi@lysand.re >
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com >
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-04-05 22:02:22 +02:00
Cyril Vallez
e94571580b
Fix deepspeed loading (part 2) ( #37306 )
...
* fix
* Update modeling_utils.py
* Update modeling_utils.py
* oups remove print
2025-04-05 20:41:42 +02:00
Cyril Vallez
84aa13dd85
Fix deepspeed loading ( #37281 )
...
* Update modeling_utils.py
* Update modeling_utils.py
* fix and remove all imports
* Update modeling_utils.py
* Update modeling_utils.py
* style
* Update modeling_utils.py
2025-04-05 17:05:45 +02:00
Rahul Tuli
ebe47ce3e9
Fix: Unexpected Keys, Improve run_compressed, Rename Test Folder ( #37077 )
2025-04-04 21:30:11 +02:00
Matt
cbfa14823b
No more dtype_byte_size() ( #37144 )
...
* No more dtype_byte_size()
* Remove function once again
* Fix rebase cruft
* Trigger tests
2025-04-02 14:58:38 +01:00
Jerry Zhang
a165458901
Add device workaround for int4 weight only quantization after API update ( #36980 )
...
* merge
* fix import
* format
* reformat
* reformat
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com >
2025-04-02 12:42:22 +02:00
Cyril Vallez
41f5c3216c
Revert #37031 ( #37178 )
...
Update modeling_utils.py
2025-04-01 19:48:15 +02:00
Cyril Vallez
bc2dea3f54
Fix meta state dict loading with quantizers ( #37136 )
...
Update modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
2025-04-01 18:45:58 +02:00
Cyril Vallez
f304318f5f
Remove low_cpu_mem_usage and _fast_init ( #36963 )
...
* Remove low_cpu_mem_usage and _fast_init
* Update deepspeed.py
* Update modeling_utils.py
* remove the first 2 tests everywhere
* Update test_modeling_common.py
* remove what was remaining about fast_init
* fix logic and simplify
* mismatched keys logic update
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* fix 2 models init_weights
* extend to others
* remove grad
* Update modeling_fsmt.py
* init weights in tests
* style
* Update test_modeling_fsmt.py
* more old models
* fix more init_weights
* copies
* fix
* style
* Update modeling_lxmert.py
* fix inits
* more and more
* more
* should finalize
* style
* Update modeling_dinov2_with_registers.py
* fix
* Update modeling_encoder_decoder.py
* fix
* style
* Update modeling_lxmert.py
* post rebase cleanup
* Update modeling_informer.py
* back to start for device
* fix
* add test to detect all failing cases correctly
* Update test_modeling_common.py
* fix
* fix
* sam
* style
* Update modeling_maskformer_swin.py
* CIs
* CIs
* remove test - will add it on separate PR
* fix
* fix
* Update modeling_sam.py
* CIs
* CIs
* CIs
* convnext
* suggestions
* CIs
* fix copies after merge
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com >
2025-03-31 17:18:43 +02:00
Zhen
e686fed635
[Feature] Support using FlashAttention2 on Ascend NPU ( #36696 )
...
* [Feature] Support using flash-attention on Ascend NPU
* Fix qwen3 and qwen3_moe moduler conversion mismatch
2025-03-31 16:12:58 +02:00
huismiling
d0b65bb479
[MLU] Fix FA2 check error, remove deepspeed-mlu deps. ( #36159 )
...
* add Cambricon MLUs support
* fix mlu device rng state
* up for quality check
* up mlu to support fp16
* fix mlu device dependency error
* fix mlu device dependency error
* enable mlu device for bf16
* fix mlu device memory tracker
* Cambricon support SDPA and flash_attn
* MLU devices : Checks if `mlu` is available via an `cndev-based` check which won't trigger the drivers and leave mlu
* Fix mlu FA2 check. Remove deepspeed-mlu check. add mlu tests support.
* fix testing errors.
* Merge branch 'hf/main' into main
* fix get_device_count error.
* fix mlu testing utils.
* fix code quality and style.
* switch to @require_torch_multi_accelerator
2025-03-31 11:02:49 +02:00
jiqing-feng
286393fbb1
enable tp on CPU ( #36299 )
...
* enable tp on CPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* get rank from cpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* update
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* enable TP tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* em print
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix model id
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix conflict
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix index and add doc
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
2025-03-31 10:55:47 +02:00
Cyril Vallez
2bea6bf24e
Fix AttentionInterface following feedback ( #37010 )
...
* up
* typo
* update doc
* Update attention_interface.md
2025-03-28 18:00:35 +01:00
Cyril Vallez
a86dad56bc
Fix state_dict map location when quantized ( #37086 )
...
* Update modeling_utils.py
* Update modeling_utils.py
2025-03-28 17:57:16 +01:00
Yih-Dar
581cf96e0c
fix tied weigths issue ( #37031 )
...
* fix
* comment
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-03-28 16:36:44 +01:00
Minho Ryu
eca74d1367
[WIP] add deepseek-v3 ( #35926 )
...
* init commit
* style
* take comments into account
* add deepseekv3 modeling
* remove redundant code
* apply make style
* apply fix-copies
* make format
* add init files
* rename deepseekv3 into deepseek_v3 based on its model_type
* rename deepseekv3 into deepseek_v3 based on its model_type
* deepseek-v3 not deepseek_v3
* set model_type as deepseek_v3
* use default docs
* apply make
* fill type and docstring
* add rope_config_validation
* use custom DeepseekV3MLP
* hold code only for checkpoints congifuration; remove redundant
* revise rope yarn for DeepSeek variation
* rename DeepSeek-V3
* some refactoring
* revise load_hook to work properly; make moe func trainable; use llama instead of mixtral
* fix attention forward
* use -1 for not-changing dim when to use exapnd
* refactor DeepseekV3TopkRouter
* use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim
* register pre_hook and hook both
* make style
* use n_shared_experts
* Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* add test file
* update modeling_file according to modular file
* make style
* add mapping for DeepseekV3ForSequenceClassification
* remove aux_loss_alpha
* add deepseek_v3 for perf
* add deepseek_v3
* rename test as deepseekv3
* use tiny-deepseek-v3
* remove DeepseekV3ForSequenceClassification
* cache before padding
* remote output_router_logits
* Revert "remote output_router_logits"
This reverts commit f264f800d04950390db8413b9efb24cef8186330.
* remove output_router_logits
* make e_score_correction_bias as buffer
* skip tests not compatible
* make style
* make e_score_correction_bias as buffer
* use rope_interleave instead of load_hook
* skip tests not compatible with MLA
* add doc for rope_interleave
* fix typo
* remove torch.no_grad for selecting topk
* fix post merge issue
* mrege with main and simplify
* nits
* final
* small fixes
* fix
* support TP better
* stash
* changes currently requires
* remove synch
* more fixes for TP
* temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used
* updates to have generation work!
* push most of the changes
* reorder functions + call for contributions!
* update readme
* nits
* update
* ruff was updated on main
* merge with main and fix copies
* revert unrelated changes
* route all tokens to all experts when testing to avoid no gradient iddues
* finish fixing all tests
* fixup
* nit
* clean config
* last readme changes
* nit
* do cnit
* typo
* last nit
* one more one more
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com >
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal >
2025-03-28 15:56:59 +01:00
cyyever
41a0e58e5b
Set weights_only in torch.load ( #36991 )
2025-03-27 14:55:50 +00:00
Kyle Sayers
d6d930a64b
[Modeling] Load FP8 safetensors such as DeepSeek ( #36828 )
...
support loading fp8
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
2025-03-27 10:47:10 +01:00
Mohamed Mekkouri
13d36e89fe
Fix device_map check for ggml files ( #37003 )
...
fix
2025-03-26 16:24:57 +01:00
Cyril Vallez
788e1092e9
Allow easy registration of custom attention functions ( #36889 )
...
* Update modeling_utils.py
* style
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* add to init
* Update modeling_utils.py
* style
* update
* Update modeling_utils.py
* Update modeling_utils.py
* style
* Add some doc
* Update _toctree.yml
* readd it for tgi/vllm compat
* CIs
* CIs
2025-03-26 16:15:06 +01:00
Yih-Dar
c6814b4ee8
Update ruff to 0.11.2 ( #36962 )
...
* update
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-03-25 16:00:11 +01:00
Marc Sun
80b4c5dcc9
Fix cuda index issue in cache allocator ( #36937 )
...
fix
2025-03-25 11:51:41 +01:00
Mohamed Mekkouri
be2c0e7bff
Fixing _pre_quantization_dtype when torch_dtype is None ( #36930 )
...
fix
2025-03-25 10:43:27 +01:00
Mohamed Mekkouri
2b8a15cc3f
Disallow Offload to disk for gguf files ( #36933 )
...
update
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
2025-03-24 19:30:01 +01:00
Cyril Vallez
dd3933dd65
Simplify keep_in_fp32_modules logic ( #36722 )
...
* better regex everywhere
* fix
* Update test_modeling_instructblip.py
* BC with explanations this time otherwise it makes no sense at all
* Update test_modeling_instructblip.py
* style
* CIs
* update _keep_in_fp32_modules in blip2
* Update modeling_utils.py
* Update modeling_utils.py
* style
* CIs
* add check
* trigger CIs
* Update modeling_utils.py
* trigger CIs
2025-03-21 16:12:59 +01:00
Raushan Turganbay
523f6e743c
Fix: dtype cannot be str ( #36262 )
...
* fix
* this wan't supposed to be here, revert
* refine tests a bit more
2025-03-21 13:27:47 +01:00
Pavel Iakubovskii
66291778dd
Refactor Attention implementation for ViT-based models ( #36545 )
...
* Refactor vit attention
* Refactor ViT-based models
* 🚨 🚨 🚨 Fix prefix for DPT
* Update params order
* trigger tests
* Fix Dinov2 attention
* Fix DPT attention impl propagation for backbone config
* Common test fix: config is modif. inplace - avoid it
* view->reshape
* Fixup
* Fixup
* Enable IJepa FA2
* Add FA2 in corresponding model docs
2025-03-20 15:15:01 +00:00
fxmarty-amd
1a374799ce
Support loading Quark quantized models in Transformers ( #36372 )
...
* add quark quantizer
* add quark doc
* clean up doc
* fix tests
* make style
* more style fixes
* cleanup imports
* cleaning
* precise install
* Update docs/source/en/quantization/quark.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update tests/quantization/quark_integration/test_quark.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* remove import guard as suggested
* update copyright headers
* add quark to transformers-quantization-latest-gpu Dockerfile
* make tests pass on transformers main + quark==0.7
* add missing F8_E4M3 and F8_E5M2 keys from str_to_torch_dtype
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
Co-authored-by: Bowen Bao <bowenbao@amd.com >
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com >
2025-03-20 15:40:51 +01:00
Pavel Iakubovskii
cf8091c017
Fix import for torch 2.0, 2.1 - guard typehint for "device_mesh" ( #36768 )
...
* Fix device_mesh
* Remove rebase leftover
2025-03-20 11:55:47 +00:00
Joao Gante
b47d9b2f8a
[generate] clarify docstrings: when to inherit GenerationMixin ( #36605 )
2025-03-20 10:58:54 +00:00
Artem Kudisov
63380b77d4
Pass state dict ( #35234 )
...
* Pass state_dict argument to get_peft_model_state_dict
* Style fix
* Change arguments order
2025-03-20 11:54:59 +01:00
Marc Sun
14b597f518
Fix casting dtype for qunatization ( #36799 )
...
* fix
* remove print
2025-03-18 18:46:03 +01:00
Cyril Vallez
db1d4c5a0b
Loading optimizations ( #36742 )
...
* improvements
* Update modeling_utils.py
* add some doc about loading
* Update modeling_utils.py
2025-03-18 16:38:44 +01:00
Cyril Vallez
2c2495cc7b
Fix post_init() code duplication ( #36727 )
...
* Update modeling_utils.py
* CIs
2025-03-14 17:36:02 +01:00