HuggingFace_transformer

Author	SHA1	Message	Date
Cyril Vallez	58e5e976e0	Small fix on context manager detection (#37562 ) * small fixes * Update modeling_utils.py * test * Update test_modeling_common.py * Update test_modeling_timm_backbone.py * more general * simpler	2025-04-17 15:39:44 +02:00
Bowen Bao	e3d3b54638	Keep Quark loading through meta device (#37538 )	2025-04-16 14:19:56 +02:00
Cyril Vallez	7dafcd0077	More appropriate cuda warmup in resource-constrained hardware (#37550 ) * better allocation in resource constrained env * Update modeling_utils.py * CIs	2025-04-16 13:40:02 +02:00
Cyril Vallez	c8e0e603de	Detect and use device context manager or global device in `from_pretrained` (#37216 ) * Update modeling_utils.py * improve * Update modeling_utils.py * Update test_modeling_common.py * Update test_modeling_timm_backbone.py * Update test_modeling_common.py * Update test_modeling_common.py * Update test_modeling_common.py * Update test_modeling_common.py * CIs	2025-04-15 09:59:20 +02:00
Cyril Vallez	4e53840920	Detect and fix most `_init_weights()` issues - make it work for composite models (#37070 ) * Update test_modeling_common.py * Fix Llama and its modular children * Update test_modeling_common.py * qwen3 * first try at prioritizing models * Update test_modeling_common.py * Update test_modeling_common.py * Update test_modeling_common.py * test * fix * fix * more models * more * more * more * smarter init for composite models! * fix post rebase * smol * fix missing args * more * typo * Super elegant and efficient init for submodels * Update modeling_utils.py * style * last fixes * cleanup * finalize cleanup * CIs * improve docstring * Update modeling_utils.py * llama4 * style * CIs * style * add dpt * granite speech * qwen 2.5 omni * better fix * Parse the config file instead * CIs	2025-04-14 16:19:04 +02:00
Bowen Bao	6cef03ba66	[Regression] Fix Quark quantized model loading after refactorization (#37407 )	2025-04-11 13:43:36 +02:00
cyyever	371c44d0ef	Remove old code for PyTorch, Accelerator and tokenizers (#37234 ) * Remove unneeded library version checks Signed-off-by: cyy <cyyever@outlook.com> * Remove PyTorch condition Signed-off-by: cyy <cyyever@outlook.com> * Remove PyTorch condition Signed-off-by: cyy <cyyever@outlook.com> * Fix ROCm get_device_capability Signed-off-by: cyy <cyyever@outlook.com> * Revert "Fix ROCm get_device_capability" This reverts commit 0e756434bd7e74ffd73de5500476072b096570a6. * Remove unnecessary check Signed-off-by: cyy <cyyever@outlook.com> * Revert changes Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-04-10 20:54:21 +02:00
Mehant Kammakomati	7d76876498	(Part 2) feat: allow for tp_size attr for tplizing the model (#37054 ) * feat: custom tp_size, new transformers tp interface Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: review cmt - error when tp_plan not set for tp_size Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: nit in docs Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>	2025-04-10 17:44:09 +02:00
Mohamed Mekkouri	0ea1151222	Llama Kernel integration (#37092 ) * initial commit * style * update * change approach attention * clean up * fix import * update * update * fix style * change method * attention * add mlp back * change name * update name * fix copies * fix config * fix	2025-04-10 17:13:25 +02:00
Wang, Yi	ae5ce22664	from_pretrained should handle xpu case (#37382 ) * from_pretrained should handle xpu case Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * fmt Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2025-04-10 13:23:17 +02:00
Joao Gante	4321b0648c	[core] remove `GenerationMixin` inheritance by default in `PreTrainedModel` (#37173 )	2025-04-08 16:42:05 +01:00
Cyril Vallez	cdfb018d03	A bit of cleaning 🧹🧹 (#37215 ) * cleaning * CIs	2025-04-08 14:33:58 +02:00
Cyril Vallez	48e179857c	Remove HQQ from caching allocator warmup (#37347 ) Update modeling_utils.py	2025-04-07 18:33:48 +02:00
Yih-Dar	e7ad077012	byebye torch 2.0 (#37277 ) * bump Torch 2.1 with broken compatibility `torch.compile` * dep table * remove usage of is_torch_greater_or_equal_than_2_1 * remove usage of is_torch_greater_or_equal_than_2_1 * remove if is_torch_greater_or_equal("2.1.0") * remove torch >= "2.1.0" * deal with 2.0.0 * PyTorch 2.0+ --> PyTorch 2.1+ * ruff 1 * difficult ruff * address comment * address comment --------- Co-authored-by: Jirka B <j.borovec+github@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-07 15:19:47 +02:00
Cyril Vallez	08f36771b3	Fix `init empty weights` without accelerate (#37337 ) * add the integration * Update accelerate.py * Update accelerate.py * add find_tied_params as well * Update accelerate.py * add where copied from * simplify * add error	2025-04-07 11:37:29 +02:00
Cyril Vallez	9db31ea585	Fix deepspeed with quantization (#37324 ) * Update modeling_utils.py * Update modeling_utils.py	2025-04-07 11:36:44 +02:00
Arthur	25b7f27234	Add llama4 (#37307 ) * remove one of the last deps * update fast image processor after refactor * styling * more quality of life improvements * nit * update * cleanups * some cleanups * vllm updates * update fake image token * [convert] Fix typo * [convert] Strip extraneous bytes from shards * [convert] Minor fixes * [convert] Use num_experts * multi-image fixes in modeling + processor * fixup size * 128 experts * Use default rope * Unfuse mlp * simplify a lot inputs embeds merging * remove .item() 👀 * fix from review * Address feedback * Use None "default" for rope_scaling. Add eot. * set seed * return aspect ratios and bug fixes * Moe 128 rebased (#8) * 128 experts * Use default rope * Unfuse mlp * Address feedback * Use None "default" for rope_scaling. Add eot. * Meta/llama quant compat (#7) * add quant compatible model & conversion code for llama4 * fix a few issues * fix a few issues * minor type mapping fix --------- Co-authored-by: Lu Fang <fanglu@fb.com> * use a new config parameter to determine which model definition to use for MoE --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Lu Fang <fanglu@fb.com> * un-comment write_tokenizer from converting script * remove un-used imports * [llama4] Pop aspect_ratios from image processor output in Llama4Processor Signed-off-by: Jon Swenson <jmswen@gmail.com> * Fix parameter_count name * Update src/transformers/models/llama4/configuration_llama4.py * nit * Add changes for no_rope, moe_layers, chunked attention. Just need to test all * Update src/transformers/models/llama4/image_processing_llama4_fast.py * nit * fix post merge with main * support flex attention * fixes * fix * add layer * small updates * rebase and delete llm_compressor * nit * [llama4/mm] Add back <\|image\|> token that delimits global tile * [llama4/mm] Fix Llama 4 image processing unit tests * add explicit dtype Signed-off-by: Jon Swenson <jmswen@gmail.com> * sdpa works * comment todo small * fix model loading Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * revert * nits * small fix for TP on 1 node * Read new params from config * Add <\|eom\|> * lol don't know how this got here * adding fp8 * Save processor, fix chat template * style * Add boi/eoi tokens We don't use them. * fixes for now flex seems to work :) * updates * nits * updates * missking keys * add context parallel * update * update * fix * nits * add worldsize and make eager attn work for vision * Ignore new key present in base models * add tp_plan * fix nope Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * minor fix Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * Clean up Llama4 vision model * current updates * add support for `attn_temperature_tuning` * add floor scale * add missing attn scales * push what works, dirty trick for the device synch * oups * Fix pad_token_id See https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files Confirmed in the original codebase. * fix causallml loading * rm * fix tied-weights * fix sdpa * push current version * should work with both short and long * add compressed_tensos & fix fbgemm tp * Fix flex impl * style * chunking * try to revert the potentially breaking change * fix auto factory * fix shapes in general * rm processing * commit cache utils cleanup * Fix context length * fix * allocate * update tp_plan * fix SDPA! * Add support for sparse `Llama4TextMoe` layer from the kernel hub * cleanup * better merge * update * still broken fixing now * nits * revert print * Write max_position_embeddings and max_model_length * Update modeling_llama4.py * Save attention_chunk_size * Sync eos terminators * Read initializer_range * style * remove `dict` * fix * eager should use `chunked_attention_mask` * revert * fixup * fix config * Revert "Merge pull request #36 from huggingface/sparse-llama4-moe" This reverts commit ccda19f050867dd42ea143c5de60f3dec81375f0, reversing changes made to a515579aed8c0fe9bf529b6c40446a289406d5d6. * Fix typo and remove warning with compiled flex and chunked prefill * Fix MoE vs FF (#41) * fix * Use correct no_rope_layers if provided one is empty list * update tests * fix * skipping some tests * fix fp8 loading Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * fix text geneartion pipeline Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * eager needs 4D mask * fix * Some cleanup * fix * update * fix * replace correctly module * patch * modulelist * update * update * clean up * Don't move to `cuda:0` in distributed mode * restrict to compressed tensors for now * rm print * Docs! * Fixes * Update docs/source/en/model_doc/llama4.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fixes * cuda graph fix * revert some stuff * fixup * styling * Update src/transformers/models/llama4/modeling_llama4.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * commit licence, cleanup here and there and style * more styling changes * fix dummies * fix and clean docstrings * remove comment * remove warning * Only fast image processor is supported * nit * trigger CI * fix issue with flex encoder * fix dynamic cache * Code quality * Code quality * fix more tests for now * Code quality * Code quality * Nuke bunch of failing stuff * Code quality * Code quality * cleanup removal of slow image processor * ruff fix fast image processor * fix * fix styling * Docs * Repo consistency * Repo consistency * fix sliding window issue * separate llama cache * styling * Repo consistency * Repo consistency * push waht works * L4 Repo consistency * Docs * fix last last alst alst alst alstsaltlsltlaslt --------- Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Keyun Tong <tongkeyun@gmail.com> Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: Jon Swenson <jmswen@gmail.com> Co-authored-by: jmswen <jmswen@users.noreply.github.com> Co-authored-by: MekkCyber <mekk.cyber@gmail.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com> Co-authored-by: Yong Hoon Shin <yhshin@meta.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: drisspg <drisspguessous@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Daniël de Kok <me@danieldk.eu> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-04-05 22:02:22 +02:00
Cyril Vallez	e94571580b	Fix deepspeed loading (part 2) (#37306 ) * fix * Update modeling_utils.py * Update modeling_utils.py * oups remove print	2025-04-05 20:41:42 +02:00
Cyril Vallez	84aa13dd85	Fix deepspeed loading (#37281 ) * Update modeling_utils.py * Update modeling_utils.py * fix and remove all imports * Update modeling_utils.py * Update modeling_utils.py * style * Update modeling_utils.py	2025-04-05 17:05:45 +02:00
Rahul Tuli	ebe47ce3e9	Fix: Unexpected Keys, Improve `run_compressed`, Rename Test Folder (#37077 )	2025-04-04 21:30:11 +02:00
Matt	cbfa14823b	No more dtype_byte_size() (#37144 ) * No more dtype_byte_size() * Remove function once again * Fix rebase cruft * Trigger tests	2025-04-02 14:58:38 +01:00
Jerry Zhang	a165458901	Add device workaround for int4 weight only quantization after API update (#36980 ) * merge * fix import * format * reformat * reformat --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-04-02 12:42:22 +02:00
Cyril Vallez	41f5c3216c	Revert #37031 (#37178 ) Update modeling_utils.py	2025-04-01 19:48:15 +02:00
Cyril Vallez	bc2dea3f54	Fix meta state dict loading with quantizers (#37136 ) Update modeling_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-04-01 18:45:58 +02:00
Cyril Vallez	f304318f5f	Remove low_cpu_mem_usage and _fast_init (#36963 ) * Remove low_cpu_mem_usage and _fast_init * Update deepspeed.py * Update modeling_utils.py * remove the first 2 tests everywhere * Update test_modeling_common.py * remove what was remaining about fast_init * fix logic and simplify * mismatched keys logic update * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * fix 2 models init_weights * extend to others * remove grad * Update modeling_fsmt.py * init weights in tests * style * Update test_modeling_fsmt.py * more old models * fix more init_weights * copies * fix * style * Update modeling_lxmert.py * fix inits * more and more * more * should finalize * style * Update modeling_dinov2_with_registers.py * fix * Update modeling_encoder_decoder.py * fix * style * Update modeling_lxmert.py * post rebase cleanup * Update modeling_informer.py * back to start for device * fix * add test to detect all failing cases correctly * Update test_modeling_common.py * fix * fix * sam * style * Update modeling_maskformer_swin.py * CIs * CIs * remove test - will add it on separate PR * fix * fix * Update modeling_sam.py * CIs * CIs * CIs * convnext * suggestions * CIs * fix copies after merge --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-03-31 17:18:43 +02:00
Zhen	e686fed635	[Feature] Support using FlashAttention2 on Ascend NPU (#36696 ) * [Feature] Support using flash-attention on Ascend NPU * Fix qwen3 and qwen3_moe moduler conversion mismatch	2025-03-31 16:12:58 +02:00
huismiling	d0b65bb479	[MLU] Fix FA2 check error, remove deepspeed-mlu deps. (#36159 ) * add Cambricon MLUs support * fix mlu device rng state * up for quality check * up mlu to support fp16 * fix mlu device dependency error * fix mlu device dependency error * enable mlu device for bf16 * fix mlu device memory tracker * Cambricon support SDPA and flash_attn * MLU devices : Checks if `mlu` is available via an `cndev-based` check which won't trigger the drivers and leave mlu * Fix mlu FA2 check. Remove deepspeed-mlu check. add mlu tests support. * fix testing errors. * Merge branch 'hf/main' into main * fix get_device_count error. * fix mlu testing utils. * fix code quality and style. * switch to @require_torch_multi_accelerator	2025-03-31 11:02:49 +02:00
jiqing-feng	286393fbb1	enable tp on CPU (#36299 ) * enable tp on CPU Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * get rank from cpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable TP tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * em print Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix model id Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix conflict Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix index and add doc Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-03-31 10:55:47 +02:00
Cyril Vallez	2bea6bf24e	Fix AttentionInterface following feedback (#37010 ) * up * typo * update doc * Update attention_interface.md	2025-03-28 18:00:35 +01:00
Cyril Vallez	a86dad56bc	Fix state_dict map location when quantized (#37086 ) * Update modeling_utils.py * Update modeling_utils.py	2025-03-28 17:57:16 +01:00
Yih-Dar	581cf96e0c	fix tied weigths issue (#37031 ) * fix * comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-03-28 16:36:44 +01:00
Minho Ryu	eca74d1367	[WIP] add deepseek-v3 (#35926 ) * init commit * style * take comments into account * add deepseekv3 modeling * remove redundant code * apply make style * apply fix-copies * make format * add init files * rename deepseekv3 into deepseek_v3 based on its model_type * rename deepseekv3 into deepseek_v3 based on its model_type * deepseek-v3 not deepseek_v3 * set model_type as deepseek_v3 * use default docs * apply make * fill type and docstring * add rope_config_validation * use custom DeepseekV3MLP * hold code only for checkpoints congifuration; remove redundant * revise rope yarn for DeepSeek variation * rename DeepSeek-V3 * some refactoring * revise load_hook to work properly; make moe func trainable; use llama instead of mixtral * fix attention forward * use -1 for not-changing dim when to use exapnd * refactor DeepseekV3TopkRouter * use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim * register pre_hook and hook both * make style * use n_shared_experts * Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add test file * update modeling_file according to modular file * make style * add mapping for DeepseekV3ForSequenceClassification * remove aux_loss_alpha * add deepseek_v3 for perf * add deepseek_v3 * rename test as deepseekv3 * use tiny-deepseek-v3 * remove DeepseekV3ForSequenceClassification * cache before padding * remote output_router_logits * Revert "remote output_router_logits" This reverts commit f264f800d04950390db8413b9efb24cef8186330. * remove output_router_logits * make e_score_correction_bias as buffer * skip tests not compatible * make style * make e_score_correction_bias as buffer * use rope_interleave instead of load_hook * skip tests not compatible with MLA * add doc for rope_interleave * fix typo * remove torch.no_grad for selecting topk * fix post merge issue * mrege with main and simplify * nits * final * small fixes * fix * support TP better * stash * changes currently requires * remove synch * more fixes for TP * temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used * updates to have generation work! * push most of the changes * reorder functions + call for contributions! * update readme * nits * update * ruff was updated on main * merge with main and fix copies * revert unrelated changes * route all tokens to all experts when testing to avoid no gradient iddues * finish fixing all tests * fixup * nit * clean config * last readme changes * nit * do cnit * typo * last nit * one more one more --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>	2025-03-28 15:56:59 +01:00
cyyever	41a0e58e5b	Set weights_only in torch.load (#36991 )	2025-03-27 14:55:50 +00:00
Kyle Sayers	d6d930a64b	[Modeling] Load FP8 safetensors such as DeepSeek (#36828 ) support loading fp8 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-03-27 10:47:10 +01:00
Mohamed Mekkouri	13d36e89fe	Fix device_map check for ggml files (#37003 ) fix	2025-03-26 16:24:57 +01:00
Cyril Vallez	788e1092e9	Allow easy registration of custom attention functions (#36889 ) * Update modeling_utils.py * style * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * add to init * Update modeling_utils.py * style * update * Update modeling_utils.py * Update modeling_utils.py * style * Add some doc * Update _toctree.yml * readd it for tgi/vllm compat * CIs * CIs	2025-03-26 16:15:06 +01:00
Yih-Dar	c6814b4ee8	Update ruff to `0.11.2` (#36962 ) * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-03-25 16:00:11 +01:00
Marc Sun	80b4c5dcc9	Fix cuda index issue in cache allocator (#36937 ) fix	2025-03-25 11:51:41 +01:00
Mohamed Mekkouri	be2c0e7bff	Fixing _pre_quantization_dtype when torch_dtype is None (#36930 ) fix	2025-03-25 10:43:27 +01:00
Mohamed Mekkouri	2b8a15cc3f	Disallow Offload to disk for gguf files (#36933 ) update Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-03-24 19:30:01 +01:00
Cyril Vallez	dd3933dd65	Simplify keep_in_fp32_modules logic (#36722 ) * better regex everywhere * fix * Update test_modeling_instructblip.py * BC with explanations this time otherwise it makes no sense at all * Update test_modeling_instructblip.py * style * CIs * update _keep_in_fp32_modules in blip2 * Update modeling_utils.py * Update modeling_utils.py * style * CIs * add check * trigger CIs * Update modeling_utils.py * trigger CIs	2025-03-21 16:12:59 +01:00
Raushan Turganbay	523f6e743c	Fix: dtype cannot be str (#36262 ) * fix * this wan't supposed to be here, revert * refine tests a bit more	2025-03-21 13:27:47 +01:00
Pavel Iakubovskii	66291778dd	Refactor Attention implementation for ViT-based models (#36545 ) * Refactor vit attention * Refactor ViT-based models * 🚨🚨🚨 Fix prefix for DPT * Update params order * trigger tests * Fix Dinov2 attention * Fix DPT attention impl propagation for backbone config * Common test fix: config is modif. inplace - avoid it * view->reshape * Fixup * Fixup * Enable IJepa FA2 * Add FA2 in corresponding model docs	2025-03-20 15:15:01 +00:00
fxmarty-amd	1a374799ce	Support loading Quark quantized models in Transformers (#36372 ) * add quark quantizer * add quark doc * clean up doc * fix tests * make style * more style fixes * cleanup imports * cleaning * precise install * Update docs/source/en/quantization/quark.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update tests/quantization/quark_integration/test_quark.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/utils/quantization_config.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * remove import guard as suggested * update copyright headers * add quark to transformers-quantization-latest-gpu Dockerfile * make tests pass on transformers main + quark==0.7 * add missing F8_E4M3 and F8_E5M2 keys from str_to_torch_dtype --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Bowen Bao <bowenbao@amd.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-03-20 15:40:51 +01:00
Pavel Iakubovskii	cf8091c017	Fix import for torch 2.0, 2.1 - guard typehint for "device_mesh" (#36768 ) * Fix device_mesh * Remove rebase leftover	2025-03-20 11:55:47 +00:00
Joao Gante	b47d9b2f8a	[generate] clarify docstrings: when to inherit `GenerationMixin` (#36605 )	2025-03-20 10:58:54 +00:00
Artem Kudisov	63380b77d4	Pass state dict (#35234 ) * Pass state_dict argument to get_peft_model_state_dict * Style fix * Change arguments order	2025-03-20 11:54:59 +01:00
Marc Sun	14b597f518	Fix casting dtype for qunatization (#36799 ) * fix * remove print	2025-03-18 18:46:03 +01:00
Cyril Vallez	db1d4c5a0b	Loading optimizations (#36742 ) * improvements * Update modeling_utils.py * add some doc about loading * Update modeling_utils.py	2025-03-18 16:38:44 +01:00
Cyril Vallez	2c2495cc7b	Fix post_init() code duplication (#36727 ) * Update modeling_utils.py * CIs	2025-03-14 17:36:02 +01:00

1 2 3 4 5 ...

720 Commits