HuggingFace_transformer

Author	SHA1	Message	Date
Lysandre Debut	de5ca373ac	Responses API in `transformers serve` (#39155 ) * Scaffolding * Explicit content * Naïve Responses API streaming implementation * Cleanup * Responses API (to be merged into #39155) (#39338) * Scaffolding * Explicit content * Naïve Responses API streaming implementation * Cleanup * use openai * validate request, including detecting unused fields * dict indexing * dict var access * tmp commit (tests failing) * add slow * use oai output type in completions * (little rebase errors) * working spec? * guard type hint * type hints. fix state (CB can now load different models) * type hints; fn names; error type * add docstrings * responses + kv cache * metadata support; fix kv cache; error event * add output_index and content_index * docstrings * add test_build_response_event * docs/comments * gate test requirements; terminate cb manager on model switch * nasty type hints * more type hints * disable validation by default; enable force models * todo --------- Co-authored-by: Lysandre <hi@lysand.re> * Slight bugfixes * PR comments from #39338 * make fixup --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Joao Gante <joao@huggingface.co>	2025-07-16 14:16:16 +02:00
Raushan Turganbay	c8524aeb07	[cache] make all classes cache compatible finally (#38635 ) * dump * push other models * fix simple greedy generation * xmod * add fmst and clean up some mentions of old cache format * gpt-bigcode now follows standards * delete tuple cache reference in generation * fix some models * fix some models * fix mambas and support cache in tapas * fix some more tests * fix copies * delete `_reorder_cache` * another fix copies * fix typos and delete unnecessary test * fix rag generate, needs special cache reordering * fix tapas and superglue * reformer create special cache * recurrent gemma `reorder_cache` was a no-op, delete * fix-copies * fix blio and musicgen pipeline tests * fix reformer * fix reformer, again... * delete `_supports_cache_class` * delete `supports_quantized_cache` * fix failing tests * fix copies * some minor clean up * style * style * fix copies * fix tests * fix copies * create causal mask now needs positions? * fixc copies * style * Update tests/test_modeling_common.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * clean-up of non-generative model after merging main * check `is_decoder` for cache * delete transpose for scores * remove tuple cache from docs everywhere * fix tests * fix copies * fix copies once more * properly deprecate `encoder_attention_mask` in Bert-like models * import `deprecate_kwarg` where needed * fix copies again * fix copies * delete `nex_decoder_cache` * fix copies asks to update for PLM * fix copies * rebasing had a few new models, fix them and merge asap! * fix copies once more * fix slow tests * fix tests and updare PLM checkpoint * add read token and revert accidentally removed line * oh com -on, style * just skip it, read token has no access to PLM yet --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-07-16 14:00:17 +02:00
Marc Sun	bfc9ddf5c6	Add StableAdamW Optimizer (#39446 ) * Added StableAdamW as an optimizer option for Trainer. Also wrote tests to verify its behaviour. * Fixed issue with * Added docs for StableAdamW. Also fixed a typo in schedule free optimizers --------- Co-authored-by: Gautham Krithiwas <gauthamkrithiwas2003@gmail.com>	2025-07-16 13:35:53 +02:00
richardodliu	e048d48bd0	Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer (#31870 ) * add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in trainer * Update src/transformers/optimization.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update optimization.py fix the error of the unclosed "(" * Update optimization.py remove whitespace in line 402 in order to pass the quality test * Update src/transformers/optimization.py * Update src/transformers/optimization.py * Apply style fixes --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-07-16 12:01:08 +02:00
Yuanyuan Chen	ae4e306a40	Defaults to adamw_torch_fused for Pytorch>=2.8 (#37358 ) * Defaults to adamw_torch_fused for latest Pytorch Signed-off-by: cyy <cyyever@outlook.com> * Fix test Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-07-16 09:52:33 +00:00
Raushan Turganbay	d33a1c389f	[chat template] add a testcase for kwargs (#39415 ) add a testcase	2025-07-16 11:31:35 +02:00
Kyle Sayers	31d81943c9	[Core] [Offloading] Fix saving offloaded submodules (#39280 ) * fix counting meta tensors, fix onloading meta tensors Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove unrelated fix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove unrelated change Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add clarifying comment Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add test_save_offloaded_model_with_direct_params Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix merge conflict, add decorators Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-07-16 08:44:40 +00:00
Raushan Turganbay	9f41f67135	[vlm] fix loading of retrieval VLMs (#39242 ) * fix vlm with retrieval * we can't use AutoModel because new ColQwen was released after refactor * no need for colqwen * tied weight keys are necessary, if using IMageTextToText * need to apply renaming in tied weights, only for ColPali * overwrite tied keys in ColPali * fix copies, modular can't handle if-statements	2025-07-15 17:23:54 +02:00
Dario Salvati	67f42928f0	Remove residual quantization attribute from dequantized models (#39373 ) * fix: removing quantization trace attribute from dequantized model Fixes #39295 * add: test `to(dtype=torch.float16)` after dequantization	2025-07-15 17:16:10 +02:00
Matt	a989bf8d84	Fix bugs from pipeline preprocessor overhaul (#39425 ) * Correct load classes for VideoClassificationPipeline * Correct load classes for the ASR pipeline	2025-07-15 14:28:59 +01:00
44670	2b79f14375	support loading qwen3 gguf (#38645 ) * support loading qwen3 gguf * Add qwen3 into GGUF_TO_FAST_CONVERTERS for tokenizer conversion * Add testcase * Fix formatting	2025-07-15 09:53:41 +00:00
Orion Weller	0e4b7938d0	Add ModernBERT Decoder Models - ModernBERT, but trained with CLM! (#38967 ) Some checks failed Release - Conda / build_and_package (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details * working locally; need to style and test * added docs and initial tests; need to debug and flesh out * fixed tests * working long context; batches * working fa2 and eager * update tests * add missing confnigs * remove default autoset * fix spacing * fix most tests * fixed tests * fix to init * refactor to match new transformers updates * remove static cache option * fa2 fix * fix docs * in progress * working on tests * fixed issue with attn outputs * remove debug * fix local config attr * update doc string * fix docstring * add docs to toc * correct typo in toc * add new updates from main w.r.t. ModernBERT RoPE * fix local param --------- Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster> Co-authored-by: oweller2 <oweller2@l07.mgmt.ai.cluster> Co-authored-by: oweller2 <oweller2@n02.mgmt.ai.cluster> Co-authored-by: oweller2 <oweller2@l08.mgmt.ai.cluster> Co-authored-by: oweller2 <oweller2@l01.mgmt.ai.cluster> Co-authored-by: oweller2 <oweller2@l02.mgmt.ai.cluster>	2025-07-15 10:40:41 +02:00
Raushan Turganbay	8d6259b0b8	[refactor] set attention implementation (#38974 ) * update * fix some tests * init from config, changes it in-place, add deepcopy in tests * fix modernbert * don't delete thsi config attr * update * style and copies * skip tests in generation * fix style * accidentally removed flash-attn-3, revert * docs * forgot about flags set to False * fix copies * address a few comments * fix copies * custom code BC	2025-07-15 09:34:06 +02:00
Cyril Vallez	8165c703ab	Fix Lfm2 and common tests (#39398 ) * fix * better fix * typo	2025-07-14 12:02:59 +02:00
Raushan Turganbay	66cd995618	[shieldgemma] fix checkpoint loading (#39348 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-14 08:34:58 +02:00
Yoni Gozlan	a1ad9197c5	Fix overriding Fast Image/Video Processors instance attributes affect other instances (#39363 ) * fix and add tests * nit	2025-07-12 23:39:06 +00:00
Julien Denize	70e57e4710	Add mistral common support (#38906 ) * wip: correct docstrings * Add mistral-common support. * quality * wip: add requested methods * wip: fix tests * wip: add internally some methods not being supported in mistral-common * wip * wip: add opencv dependency and update test list * wip: add mistral-common to testing dependencies * wip: revert some test changes * wip: ci * wip: ci * clean * check * check * check * wip: add hf image format to apply_chat_template and return pixel_values * wip: make mistral-common non-installed safe * wip: clean zip * fix: from_pretrained * fix: path and base64 * fix: path and import root * wip: add docs * clean * clean * revert --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-07-11 16:26:58 +00:00
Yih-Dar	24f771a043	fix failing `test_sdpa_can_dispatch_on_flash` (#39259 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-11 16:30:56 +02:00
Shuming Hu	bf607f6d3b	PerceptionLM (#37878 ) * plm template * A working plm with fixed image features * hacked processor * First version that reproduced PLM output using PE from timm. * Simplify and fix tie_word_embeddings * Use PIL resize. Simplify converstion. * First version that works with video input. * simplifed image preprocessing (not batched) * Minor fixes after rebasing on main. * Video processor based on new API. * Revert to use _preprocess for image processor. * refactor with modular * fix tie_word_embedding * Testing with timm PE * check in missed converstion from modular to model.py * First working version of PLM with Eva PE. PLM-1B and 3B outputs are exactly the same as before. PLM-8B output has some differences. * address review comments * Fixed batching if video and image examples mixed. * Simplify PE configuration. * Enable AutoModel for PerceptionEncoder. * Update PE config style. * update all headers * Minor fixes. * Move lm_head to PerceptionLMForConditionalGeneration. Fix vit_G model specification. * Fix for testing_modeling_perception_lm.py * Image processing refactoring to use more common parts. * Fix processor test. * update tests to use model from hub * More test fixes. * integration test GT update after rebasing; probably due to video preprocessing * update test media path to hub * Stop tracking local scripts * address some review comments * refactor image processing. * small fixes * update documentation and minor fixes * remove scripts * Minor fix for CI * Fix image processing * CI and doc fix * CI formatting fix * ruff fix * ruff formatting * ran utils/sort_auto_mappings.py * update docstring * more docstring udpates * add vision_input_type default fallback for image processing * more verbose variable naming * test update * Remove PE and PEConfig use AutoModel(TimmWrapper) instead * Minor cleanup. * Minor Fix: remove any ref to PE. Ruff format and check. * fix docstring * Fix modular/model consistency.Improvex docstringfor . * Fix PerceptionLMForConditionalGenerationModelTest * ruff fix * fix for check_repo * minor formatting * dummy size arg to fix for processor test. * Update docstring for PerceptionLMConfig * Minor fixes from review feedback. * Revert some minor changes per reviewer feedback. * update base_model_prefix * address reviewer feedback * fix comment in modeling file * address reviewer feedback * ruff format * Pre-merge test update. * reapply modular and fix checkpoint name * processor test path * use modular a bit more * remove dead code * add token decorator --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-11 11:07:32 +02:00
Pavel Iakubovskii	fe1a5b73e6	[modular] speedup check_modular_conversion with multiprocessing (#37456 ) * Change topological sort to return level-based output (lists of lists) * Update main for modular converter * Update test * update check_modular_conversion * Update gitignore * Fix missing conversion for glm4 * Update * Fix error msg * Fixup * fix docstring * update docs * Add comment * delete qwen3_moe	2025-07-10 19:07:59 +01:00
Cyril Vallez	571a8c2131	Add a default value for `position_ids` in masking_utils (#39310 ) * set default * Update masking_utils.py * add small test	2025-07-10 18:53:40 +02:00
Kyle Sayers	bdc8028cb3	[Core] [Offloading] Enable saving offloaded models with multiple shared tensor groups (#39263 ) * fix counting meta tensors, fix onloading meta tensors Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove unrelated fix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add test Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-07-10 18:33:30 +02:00
Joao Gante	df49b399dc	[tests] tag serve tests as slow (#39343 ) * maybe they need more cpu resources? * add todo	2025-07-10 15:40:08 +00:00
Paul Pak	9682d07f92	LFM2 (#39340 ) * [modeling][lfm2] LFM2 model on 4.53.0 interface * [configuration] hook in LFM2 keys * [modeling][lfm2] update modeling interface for 4.53.1 * [modeling][lfm2] apply mask to hidden conv states * [misc] ruff format/lint * [modeling][lfm2] minor: NotImplemented legacy cache conversion * Create lfm2.md * create nice modular * style * Update modeling_auto.py * clean and start adding tests * style * Update test_modeling_lfm2.py * Update __init__.py * small test model size * config * small fix * fix * remove useless config attrs -> block_dim and conv_dim are hiden_size * fix prepare inputs * fix config * test * typo * skip tests accordingly * config docstrings * add doc to .md * skip config docstring check --------- Co-authored-by: Maxime Labonne <81252890+mlabonne@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-10 16:07:33 +02:00
Joao Gante	38c3931362	[server] add tests and fix passing a custom `generation_config` (#39230 ) * add tests; fix passing a custom generation_config * tool integration test * add install step * add accelerate as dep to serving * add todo	2025-07-10 13:41:38 +00:00
Yih-Dar	92043bde29	fix `phi3` tests (#39312 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-10 11:51:55 +02:00
Kingsley	520b9dcb42	fix Glm4v batch videos forward (#39172 ) * changes for video * update modular * change get_video_features * update video token replacement * update modular * add test and fix typo * lint * fix order * lint * fix * remove dependency * lint * lint * remove todo * resize video for test * lint.. * fix test * new a processor for video_test * fix test	2025-07-10 10:44:28 +02:00
Raushan Turganbay	bc161d5d06	Delete deprecated stuff (#38838 ) * delete deprecated stuff * fix copies * remove unused tests * fix modernbert and fuyu * Update src/transformers/cache_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * bye bye `seen_tokens` * address comments * update typings * ecnoder decoder models follow same pattern as whisper * fix copies * why is it set to False? * fix switch transformers * fix encoder decoder models shared weight * fix copies and RAG * remove `next_cache` * fix gptj/git * fix copies * fix copies * style... * another forgotten docsrting --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-07-10 05:18:44 +00:00
jiqing-feng	aff7df8436	enable static cache on TP model (#39164 ) * enable static cache on TP model Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * check tp size before init kv cache Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix docstring Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add tp tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix other cache head size Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-07-09 21:14:45 +00:00
Vladislav Bronzov	c980904204	Add DeepSeek V2 Model into Transformers (#36400 ) * add initial structure * doc fixes, add model base logic * update init files * some fixes to config and modular * some improvements for attention * format * remove unused attn * some fixes for moe layer and for decoder * adapt _compute_yarn_parameters for deepseek * format * small fix * fix for decoder forward * add tests, small refactoring * fix dummies * fix init * fix doc * fix config docs * add sequce doc, fix init for gate * fix issues in tests * fix config doc * remove unused args * some fixes and refactoring after review * fix doc for config * small fixes for config args * revert config refactoring * small refactoring * minor fixes after rebase * small fix after merge * fix modular * remove rotaryembd from public init * small test fix * some rotary pos calculation improvement * fix format * some improvements and fixes * fix config * some refactoring * adjust some unit tests * skip test * small fixes and tests adjustment * reapply modular * fix all tests except Integration * fix integration testzs * cleanup BC stuff * rope * fix integrations tests based on a10 * style --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-09 17:04:28 +02:00
Yih-Dar	4798c05c64	skip `test_torchscript_*` for now until the majority of the community ask for it (#39307 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-09 15:35:48 +02:00
JJJYmmm	25343aafee	Fix SDPA attention precision issue in Qwen2.5-VL (#37363 ) * solve conflicts and remove redundant attention_mask in qwenvit * update decoded text check * remove trailing whitespace	2025-07-09 07:03:44 +02:00
Yaswanth Gali	0e1c281745	[Tests] Update model_id in AIMv2 Tests (#39281 ) * Update model_id in tests * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-08 21:46:32 +02:00
Quentin Lhoest	1ecd52e50a	Add torchcodec in docstrings/tests for `datasets` 4.0 (#39156 ) * fix dataset run_object_detection * bump version * keep same dataset actually * torchcodec in docstrings and testing utils * torchcodec in dockerfiles and requirements * remove duplicate * add torchocodec to all the remaining docker files * fix tests * support torchcodec in audio classification and ASR * [commit to revert] build ci-dev images * [commit to revert] trigger circleci * [commit to revert] build ci-dev images * fix * fix modeling_hubert * backward compatible run_object_detection * revert ci trigger commits * fix mono conversion and support torch tensor as input * revert map_to_array docs + fix it * revert mono * nit in docstring * style * fix modular --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-08 17:06:12 +02:00
Yih-Dar	838a0268b8	fix flaky `test_generate_compile_model_forward` (#39276 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-08 15:36:05 +02:00
Yaswanth Gali	fbdaa7b099	Add Aimv2 model (#36625 ) * Model skelton * changes * temp push * changes * Added support for aimv2-native * More changes * More changes * Stupid mistake correction * Added config and refactor * Added vison model * update * Refactor for lit variant * Added Text Model * Minor fixes * nits * update * Preliminary tests * More fixes * Updated tests 🤗 * Refactor * Updated testcase * Updated config * make fixup * more fixes * Bug fix and updates * deadcode * Fixes * nit * up * Happy CI ✅ * Reduce LOC * nit * nit * make style * return_dict refactor * bug fix * fix * doc update * nit * make fixup * Minor update * _init_weigths modifcation * update tests * Minor fixes post review * Update w.r.t GradientCheckpointingLayer * docs update * update * nit * Use more Modular 😉 * Change name from AIMv2 to Aimv2 * Nit * make style * Add model doc pointer * make style * Update model doc section * updates * Modify attn mask and interface * update test * Final change * Utilize flash and flex attn * keep attn mask * camelcase model name in test file * Fix docstring * Fix config warning finally and create_causal_mask * disable torchscript * remove unused arg * remove from tests * balance model size for tests * fix device * tests * tests * flaky test * fix import --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-08 11:53:21 +02:00
Jingze Shi	d8590b4b0c	Add Doge model (#35891 ) * Add Doge Model * Fix code quality * Rollback an error commit * Fix config for open-source weights * Revert "Fix config for open-source weights" This reverts commit 229cdcac10a6a4274d1dd13b729bc14c98eb0c76. * Add modular_doge * Update Doge inherits from Llama * Fix import bug * [docs] Add usage of doge model * Fix Doge import pretrainedconfig from modeling_utils to configuration_utils * [docs] remove trust remote code from doge * Fix dynamo bug in doge model * Update docstrings * Import apply_rotary_pos_emb and repeat_kv from Llama * Fix all nits * Fix code quality * Fix some bugs * Fix code quality * Remove inherited `_update_causal_mask` from Llama This leads to incorrect weight initialization. * Fix the wrong tensor orderings in DogeCDMoE * Fix attention mask bug We have to provide attention_mask for dynamic mask computation * Modify most implementations to inherit from Llama But there are two problems: 1. `flex_attention_forward` is not updated properly 2. `Example` error in the forward method of DogeForCausalLM * Modify CDMoE for batch efficient implementation * Uniform MoE configuration names, just like QwenMoE * Fix code quality * Fix code quality * Fix code quality * Add tp plan of CDMoE Module * Hybird DMA with sliding window * Update valid tokens greater than window size * Fix code quality * Add `convert_doge_weights_to_hf` * Fix STATE_DICT_MAPPING in convert_doge_weights_to_hf.py * Fix nits in modular_doge * Fix code quality * Fix all nits * Fix all nits * Make sure the attention function is updated inside the class * Fix code quality issues in the Doge model and add a test for it * Fix `test_generate` * Fix code quality * Fix nits fllowing suggestions * Fix code quality * Fix code quality issues * Fix nits * Fix code quality nits * Fix the missing parameters in the configuration. * Fix the missing parameters in the configuration. * Fix nits * Add initialization of attention * Fix last nits * Simplify dynamic mask generation logic * Rename router_logits to gate_logits for matching latest changes of MixtralModel * Rename typings for matching latest changes of MixtralModel * Fixes typo in comment * Update src/transformers/models/doge/modular_doge.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix code quality issues to match other modular * Fix code quality issues to match other modular * Fix the static compilation errors * Update model weights link * Fix code quality issues to match other modular * reapply modular and support for new outputs * style * simplify a lot * fix import location * reapply modular * fix * fix integration test --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-08 11:44:29 +02:00
Guang Yang	356fd68109	fix(generation): stop beam search per-instance when heuristic satisfied (#38778 ) * fix(decoding): stop beam search per-instance when heuristic satisfied Previously, when early_stopping is set to `False`, the early-stopping heuristic only halted generation when all batch instances reached the criterion. This caused instances that are impossible (suggested by the heuristic) to improve keep generating, leading to inconsistent and overlong outputs across the batch. Now we apply the heuristic per-instance: once a certain instance of batch has its all beams impossibe to improve, we mark that instance finished while letting others continue. This restores expected behavior and ensures consistency in batched generation. * Add test case GenerationIntegrationTests.test_beam_search_early_stop_heuristic * Update naming improvement_possibility -> is_early_stop_heuristic_unsatisfied * Add comments for early stop heuristic * Update src/transformers/generation/utils.py --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-07-08 08:59:37 +00:00
Yih-Dar	a21557fa3e	Skip `test_eager_matches sdpa generate` and update an integration test for blip-like models (#39248 ) * skip * skip --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-08 10:38:25 +02:00
Yao Matrix	b2816da802	fix xpu failures on PT 2.7 and 2.8 w/o IPEX and enable hqq cases on XPU (#39187 ) * chameleon xpu bnb groundtruth update on bnb triton backend since we are deprecating ipex backend Signed-off-by: YAO Matrix <matrix.yao@intel.com> * enable hqq uts on XPU, all passed Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix comment Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-07-08 10:18:26 +02:00
Yuxuan Zhang	17b3c96c00	Glm 4 doc (#39247 ) * update the glm4 model readme * update test * update GLM-4.1V model * update as format * update * fix some tests * fix the rest * fix on a10, not t4 * nit: dummy import --------- Co-authored-by: raushan <raushan@huggingface.co>	2025-07-08 08:22:04 +02:00
Yih-Dar	41e865bb8d	fix some flaky tests in `tests/generation/test_utils.py` (#39254 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-07 19:49:41 +02:00
Mikhail Moskovchenko	3993ee1e98	Add `segmentation_maps` support to MobileNetV2ImageProcessor (#37312 ) * Add `segmentation_maps` support to mobilenet_v2 image processor and `reduce_labels` to mobilevit * Changed mobilenetv2 tests to support fastimageprocessor * added `segmentation_maps` support to fast image processor * reverted to upstream/main * Add optional * Use autodocstring * Changed docs * Docs fix * Changed fp to match beit fp * Change typing imports * Fixed repo inconsistency * Added fast-slow equivalence tests * Removed unnecessary call * Add `reduce_labels` to Mobilevit fast processor --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-07-07 13:34:59 -04:00
kaixuanliu	c4e39ee59c	adjust input and output texts for test_modeling_recurrent_gemma.py (#39190 ) * adjust input and output texts for test_modeling_recurrent_gemma.py Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * fix bug Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * adjust Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update Expectation match Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * fix --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-07 15:13:25 +02:00
Yih-Dar	9b09fe479f	fix `fastspeech2_conformer` tests (#39229 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-07 15:04:26 +02:00
Cyril Vallez	056fa73fae	[modular] Simplify logic and docstring handling (#39185 ) * simplify a lot * Update modular_model_converter.py * finalize * remove outdated functions * apply it * and examples	2025-07-07 14:52:57 +02:00
Xavier Dupré	f16fbfb89a	Make _compute_dynamic_ntk_parameters exportable (#39171 ) * Make _compute_dynamic_ntk_parameters exportable * add unit test	2025-07-07 14:48:31 +02:00
Isotr0py	8570bc29f3	Fix missing fast tokenizer/image_processor in whisper/qwen2.5-omni processor (#39244 ) * fix missing fast tokenizer in whisper processor Signed-off-by: Isotr0py <2037008807@qq.com> * fix processor test Signed-off-by: Isotr0py <2037008807@qq.com> * fix qwen2.5 omni processor Signed-off-by: Isotr0py <2037008807@qq.com> --------- Signed-off-by: Isotr0py <2037008807@qq.com>	2025-07-07 13:54:18 +02:00
Rémi Ouazan	a325409a50	Expectations re-order and corrected FA3 skip (#39195 ) * Fix Expectations and a FA3 skip * Fixed docstring * Added context for Default expectation	2025-07-07 11:42:33 +02:00
Arthur	ca7e1a3756	Refactor the way we handle outputs for new llamas and new models (#39120 ) * just update 2 files * update other models as well just making fix-copies * also add the changes needed to modeling utils * put this on the pretrained model instead * nits and fixes * update generic, fix to use config value * update other modelings * use transformers kwargs instead * update * update * update other models * update * updates * update * update * update * fix * finally * very small nits * this fixes more tests * fix other models as well! * update modularqwen2 * update models based on qwen2 * update * update * remove the *flash stuff in favor of noraml kwargs update * propagate gemma? * remove output attentions * propagate * support cross attention edge case * same * test this * fixes * more fix * update * update * fix conflicts * update * fix emu3 * fix emu3 * move the fix a bit * quel enfer * some fixes, loss_kwargs should never had been * finish fixing gemma3n * fix small lm3 * fix another one * fix csm now * fux csm and mistral * fix mistral now * small fixes * fix janusss * only for some models * fixup * phix phi3 * more fixes? * dose this fix it? * update * holy shit it was just graph breaks * protect torch * updates * fix samhq? * fix moonshine * more moonshine fixes, 3 failures left! * nits * generic needs to support more * more fixes to moonshine! * fix cross attention outputs! * fix csm! * nits * fix stupid kosmos2 * current updates * fixes * use output recorder? * nicer! * a little bit of magic * update * fix protect * fix * small fixes * protect import * fix a bunch of more models * fix fixups * fix some of the last ones * nit * partly fix phi * update * fix import path * make something that is fullgraph compatible just to be sure * typing was wrong on llama so the rest was wrong as well * fucking ugly but at least it is still exportable * syle * supposed to fix moonshine, it still breaks * fix some default * fix the last bits of sam * update samhq * more fixes to am hq * nit * fix all output+hidden states and output_attentions! * fix? * fix diffllama * updates to fix initialization on the sam pips * ups there was a bug * fix the last sam hq test * fix gotocr * fix gotocr2! * fixes * skip stupid tests * there was one left :) * fixup * fix fix copies issues with this test file * fix copies for sam_hq * rm some comments * skip 2 more failing tests * fix * fix everything * Apply suggestions from code review Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * add more doc! * fix public init * fix modular qwen3 --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>	2025-07-05 11:34:28 +02:00

1 2 3 4 5 ...

5153 Commits