HuggingFace_transformer

Author	SHA1	Message	Date
SUMIN	ccb1d06ecf	Convert binary/image/model files to Git LFS pointers Some checks failed Secret Leaks / trufflehog (push) Has been cancelled Details	2026-04-11 01:50:39 +09:00
SUMIN	e4b809e5b2	Add Git LFS tracking for binary/model/image files	2026-04-11 01:45:26 +09:00
ssum21	e52c5890d1	add_toctree.yml	2025-08-30 15:57:15 +09:00
SSUM	b80c173b8f	Update docs/source/ko/model_doc/deepseek_v3.md Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>	2025-08-27 18:54:00 +09:00
SSUM	15b4988bb7	Update docs/source/ko/model_doc/deepseek_v3.md Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>	2025-08-27 18:53:52 +09:00
SSUM	231653db22	Merge branch 'main' into ko-deepseek_v3.md	2025-08-27 13:54:56 +09:00
Yih-Dar	ff8b88a948	Fix nightly torch CI (#40469 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-26 22:02:15 +02:00
Yih-Dar	74ad608a2b	Not to shock AMD team by the cancelled workflow run notification ❤️ 💖 (#40467 )	2025-08-26 20:53:24 +02:00
SowmiyaNarayanan G	c8c7623f20	Update SegFormer model card (#40417 ) * Update SegFormer model card * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update the segformer model card * Remove quantization example --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-26 08:27:25 -07:00
StevenBucaille	78f32c3917	[pipeline] Add Keypoint Matching pipeline (#39970 ) * feat: keypoint-matcher pipeline * docs: added keypoint-matcher pipeline in docs * fix: added missing statements for repo consistency * docs: updated SuperGlue, LightGlue and EfficientLoFTR docs * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * test: fixed run_pipeline_test * update pipeline typing and docs * update tests * update docs snippets * Fix import error * fix: pipeline init * pt framework --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-08-26 15:26:57 +01:00
Joao Gante	6451294f6f	[RoPE] explicit factor > implicit factor in YaRN (#40320 ) explicit factor > implicit factor	2025-08-26 14:58:28 +01:00
audioXD	5a8ba87ecf	[fast_image_processor] fix image normalization for resize (#40436 )	2025-08-26 13:49:51 +00:00
VED	0ce6709e70	deci gguf support (#38669 ) * deci gguf support * make style * tests for deci * try except removed * style * try except removed	2025-08-26 13:43:17 +00:00
Matt	263d06fedc	Fix extra template loading (#40455 ) * Fix extra template loading * Reformat * Trigger tests	2025-08-26 14:01:01 +01:00
Pedro Cuenca	58cebc848b	flash_paged: s_aux may not exist (#40434 ) Some implementations (i.e., https://huggingface.co/kernels-community/vllm-flash-attn3) support an `s_aux` arg for attention sinks, but others (https://huggingface.co/kernels-community/flash-attn) do not. If s_aux is present in the kwargs, we forward it, otherwise we don't. The user will still get an error if they use a model like gpt-oss-20b with an implementation that does not support `s_aux`, but models that don't use it won't error out. For example, [this is currently failing](`399cd5c04b/examples/pytorch/continuous_batching.py (L16)`) because we are sending `s_aux: None` in the dict.	2025-08-26 13:15:59 +02:00
Rémi Ouazan	34108a2230	Continuous batching refactor (#40426 ) * Rework of the CB example * Further rework of CB example * Refactor PA cache, slice on tokens, add debug prints -- WIP * Slice cache -- WIP * Added a mechanism to check batched outputs in CB script * Less logging, debug flag for slice, !better reset! -- WIP * QOL and safety margins * Refactor and style * Better saving of cb example * Fix * Fixes and QOL * Mor einformations about metrics * Further logging * Style * Licenses * Removed some comments * Add a slice input flag * Fix in example * Added back some open-telemetry deps * Removed some aux function * Added FA2 option to example script * Fixed math (all of it) * Added a simple example * Renamed core to classes * Made allocation of attention mask optionnal * Style	2025-08-26 13:01:42 +02:00
Manuel de Prada Corral	49e168ff08	🚨 Remove Contrastive Search decoding strategy (#40428 ) * delete go brrr * fix tests * review	2025-08-26 12:31:46 +02:00
Rémi Ouazan	b8184b7ce9	Make cache_config not mandatory (#40316 ) * Relaxed assumptions on cache_config * Review compliance * Style * Styyyle * Removed default and added args * Rebase mishapfix * Propagate args to TorchExportableModuleForDecoderOnlyLM * Fix the test I wanted fixed in this PR * Added some AMD expectation related to cache tests	2025-08-26 12:06:17 +02:00
Yao Matrix	32fcc24667	rename get_cuda_warm_up_factor to get_accelerator_warm_up_factor (#40363 ) Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-08-26 09:56:35 +00:00
Raushan Turganbay	f690a2a1e0	[video processors] decode only sampled videos -> less RAM and faster processing (#39600 ) * draft update two models for now * batch update all VLMs first * update some more image processors * update * fix a few tests * just make CI green for now * fix copies * update once more * update * unskip the test * fix these two * fix torchcodec audio loading * maybe * yay, i fixed torchcodec installation and now can actually test it * fix copies deepseek * make sure the metadata is returrned when users request it * add docs * update * fixup * Update src/transformers/audio_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/glm4v/video_processing_glm4v.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * update * what if we set some metadata attr to `None` * fix CI * fix one test * fix 4 channel test * fix glm timestemps * rebase gone wrong * raise warning once * fixup * typo * fix copies * ifx smolvlm test * this is why torch's official benchmark was faster, set threads to `0` * Apply style fixes --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-08-26 11:38:02 +02:00
Xin Yao	64ae6e6b1d	fix qwen25-vl grad acc (#40333 ) * fix qwen25—vl grad acc * fix Qwen2_5_VLForConditionalGeneration for accepts_loss_kwargs * fix ci * fix ci * fix typo * fix CI	2025-08-26 09:30:06 +00:00
Kashif Rasul	6d2bb1e04d	[Trainer] accelerate contextparallel support in trainer (#40205 ) * initial context_parallel_size support in trainer * For context parallelism, use AVG instead of SUM to avoid over-accounting tokens * use parallelism_config.cp_enabled * add parallelism_config to trainer state * warn when auto-enabling FSDP * fix some reviews * WIP: somewhat matching loss * Feat: add back nested_gather * Feat: cleanup * Fix: raise on non-sdpa attn * remove context_parallel_size from TrainingArguments * if we have parallelism_config, we defer to get_state_dict from accelerate * fix form review * Feat: add parallelism config support * Chore: revert some unwanted formatting changes * Fix: check None * Check none 2 * Fix: remove duplicate import * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Fin * require accerelate 1.10.1 and higer --------- Co-authored-by: S1ro1 <matej.sirovatka@gmail.com> Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-08-26 09:28:48 +00:00
Pavel Iakubovskii	63caaea1fb	Refactor ViT-like models (#39816 ) * refactor vit * fix * fixup * turn off FX tests * AST * deit * dinov2 * dinov2_with_registers * dpt * depth anything (nit) * depth pro (nit) * ijepa * ijepa (modular) * prompt_depth_anything (nit) * vilt (nit) * zoedepth (nit) * videomae * vit_mae * vit_msn * vivit * yolos * eomt * vitpose * update auto backbone * disable `fx` and export tests (dnov2, dpt, ijepa, vit, vitpose) * fix kwargs for backbone * fix * convnext * fixup * update convnext layernorm * fix-copies layer_norm * convnextv2 * explicit output_hidden_states for models with backbones * explicit hidden states collection for dinov2 * tests fixed * fix DPT as well * fix dinov2 with registers * add comment	2025-08-26 11:14:06 +02:00
Yih-Dar	922e65b3fc	Fix non FA2 tests after FA2 installed in CI docker image (#40430 ) * up * up * up * up * up * up * up * up * up * up * up * up * up --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-26 10:36:50 +02:00
ivarflakstad	e68146fbe7	Fix collated reports model name entry (#40441 ) Some checks failed Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled Details Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled Details Build documentation / build (push) Has been cancelled Details Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled Details Self-hosted runner (push-caller) / Check if setup was changed (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details Update Transformers metadata / build_and_package (push) Has been cancelled Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Has been cancelled Details Self-hosted runner (push-caller) / build-docker-containers (push) Has been cancelled Details Self-hosted runner (push-caller) / Trigger Push CI (push) Has been cancelled Details Check Tiny Models / Check tiny models (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled Details Nvidia CI / Setup (push) Has been cancelled Details Nvidia CI / Model CI (push) Has been cancelled Details Nvidia CI / Torch pipeline CI (push) Has been cancelled Details Nvidia CI / Example CI (push) Has been cancelled Details Nvidia CI / Trainer/FSDP CI (push) Has been cancelled Details Nvidia CI / DeepSpeed CI (push) Has been cancelled Details Nvidia CI / Quantization CI (push) Has been cancelled Details Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled Details Doctests / Setup (push) Has been cancelled Details Doctests / Call doctest jobs (push) Has been cancelled Details Doctests / Send results to webhook (push) Has been cancelled Details Stale Bot / Close Stale Issues (push) Has been cancelled Details	2025-08-25 20:36:01 +00:00
Ákos Hadnagy	8ce633cc75	InternVL MI325 test expectations (#40387 ) * Adjust ROCm expectations * MI355 --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>	2025-08-25 22:00:35 +02:00
ivarflakstad	7637d298b3	Fix collated reports uploading (#40440 )	2025-08-25 21:49:59 +02:00
id01	fa59cf9c9f	Fix https://github.com/huggingface/transformers/issues/40292 (#40439 ) * Fix https://github.com/huggingface/transformers/issues/40292 * Trigger tests --------- Co-authored-by: Matt <rocketknight1@gmail.com>	2025-08-25 20:12:57 +01:00
ivarflakstad	f0e87b436d	Fix collated reports model directory traversal (#40437 ) Fix model dir traversal	2025-08-25 18:01:58 +00:00
Ákos Hadnagy	ef406902bf	Gemma3 text fixes: Add expectations for MI325 (#40384 ) * Add expectations for MI325 * Ruff * Adjust CUDA expectations as well * Another attempt for CUDA expectations	2025-08-25 19:57:50 +02:00
Judy	c81723d31b	🌐 [i18n-KO] Translated `models.md` to Korean (#39518 ) * docs: ko: models.md * feat: nmt draft * fix: manual edits * Resolved _toctree.yaml conflict during merge from main * Apply suggestions from code review Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Apply suggestions from code review * fix: update toctree * Update docs/source/ko/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-25 09:17:08 -07:00
ivarflakstad	6b5eab70e4	Remove working-dir from collated reports job (#40435 )	2025-08-25 18:14:35 +02:00
Joao Gante	1763ef2951	[docs] remove last references to `transformers` TF classes/methods (#40429 ) * halfway through tasks * complete * Update utils/check_docstrings.py	2025-08-25 16:30:59 +01:00
Olumayowa Akinkuehinmi	eac4f00bdf	Fix typo and improve GPU kernel check error message in MXFP4 quantization (#40349 ) (#40408 ) Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-08-25 15:21:55 +00:00
Joshua Chin	d8f2edcc46	Add `tokenizer_kwargs` argument to the text generation pipeline (#40364 ) * Add `tokenizer_kwargs` arg to text generation pipeline. * chore: re-run CI * Rename `tokenizer_kwargs` to `tokenizer_encode_kwargs` for text generation pipeline * Fix `tokenizer_encode_kwargs` doc string. * Fix note related to `tokenizer _kwargs` in text generation pipeline --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-08-25 15:21:19 +00:00
ivarflakstad	1a35d07f56	Update collated reports working directory and --path (#40433 )	2025-08-25 15:18:26 +00:00
Cyril Vallez	399cd5c04b	Fix modular for modernbert-decoder (#40431 ) * fix the modular * CI	2025-08-25 16:50:49 +02:00
Manuel de Prada Corral	ea8d9c8f06	🚨 Remove DoLa decoding strategy (#40082 ) * remove dola generation strategy * add fast test	2025-08-25 16:33:27 +02:00
Arthur	6bf6f8490c	[`Mxfp4`] Add a way to save with a quantization method (#40176 ) * add a test * tempdir * fix import issue[ * wow I am tired * properly init * i am not super familiar with quantizer api :\| * set to TRUE fro now * full support * push current changes * will clean this later but the imports are a shitshow here * this correctly saves the block and scales but forward seems broken * quanitze was not correct * fix storage * why were bias even included * finally! * style * fix style * remove print * lazy import * up * not sure what happens this works now? * holy molly it was not so far * okay this seems to work! * workings!!! * allow save_pretrained to create PR * Apply suggestions from code review * fixup * add deqyabtze fakse as wek * working new * fix * rm swizzle and unswizzle during saving * rm print * Update src/transformers/modeling_utils.py * fix * style --------- Co-authored-by: Marc Sun <marc@huggingface.co>	2025-08-25 16:27:19 +02:00
Andrew Chauzov	04c2bae3a8	Fix label smoothing incompatibility with multi-label classification (#40296 ) * Fix label smoothing incompatibility with multi-label classification (#40258) * Improve label smoothing multi-label check based on reviewer feedback - Move check from LabelSmoother to Trainer.__init__() for better architecture - Use model.config.problem_type instead of tensor inference for robustness - Warn and disable smoothing instead of raising error for better UX - Update test to verify warning behavior	2025-08-25 14:23:31 +00:00
Raushan Turganbay	3b5b9f6518	Fix processing tests (#40379 ) * fix tests * skip failing test in generation as well * grounding dino was overwritten * one more overwritten code * clear comment	2025-08-25 14:50:54 +02:00
jiqing-feng	a0a37b3250	Gpt oss optim (#40304 ) * enable fast index selecting Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update model Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix gpt-oss tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix check tensor Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-08-25 14:36:33 +02:00
ρrαnαm	d73181b3fc	Fix UnboundLocalError in WER metric computation (#40402 ) Renamed wer metric variable to wer_metric to avoid naming conflict with local variable assignment in compute_metrics function. Co-authored-by: pranam-gf <pranam@goodfin.com>	2025-08-25 12:02:22 +00:00
Prawal Sharma	11e12a715a	Fix typo: 'seperator' to 'separator' in variable names (#40389 ) Fixed 4 instances of the typo "seperator" → "separator" in variable names: - 2 instances in src/transformers/models/shieldgemma2/convert_shieldgemma2_weights_orbax_to_hf.py - 2 instances in src/transformers/models/gemma3/convert_gemma3_weights_orbax_to_hf.py These typos were in variable names used for parsing path components in weight conversion scripts. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-08-25 11:56:30 +00:00
Cyril Vallez	40299134a8	Fix CI (hunyuan moe does not support fullgraph) (#40423 ) fix flag	2025-08-25 12:01:28 +02:00
Olumayowa Akinkuehinmi	a2b37bfd58	Fix typo: 'casual' -> 'causal' in code and documentation (#40371 ) (#40407 )	2025-08-25 09:32:15 +00:00
Joao Gante	0031c044f8	[docs] flax/jax purge (#40372 ) flax/jax purge	2025-08-25 10:25:00 +01:00
Du Wenjie	14b89fed24	fix to accept cumulative_seqlens from TransformersKwargs in FA (#40194 ) * fix to the typings which are unmatched to FA function signature cumulative_seqlens_q/k -> cu_seq_lens_q/k: - in the FlashAttentionKwargs in modeling_flash_attention_utils - in the TransformersKwargs in generic - in the PagedAttentionArgs in continuous_batching It is BC, because they are created in `ContinuousBatchProcessor.setup_static_tensors:L762`, used in `ContinuousBatchingManager._model_forward:L1233` and destroyed with `ContinuousBatchProcessor` * format changes by ruff * Update src/transformers/integrations/flash_paged.py unused function arg in `PagedAttentionCache.update` Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * revert continuous_batching signiture, which is more meaningful --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>	2025-08-25 11:00:13 +02:00
Pablo Montalvo	ba095d387d	🧹 🧹 🧹 Get set decoder cleanup (#39509 ) * simplify common get/set * remove some noise * change some 5 years old modeling utils * update examples * fix copies * revert some changes * fixes, gah * format * move to Mixin * remove smolvlm specific require grad * skip * force defaults * remodularise some stuff * remodularise more stuff * add safety for audio models * style * have a correct fallback, you daft donkey * remove this argh * change heuristic for audio models * fixup * revert * this works * this should be explicit * fix Nth ESM exception * tryout decoder * this as well * revert again * 🧠 * aaah ESM has two modelings aaah * broom broom * format * wrong copies * copies * modular cleanups * format * modularities * wrong mergefix * seriously * align with new model * new model	2025-08-25 10:57:56 +02:00
Cyril Vallez	2c55c7fc94	Reactivate a lot of tests skipped for no reason anymore (#40378 ) * reactivate all the tests * some tests still failing	2025-08-25 10:44:43 +02:00

1 2 3 4 5 ...

20222 Commits