HuggingFace_transformer

Author	SHA1	Message	Date
ivarflakstad	e68146fbe7	Fix collated reports model name entry (#40441 ) Some checks failed Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled Details Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled Details Build documentation / build (push) Has been cancelled Details Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled Details Self-hosted runner (push-caller) / Check if setup was changed (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details Update Transformers metadata / build_and_package (push) Has been cancelled Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Has been cancelled Details Self-hosted runner (push-caller) / build-docker-containers (push) Has been cancelled Details Self-hosted runner (push-caller) / Trigger Push CI (push) Has been cancelled Details Check Tiny Models / Check tiny models (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled Details Nvidia CI / Setup (push) Has been cancelled Details Nvidia CI / Model CI (push) Has been cancelled Details Nvidia CI / Torch pipeline CI (push) Has been cancelled Details Nvidia CI / Example CI (push) Has been cancelled Details Nvidia CI / Trainer/FSDP CI (push) Has been cancelled Details Nvidia CI / DeepSpeed CI (push) Has been cancelled Details Nvidia CI / Quantization CI (push) Has been cancelled Details Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled Details Doctests / Setup (push) Has been cancelled Details Doctests / Call doctest jobs (push) Has been cancelled Details Doctests / Send results to webhook (push) Has been cancelled Details Stale Bot / Close Stale Issues (push) Has been cancelled Details	2025-08-25 20:36:01 +00:00
Ákos Hadnagy	8ce633cc75	InternVL MI325 test expectations (#40387 ) * Adjust ROCm expectations * MI355 --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>	2025-08-25 22:00:35 +02:00
ivarflakstad	7637d298b3	Fix collated reports uploading (#40440 )	2025-08-25 21:49:59 +02:00
id01	fa59cf9c9f	Fix https://github.com/huggingface/transformers/issues/40292 (#40439 ) * Fix https://github.com/huggingface/transformers/issues/40292 * Trigger tests --------- Co-authored-by: Matt <rocketknight1@gmail.com>	2025-08-25 20:12:57 +01:00
ivarflakstad	f0e87b436d	Fix collated reports model directory traversal (#40437 ) Fix model dir traversal	2025-08-25 18:01:58 +00:00
Ákos Hadnagy	ef406902bf	Gemma3 text fixes: Add expectations for MI325 (#40384 ) * Add expectations for MI325 * Ruff * Adjust CUDA expectations as well * Another attempt for CUDA expectations	2025-08-25 19:57:50 +02:00
Judy	c81723d31b	🌐 [i18n-KO] Translated `models.md` to Korean (#39518 ) * docs: ko: models.md * feat: nmt draft * fix: manual edits * Resolved _toctree.yaml conflict during merge from main * Apply suggestions from code review Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Apply suggestions from code review * fix: update toctree * Update docs/source/ko/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-25 09:17:08 -07:00
ivarflakstad	6b5eab70e4	Remove working-dir from collated reports job (#40435 )	2025-08-25 18:14:35 +02:00
Joao Gante	1763ef2951	[docs] remove last references to `transformers` TF classes/methods (#40429 ) * halfway through tasks * complete * Update utils/check_docstrings.py	2025-08-25 16:30:59 +01:00
Olumayowa Akinkuehinmi	eac4f00bdf	Fix typo and improve GPU kernel check error message in MXFP4 quantization (#40349 ) (#40408 ) Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-08-25 15:21:55 +00:00
Joshua Chin	d8f2edcc46	Add `tokenizer_kwargs` argument to the text generation pipeline (#40364 ) * Add `tokenizer_kwargs` arg to text generation pipeline. * chore: re-run CI * Rename `tokenizer_kwargs` to `tokenizer_encode_kwargs` for text generation pipeline * Fix `tokenizer_encode_kwargs` doc string. * Fix note related to `tokenizer _kwargs` in text generation pipeline --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-08-25 15:21:19 +00:00
ivarflakstad	1a35d07f56	Update collated reports working directory and --path (#40433 )	2025-08-25 15:18:26 +00:00
Cyril Vallez	399cd5c04b	Fix modular for modernbert-decoder (#40431 ) * fix the modular * CI	2025-08-25 16:50:49 +02:00
Manuel de Prada Corral	ea8d9c8f06	🚨 Remove DoLa decoding strategy (#40082 ) * remove dola generation strategy * add fast test	2025-08-25 16:33:27 +02:00
Arthur	6bf6f8490c	[`Mxfp4`] Add a way to save with a quantization method (#40176 ) * add a test * tempdir * fix import issue[ * wow I am tired * properly init * i am not super familiar with quantizer api :\| * set to TRUE fro now * full support * push current changes * will clean this later but the imports are a shitshow here * this correctly saves the block and scales but forward seems broken * quanitze was not correct * fix storage * why were bias even included * finally! * style * fix style * remove print * lazy import * up * not sure what happens this works now? * holy molly it was not so far * okay this seems to work! * workings!!! * allow save_pretrained to create PR * Apply suggestions from code review * fixup * add deqyabtze fakse as wek * working new * fix * rm swizzle and unswizzle during saving * rm print * Update src/transformers/modeling_utils.py * fix * style --------- Co-authored-by: Marc Sun <marc@huggingface.co>	2025-08-25 16:27:19 +02:00
Andrew Chauzov	04c2bae3a8	Fix label smoothing incompatibility with multi-label classification (#40296 ) * Fix label smoothing incompatibility with multi-label classification (#40258) * Improve label smoothing multi-label check based on reviewer feedback - Move check from LabelSmoother to Trainer.__init__() for better architecture - Use model.config.problem_type instead of tensor inference for robustness - Warn and disable smoothing instead of raising error for better UX - Update test to verify warning behavior	2025-08-25 14:23:31 +00:00
Raushan Turganbay	3b5b9f6518	Fix processing tests (#40379 ) * fix tests * skip failing test in generation as well * grounding dino was overwritten * one more overwritten code * clear comment	2025-08-25 14:50:54 +02:00
jiqing-feng	a0a37b3250	Gpt oss optim (#40304 ) * enable fast index selecting Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update model Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix gpt-oss tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix check tensor Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-08-25 14:36:33 +02:00
ρrαnαm	d73181b3fc	Fix UnboundLocalError in WER metric computation (#40402 ) Renamed wer metric variable to wer_metric to avoid naming conflict with local variable assignment in compute_metrics function. Co-authored-by: pranam-gf <pranam@goodfin.com>	2025-08-25 12:02:22 +00:00
Prawal Sharma	11e12a715a	Fix typo: 'seperator' to 'separator' in variable names (#40389 ) Fixed 4 instances of the typo "seperator" → "separator" in variable names: - 2 instances in src/transformers/models/shieldgemma2/convert_shieldgemma2_weights_orbax_to_hf.py - 2 instances in src/transformers/models/gemma3/convert_gemma3_weights_orbax_to_hf.py These typos were in variable names used for parsing path components in weight conversion scripts. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-08-25 11:56:30 +00:00
Cyril Vallez	40299134a8	Fix CI (hunyuan moe does not support fullgraph) (#40423 ) fix flag	2025-08-25 12:01:28 +02:00
Olumayowa Akinkuehinmi	a2b37bfd58	Fix typo: 'casual' -> 'causal' in code and documentation (#40371 ) (#40407 )	2025-08-25 09:32:15 +00:00
Joao Gante	0031c044f8	[docs] flax/jax purge (#40372 ) flax/jax purge	2025-08-25 10:25:00 +01:00
Du Wenjie	14b89fed24	fix to accept cumulative_seqlens from TransformersKwargs in FA (#40194 ) * fix to the typings which are unmatched to FA function signature cumulative_seqlens_q/k -> cu_seq_lens_q/k: - in the FlashAttentionKwargs in modeling_flash_attention_utils - in the TransformersKwargs in generic - in the PagedAttentionArgs in continuous_batching It is BC, because they are created in `ContinuousBatchProcessor.setup_static_tensors:L762`, used in `ContinuousBatchingManager._model_forward:L1233` and destroyed with `ContinuousBatchProcessor` * format changes by ruff * Update src/transformers/integrations/flash_paged.py unused function arg in `PagedAttentionCache.update` Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * revert continuous_batching signiture, which is more meaningful --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>	2025-08-25 11:00:13 +02:00
Pablo Montalvo	ba095d387d	🧹 🧹 🧹 Get set decoder cleanup (#39509 ) * simplify common get/set * remove some noise * change some 5 years old modeling utils * update examples * fix copies * revert some changes * fixes, gah * format * move to Mixin * remove smolvlm specific require grad * skip * force defaults * remodularise some stuff * remodularise more stuff * add safety for audio models * style * have a correct fallback, you daft donkey * remove this argh * change heuristic for audio models * fixup * revert * this works * this should be explicit * fix Nth ESM exception * tryout decoder * this as well * revert again * 🧠 * aaah ESM has two modelings aaah * broom broom * format * wrong copies * copies * modular cleanups * format * modularities * wrong mergefix * seriously * align with new model * new model	2025-08-25 10:57:56 +02:00
Cyril Vallez	2c55c7fc94	Reactivate a lot of tests skipped for no reason anymore (#40378 ) * reactivate all the tests * some tests still failing	2025-08-25 10:44:43 +02:00
Yih-Dar	4f9b4e62bc	Run FA2 tests in CI (#40397 ) up Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-23 12:30:18 +02:00
Quentin Gallouédec	28ca27cb2b	HF papers in doc (#40381 ) * HF papers * clean * Update src/transformers/models/gemma3n/configuration_gemma3n.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * style --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-22 15:07:08 -07:00
tardc	7d88f57fc6	Update README_zh-hans.md (#40380 ) Fix a typo.	2025-08-22 18:22:26 +00:00
Cyril Vallez	29ddcacea3	Rework the Cache documentation (#40373 ) * start working the doc * remove gemma2 * review	2025-08-22 17:06:28 +02:00
Matt	dab66f15a1	Chat Template Doc Fixes (#40173 ) * draft commit * draft commit * Fixup chat_extras too * Update conversations.md * Update the toctree and titles * Update the writing guide! * Use @zucchini-nlp's suggestion * Update docs/source/en/conversations.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/conversations.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/conversations.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-22 15:48:33 +01:00
amd-lalithnc	0a21e870c7	Bug Fix: Dynamically set return_lse flag in FlexAttention (#40352 ) * bug fix - return_lse dynamically set * addressed compatibility with return type - flex_attention_forward * rename variables * revert changes to commits	2025-08-22 13:49:26 +00:00
Abdelrahman Kaseb	894b2d84b6	Add GptOssForTokenClassification for GPT-OSS models (#40190 ) * Add GptOssForTokenClassification for GPT-OSS models * After run make fixup	2025-08-22 15:14:46 +02:00
Fazzie	56d68c6706	Addiing ByteDance Seed Seed-OSS (#40272 ) add seed oss	2025-08-22 14:54:28 +02:00
Yonghye Kwon	8a6908c10d	fix(example): align parameter names with the latest function definition for gdino (#40369 )	2025-08-22 12:27:58 +00:00
Raushan Turganbay	7db228a92a	[configuration] allow to overwrite kwargs from subconfigs (#40241 ) allow to overwrite kwargs from subconfigs	2025-08-22 13:31:25 +02:00
Raushan Turganbay	19ffe0219d	[processor] move commonalities to mixin (#40339 ) * move commonalities to mixin * revert - unrelated * fix copies * fix style * comments	2025-08-22 13:04:43 +02:00
Cyril Vallez	d8f6d3790a	⚠️⚠️ Use `dtype` instead of `torch_dtype` everywhere! (#39782 ) * update everywhere * style * pipelines * switch it everywhere in tests * switch it everywhere in docs * switch in converters everywhere * update in examples * update in model docstrings * style * warnings * style * Update configuration_utils.py * fix * Update configuration_utils.py * fixes and add first test * add pipeline tests * Update test_pipelines_common.py * add config test * Update test_modeling_common.py * add new ones * post rebase * add new * post rebase adds	2025-08-22 12:34:16 +02:00
Joao Gante	9c25820978	[pipelines] add support to `skip_special_tokens` in the main text generation pipelines (#40356 ) * add support to skip_special_tokens in pipelines * add test * rm redundant	2025-08-22 10:12:46 +00:00
Raushan Turganbay	5c40e7a225	Change multimodal data links to HF hub (#40309 ) change multimodal data links to HF hub	2025-08-22 11:50:04 +02:00
Rémi Ouazan	e018b77c89	wav2vec2 fixes (#40341 ) * Changed datasets to avoid a datasets error * Changed back split to test	2025-08-22 11:32:29 +02:00
Isotr0py	d7fe3111ff	Fix idefics3 vision embeddings indices dtype (#40360 ) fix idefics3 vision embeddings Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-22 11:10:45 +02:00
yjc9696	cf487cdf1f	HunYuan opensource (#39606 ) * merge opensource_hunyuan * add head_dim * fix assertion error * fix seen_tokens * ready_for_upstream (merge request !17) Squash merge branch 'ready_for_upstream' into 'main' * fix configuration type&docstring * fix style * ready_for_upstream (merge request !18) Squash merge branch 'ready_for_upstream' into 'main' * add doc * fix testcode * fix configuration type&docstring * rename base model * remove assert * update * remove tiktoken * update * fix moe and code style (#3) * update * fix format * update * revert makefile * fix moe config * fix numel() * remove prepare_inputs_for_generation * fix kv_seq_len * add docs/toctree * remove unused paramter&add licence * add licence * remove unused paramter * fix code * dense modular update import fix fix use mistralmodel fix qknorm add sliding_window make style fix dense done hunyuan moe fix import fix modular fixup fixup * update model path * fix mlp_bias * fix modular * Fix modeling (#5) * fix attention * use llamamodel * fix code * Fix qk (#6) * fix qk_norm * fix * fix modual * Fix moe (#7) * fix some moe code * fix einsum * try top1 * use top1 * Fix rotary (#8) * fix rotary * fix modeling * fix modular * fix testcode * remove A13B unit test * Fix moe v1 (#9) fix moe & gate * Fix gate norm (#10) * add norm_topk_prob * Fix testcase (#11) * fix&skip test * Fix testcase (#12) * skip testcase * Fix norm topk (#13) * hardcode norm_topk_prob * fix testcase --------- Co-authored-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: Mingji Han <mingjihan@tencent.com>	2025-08-22 07:59:58 +00:00
Huzaifa Jawad	8365f70e92	DOCS: Clarification on the use of `label_names` as an argument to TrainingArguments (#40353 ) * Update trainer.md * Update trainer.md Removed the detail about label_names argument usage from the tip/ warning section * Update training_args.py Added the label_names usage clarification in the docstring * Update trainer.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-21 17:19:04 -07:00
Yao Matrix	7c1169e21f	[4/N]more docs to device agnostic (#40355 ) * more docs to device agnostic Signed-off-by: YAO Matrix <matrix.yao@intel.com> * more Signed-off-by: YAO Matrix <matrix.yao@intel.com> * 1 Signed-off-by: YAO Matrix <matrix.yao@intel.com> * 2 Signed-off-by: YAO Matrix <matrix.yao@intel.com> * Update vitpose.md * Update camembert.md * Update camembert.md --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-08-21 10:22:26 -07:00
Joao Gante	9568b506ed	[generate] handle support for cache classes when num enc layers != num dec layers (#40277 ) * handle support for cache classes when num enc layers != num dec layers * handle overwrites * one more corner case * Update src/transformers/generation/utils.py * Update src/transformers/generation/utils.py * Apply suggestions from code review * handle corner case :o	2025-08-21 17:35:18 +01:00
Ákos Hadnagy	7f38068ae0	Qwen2.5-VL test fixes for ROCm (#40308 )	2025-08-21 18:13:07 +02:00
Anton Vlasjuk	cb1df4d26a	[`FA`] Fix some model tests (#40350 ) * fix * cleanup, revert aimv2 fa changes * fix aria * i searched a long time but the cross dependency is for the recent models so... * this was something... evolla * fix modernbert decoder + make fa test more robust * nit	2025-08-21 18:08:21 +02:00
Yuanyuan Chen	f46f29dd7c	Remove more PyTorch 2.2 compatible code (#40337 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-08-21 15:19:53 +00:00
Aaron Keesing	128f42d370	[detection] use consistent dtype for Conditional and DAB DETR positional embeddings (#40300 ) fix: use consistent dtype for sine positional embeddings	2025-08-21 15:49:56 +01:00

1 2 3 4 5 ...

20193 Commits