HuggingFace_transformer

Author	SHA1	Message	Date
SUMIN	ccb1d06ecf	Convert binary/image/model files to Git LFS pointers Some checks failed Secret Leaks / trufflehog (push) Has been cancelled Details	2026-04-11 01:50:39 +09:00
SUMIN	e4b809e5b2	Add Git LFS tracking for binary/model/image files	2026-04-11 01:45:26 +09:00
ssum21	e52c5890d1	add_toctree.yml	2025-08-30 15:57:15 +09:00
SSUM	b80c173b8f	Update docs/source/ko/model_doc/deepseek_v3.md Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>	2025-08-27 18:54:00 +09:00
SSUM	15b4988bb7	Update docs/source/ko/model_doc/deepseek_v3.md Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>	2025-08-27 18:53:52 +09:00
SSUM	231653db22	Merge branch 'main' into ko-deepseek_v3.md	2025-08-27 13:54:56 +09:00
Yih-Dar	ff8b88a948	Fix nightly torch CI (#40469 ) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-26 22:02:15 +02:00
Yih-Dar	74ad608a2b	Not to shock AMD team by the cancelled workflow run notification ❤️ 💖 (#40467 )	2025-08-26 20:53:24 +02:00
SowmiyaNarayanan G	c8c7623f20	Update SegFormer model card (#40417 ) * Update SegFormer model card * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/segformer.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update the segformer model card * Remove quantization example --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-26 08:27:25 -07:00
StevenBucaille	78f32c3917	[pipeline] Add Keypoint Matching pipeline (#39970 ) * feat: keypoint-matcher pipeline * docs: added keypoint-matcher pipeline in docs * fix: added missing statements for repo consistency * docs: updated SuperGlue, LightGlue and EfficientLoFTR docs * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * test: fixed run_pipeline_test * update pipeline typing and docs * update tests * update docs snippets * Fix import error * fix: pipeline init * pt framework --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-08-26 15:26:57 +01:00
Joao Gante	6451294f6f	[RoPE] explicit factor > implicit factor in YaRN (#40320 ) explicit factor > implicit factor	2025-08-26 14:58:28 +01:00
audioXD	5a8ba87ecf	[fast_image_processor] fix image normalization for resize (#40436 )	2025-08-26 13:49:51 +00:00
VED	0ce6709e70	deci gguf support (#38669 ) * deci gguf support * make style * tests for deci * try except removed * style * try except removed	2025-08-26 13:43:17 +00:00
Matt	263d06fedc	Fix extra template loading (#40455 ) * Fix extra template loading * Reformat * Trigger tests	2025-08-26 14:01:01 +01:00
Pedro Cuenca	58cebc848b	flash_paged: s_aux may not exist (#40434 ) Some implementations (i.e., https://huggingface.co/kernels-community/vllm-flash-attn3) support an `s_aux` arg for attention sinks, but others (https://huggingface.co/kernels-community/flash-attn) do not. If s_aux is present in the kwargs, we forward it, otherwise we don't. The user will still get an error if they use a model like gpt-oss-20b with an implementation that does not support `s_aux`, but models that don't use it won't error out. For example, [this is currently failing](`399cd5c04b/examples/pytorch/continuous_batching.py (L16)`) because we are sending `s_aux: None` in the dict.	2025-08-26 13:15:59 +02:00
Rémi Ouazan	34108a2230	Continuous batching refactor (#40426 ) * Rework of the CB example * Further rework of CB example * Refactor PA cache, slice on tokens, add debug prints -- WIP * Slice cache -- WIP * Added a mechanism to check batched outputs in CB script * Less logging, debug flag for slice, !better reset! -- WIP * QOL and safety margins * Refactor and style * Better saving of cb example * Fix * Fixes and QOL * Mor einformations about metrics * Further logging * Style * Licenses * Removed some comments * Add a slice input flag * Fix in example * Added back some open-telemetry deps * Removed some aux function * Added FA2 option to example script * Fixed math (all of it) * Added a simple example * Renamed core to classes * Made allocation of attention mask optionnal * Style	2025-08-26 13:01:42 +02:00
Manuel de Prada Corral	49e168ff08	🚨 Remove Contrastive Search decoding strategy (#40428 ) * delete go brrr * fix tests * review	2025-08-26 12:31:46 +02:00
Rémi Ouazan	b8184b7ce9	Make cache_config not mandatory (#40316 ) * Relaxed assumptions on cache_config * Review compliance * Style * Styyyle * Removed default and added args * Rebase mishapfix * Propagate args to TorchExportableModuleForDecoderOnlyLM * Fix the test I wanted fixed in this PR * Added some AMD expectation related to cache tests	2025-08-26 12:06:17 +02:00
Yao Matrix	32fcc24667	rename get_cuda_warm_up_factor to get_accelerator_warm_up_factor (#40363 ) Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-08-26 09:56:35 +00:00
Raushan Turganbay	f690a2a1e0	[video processors] decode only sampled videos -> less RAM and faster processing (#39600 ) * draft update two models for now * batch update all VLMs first * update some more image processors * update * fix a few tests * just make CI green for now * fix copies * update once more * update * unskip the test * fix these two * fix torchcodec audio loading * maybe * yay, i fixed torchcodec installation and now can actually test it * fix copies deepseek * make sure the metadata is returrned when users request it * add docs * update * fixup * Update src/transformers/audio_utils.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/glm4v/video_processing_glm4v.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * update * what if we set some metadata attr to `None` * fix CI * fix one test * fix 4 channel test * fix glm timestemps * rebase gone wrong * raise warning once * fixup * typo * fix copies * ifx smolvlm test * this is why torch's official benchmark was faster, set threads to `0` * Apply style fixes --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-08-26 11:38:02 +02:00
Xin Yao	64ae6e6b1d	fix qwen25-vl grad acc (#40333 ) * fix qwen25—vl grad acc * fix Qwen2_5_VLForConditionalGeneration for accepts_loss_kwargs * fix ci * fix ci * fix typo * fix CI	2025-08-26 09:30:06 +00:00
Kashif Rasul	6d2bb1e04d	[Trainer] accelerate contextparallel support in trainer (#40205 ) * initial context_parallel_size support in trainer * For context parallelism, use AVG instead of SUM to avoid over-accounting tokens * use parallelism_config.cp_enabled * add parallelism_config to trainer state * warn when auto-enabling FSDP * fix some reviews * WIP: somewhat matching loss * Feat: add back nested_gather * Feat: cleanup * Fix: raise on non-sdpa attn * remove context_parallel_size from TrainingArguments * if we have parallelism_config, we defer to get_state_dict from accelerate * fix form review * Feat: add parallelism config support * Chore: revert some unwanted formatting changes * Fix: check None * Check none 2 * Fix: remove duplicate import * Update src/transformers/trainer.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Fin * require accerelate 1.10.1 and higer --------- Co-authored-by: S1ro1 <matej.sirovatka@gmail.com> Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-08-26 09:28:48 +00:00
Pavel Iakubovskii	63caaea1fb	Refactor ViT-like models (#39816 ) * refactor vit * fix * fixup * turn off FX tests * AST * deit * dinov2 * dinov2_with_registers * dpt * depth anything (nit) * depth pro (nit) * ijepa * ijepa (modular) * prompt_depth_anything (nit) * vilt (nit) * zoedepth (nit) * videomae * vit_mae * vit_msn * vivit * yolos * eomt * vitpose * update auto backbone * disable `fx` and export tests (dnov2, dpt, ijepa, vit, vitpose) * fix kwargs for backbone * fix * convnext * fixup * update convnext layernorm * fix-copies layer_norm * convnextv2 * explicit output_hidden_states for models with backbones * explicit hidden states collection for dinov2 * tests fixed * fix DPT as well * fix dinov2 with registers * add comment	2025-08-26 11:14:06 +02:00
Yih-Dar	922e65b3fc	Fix non FA2 tests after FA2 installed in CI docker image (#40430 ) * up * up * up * up * up * up * up * up * up * up * up * up * up --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-26 10:36:50 +02:00
ivarflakstad	e68146fbe7	Fix collated reports model name entry (#40441 ) Some checks failed Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled Details Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled Details Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled Details Build documentation / build (push) Has been cancelled Details Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled Details Self-hosted runner (push-caller) / Check if setup was changed (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details Update Transformers metadata / build_and_package (push) Has been cancelled Details Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Has been cancelled Details Self-hosted runner (push-caller) / build-docker-containers (push) Has been cancelled Details Self-hosted runner (push-caller) / Trigger Push CI (push) Has been cancelled Details Check Tiny Models / Check tiny models (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled Details Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled Details Nvidia CI / Setup (push) Has been cancelled Details Nvidia CI / Model CI (push) Has been cancelled Details Nvidia CI / Torch pipeline CI (push) Has been cancelled Details Nvidia CI / Example CI (push) Has been cancelled Details Nvidia CI / Trainer/FSDP CI (push) Has been cancelled Details Nvidia CI / DeepSpeed CI (push) Has been cancelled Details Nvidia CI / Quantization CI (push) Has been cancelled Details Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled Details Doctests / Setup (push) Has been cancelled Details Doctests / Call doctest jobs (push) Has been cancelled Details Doctests / Send results to webhook (push) Has been cancelled Details Stale Bot / Close Stale Issues (push) Has been cancelled Details	2025-08-25 20:36:01 +00:00
Ákos Hadnagy	8ce633cc75	InternVL MI325 test expectations (#40387 ) * Adjust ROCm expectations * MI355 --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>	2025-08-25 22:00:35 +02:00
ivarflakstad	7637d298b3	Fix collated reports uploading (#40440 )	2025-08-25 21:49:59 +02:00
id01	fa59cf9c9f	Fix https://github.com/huggingface/transformers/issues/40292 (#40439 ) * Fix https://github.com/huggingface/transformers/issues/40292 * Trigger tests --------- Co-authored-by: Matt <rocketknight1@gmail.com>	2025-08-25 20:12:57 +01:00
ivarflakstad	f0e87b436d	Fix collated reports model directory traversal (#40437 ) Fix model dir traversal	2025-08-25 18:01:58 +00:00
Ákos Hadnagy	ef406902bf	Gemma3 text fixes: Add expectations for MI325 (#40384 ) * Add expectations for MI325 * Ruff * Adjust CUDA expectations as well * Another attempt for CUDA expectations	2025-08-25 19:57:50 +02:00
Judy	c81723d31b	🌐 [i18n-KO] Translated `models.md` to Korean (#39518 ) * docs: ko: models.md * feat: nmt draft * fix: manual edits * Resolved _toctree.yaml conflict during merge from main * Apply suggestions from code review Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Apply suggestions from code review * fix: update toctree * Update docs/source/ko/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-25 09:17:08 -07:00
ivarflakstad	6b5eab70e4	Remove working-dir from collated reports job (#40435 )	2025-08-25 18:14:35 +02:00
Joao Gante	1763ef2951	[docs] remove last references to `transformers` TF classes/methods (#40429 ) * halfway through tasks * complete * Update utils/check_docstrings.py	2025-08-25 16:30:59 +01:00
Olumayowa Akinkuehinmi	eac4f00bdf	Fix typo and improve GPU kernel check error message in MXFP4 quantization (#40349 ) (#40408 ) Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-08-25 15:21:55 +00:00
Joshua Chin	d8f2edcc46	Add `tokenizer_kwargs` argument to the text generation pipeline (#40364 ) * Add `tokenizer_kwargs` arg to text generation pipeline. * chore: re-run CI * Rename `tokenizer_kwargs` to `tokenizer_encode_kwargs` for text generation pipeline * Fix `tokenizer_encode_kwargs` doc string. * Fix note related to `tokenizer _kwargs` in text generation pipeline --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-08-25 15:21:19 +00:00
ivarflakstad	1a35d07f56	Update collated reports working directory and --path (#40433 )	2025-08-25 15:18:26 +00:00
Cyril Vallez	399cd5c04b	Fix modular for modernbert-decoder (#40431 ) * fix the modular * CI	2025-08-25 16:50:49 +02:00
Manuel de Prada Corral	ea8d9c8f06	🚨 Remove DoLa decoding strategy (#40082 ) * remove dola generation strategy * add fast test	2025-08-25 16:33:27 +02:00
Arthur	6bf6f8490c	[`Mxfp4`] Add a way to save with a quantization method (#40176 ) * add a test * tempdir * fix import issue[ * wow I am tired * properly init * i am not super familiar with quantizer api :\| * set to TRUE fro now * full support * push current changes * will clean this later but the imports are a shitshow here * this correctly saves the block and scales but forward seems broken * quanitze was not correct * fix storage * why were bias even included * finally! * style * fix style * remove print * lazy import * up * not sure what happens this works now? * holy molly it was not so far * okay this seems to work! * workings!!! * allow save_pretrained to create PR * Apply suggestions from code review * fixup * add deqyabtze fakse as wek * working new * fix * rm swizzle and unswizzle during saving * rm print * Update src/transformers/modeling_utils.py * fix * style --------- Co-authored-by: Marc Sun <marc@huggingface.co>	2025-08-25 16:27:19 +02:00
Andrew Chauzov	04c2bae3a8	Fix label smoothing incompatibility with multi-label classification (#40296 ) * Fix label smoothing incompatibility with multi-label classification (#40258) * Improve label smoothing multi-label check based on reviewer feedback - Move check from LabelSmoother to Trainer.__init__() for better architecture - Use model.config.problem_type instead of tensor inference for robustness - Warn and disable smoothing instead of raising error for better UX - Update test to verify warning behavior	2025-08-25 14:23:31 +00:00
Raushan Turganbay	3b5b9f6518	Fix processing tests (#40379 ) * fix tests * skip failing test in generation as well * grounding dino was overwritten * one more overwritten code * clear comment	2025-08-25 14:50:54 +02:00
jiqing-feng	a0a37b3250	Gpt oss optim (#40304 ) * enable fast index selecting Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update model Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix gpt-oss tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix check tensor Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-08-25 14:36:33 +02:00
ρrαnαm	d73181b3fc	Fix UnboundLocalError in WER metric computation (#40402 ) Renamed wer metric variable to wer_metric to avoid naming conflict with local variable assignment in compute_metrics function. Co-authored-by: pranam-gf <pranam@goodfin.com>	2025-08-25 12:02:22 +00:00
Prawal Sharma	11e12a715a	Fix typo: 'seperator' to 'separator' in variable names (#40389 ) Fixed 4 instances of the typo "seperator" → "separator" in variable names: - 2 instances in src/transformers/models/shieldgemma2/convert_shieldgemma2_weights_orbax_to_hf.py - 2 instances in src/transformers/models/gemma3/convert_gemma3_weights_orbax_to_hf.py These typos were in variable names used for parsing path components in weight conversion scripts. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-08-25 11:56:30 +00:00
Cyril Vallez	40299134a8	Fix CI (hunyuan moe does not support fullgraph) (#40423 ) fix flag	2025-08-25 12:01:28 +02:00
Olumayowa Akinkuehinmi	a2b37bfd58	Fix typo: 'casual' -> 'causal' in code and documentation (#40371 ) (#40407 )	2025-08-25 09:32:15 +00:00
Joao Gante	0031c044f8	[docs] flax/jax purge (#40372 ) flax/jax purge	2025-08-25 10:25:00 +01:00
Du Wenjie	14b89fed24	fix to accept cumulative_seqlens from TransformersKwargs in FA (#40194 ) * fix to the typings which are unmatched to FA function signature cumulative_seqlens_q/k -> cu_seq_lens_q/k: - in the FlashAttentionKwargs in modeling_flash_attention_utils - in the TransformersKwargs in generic - in the PagedAttentionArgs in continuous_batching It is BC, because they are created in `ContinuousBatchProcessor.setup_static_tensors:L762`, used in `ContinuousBatchingManager._model_forward:L1233` and destroyed with `ContinuousBatchProcessor` * format changes by ruff * Update src/transformers/integrations/flash_paged.py unused function arg in `PagedAttentionCache.update` Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * revert continuous_batching signiture, which is more meaningful --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>	2025-08-25 11:00:13 +02:00
Pablo Montalvo	ba095d387d	🧹 🧹 🧹 Get set decoder cleanup (#39509 ) * simplify common get/set * remove some noise * change some 5 years old modeling utils * update examples * fix copies * revert some changes * fixes, gah * format * move to Mixin * remove smolvlm specific require grad * skip * force defaults * remodularise some stuff * remodularise more stuff * add safety for audio models * style * have a correct fallback, you daft donkey * remove this argh * change heuristic for audio models * fixup * revert * this works * this should be explicit * fix Nth ESM exception * tryout decoder * this as well * revert again * 🧠 * aaah ESM has two modelings aaah * broom broom * format * wrong copies * copies * modular cleanups * format * modularities * wrong mergefix * seriously * align with new model * new model	2025-08-25 10:57:56 +02:00
Cyril Vallez	2c55c7fc94	Reactivate a lot of tests skipped for no reason anymore (#40378 ) * reactivate all the tests * some tests still failing	2025-08-25 10:44:43 +02:00
Yih-Dar	4f9b4e62bc	Run FA2 tests in CI (#40397 ) up Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-23 12:30:18 +02:00
Quentin Gallouédec	28ca27cb2b	HF papers in doc (#40381 ) * HF papers * clean * Update src/transformers/models/gemma3n/configuration_gemma3n.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * style --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-22 15:07:08 -07:00
tardc	7d88f57fc6	Update README_zh-hans.md (#40380 ) Fix a typo.	2025-08-22 18:22:26 +00:00
Cyril Vallez	29ddcacea3	Rework the Cache documentation (#40373 ) * start working the doc * remove gemma2 * review	2025-08-22 17:06:28 +02:00
Matt	dab66f15a1	Chat Template Doc Fixes (#40173 ) * draft commit * draft commit * Fixup chat_extras too * Update conversations.md * Update the toctree and titles * Update the writing guide! * Use @zucchini-nlp's suggestion * Update docs/source/en/conversations.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/conversations.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/conversations.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-22 15:48:33 +01:00
amd-lalithnc	0a21e870c7	Bug Fix: Dynamically set return_lse flag in FlexAttention (#40352 ) * bug fix - return_lse dynamically set * addressed compatibility with return type - flex_attention_forward * rename variables * revert changes to commits	2025-08-22 13:49:26 +00:00
Abdelrahman Kaseb	894b2d84b6	Add GptOssForTokenClassification for GPT-OSS models (#40190 ) * Add GptOssForTokenClassification for GPT-OSS models * After run make fixup	2025-08-22 15:14:46 +02:00
Fazzie	56d68c6706	Addiing ByteDance Seed Seed-OSS (#40272 ) add seed oss	2025-08-22 14:54:28 +02:00
Yonghye Kwon	8a6908c10d	fix(example): align parameter names with the latest function definition for gdino (#40369 )	2025-08-22 12:27:58 +00:00
Raushan Turganbay	7db228a92a	[configuration] allow to overwrite kwargs from subconfigs (#40241 ) allow to overwrite kwargs from subconfigs	2025-08-22 13:31:25 +02:00
Raushan Turganbay	19ffe0219d	[processor] move commonalities to mixin (#40339 ) * move commonalities to mixin * revert - unrelated * fix copies * fix style * comments	2025-08-22 13:04:43 +02:00
Cyril Vallez	d8f6d3790a	⚠️⚠️ Use `dtype` instead of `torch_dtype` everywhere! (#39782 ) * update everywhere * style * pipelines * switch it everywhere in tests * switch it everywhere in docs * switch in converters everywhere * update in examples * update in model docstrings * style * warnings * style * Update configuration_utils.py * fix * Update configuration_utils.py * fixes and add first test * add pipeline tests * Update test_pipelines_common.py * add config test * Update test_modeling_common.py * add new ones * post rebase * add new * post rebase adds	2025-08-22 12:34:16 +02:00
Joao Gante	9c25820978	[pipelines] add support to `skip_special_tokens` in the main text generation pipelines (#40356 ) * add support to skip_special_tokens in pipelines * add test * rm redundant	2025-08-22 10:12:46 +00:00
Raushan Turganbay	5c40e7a225	Change multimodal data links to HF hub (#40309 ) change multimodal data links to HF hub	2025-08-22 11:50:04 +02:00
Rémi Ouazan	e018b77c89	wav2vec2 fixes (#40341 ) * Changed datasets to avoid a datasets error * Changed back split to test	2025-08-22 11:32:29 +02:00
Isotr0py	d7fe3111ff	Fix idefics3 vision embeddings indices dtype (#40360 ) fix idefics3 vision embeddings Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-22 11:10:45 +02:00
yjc9696	cf487cdf1f	HunYuan opensource (#39606 ) * merge opensource_hunyuan * add head_dim * fix assertion error * fix seen_tokens * ready_for_upstream (merge request !17) Squash merge branch 'ready_for_upstream' into 'main' * fix configuration type&docstring * fix style * ready_for_upstream (merge request !18) Squash merge branch 'ready_for_upstream' into 'main' * add doc * fix testcode * fix configuration type&docstring * rename base model * remove assert * update * remove tiktoken * update * fix moe and code style (#3) * update * fix format * update * revert makefile * fix moe config * fix numel() * remove prepare_inputs_for_generation * fix kv_seq_len * add docs/toctree * remove unused paramter&add licence * add licence * remove unused paramter * fix code * dense modular update import fix fix use mistralmodel fix qknorm add sliding_window make style fix dense done hunyuan moe fix import fix modular fixup fixup * update model path * fix mlp_bias * fix modular * Fix modeling (#5) * fix attention * use llamamodel * fix code * Fix qk (#6) * fix qk_norm * fix * fix modual * Fix moe (#7) * fix some moe code * fix einsum * try top1 * use top1 * Fix rotary (#8) * fix rotary * fix modeling * fix modular * fix testcode * remove A13B unit test * Fix moe v1 (#9) fix moe & gate * Fix gate norm (#10) * add norm_topk_prob * Fix testcase (#11) * fix&skip test * Fix testcase (#12) * skip testcase * Fix norm topk (#13) * hardcode norm_topk_prob * fix testcase --------- Co-authored-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: Mingji Han <mingjihan@tencent.com>	2025-08-22 07:59:58 +00:00
Huzaifa Jawad	8365f70e92	DOCS: Clarification on the use of `label_names` as an argument to TrainingArguments (#40353 ) * Update trainer.md * Update trainer.md Removed the detail about label_names argument usage from the tip/ warning section * Update training_args.py Added the label_names usage clarification in the docstring * Update trainer.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-21 17:19:04 -07:00
Yao Matrix	7c1169e21f	[4/N]more docs to device agnostic (#40355 ) * more docs to device agnostic Signed-off-by: YAO Matrix <matrix.yao@intel.com> * more Signed-off-by: YAO Matrix <matrix.yao@intel.com> * 1 Signed-off-by: YAO Matrix <matrix.yao@intel.com> * 2 Signed-off-by: YAO Matrix <matrix.yao@intel.com> * Update vitpose.md * Update camembert.md * Update camembert.md --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-08-21 10:22:26 -07:00
Joao Gante	9568b506ed	[generate] handle support for cache classes when num enc layers != num dec layers (#40277 ) * handle support for cache classes when num enc layers != num dec layers * handle overwrites * one more corner case * Update src/transformers/generation/utils.py * Update src/transformers/generation/utils.py * Apply suggestions from code review * handle corner case :o	2025-08-21 17:35:18 +01:00
Ákos Hadnagy	7f38068ae0	Qwen2.5-VL test fixes for ROCm (#40308 )	2025-08-21 18:13:07 +02:00
Anton Vlasjuk	cb1df4d26a	[`FA`] Fix some model tests (#40350 ) * fix * cleanup, revert aimv2 fa changes * fix aria * i searched a long time but the cross dependency is for the recent models so... * this was something... evolla * fix modernbert decoder + make fa test more robust * nit	2025-08-21 18:08:21 +02:00
Yuanyuan Chen	f46f29dd7c	Remove more PyTorch 2.2 compatible code (#40337 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-08-21 15:19:53 +00:00
Aaron Keesing	128f42d370	[detection] use consistent dtype for Conditional and DAB DETR positional embeddings (#40300 ) fix: use consistent dtype for sine positional embeddings	2025-08-21 15:49:56 +01:00
Joao Gante	2121d09239	[serve] add cors warnings (#40112 ) * add cors warnings * Update src/transformers/commands/serving.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/commands/serving.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review * make fixup --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-08-21 14:32:36 +01:00
Eric Bezzam	b40b834ab1	Clean up XCodec and other codecs (#40348 ) * Clean up xcodec addition. * Clean up config. * Switch to fixtures test. * Small stuff. * Polish XCodec and standardize across codecs. * Update src/transformers/models/xcodec/modeling_xcodec.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Format and fix test. * Update tol. --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>	2025-08-21 15:32:00 +02:00
Michele Corazza	75aa7c7252	[ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification (#35991 ) * [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification * fix the modular conversion	2025-08-21 15:16:03 +02:00
Pablo Montalvo	04b751f07d	Fix attention vizualizer (#40285 ) * make visualizer rely on create causal mask * format * fixup * fixup * read token * read token, duh * what is up with that token * small tests? * adjust * try with flush * normalize for ANSI * buffer shenanigans	2025-08-21 13:13:35 +00:00
cyn	1e1db12304	(small) fix conditional for input_ids and input_embeds in marian (#40045 ) * (small) fix conditional for input_ids and input_embeds in marian * address comment	2025-08-21 15:13:14 +02:00
Yih-Dar	7f2f53424e	Update `test_spm_converter_bytefallback_warning` (#40284 ) fff Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-21 14:09:28 +02:00
Ákos Hadnagy	11a49dd9e3	T5 test and target device fixes (#40313 ) * Fix cache setup related issues * Fix target-device-related issues * Ruff * Address review comments	2025-08-21 14:07:29 +02:00
Eddie Tsai	c4513a9fe6	Fix links in Glm4vMoe configuration classes to point to the correct H… (#40310 ) * Fix links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository * run fixup to update links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository	2025-08-21 11:42:53 +00:00
Elad Segal	c7e6f9a485	Fix an infinite loop bug in recursive search of relative imports (#40326 ) Fix bug in recursive search of relative imports	2025-08-21 11:39:43 +00:00
wirthual	e95441bdb5	add type hints (#40319 ) * add basic type hints to import module * run make fixup * remove optional * fixes --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-08-21 12:19:59 +01:00
Tom Aarsen	5c88d8fbcc	Fix: Only call Trainer.align_special_tokens if model has "config" attribute (#40322 ) * Only call Trainer.align_special_tokens if model has "config" attribute * Add efficient test for training a model without model.config * Reformat	2025-08-21 12:06:42 +01:00
Joao Gante	c031f6f994	[docs] remove TF references from `/en/model_doc` (#40344 ) * models up to F * models up to M * all models	2025-08-21 11:53:21 +01:00
Yuanyuan Chen	7b060e5eb7	Add missing arguments to class constructors (#40068 ) * Add missing arguments Signed-off-by: cyy <cyyever@outlook.com> * Fix typos Signed-off-by: cyy <cyyever@outlook.com> * More fixes Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-08-21 10:22:38 +00:00
Cyril Vallez	6ad7f29461	Fix deprecation warning version (#40343 ) fix	2025-08-21 12:18:23 +02:00
Abdelrahman Kaseb	adf84aec21	Add DeepseekV3ForSequenceClassification for Deepseek V3 models (#40200 ) * Add Sequence Classification Support for Deepseek v3 model DeepseekV3ForSequenceClassification * After run make fixup	2025-08-21 12:01:33 +02:00
Yuanyuan Chen	1e2e28f3c8	Change Qwen2RMSNorm to RMSNorm from PyTorch (#40066 ) * Unify Qwen2RMSNorm definitions and use RMSNorm from PyTorch Signed-off-by: cyy <cyyever@outlook.com> * subclass RMSNorm Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-08-21 11:58:35 +02:00
Yuekai Zhang	022af24fcc	Fix qwen-omni processor text only mode (#40336 ) * Fix qwen-omni processor text only mode * remove try except --------- Co-authored-by: yuekaiz <yuekaiz@mgmt1-login.cm.cluster>	2025-08-21 11:57:32 +02:00
Joao Gante	c99ed492c7	[docs] remove flax references from `/en/model_doc` (#40311 ) * 1st commit * all models up to D * all models up to G * all models up to M * all remaining models	2025-08-21 10:52:54 +01:00
Cyril Vallez	c2e3cc24e0	Fix chunked attention mask with left-padding (#40324 ) * add fix * add test * raise proper warning for older versions * fix * fix and add 2nd test * fix for flex and torch 2.5	2025-08-21 10:52:49 +02:00
Cyril Vallez	242bb2cafc	One cache class to rule them all (#40276 ) * remove all classes * fix generate * start replacing everywhere * finish removing everywhere * typo * typo * fix * typo * remove num_layers=1 * CI * fix all docstrings * review * style	2025-08-20 19:36:11 +02:00
ivarflakstad	1054494dd6	Update notification service amd_daily_ci_workflows definition (#40314 )	2025-08-20 17:49:46 +02:00
Eon Kim	139cd91713	Fix: Apply `get_placeholder_mask` in Ovis2 (#40280 ) * Refactor special image mask * Refactor get_placeholder_mask method * Revert "Refactor special image mask" This reverts commit 9eb1828ae930329656d6f323a510c5e6033e1f85. * Fix * Revert "Refactor get_placeholder_mask method" This reverts commit 07aad6484bb08d6351d5b605e9db574d28edcd15.	2025-08-20 17:12:10 +02:00
Yih-Dar	5d906740d2	Update CI with nightly torch workflow file (#40306 ) * fix nightly ci * Apply suggestions from code review Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>	2025-08-20 16:59:00 +02:00
Arthur	4977ec2ae8	[`GPT OSS`] Refactor the tests as it was not properly checking the outputs (#40288 ) * it was long due! * use the official kernel * more permissive * update the kernel as well * mmm should it be this? * up pu * fixup * Update test_modeling_gpt_oss.py * style * start with 20b	2025-08-20 16:47:41 +02:00
Yih-Dar	3b7230124b	No more `natten` (#40287 ) get rid off natten Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-20 16:10:15 +02:00
Matt	2df0c323cb	byebye torch 2.1 (#40317 ) * Bump minimum torch version to 2.2 * Remove is_torch_greater_or_equal_than_2_2 * update versions table * Deprecate is_torch_sdpa_available (except for backward compat), remove require_torch_sdpa	2025-08-20 15:03:46 +01:00
Rishub Tamirisa	c50f140be2	Add back `_tp_plan` attribute (#39944 ) * Update modeling_utils.py * make sure we update with the module's plan * use public api * oups * update * fix failing test * Update src/transformers/integrations/tensor_parallel.py * Update src/transformers/integrations/tensor_parallel.py * fix * make the API more friendly! * fix tests * fix styling --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-08-20 15:29:55 +02:00
Ákos Hadnagy	a97213d131	Qwen2.5-Omni test fixes (#40307 ) Updated expectations, and mp tests	2025-08-20 14:48:30 +02:00
Duc-Viet Hoang	ca543f822f	Add support for Florence-2 (#38188 ) * init * add modular * fixup * update configuration * add processing file * update auto files * update * update modular * green setup_and_quality ci * it works * fix some tests * commit florence2 * update test * make test cases done - 16 left * style * fix few test cases * fix some tests * fix init test * update florence2 vision style * hope is green * fix init test * fix init * update modular * refactor vision module * fix: channel attention use dynamic scale * update modular * update * update attention mask * update * fix naming * Update src/transformers/models/florence2/processing_florence2.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * spatial block works * more beautiful * more more beautiful * merge main * merge main and fixup * fix typing hint * update modeling * fix eager matches sdpa * fix style * fix compile test - all green * remove florence2 language * remove Florence2LanguageModel things * fix style * update florence2 model * override prepare encoder_decoder for generation * add weight conversion script * rewrite channel attention to use sdpa * eleminate 1 tranpose op * support fa2 * fix quality check * chore: reformat `test_modeling_florence2.py` * some refactor for processor * some refactor for processor * update naming convention and remove BC * make it pass the test * fix: correct Embedding Cosine * update comments and docstring * support input_embeds * support input embeds ideally * fix style * fix style * fix style again :D * add test prcoessor * refactor processor and add test for processor * reformat test processor * make fixup * fix schema check * remove image_token * ensure image token in tokenizer and fix integration tests * fix processor test * add more integration tests for large model and rename test_processor to test_processing * test_assisted_decoding_sample should pass * update doc and make model work with image text to text pipeline * docs: add sdpa bagde * resolve cyril's comments * fix import torch error * add helper get_placeholder_mask * inherit from llava * florence2 may not _supports_attention_backend because of bart ... * move florence2 model card to multimodal * let base model always return_dict * fix style * tiny update doc * set _checkpoint_conversion_mapping = {} * fix code quality * support flex and compile graph and move external func to internal func * remove condition because it always true * remove window funcs * move post processor config out * fix ci * new intro to trigger test * remove `kernel_size` argument --------- Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-08-20 14:28:06 +02:00
Matt	959239debc	Remove unnecessary contiguous calls for modern torch (#40315 )	2025-08-20 12:24:14 +00:00
Anton Vlasjuk	7d2aa5d6e6	🚨 [`Flash Attention`] Fix sliding window size (#40163 ) * swa fix * add comment, make fix symmetrical * modify fa inference test to force swa correctness check * fixup comment	2025-08-20 14:23:14 +02:00
MilkClouds	3128db6927	chore: fix typo in `find_executable_batch_size` to match new 0.9 ratio (#40206 )	2025-08-20 12:18:06 +00:00
Manny Cortes	ca0aaa8c74	[`fix`] Pass adamw optimizer parameters to StableAdamW (#40184 ) * fix: pass adamw optimizer parameters to StableAdamW * add test for stable_adamw initialization with trainer arguments * address copilot suggestion * fix: update weight_decay handling in stable_adamw kwargs --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-08-20 11:52:23 +00:00
Isotr0py	a01f38b364	Fix GOT-OCR2 and Cohere2Vision image processor patches caculation (#40312 ) fix got-ocr patches caculation Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-20 13:13:58 +02:00
Anuraag (Rag) Agrawal	a5f0b505a0	Remove OTel SDK dependencies (#40305 )	2025-08-20 12:31:44 +02:00
Eric Bezzam	d0f1a6ec36	Clean up X-Codec. (#40271 ) * Clean up xcodec addition. * Clean up config. * Switch to fixtures test. * Small stuff.	2025-08-20 12:16:28 +02:00
Joao Gante	da9452a592	[docs] delete more TF/Flax docs (#40289 ) * delete some TF docs * update documentation checks to ignore tf/flax * a few more removals * nit * Update utils/check_repo.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-08-20 10:44:14 +01:00
Anton Vlasjuk	a4e1fee44d	[`FA`] Fix dtype in varlen with position ids (#40295 ) fix	2025-08-20 11:15:55 +02:00
Yih-Dar	126bc03b4e	Allow to be able to run `torch.compile` tests with `fullgraph=True` (#40164 ) * fix * address comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-20 10:42:33 +02:00
NielsRogge	1d46091737	Add MetaCLIP 2 (#39826 ) * First draft * Make fixup * Use eos_token_id * Improve tests * Update clip * Make fixup * Fix processor tests * Add conversion script * Update docs * Update tokenization_auto * Make fixup * Use check_model_inputs * Rename to lowercase * Undo CLIP changes * Address comment * Convert all checkpoints * Update auto files * Rename checkpoints	2025-08-20 09:25:43 +02:00
Yao Matrix	0f9c9088d0	[3/3] make docs device agnostic, all en docs for existing models done (#40298 ) docs to device agnostic cont. Signed-off-by: Yao, Matrix <matrix.yao@intel.com>	2025-08-19 21:01:27 -07:00
Yao Matrix	eaa48c81e9	make model docs device agnostic (2) (#40256 ) * doc cont. Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * more models Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update docs/source/en/quicktour.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quicktour.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quicktour.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quicktour.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update mixtral.md --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-19 13:10:03 -07:00
Ákos Hadnagy	42fe769928	SmolVLM test fixes (#40275 ) * Fix SmolVLM tests * Add the proper CUDA expectations as well * Split 'A10 and A100 expectations * Ruff --------- Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com>	2025-08-19 21:22:06 +02:00
Ákos Hadnagy	4c017465bd	Adjust ROCm test output expectations (#40279 ) Adjust ROCm output expectations	2025-08-19 21:21:45 +02:00
Nemitha Wijerathna	0f9ce43687	Standardize BertGeneration model card (#40250 ) * Standardize BertGeneration model card: new format, usage examples, quantization * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply reviewer feedback: update code examples * Add missing code example --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-19 11:22:13 -07:00
Quentin Gallouédec	6ceb13fb22	SmolVLM and InternVL: Ensure pixel values are converted to the correct dtype for fp16/bf16 (#40121 ) * Ensure pixel values are converted to the correct dtype for fp16/bf16 * add to modular	2025-08-19 10:39:08 -07:00
Ahnjj_DEV	92f40da608	Update model card for gpt neox japanese (#39862 ) * Update GPT-NeoX-Japanese model card * Apply suggestions from code review * Update gpt_neox_japanese.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-19 09:18:46 -07:00
Mehul Aggarwal	3a4b2756cf	docs: Update TrOCR model card to new format (#40240 ) * docs: Update TrOCR model card to new format * Updated Sugegestions	2025-08-19 09:17:45 -07:00
Aayush Shah	46d38546f3	Standardize RAG model card (#40222 ) * Standardize RAG model card Update rag.md to follow the new Hugging Face model card template: - Added friendly overview in plain language - Added pipeline and AutoModel usage examples - Included quantization example with BitsAndBytesConfig - Added notes and resources sections - Removed abstract and FlashAttention badge * Standardize RAG model card Update rag.md to follow the new Hugging Face model card template: - Added friendly overview in plain language - Added AutoModel usage example - Included quantization example with BitsAndBytesConfig	2025-08-19 09:16:10 -07:00
Jin-Ho Lee	bd96e1e1cc	docs(layoutlm): add missing `id=usage` to `<hfoptions>` tag in LayoutLM model card (#40273 ) docs(layoutlm): add missing 'id=usage' to <hfoptions> tag in LayoutLM model card	2025-08-19 09:14:43 -07:00
Robin Ede	8636b309e6	Fix chat CLI GPU loading and request_id validation issues (#40230 ) (#40232 ) * Fix chat CLI GPU loading and request_id validation issues (#40230) This commit addresses two critical bugs in the transformers chat CLI: 1. GPU Loading Issue: Changed default device from "cpu" to "auto" in ChatArguments - Chat CLI now automatically uses GPU when available instead of defaulting to CPU - Matches the behavior of the underlying serving infrastructure 2. Request ID Validation Error: Added request_id field to TransformersCompletionCreateParamsStreaming schema - Fixes "Unexpected keys in the request: {'request_id'}" error on second message - Allows request_id to be properly sent and validated by the server Both fixes target the exact root causes identified in issue #40230: - Users will now get GPU acceleration by default when available - Chat sessions will no longer break after the second message * Remove unrelated request_id field from TransformersCompletionCreateParamsStreaming	2025-08-19 15:33:44 +00:00
Arthur	bebeccb06a	fix which routing method (#40283 )	2025-08-19 16:35:13 +02:00
Tyler Zhu	249d7c6929	Update image_processing_perception_lm_fast.py to allow for proper override of vision_input_type (#40252 ) * Update image_processing_perception_lm_fast.py Allow for a proper override of vision_input_type in hf fast image processor, otherwise we need to resort to manually setting the attribute. * Update processing_perception_lm.py to match kwargs vision input type * Update image_processing_perception_lm_fast.py kwargs to signature args	2025-08-19 11:41:27 +00:00
r0	57bb6db6ee	Skipping pytree registration in case fsdp is enabled (#40075 ) * Skipping pytree registration in case fsdp is enabled * Beauty changes * Beauty changes * Moved the is_fsdp_available function to import utils * Moved is_fsdp_available to integrations.fsdp * Skipping pytree registration in case fsdp is enabled * Beauty changes * Beauty changes * Moved the is_fsdp_available function to import utils * Moved is_fsdp_available to integrations.fsdp * Added pytree registration inside dynamic cache class * Making ci/cd lords happy * Adding a check if DynamicCache is already a leaf * Adding try/catch for multiple initializations of DynamicCache in test suites * Moving dynamic cache pytree registration to executorch * Adding try catch back	2025-08-19 11:58:05 +02:00
tic-top	5b3b7ea472	Add Kosmos-2.5 (#31711 ) Add Microsoft Kosmos-2.5 --------- Co-authored-by: kirp@umich.edu <tic-top> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-19 11:56:03 +02:00
nnul	c93594e286	[detection] fix correct `k_proj` weight and bias slicing in D-FINE (#40257 ) Fix: correct k_proj weight and bias conversion in D-FINE	2025-08-19 09:44:37 +00:00
Raushan Turganbay	2f1a8ad4ba	Fix setting attention for multimodal models (#39984 ) * fix * use non-explicit `None` * keep previously set attn if exists	2025-08-19 11:35:11 +02:00
Cyril Vallez	a2e76b908b	🚨🚨 Switch default compilation to fullgraph=False (#40137 ) * switch default * docstring * docstring * rework tests and remove outdated restrictions * simplify * we need a check for static cache * fix * rename var * fix * revert * style * rename test	2025-08-19 11:26:22 +02:00
Jack	2b59207a72	Fix slow static cache export tests (#40261 )	2025-08-19 11:24:07 +02:00
Matteo Destro	56c44213b3	[detection] fix attention mask for RT-DETR-based models (#40269 ) * Fix get_contrastive_denoising_training_group attention * Add bool attention_mask conversion	2025-08-19 09:15:56 +00:00
BakerBunker	5d9a715e30	set inputs_embeds to None while generate to avoid audio encoder forward in generation process (#40248 ) * set inputs_embeds to None while generate to avoid audio encoder forward in generation process * set input_features to none instead --------- Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>	2025-08-19 08:45:57 +00:00
ivarflakstad	28746cdc7b	Remove MI300 CI (#40270 ) Remove MI300 CI (in history if we need it back)	2025-08-19 08:23:39 +00:00
Raushan Turganbay	debc92e60a	Skip broken tests (#40157 ) skip these tests	2025-08-19 10:04:08 +02:00
rafakatri	6b5bd11723	docs: Update OLMo model card (#40233 ) * Updated OLMo model card * Update OLMo description Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix typo Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix cli typo Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix cli example Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Add bitsandbytes info Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-18 13:35:39 -07:00
Ákos Hadnagy	e472efb9ac	Fix benchmark workflow (#40254 ) Correct init_db.sql path Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com>	2025-08-18 18:14:16 +00:00
Pavlo Fesenko	59862209ca	Correct typo and update notes in docs Readme (#40234 ) * Correct typo and update notes in docs readme * Update docs/README.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/README.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-18 10:31:12 -07:00
Sahil Kabir	a7eabf1dde	Model card for NLLB (#40074 ) * initializing branch and draft PR * updated model card .md file * minor * minor * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * resolving comments + adding visuals * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * NllbTokenizerFast and NllbTokenizer added * endline * minor * Update nllb.md --------- Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-18 10:05:59 -07:00
akug	01c03bf4ee	fix: Catch correct ConnectionError for additional_chat_templates (#39874 ) * fix: Catch correct ConnectionError for additional_chat_templates * fix: don't catch timeout * fix: formatting	2025-08-18 17:25:47 +01:00
Rémi Ouazan	2bcf9f6c7e	Fixes for EncoderDecoderCache (#40008 ) * Add expectation to t5 for rocm 9.4 * Made EncoderDecoderCache compatible with nn.DataParallel * Fixed t5gemma EncoderDecoderCache * Added todos in autoformer * Ruff * Init is self-contained * Review compliance * Fixed kwargs init of EncoderDecoderCache	2025-08-18 17:51:05 +02:00
Anton Vlasjuk	aa45824919	[`CI`] Fix repo consistency (#40249 ) * fix * doc --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-08-18 17:32:17 +02:00
Joao Gante	d6fad86d23	[serve] guard imports (#39825 ) guard imports	2025-08-18 16:28:10 +01:00
MQY	7a0ba0d7d8	[typing] fix type annotation error in DepthPro model image processor (#40238 ) * fix type annotation error in DepthPro model image processor * fix * run make fix-copies	2025-08-18 15:42:13 +01:00
Thomas Børstad	00b4dfb786	Add `chat_template` (`jinja2`) as an extra dependency (#40128 ) * add jinja2 as a dependency * Make jinja2 a core dependency in install_requires - Add jinja2 to install_requires list in setup.py for automatic installation - Add jinja2 to runtime version checks in dependency_versions_check.py - Resolves issue where pip install transformers doesn't install jinja2 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Make jinja2 a core dependency in install_requires * Make jinja2 an extra dependency instead of adding a core dep --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-08-18 14:31:40 +00:00
Peter St. John	f417a1aad4	remove transpose_for_scores call in ESM-2 (#40210 ) * remove transpose_for_scores call Signed-off-by: Peter St. John <pstjohn@nvidia.com> * fix copied evolla code Signed-off-by: Peter St. John <pstjohn@nvidia.com> --------- Signed-off-by: Peter St. John <pstjohn@nvidia.com>	2025-08-18 14:28:59 +00:00
Manuel de Prada Corral	a36d51e801	🚨 Always return Cache objects in modelings (to align with generate) (#39765 ) * watch the world burn * fix models, pipelines * make the error a warning * remove kwargs and return_legacy_cache * fix reformer	2025-08-18 16:26:35 +02:00
Yuanyuan Chen	57e230cdb2	Fix more pylint warnings (#40204 ) Fix pylint warnings Signed-off-by: cyy <cyyever@outlook.com>	2025-08-18 14:17:16 +00:00
Eon Kim	47938f8f8d	Add Ovis2 model and processor implementation (#37088 ) * Add Ovis2 model and processor implementation * Apply style fixes * Add unit tests for Ovis2 image processing and processor * Refactor image processing functions for clarity and efficiency * Add Ovis2 ImageProcessorFast * Refactor Ovis2 code * Refactor Ovis2 model components and update processor functionality * Fix repo consistency issues for Ovis2: docstring, config cleanup * Update Ovis2 model integration tests * Update Ovis2 configuration and processing classes for improved documentation * Remove duplicate entry for 'ovis2' in VLM_CLASS_NAMES * Fix conflict * Fix import order * Update image processor class names * Update Ovis2 model structure * Refactor Ovis2 configuration * Fix typos * Refactor Ovis2 model classes and remove unused code * Fix typos * Refactor Ovis2 model initialization * Fiix typos * Remove Ovis2 model mapping from MODEL_MAPPING_NAMES in modeling_auto.py * Add license and update type hints * Refactor token function and update docstring handling * Add license * Add Ovis2 model support and update documentation * Refactor Ovis2 model structure and enhance multimodal capabilities * Update Ovis2 weight mapping for consistency and clarity in key patterns * Remove unused 'grids' parameter from Ovis2 model and Update processing logic to handle image grids more efficiently. * Refactor Ovis2 model test structure to include Ovis2Model * Add optional disable_grouping param to Ovis2ImageProcessorFast * Refactor type hints in Ovis2 modules * Add licensing information in Ovis2 modules and tests * Refactor Ovis2 model by removing unused methods * Refactor Ovis2 model tests by renaming test classes and removing skipped tests * Refactor Ovis2 model output classes * Refactor Ovis2 weight conversion and Update model embedding classes * Refactor Ovis2 model imports and remove unused functions * Enhance vision configuration extraction in Ovis2 weight conversion * Refactor Ovis2 model's forward method to remove interpolation option * Update Ovis2 model documentation * Refactor Ovis2 model input handling and tokenizer configuration * Update return type hints in Ovis2 model * Remove commented-out code * fix config for tests and remove key mappings * Update tokenizer configuration to use add_special_tokens method * skip torchscript * Fix image placeholder generation in Ovis2Processor * Refactor Ovis2 model to rename visual_table to visual_embeddings_table * Enhance Ovis2 model by adding vision_feature_select_strategy parameter * Refactor Ovis2 model weights conversion and architecture * Refactor Ovis2 model by removing vision_feature_select_strategy parameter * Update Ovis2 model examples * Refactor Ovis2 model * Update Ovis2 model * Update Ovis2 model configuration * Refactor Ovis2 model test setup * Refactor flash attention support * Refactor * Fix typo * Refactor * Refactor model classes * Update expected output in Ovis2 * Refactor docstrings * Fix * Fix * Fix * Update input in tests * Fix * Fix get_decoder method * Refactor * Refactor Ovis2 * Fix * Fix * Fix test * Add get_placeholder_mask * Refactor Ovis2 model tests * Fix * Refactor * Fix * Fix * Fix Ovis2 test --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-08-18 16:05:49 +02:00
ivarflakstad	2fe43376cd	AMD scheduled CI ref env file (#40243 ) * Reference env-file to be used in docker running the CI * Disable MI300 CI for now	2025-08-18 15:23:27 +02:00
nnul	e4bd2c858d	Fix ESM token_dropout crash when using inputs_embeds instead of input_ids (#40181 ) * fix: Error after calling ESM model with input embeddings not input ids * propagate changes to other models	2025-08-18 13:22:10 +00:00
Yuanyuan Chen	6333eb986a	Fix more typos (#40212 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-08-18 12:52:12 +00:00
Yoni Gozlan	e5886f9194	[SAM 2] Change checkpoints in docs and tests (#40213 ) * change checkpoints in docs and tests * add notebook	2025-08-18 11:21:34 +02:00
Luo Xiaochuan	eb2f9da096	fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function (#40130 ) * fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com> * fix similar errer at qwen2_vl and do make fix-copies Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com> * pass in kwargs for loss_func at qwen2_vl and qwen2_5_vl Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com> * Apply style fixes --------- Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-08-18 08:59:25 +00:00
Rohit Jena	6ce8f05375	Use correct `model_input_names` for PixtralImageProcessor (#40226 ) add image_sizes to model_input_names	2025-08-18 08:06:52 +00:00
Yih-Dar	2914ceca20	Revert "Pin torch to 2.7.1 on CircleCI for now" + Final fix for `too long with no output` (#40201 ) * Revert "Pin torch to 2.7.1 on CircleCI for now (#40174)" This reverts commit `31b6e6e1da`. * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-18 08:40:53 +02:00
Jin-Ho Lee	cd22550692	docs: Update LayoutLM model card according to new standardized format (#40129 ) * docs: Update LayoutLM model card with standardized format * Apply suggestions from code review This commit incorporates all suggestions provided in the recent review. Further changes will be committed separately to address remaining comments. Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Address remaining review comments * Address few more review comments: 1. remove transformer-cli section 2. put resources after notes 3. change API refs to 2nd level header * Update layoutlm.md * Update layoutlm.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-15 09:33:47 -07:00
Daniel Han	05000aefe1	Fix GPT-OSS `swiglu_limit` not passed in for MXFP4 (#40197 ) Add swiglu_limit = 7.0	2025-08-15 17:04:25 +02:00
Manal ML	3f4c85fef0	Add X-Codec model (#38248 ) * add working x-codec * nit * fix styling + copies * fix docstring * fix docstring and config attribute * Update args + config * update convertion script * update docs + cleanup * Ruff fix * fix doctrings	2025-08-15 16:24:12 +02:00
Ákos Hadnagy	29e4e35927	Benchmarking improvements (#39768 ) * Start revamping benchmarking * Start refactoring benchmarking * Use Pandas for CSV * import fix * Remove benchmark files * Remove sample data * Address review comments	2025-08-15 15:59:11 +02:00
Ajeet Verma	de437d0d7a	Update: add type hints to check_tokenizers.py (#40094 ) * Update check_tokenizers.py chore(typing): add type hints to check_tokenizers script - Annotate params/returns for helper functions - Keep tokenizer instances as `Any` to avoid runtime coupling - Make `check_LTR_mark` return `bool` explicitly (no behavior change) * Update check_tokenizers.py chore(typing): replace Any with PreTrainedTokenizerBase in check_tokenizers.py - Use transformers.tokenization_utils_base.PreTrainedTokenizerBase for `slow` and `fast` params - Covers both PreTrainedTokenizer and PreTrainedTokenizerFast - Exposes required methods (encode, decode, encode_plus, tokenize) - Removes generic Any typing while staying implementation-agnostic	2025-08-15 12:41:28 +00:00
Yuanyuan Chen	28a03fb78a	Fix various Pylint warnings (#40107 ) Tidy code Signed-off-by: cyy <cyyever@outlook.com>	2025-08-15 12:40:12 +00:00
Yuanyuan Chen	ec85d2c44f	Avoid CUDA stream sync (#40060 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-08-15 12:37:15 +00:00
Yuanyuan Chen	c7afaa5b44	Remove _prepare_flash_attention_from_position_ids (#40069 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-08-15 12:35:03 +00:00
Yuanyuan Chen	c167faa081	Fix typos (#40175 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-08-15 12:10:26 +00:00
Cyril Vallez	5068fcd9a8	Add repr to EncoderDecoderCache (#40195 ) * add repr * oups	2025-08-15 12:57:49 +02:00
Cyril Vallez	421175685d	Fix fsdp for generic-task models (#40191 ) * remove abc inheritance * add fast test	2025-08-15 12:28:16 +02:00
Ferdinand Mom	4912d5b490	fix to avoid modifying a view in place (#40162 ) * fix to avoid modifying a view in place * add backward test in tensor parallel * add test to test_modelig_gpt_oss.py * linting	2025-08-15 10:30:49 +02:00
Yao Matrix	cc9997878a	make model doc device agnostic (#40143 ) * make model doc device agnostic Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update align.md * Update aya_vision.md * Update byt5.md * refine Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update granitevision.md * Update src/transformers/pytorch_utils.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * add doc Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * 3 more Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-14 23:31:31 -07:00
Christopher Akiki	85fce2e54c	[MINOR:TYPO] Update base.py (#40169 ) * [MINOR:TYPO] Update base.py All other occurrences in the docs use lowercase. (https://github.com/search?q=repo%3Ahuggingface%2Ftransformers%20translation_XX_to_YY&type=code) Also, using uppercase doesn't work: tested with "translation_EN_to_FR" which doesn't work and instead returns: `ValueError: The task does not provide any default models for options ('EN', 'FR')` It might be a good idea to allow for uppercase, but that's for another issue. * [MINOR:TYPO] Update __init__.py	2025-08-14 22:53:57 -07:00
Raushan Turganbay	52c6c1bb6e	Update dynamic attnt setter for multimodals (#39908 ) * update * fix the test for DepthPro * PR comments * wait, I didn't delete this in prev commit? * fix * better way --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-08-14 21:46:13 +02:00
Yih-Dar	31b6e6e1da	Pin torch to 2.7.1 on CircleCI for now (#40174 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-14 20:19:35 +02:00
MAHIR DAIYAN	b02f2d8b6a	Add dates to the model docs (#39320 ) * added dates to the models with a single hf papers link * added the dates for models with multiple papers * half of no_papers models done * rest of no_papers models also done, only the exceptions left * added copyright disclaimer to sam_hw, cohere, cohere2 + dates * some more fixes, hf links + typo * some new models + a rough script * the script looks robust, changed all paper links to hf * minor change to handle technical reports along with blogs * ran make fixup to remove the white space * refactor	2025-08-14 10:08:46 -07:00
Eshwanth Karti T R	8a658ac119	Standardize BARTpho model card: badges, new examples, fixed broken im… (#40051 ) * Standardize BARTpho model card: badges, new examples, fixed broken image section, and links (#36979)Update bartpho.md * Update bartpho.md Removed non-required/unsupported sections: Quantization, Attention visualizer, and Resources (plus stray tokenizer header). Added code snippets which were suggested * Update bartpho.md Updated with necessary tags * Update bartpho.md * Update bartpho.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-14 09:55:27 -07:00
Yuefeng	2b6cbedeb2	Add GptOssForSequenceClassification for GPT-OSS models (#40043 ) * Add GptOssForSequenceClassification * Tiny fix * make fixup * trigger CI rerun * Check config type instead --------- Co-authored-by: Yuefeng Zhan <yuefzh@microsoft.com>	2025-08-14 18:32:14 +02:00
Sarah Floris	b834cb8138	build: Add fast image processor tvp (#39529 ) * build: add TvpImageProcessorFast - Introduced TvpImageProcessorFast to enhance image processing capabilities. - Updated image processing auto registration to include the new fast processor. - Modified tests to accommodate both TvpImageProcessor and TvpImageProcessorFast, ensuring comprehensive coverage for both classes. * fix: TvpImageProcessorFast with new resize method and update processing logic * build: add TvpImageProcessorFast * refactor: clean up whitespace and formatting in TvpImageProcessorFast and related tests - Removed unnecessary whitespace and ensured consistent formatting in image_processing_tvp_fast.py. - Updated import order in test_image_processing_tvp.py for clarity. - Minor adjustments to maintain code readability and consistency. * fix: Enhance TvpFastImageProcessorKwargs and update documentation - Added TvpFastImageProcessorKwargs class to define valid kwargs for TvpImageProcessorFast. - Updated the documentation in tvp.md to include the new class and its parameters. - Refined the image processing logic in image_processing_tvp_fast.py for better handling of padding and resizing. - Improved test cases in test_image_processing_tvp.py to ensure compatibility with the new processing logic and tensor inputs. * fix: tested now with python 3.9 * fix: remove tvp kwargs from docs * simplify processing * remove import and fix tests --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-08-14 15:48:18 +00:00
Pavel Iakubovskii	6f259bc83e	Fix docs typo (#40167 ) * DINOv3 model * working version * linter revert * linter revert * linter revert * fix init * remove flex and add convert to hf script * DINOv3 convnext * working version of convnext * adding to auto * Dinov3 -> DINOv3 * PR feedback * complete convert checkpoint * fix assertion * bf16 -> fp32 * add fast image processor * fixup * change conversion script * Use Pixtral attention * minor renaming * simplify intermediates capturing * refactor DINOv3ViTPatchEmbeddings * Refactor DINOv3ViTEmbeddings * [WIP] rope: remove unused params * [WIP] rope: rename period -> inv_freq for consistency * [WIP] rope: move augs * change inv_freq init (not persistent anymore) * [WIP] rope: move coords to init * rope - done! * use default LayerScale * conversion: truncate expected outputs * remove commented code * Refactor MLP layers * nit * clean up config params * nit docs * simplify embeddings * simplify compile compat lru_cache * fixup * dynamic patch coords * move augmentation * Fix docs * fixup and type hints * fix output capturing * fix tests * fixup * fix auto mappings * Add draft docs * fix dtype cast issue * add push to hub * add image processor tests * fixup * add modular * update modular * convert and test convnext * update conversion script * update prefix * Update LayerNorm * refactor DINOv3ConvNextLayer * rename * refactor convnext model * fix doc check * fix docs * fix convnext config * tmp fix for check docstring * remove unused arg * fix tests * (nit) change init * standardize gated MLP * clear namings and sat493m * fix tensors on different devices * revert linter * pr * pr feedbak ruff format * missing headers * fix code snippet and collection link in docs * DINOv3 description * fix checkpoints in tests * not doc fixes in configs * output_hidden_states * x -> features * remove sequential --------- Co-authored-by: Cijo Jose <cijose@meta.com>	2025-08-14 17:29:53 +02:00
Zhen	41980ce93e	[bugfix] fix flash-attention2 unavailable error for Ascend NPU (#40151 ) * [bugfix] fix flash-attention2 unavailable error for Ascend NPU * remove redundant apply_rotary_emb usage * fix ruff check error * pad_input and unpad_input use same implementation as fa2 * rollback redundant codes * fix ruff check error * optimize fa2 judgement logic	2025-08-14 14:21:39 +02:00
Cyril Vallez	eba1d62091	[FA2] Fix it finally - revert fa kwargs preparation (#40161 ) revert	2025-08-14 13:39:11 +02:00
Quentin Gallouédec	1c5d2f7fb6	Replace `self.tokenizer` by `self.processing_class` (#40119 )	2025-08-14 13:24:55 +02:00
Kashif Rasul	cfe52ff4db	[Continous Batching] set head_dim when config.head_dim is None (#40159 ) * set head_dim when config.head_dim is None * use model's actual TP setting	2025-08-14 13:23:27 +02:00
Manuel de Prada Corral	c47544b16f	Fix CI: Use correct import in SAM for torchvision InterpolationMode (#40160 ) fix ci	2025-08-14 10:53:23 +00:00
StevenBucaille	22e89e5385	[efficientloftr] fix bugs and follow original cross attn implementation strictly (#40141 ) * fix: changed is_causal to be False * fix: Added original cross attention bug * fix: fixed the way bordel removal is computed * fix: added missing normalization on coarse features * test: fixed integration tests --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-08-14 10:42:59 +01:00
Raushan Turganbay	252364fd8e	[Cohere2Vision] remove unused arg (#40103 ) * remove unused arg * remove the arg from test as well	2025-08-14 09:10:25 +00:00
Guillaume LEGENDRE	e446372f76	Create self-scheduled-amd-mi355-caller.yml (#40134 )	2025-08-14 01:33:45 +02:00
Sai-Suraj-27	be1ab5103f	Update Dockerfiles to install packages inside a virtual environment (#39098 ) * Removed un-necessary virtual environment creation in Dockerfiles. * Updated Dockerfiles to install packages in a virtual environment. * use venv's python * update * build and trigger * trigger * build and trigger * build and trigger * build and trigger * build and trigger * build and trigger * build and trigger * update * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-13 23:51:52 +02:00
Yih-Dar	591708d9ce	Add pytest marker: `torch_compile_test` and `torch_export_test` (#39950 ) * new marker * trigger CI * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-13 23:47:15 +02:00
Cyril Vallez	12e49cda32	Fix quantized cache with only cache_implementation in generate (#40144 ) * fix args * comment	2025-08-13 23:21:41 +02:00
임승섭	e651ae0a32	🌐 [i18n-KO] Translated `gemma3.md` to Korean (#39865 ) * docs: ko: gemma3.md * feat: nmt draft * fix: manual edits * fix: resolve suggestions Co-authored-by: Chaewon Song <chaewon1019@ewhain.net> * fix: resolve suggestions --------- Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>	2025-08-13 13:25:20 -07:00
Anil-Red	0f9c2595cd	updated visualBERT modelcard (#40057 ) * updated visualBERT modelcard * fix: Review for VisualBERT card	2025-08-13 12:47:32 -07:00
Cyril Vallez	412c9c3030	Remove an old badly designed test (#40142 ) remove it	2025-08-13 20:47:00 +02:00
Steven Liu	eb5768a86e	[docs] Fix ko toctree (#40138 ) Update _toctree.yml	2025-08-13 11:24:58 -07:00
Sangbum Daniel Choi	68a13cd4a6	Add Segment Anything 2 (SAM2) (#32317 ) * initial comment * test * initial conversion for outline * intermediate commit for configuration * chore:init files for sam2 * adding arbitary undefined config * check * add vision * make style * init sam2 base model * Fix imports * Linting * chore:sam to sam2 classes * Linting * Add sam2 to models.__init__ * chore:match prompt encoder with sam2 code * chore:prepare kwargs for mask decoder * Add image/video predictors * Add CUDA kernel * Add output classes * linting * Add logging info * tmp commit * docs for sam2 * enable image processing * check difference of original SAM2 - difference is the order of ToTensor() - please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize * enable promptencoder of sam2 * fix promprencoder * Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference) * Confirmed that ImageEncoder is exactly same (Be aware the linting of init) * Confirmed that MaskDecoder is exactly same (TO DO: lint variable name) * SamModel is now available (Need more chore for name) * make fix-copies * make style * make CI happy * Refactor VisionEncoder and PostioinEmbedding * TO DO : fix the image_embeddings and sparse_embeddings part * pure image inference done * reusable features fix and make style * styling * refactor memoryattention * tmp * tmp * refactor memoryencoder TO DO : convert and inference the video pipeline * TO DO : fix the image_encoder shape * conversion finish TO DO: need to check video inference * make style * remove video model * lint * change * python utils/check_docstringspy --check_all * python utils/check_config_attributes.py * remove copies for sam2promptencoder due to configuration * change __init__.py * remove tensorflow version * fix that to not use direct comparison * make style * add missing import * fix image_embedding_size * refactor Sam2 Attention * add fully working video inference (refactoring todo) * clarify _prepare_memory_conditioned_features * simplify modeling code, remove unused paths * use one model * use auto_docstring * refactor rope embeddings * nit * not using multimask when several points given * add all sam2.1 * add video tmp * add Sam2VideoSessionState + fast image proc + video proc * remove init_states from model * fix batch inference * add image integration tests * uniformize modeling code with other sam models and use modular * pass vision tests an most model tests * All tests passing * add offloading inference state and video to cpu * fix inference from image embedding and existing mask * fix multi_boxes mask inference * Fix batch images + batch boxes inference * improve processing for image inference * add support for mask generation pipeline * add support for get_connected_components post processing in mask generation * add fast image processor sam, image processor tests and use modular for sam2 image processor * fix mistake in sam after #39120 * fix init weights * refactor convert * add integration tests for video + other improvements * add needed missing docstrings * Improve docstrings and * improve inference speed by avoiding cuda sync * add test * skip test for vision_model * minor fix for vision_model * fix vision_model by adding sam2model and change the torch dependencies * remove patch_size * remove image_embedding_size * fix patch_size * fix test * make style * Separate hieradet and vision encoder in sam2 * fixup * review changes part 1 * remove MemoryEncoderConfig and MemoryAttentionConfig * pass q_stride instead of q_pool module * add inference on streamed videos * explicitely process streamed frames * nit * Improve docstrings in Sam2Model * update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel * improve video inference api * change inference_state to inference_session * use modular for Sam2Model * fix convert sam2 hf * modular * Update src/transformers/models/sam2/video_processing_sam2.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix minor config * fix attention loading error * update modeling tests to use hub checkpoints * Use CI A10 runner for integration tests values + higher tolerance for video integration tests * PR review part 1 * fix doc * nit improvements * enforce one input format for points, labels and boxes * nit * last few nits from PR review * fix style * fix the input type * fix docs * add sam2 model as conversion script * improve sam2 doc * nit fixes + optimization * split sam2 and sam2_video in two models * PR review part 1 * fix None for default slow processor of sam2 * remove unecessary code path in sam2_video * refactor/simplify RoPE * replace embedding module list with embedding matrix * fix tests * remove kernel * nit * use lru_cache for sine_pos_embeddings * reorder sam2_video methods * simplify sam2_video * PR review part 1 * simplify sam2 video a lot * more simplification * update integration tests with updated conftest * more explicit config for hieradet * do post_processing outside of sam2 video model * Improve Sam2VideoVisionRotaryEmbedding * fix tests * update docs and fix mask2former/oneformer * avoid unnecessary reshapes/permute * fix device concatenating points * small dtype fix * PR review * nit * fix style and finish up doc * fix style * fix docstrings * fix modular --------- Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com> Co-authored-by: Haitham Khedr <haithamkhedr@meta.com> Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-08-13 14:18:05 -04:00
Cyril Vallez	25ad9c8c92	Fix Janus (#40140 ) fix	2025-08-13 20:12:21 +02:00
Arthur	bec6926696	gpt oss is important (#40139 )	2025-08-13 19:49:54 +02:00
Lee SuJung	ab9108517a	🌐 [i18n-KO] Translated `pipelines.md` to Korean (#39577 ) * docs: ko: pipelines.md * feat: gpt draft * Update docs/source/ko/main_classes/pipelines.md Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * Update docs/source/ko/main_classes/pipelines.md Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * Update docs/source/ko/main_classes/pipelines.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ko/main_classes/pipelines.md Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * Update docs/source/ko/main_classes/pipelines.md Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * Update _toctree.yml * Update _toctree.yml 번역 문서 수정 * Update pipelines.md ToC 수정 * Update pipelines.md --------- Co-authored-by: xhaktm <tnwjd318@hs.ac.kr> Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-13 10:26:17 -07:00
Yoni Gozlan	20c6b478cd	🚨 Use lru_cache for sine pos embeddings MaskFormer (#40007 ) * use lru_cache for sine pos embeddings maskformer * fix calls to pos embed * change maxsize to 1	2025-08-13 17:05:22 +00:00
HyunSang Jang	6b728f1830	🌐 [i18n-KO] Translated grounding-dino.md to Korean (#39861 ) * docs: ko: grounding-dino.md * feat: nmt draft * fix: manual edits * Update docs/source/ko/model_doc/grounding-dino.md Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com> * Update docs/source/ko/model_doc/grounding-dino.md Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com> * Update docs/source/ko/model_doc/grounding-dino.md Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com> * docs: add AP explanation for better readability --------- Co-authored-by: TaskerJang <bymyself103@naver.com> Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-08-13 10:01:05 -07:00
chaeyoung kim	127e33f759	🌐 [i18n-KO] Translated `optimizers.md` to Korean (#40011 ) * docs: ko: optimizers.md * feat: optimizers draft * fix: manual edits * docs: ko: update optimizers.md * Update docs/source/ko/optimizers.md Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com> * Update docs/source/ko/optimizers.md Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com> * Update docs/source/ko/optimizers.md Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com> * docs: ko: final updates to optimizers and toctree --------- Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com> Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com>	2025-08-13 10:00:47 -07:00
Taemin Park	ac52c77a66	🌐 [i18n-KO] Translated `gpt2.md` to Korean (#39808 ) * docs: ko: bamba.md * feat: nmt draft * fix: manual edits * docs: ko: gpt2.md * feat: nmt draft * fix: manual edits * Remove bamba.md from docs/source/ko/model_doc/ * Update _toctree.yml	2025-08-13 10:00:25 -07:00
Joao Gante	5337f3052d	🚨🚨 [generate] ignore `cache_implementation="hybrid"` hub defaults (#40135 ) * working? * fix tests	2025-08-13 17:57:41 +02:00
Minseo Kim	e4223fa915	🌐 [i18n-KO] Translated `main_classes/optimizer_schedules.md` to Korean (#39713 ) * docs: ko: main_classes/optimizer_schedules * feat: nmt draft * fix: improve TOC anchors and expressions in optimizer_schedules - Add TOC anchors to all section headers - Fix terminology and improve Korean expressions * fix: Correct translation of 'weight decay fixed' to '가중치 감쇠가 적용된' Changed '가중치 감쇠가 수정된' to '가중치 감쇠가 적용된' for more accurate translation of 'weight decay fixed' in the context of optimization. * fix: Use more natural Korean inheritance expression Changed '에서 상속받는' to '을 상속받는' to follow natural Korean grammar patterns for inheritance terminology. * fix: Use consistent '미세 조정' translation for 'finetuned models' Changed '파인튜닝된' to '미세 조정된 모델' to follow the established translation glossary for 'finetuned models' terminology.	2025-08-13 08:23:09 -07:00
Jaehyeon Shin	9e21e50241	🌐 [i18n-KO] Translated `jamba.md` to Korean (#39890 ) * docs: ko: jamba.md * feat: nmt draft * fix: manual edits * fix: resolve suggestion Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com> --------- Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>	2025-08-13 08:22:28 -07:00
HyunSang Jang	486844579b	🌐 [i18n-KO] Translated `main_classes/processors.md` to Korean (#39519 ) * docs: ko: processors.md * feat: nmt draft * fix: manual edits * Update docs/source/ko/main_classes/processors.md Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Update docs/source/ko/main_classes/processors.md Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> --------- Co-authored-by: TaskerJang <bymyself103@naver.com> Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>	2025-08-13 08:21:38 -07:00
Yoni Gozlan	f445caeb0f	Fix hidden torchvision>=0.15 dependency issue (#39928 ) * use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT * fix min torchvision version * use InterpolationMode directly * remove unused is_torchvision_greater_or_equal, * nit	2025-08-13 15:13:42 +00:00
Joao Gante	11537c3e0c	[trainer] handle case where EOS token is None in `generation_config` (#40127 ) * handle case where EOS token is None in gen config * update eli5 dataset	2025-08-13 15:57:17 +01:00
Shiva Heydari	8ef5cd6579	DOCS: Add missing space in SECURITY.md (#40087 )	2025-08-13 12:57:37 +00:00
ivarflakstad	ebceef343a	Collated reports (#40080 ) * Add initial collated reports script and job definition * provide commit hash for this run. Also use hash in generated artifact name. Json formatting * tidy * Add option to upload collated reports to hf hub * Add glob pattern for test report folders * Fix glob * Use machine_type as path filter instead of glob. Include machine_type in collated report	2025-08-13 14:48:15 +02:00
Manuel de Prada Corral	e78571f5ce	`decoding_method` argument in generate (#40085 ) * factor out expand inputs * callable arg * improve docs, add test * Update docs/source/en/generation_strategies.md Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-08-13 12:45:50 +00:00
Joao Gante	8d19231bca	[serve] allow array `content` inputs for LLMs (#39829 ) fix bug; add tests	2025-08-13 11:26:19 +01:00
Manuel de Prada Corral	34a1fc6426	Fix QuantoQuantizedCache import issues (#40109 ) * fix quantoquantized	2025-08-13 10:22:59 +00:00
Nikita	060b86e21d	changed xLSTMRMSNorm to RMSNorm (#40113 ) * changed xLSTMRMS.. to RMS... * fix linter error --------- Co-authored-by: Nikita <nikita@Nikitas-MacBook-Pro.local>	2025-08-13 11:10:42 +02:00
Quentin Gallouédec	849c3778c6	[bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM (#39975 ) * [bugfix] ensure correct tensor device in Idefics2, Idefics3, and SmolVLM models * to cuda	2025-08-13 09:58:50 +02:00
Ahn Joon Sung	85d536a93b	🌐 [i18n-KO] Translated `tiny_agents.md` to Korean (#39913 ) * docs: ko: tiny_agents.md * feat: nmt draft * fix: manual edits * fix: manual edits	2025-08-12 22:54:16 -07:00
Ferdinand Mom	31ab7168ff	remove sequence parallel in llama4 (#40084 )	2025-08-13 00:12:45 +02:00
Shivamjan	a1a4fcd03e	Add model card for MobileViT (#40033 ) * Add model card for MobileViT * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update mobilevit.md * Update mobilevit.md * Update mobilevit.md * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update mobilevit.md * Update mobilevit.md * Update mobilevit.md * Update mobilevit.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-12 11:36:59 -07:00
Joao Gante	e5e73e4b95	[docs] Add reference to HF-maintained `custom_generate` collections (#39894 ) decoding -> generation; add collections	2025-08-12 17:38:00 +01:00
LucasChan	0ce24f5a88	Fix Causality Handling in Flash Attention to Support Bidirectional Attention (#39707 ) Fix the is_causal logic to enable bidirectional attention Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-08-12 16:16:28 +00:00
Joao Gante	83dbebc429	[trainer] ensure special tokens in model configs are aligned with tokenizer at train time (#38441 ) * tmp commit * add test * make fixup * reset warns/info in test	2025-08-12 16:32:07 +01:00
Anton Vlasjuk	9977cf1739	[`Flash Attention`] Fix flash attention integration (#40002 ) * fix flash attention * i got a stroke reading that comment * change dropout kwarg back to before * rename _fa3... as it's used for multiple variants and should work as fallback instead * simplify imports and support kwargs for fa * style * fix comments order * small fix * skip kernels test (causes cuda illegal memories w/o cleanup), fix fa test in general esp for models like bart * style * allow fullgraph by preloading on init * make globals "private" * ci pls be happy * change skip conditions based on backend flag (indicating missing mask interface) * move globals support to a function to prepare kwargs * style * generalize supported kwargs * small change to doc * fix * add comments * style * revert prep during generate * style * revert weird style changes * add fa kwarg prep during generate with fixes back * how did this even happen * how * add comment	2025-08-12 15:24:10 +00:00
Mohamed Mekkouri	b6ba595543	Default to dequantize if cpu in device_map for mxfp4 (#39993 ) * default to dq if cpu * an other check * style * revert some changes	2025-08-12 16:48:52 +02:00
Michał Gallus	a5fac1c394	Fix error on importing unavailable torch.distributed (#40038 ) Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module.	2025-08-12 16:30:51 +02:00
Çağrı Tuğrul Canbol	085e02383c	Fix Qwen3 MoE GGUF architecture mismatch (#39976 ) * fix qwen3moe gguf architecture * Fix Qwen3Moe GGUF loading --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Jinuk Kim <jusjinuk@snu.ac.kr>	2025-08-12 13:38:48 +00:00
Cyril Vallez	2ce0dae390	Switch the order of args in StaticCache (for BC and future logic) (#40100 ) * switch order for BC and future logic * in generate as well	2025-08-12 15:30:44 +02:00
Isotr0py	f7cbd5f3ef	Fix regression in mllama vision encoder (#40083 ) fix mllama vision encoder Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-12 15:29:45 +02:00
Quentin Gallouédec	35dc88829c	Replace `logger.warning` with `logger.warning_once` in `GradientCheckpointingLayer` (#40091 )	2025-08-12 15:26:47 +02:00
Cyril Vallez	b1b46555cd	Re-apply make style (#40106 ) make style	2025-08-12 15:02:16 +02:00
MilkClouds	a07b5e90f2	feat: add `is_fast` to ImageProcessor (#39603 ) * feat: add `is_fast` to ImageProcessor * test_image_processing_common.py 업데이트 Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * feat: add missing BaseImageProcessorFast import * fix: `issubclass` for discriminating subclass of BaseImageProcessorFast --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-08-12 12:14:57 +00:00
Yuanyuan Chen	952fac100d	Enable SIM rules (#39806 ) * Enable SIM rules Signed-off-by: cyy <cyyever@outlook.com> * More fixes Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-08-12 12:14:26 +00:00
Cyril Vallez	41d1717882	New DynamicSlidingWindowLayer & associated Cache (#40039 ) * start adding the layer * style * improve * modular * fix * fix * improve * generate integration * comment * remove old one * remove * fix * fix * fix * fix all recompiles * fix * doc * fix * add text config check * fix encoderdecoder cache * add it for all models with sliding/hybrid support * revert * start fixing * prophetnet * fsmt * fix ddp_data * add test for mistral * improve mistral test and add gemma2 test * docstrings	2025-08-12 14:09:52 +02:00
Malav-P	ab455e0d88	Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock (#39743 ) audio encodings now match conv weight dtype in Gemma3nAudioSSCPConvBlock	2025-08-12 12:08:28 +00:00
Quentin Gallouédec	4b3a1a62cc	Causal loss for `ForConditionalGeneration` (#39973 ) * feat: add ForConditionalGeneration loss to LOSS_MAPPING * consistent spelling of "recognized"	2025-08-12 14:03:09 +02:00
Lambert	f6b6e17719	Add glm4.5&&glm4.5V doc (#40095 ) * Docs: GLM-4-MoE & GLM-4V-MoE pages * Docs: polish GLM-4V-MoE intro, remove placeholders; pin image * Docs --------- Co-authored-by: wujiahan <lambert@gmail.com>	2025-08-12 11:44:53 +00:00
Raushan Turganbay	1c5e17c025	Update Glm4V processor and add tests (#39988 ) * update GLm4V and add tests * Update tests/models/glm4v/test_processor_glm4v.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * remove min/max pixels for BC * fix video tests --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-08-12 13:40:54 +02:00
Aritra Roy Gosthipaty	913c0a8c33	[docs] Zero Shot Object Detection Task (#40096 ) * refactor zsod task docs * keeping the image guided od section * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update docs/source/en/tasks/zero_shot_object_detection.md Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>	2025-08-12 11:43:38 +01:00
youngrok cha	c6fbfab61b	[fix] batch inference for llava_onevision (#40021 ) * [fix] llava onevision batch inference * style * cannot pass inconsistent list & handle text-only case	2025-08-12 11:01:00 +02:00
Raushan Turganbay	86bb1fcd26	Revert FA2 kwargs construction (#40029 ) * revert * use imports * went way too high in imports level * style	2025-08-12 10:48:35 +02:00
Shuming Hu	3ff2e984d2	Fix PerceptionLM image preprocessing for non-tiled image input. (#40006 ) * Fix PerceptionLM image preprocessing for non-tiled image input. * Add test for single tile vanilla image processing. * ruff format * recover missing test skip * Simplify test. * minor test name fix	2025-08-12 08:40:22 +00:00
ivarflakstad	4668ef1459	Update notification service MI325 (#40078 ) add mi325 to amd_daily_ci_workflows	2025-08-12 10:22:52 +02:00
drbh	1cea763ba4	feat: extract rev in attn_implementation kernels via @ (#40009 ) * feat: extract rev in attn_implementation kernels via @ * fix: adjust for ruff * fix: update regex and add explanatory comment * fix: move attn_implementation kernel doc * fix: remove extra line	2025-08-11 15:14:13 -04:00
Anton Vlasjuk	e29919f993	[`GPT Big Code`] Fix attention scaling (#40041 ) * fix * update integration tests * fmt * add regression test	2025-08-11 19:01:31 +00:00
Shoumik Gandre	eca703026e	chore: standardize DeBERTa model card (#37409 ) * chore: standardize DeBERTa model card * Apply suggestions from code review in docs Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix: Update deberta.md with code cleanup suggestions * Update docs/source/en/model_doc/deberta.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/deberta.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update deberta.md * Update deberta.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-11 10:30:37 -07:00
Yih-Dar	43001fd3c6	Fix `time_spent` in `notification_service.py`. (#40081 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-11 18:30:58 +02:00
azhar shaikh	5521c62b89	added Textnet fast image processor (#39884 ) * feat: add fast image processor implementation for TextNet model * chore: override to_dict method to TextNetImageProcessorFast for slow processor compatibility tests * chore: update init method * chore: coding and style checks * chore: fixed code quality issue * chore: override resize to handle size_divisor, move all preprocessing logic to child class * fix: autoImageProcessor issue for textnet * chore: cleanup * simplify resize --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-08-11 11:44:31 -04:00
Raushan Turganbay	6b70d79b61	Fix repo consistency (#40077 ) fix	2025-08-11 15:26:22 +02:00
Wing Lian	7dd82f307b	guard on model.eval when using torch.compile + FSDP2 (#37413 ) guard on model.eval Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-08-11 13:22:42 +02:00
Cyril Vallez	68eb1a9a63	Remove deprecated cache-related objects (#40035 ) remove them	2025-08-11 10:30:14 +02:00
Dongruixuan Li	480653d271	fix: move super().__init__ after vision_config init in Mistral3Config (#40063 ) fix: move super().__init__ after vision_config init in Mistral3Config (#40062)	2025-08-11 09:21:54 +02:00
Raushan Turganbay	502f253e20	[gemma3] update conversion key mapping (#39778 ) update conversion key mapping	2025-08-11 09:21:13 +02:00
Raushan Turganbay	3124d1b439	[qwen-vl] fix beam search with videos (#39726 ) * fix * fix copies	2025-08-11 09:21:04 +02:00
Tsumugii	1372a5b8c4	fix: resolve triton version check compatibility on windows (#39986 ) * fix: resolve triton version check compatibility on windows * style: remove trailing space * fix: fix typo --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-08-11 08:53:19 +02:00
Yih-Dar	99c747539e	unpin `torchcodec==0.5.0` and use `torch 2.8` on daily CI (#40072 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-10 22:27:39 +02:00
reedrya	b59140b696	Update HuBERT model card according to template (#39742 ) * Update HuBERT model card according to template Standardized HuBERT doc, added ASR examples, Flash Attention 2 support, and quantization section. * Address review comments and changes requested to hubert.md * Update hubert.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-10 11:32:45 -07:00
ssum21	795ae8f282	docs : 4N3MONE recommandced modified contents	2025-08-09 20:07:42 -07:00
Yih-Dar	f4d57f2f0c	Revert "fix `notification_service.py` about `time_spent`" (#40044 ) Revert "fix `notification_service.py` about `time_spent` (#40037)" This reverts commit `d2ba153b29`.	2025-08-08 22:32:24 +02:00
Yuxuan Zhang	7b20915f4e	GLM-4.5V Model Support (#39805 ) Some checks failed Secret Leaks / trufflehog (push) Has been cancelled Details * init * update * uupdate * ruff * t patch is 2 defalut not 1 * draft * back * back1 * update * config update * update using glm-41 format * add self.rope_scaling = config.rope_scaling * update config * update * remove the processor * update * fix tests * update * for test * update * update 2126 * self.rope_scaling is missing in GLM4MOE lets add it * update * update * Update modular_glm4v_moe.py * change config * update apply_multimodal_rotary_pos_emb * format * update * Delete 3-rollout_qas_thinking_answers.py * use right name * update with place holder * update * use right rotary * Update image_processing_glm4v_fast.py * rope_config_validation needs to rewrite the entire config file in modular * update * changed name * update * Update modeling_glm4v_moe.py * _init_weights shoud be add in Glm4vMoePreTrainedModel * remove use_qk_norm * Update modular_glm4v_moe.py * remove use_qk_norm as it is not use * fix style * deprecations are not needed on new models * fix merge issues --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Arthur <arthur.zucker@gmail.com>	2025-08-08 17:39:52 +02:00
Yih-Dar	d2ba153b29	fix `notification_service.py` about `time_spent` (#40037 ) temp Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-08 17:11:16 +02:00
Mohamed Mekkouri	f639c0c780	Bnb failling tests (#40026 ) * initial commit * style --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-08-08 16:28:00 +02:00
Cyril Vallez	a96cccd0dd	Tie weights recursively on all submodels (#39996 ) * recursive call * add missing keys * remove bad keys	2025-08-08 16:03:16 +02:00
Cyril Vallez	a78263dbb5	fix	2025-08-08 15:32:23 +02:00
Cyril Vallez	dc11a3cbb2	[core] Refactor the Cache logic to make it simpler and more general (#39797 ) * Simplify the logic quite a bit * Update cache_utils.py * continue work * continue simplifying a lot * style * Update cache_utils.py * offloading much simpler * style * Update cache_utils.py * update inits * Update cache_utils.py * consistemncy * Update cache_utils.py * update generate * style * fix * fix * add early_initialization * fix * fix mamba caches * update * fix * fix * fix * fix tests * fix configs * revert * fix tests * alright * Update modeling_gptj.py * fix the constructors * cache tests * Update test_cache_utils.py * fix * simplify * back to before -> avoid compile bug * doc * mistral test * llama4 test dtype * Update test_modeling_llama4.py * CIs * Finally find a nice impl * Update cache_utils.py * Update cache_utils.py * add lazy methods in autodoc * typo * better doc * Add detailed docstring for lazy init * CIs * style * fix	2025-08-08 14:47:21 +02:00
Laurenz Ruzicka	95510ab018	Fix missing None default values for Gemma3n model in get_placeholder_mask (#39991 ) (#40024 ) * Fix missing None default values for Gemma3n model in get_placeholder_mask (#39991) * Switched definition of optional from\| None to Optiona[] (Issue #39991) --------- Co-authored-by: Laurenz Ruzicka <Laurenz.Ruzicka@ait.ac.at>	2025-08-08 10:43:42 +00:00
Cyril Vallez	5c3fb7f731	Harmonize `past_key_value` to `past_key_valueS` everywhere (#39956 ) * all modulars and llama * apply modular * bert and gpt2 copies * fix imports * do it everywhere * fix import * finalize it * fix * oups set it in modular * style * fix * Add 1 version to deprecation cycle * Update modeling_layers.py	2025-08-08 11:52:57 +02:00
Raushan Turganbay	2469cce621	Fix an annoying flaky test (#40000 ) annoying flaky test	2025-08-08 10:32:51 +02:00
Mohamed Mekkouri	fe1bf82159	Higgs modules_to_not_convert standardization (#39989 ) fix higgs	2025-08-08 10:22:59 +02:00
Isotr0py	b374c3d12e	Fix broken image inference for Fuyu model (#39915 ) * fix fuyu Signed-off-by: Isotr0py <2037008807@qq.com> * oops Signed-off-by: Isotr0py <2037008807@qq.com> * run test on GPU Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> * clean unused Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> * revert Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> * add fuyu multimodal test Signed-off-by: Isotr0py <2037008807@qq.com> * fix Signed-off-by: Isotr0py <2037008807@qq.com> --------- Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-08 07:21:49 +00:00
Yih-Dar	4d57c39007	pin torchcodec==0.5.0 for now with torch 2.7.1 on daily CI (#40013 ) * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-07 23:05:39 +02:00
Yih-Dar	3e0333fa4a	Update expected output values after #39885 (part 2) (#40015 ) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-07 22:52:53 +02:00
Mohamed Mekkouri	12f248bced	Raising error when quantizing a quantized model (#39998 ) * error when quantizing a quantized model * style	2025-08-07 20:37:25 +00:00
Minseo Kim	efaf3714dc	docs: fix duplication in 'en/optimizers.md' (#40014 )	2025-08-07 13:28:43 -07:00
Yih-Dar	ca4cbb1e3f	unpin torch<2.8 on circleci (#40012 ) update Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-07 21:31:17 +02:00
Raushan Turganbay	78922577e9	FA2 can continue generation from cache (#39843 ) * add fa2 support to continue generation from cache * update q-len	2025-08-07 19:26:23 +02:00
Yuanyuan Chen	9bfbdd2945	Fix default values of getenv (#39867 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-08-07 17:25:40 +00:00
Duc-Viet Hoang	692d336908	Fix HGNetV2 Model Card and Image Classification Pipeline Usage Tips (#39965 ) * fix hgnet docs and image-classification pipeline * use positional argument * fix dit close hfoptions tag * fix alphabet order * fix hgnnet modular docstring * Update hgnet_v2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update hgnet_v2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/hgnet_v2.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix: hgnet reference * change hgnet to en doc --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-07 09:33:29 -07:00
Armaghan Shakir	0659214196	fix: remove CHAT_TEMPLATE import in tests for deepseek-vl (#40003 ) * remove CHAT_TEMPLATE import in tests * update and use prepare_processor_dict	2025-08-07 16:19:36 +00:00
Shuming Hu	27997eeb8d	Fix missing video inputs for PerceptionLM. (#39971 ) * Fix missing video inputs for PerceptionLM. * Minor fix for vanilla input image (only C,H,W, no tiles dim). * Revert "Minor fix for vanilla input image (only C,H,W, no tiles dim)." This reverts commit 181d87b964e59c4118035a9fd4f530c6e551ba9f.	2025-08-07 15:54:45 +00:00
Yuan Wu	bf1bd6ac1f	Fix int4 quantized model cannot work with cpu (#39724 ) * Fix int4 quantized model cannot work with cpu Signed-off-by: yuanwu <yuan.wu@intel.com> * Update the comments Signed-off-by: yuanwu <yuan.wu@intel.com> * update Signed-off-by: yuanwu <yuan.wu@intel.com> * update Signed-off-by: yuanwu <yuan.wu@intel.com> --------- Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-08-07 15:24:00 +00:00
Yih-Dar	43d3b1931a	Update expected output values after #39885 (part 1) (#39990 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-07 16:00:28 +02:00
Cyril Vallez	d5a0809707	Fix consistency (#39995 ) * modular * fix	2025-08-07 15:52:40 +02:00
Pavel Iakubovskii	b347e93567	[typing] Fix return typehint for decoder and inv_freq annotation (#39610 ) * fix return typehint for decoder and annotate inv_freq * fix modular * Fix consistency * Move annotation on class level * missing annotations * add comment	2025-08-07 14:10:22 +01:00
dependabot[bot]	7188e2e28c	Bump transformers from 4.48.0 to 4.53.0 in /examples/tensorflow/language-modeling-tpu (#39967 ) Bump transformers in /examples/tensorflow/language-modeling-tpu Bumps [transformers](https://github.com/huggingface/transformers) from 4.48.0 to 4.53.0. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](https://github.com/huggingface/transformers/compare/v4.48.0...v4.53.0) --- updated-dependencies: - dependency-name: transformers dependency-version: 4.53.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-07 12:13:48 +01:00
Isotr0py	2b19a06692	Fix gemma3n feature extractor's incorrect squeeze (#39919 ) * fix gemma3n squeeze Signed-off-by: Isotr0py <2037008807@qq.com> * add regression test Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> --------- Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-07 18:34:28 +08:00
Raushan Turganbay	555cbf5917	[Idefics] fix device mismatch (#39981 ) fix	2025-08-07 11:12:04 +02:00
Rémi Ouazan	597ed1a11d	Various test fixes for AMD (#39978 ) * Add amd expectation in internvl * Add amd expectation to llama * Added bnb decorator for a llava test that requires bnb * Added amd expectation for mistral3 * Style	2025-08-07 10:57:04 +02:00
Jack	6121e9e46c	Support input_embeds in torch exportable decoders (#39836 ) * Support input_embeds in torch exportable decoders * Hybrid cache update * Manually change some callsites * AI changes the rest of the call sites * Make either input_ids/inputs_embeds mandatory * Clean up * Ruff check --fix * Fix test * pr review * Revert config/generation_config changes * Ruff check	2025-08-07 08:51:31 +00:00
StevenBucaille	cdeaad96b7	[superglue] Fixed the way batch mask was applied to the scores before match assignment computation (#39968 ) fix: mask filling to score was wrong	2025-08-07 09:49:39 +01:00
Rémi Ouazan	2593932f10	Gemma3 fixes (#39960 ) * Fix multiple devices issue * Added expectations for rocm 9.4 * Ruff	2025-08-07 09:57:21 +02:00
Yoni Gozlan	513f76853b	Modular fix: remove the model name in `find_file_type` (#39897 ) * remove the model name in the class name * add comment	2025-08-06 23:31:07 +00:00
Arpon Kapuria	743bb5f52e	chore: update Deformable_Detr model card (#39902 ) * chore: update Deformable_Detr model card * fix: added pipeline, automodel examples and checkpoints link * Update deformable_detr.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-06 12:45:14 -07:00
Zhen	ac0b468465	[bugfix] fix flash_attention_2 unavailable error on Ascend NPU (#39844 )	2025-08-06 17:48:52 +00:00
Manuel de Prada Corral	cf243a1bf8	Fix `fix_and_overwrite` mode of `utils/check_docstring.py` (#39369 ) * bug in fix mode of check_docstring	2025-08-06 19:37:25 +02:00
Marc Sun	6902ffa505	remove `triton_kernels` dep with `kernels` instead (#39926 ) * remove dep * style * rm import * fix * style * simplify * style	2025-08-06 19:31:20 +02:00
ScutterKey	cb2e0df2ec	[image processor] fix glm4v (#39964 ) * fix glm4v image process * Update src/transformers/models/glm4v/image_processing_glm4v.py --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-08-06 17:46:58 +01:00
Tialo	9ab75fc428	fix typo (#39936 ) * fix typo * fix modular instead * fix --------- Co-authored-by: y.korobko <y.korobko@tbank.ru>	2025-08-06 16:21:24 +00:00
Mikhail Samin	43b3f58875	Fix grammatical error in MoE variable name: expert_hitted → expert_hit, hitted_experts → hit_experts (#39959 ) * Fix grammatical error: expert_hitted -> expert_hit in MoE implementations * Fix grammatical error: hitted_experts -> hit_experts in MoE implementation	2025-08-06 15:45:19 +00:00
Minseo Kim	dff6185d61	docs: fix typo in 'quantization-aware training' (#39904 )	2025-08-06 14:52:43 +00:00
Matthew Douglas	c7844c7a8e	Enable gpt-oss mxfp4 on older hardware (sm75+) (#39940 ) Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-08-06 13:39:21 +00:00
Lintch	dd70a8cb9d	Fix MXFP4 quantizer validation to allow CPU inference with dequantize option (#39953 ) * Fix MXFP4 quantizer validation to enable CPU dequantization Move dequantize check before CUDA availability check to allow CPU inference when quantization_config.dequantize is True. This enables users to run MXFP4 models on CPU by automatically converting them to BF16 format. * Add tests for MXFP4 quantizer CPU dequantization validation * fix: format mxfp4 test file with ruff	2025-08-06 15:20:41 +02:00
Joao Gante	82eb67e62a	[docs] ko toc fix (#39927 )	2025-08-06 10:12:34 +00:00
Yih-Dar	9e76a6bb54	circleci: pin torch 2.7.1 until `torchcodec` is updated (#39951 ) circleci torch 2.7.1 Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-06 11:18:00 +02:00
Manuel de Prada Corral	910b319357	Fix CI: Tests failing on CPU due to `torch.device('cpu').index` being None (#39933 ) replace routing_weights.device.index with a	2025-08-06 10:22:43 +02:00
Yih-Dar	369c99d0ce	Avoid `utils/check_bad_commit.py` failing due to rate limit (requesting `api.github.com`) (#39918 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-05 21:52:20 +02:00
Joao Gante	b771e476a8	[CI] post-`GptOss` fixes for green CI (#39929 )	2025-08-05 20:04:59 +02:00
Lysandre	eb6e26acf3	Dev version	2025-08-05 18:09:30 +02:00
Lysandre Debut	c54203a32e	gpt_oss last chat template changes (#39925 ) Last chat template changes	2025-08-05 18:08:08 +02:00
Arthur	7c38d8fc23	Add GPT OSS model from OpenAI (#39923 ) * fix * nice * where i am at * Bro this works * Update src/transformers/integrations/tensor_parallel.py * cleanups * yups that was breaking * Update src/transformers/models/openai_moe/modeling_openai_moe.py * gather on experts and not mlp * add changes for latest convert branch * adds options to get output_router_logits from config * bring chat temlate + special tokens back into the script. * initial commmit * update * working with shards * add model.safetensors.index.json * fix * fix * mxfp4 flag * rm print * Fix PAD/EOS/BOS (#18) * fix pad/eos/bos * base model maybe one day * add some doc * special tokens based on harmony. * add in tokenizer config as well. * prepare for rebase with main * Fix for initialize_tensor_parallelism now returning 4-tuple ``` [rank0]: File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module> [rank0]: model = AutoModelForCausalLM.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained [rank0]: return model_class.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper [rank0]: return func(args, kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained [rank0]: tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: ValueError: too many values to unpack (expected 3) ``` mxfp4 * mxfp4 draft * fix * fix import * draft * draft impl * finally working ! * simplify * add import * working version * consider blocks and scales * device mesh fix * initial commit * add working dequant + quant logic * update * non nan, gibberish output * working EP + quantization finally ! * start cleaning * remove reversing process * style * some cleaning * initial commmit * more cleaning * more cleaning * simplify * more cleaning * rm duplicated function * changing tp_plan * update tp plan check * add loading attribute * dequantizing logic * use subfunctions * import cleaning * update_param_name * adds clamped swiglu * add clamping to training path * simplify dequant logic * update * Bad merge * more simplifications & tests * fix ! * fix registering custom attention * fix order * fixes * some test nits * nits * nit * fix * Clamp sink logits * Clean * Soft-max trick * Clean up * p * fix deepspeed * update both modeling and modular for cleanup * contiguous * update tests * fix top_k router call * revert renaming * test nits * small fixes for EP * fix path for our local tests * update as I should not have broken that! * fix the loss of mixtral * revert part of the changes related to router_scores, kernel probably no ready for that! * deleting a small nit * update arch * fix post processing * update * running version but not expected output * moving to cuda * initial commit * revert * erroring when loading on cpu * updates * del blocks, scales * fix * style * rm comm * comment * add comment * style * remove duplicated lines * Fix minor issue with weight_map conversion script * fix sampling params * rename to final name * upate pre-final version of template * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * fix batched inference * serve fixes * swizzle ! * update final chat template by Matt. * fix responses; pin oai * sinplify * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * fix * Use ROCm kernels from HUB * Make kernel modes explicit * update final chat template by Matt. x2 * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> * Fix installation * Update setup.py Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com> * allow no content * fix: update message handling in write_tokenizer function * Fix template logic for user message role * last nits for CB and flash_paged! * there was one bad merge * fix CB (hardcode for now, its just using kv groups instead) * fix * better fix for device_map * minor device fix * Fix flash paged * updates * Revert "remove dtensors, not explicit (#39840)" This reverts commit `6dfd561d9c`. * update * Revert "remove dtensors, not explicit (#39840)" This reverts commit `6dfd561d9c`. * fix merge * fix * Fix line break when custom model indentity * nits testing * to locals first and pass sliding window to flash paged * register modes for MegaBlocksMoeMlp * add integration test in fixtures -> now update the tests to use it! * update integration tests * initial fix * style and update tests * fix * chore(gpt oss): remove mlp_bias from configuration It was just a leftover. * stats * Integration tests * whoops * Shouldn't move model * Ensure assistant messages without thinking always go to "final" channel * More checks to ensure expected format * Add pad_token_id to model configuration in write_model function (#51) * Add oai fix fast tests (#59) * Fix some fast tests * Force some updates * Remove unnecessary fixes * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * reasoning -> Reasoning * Add additional integration tests * fixup * Slight fixes * align chat template with harmony * simplify * Add comment * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * Revert fixup * skip 2 test remove todo * merge * padding side should be left for integration tests * fix modular wrt to changes made to modeling * style * isort * fix opies for the loss * mmmm --------- Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: edbeeching <edbeeching@gmail.com> Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com> Co-authored-by: MekkCyber <mekk.cyber@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com> Co-authored-by: Zhuohan Li <zhuohan@openai.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal> Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Akos Hadnagy <akos@ahadnagy.com> Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com> Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Matt <rocketknight1@gmail.com>	2025-08-05 18:02:18 +02:00
TaeHyeon Jeon	738c1a3899	🌐 [i18n-KO] Translated `cache_explanation.md` to Korean (#39535 ) * update: _toctree.yml * docs: ko: cache_explanation.md * feat: nmt draft * fix: apply yijun-lee's comments * fix: apply 4N3MONE's comments * docs: update cache_position * docs: update cache-storage-implementation * update: add h2 tag in cache-position --------- Co-authored-by: taehyeonjeon <xogus294@gmail.com>	2025-08-05 08:20:13 -07:00
Guang Yang	d2ae766836	Export SmolvLM (#39614 ) Export SmolVLM for ExecuTorch	2025-08-05 16:20:23 +02:00
ppaanngggg	c430047602	[docs] update object detection guide (#39909 ) * Update object_detection.md * Update object_detection.md	2025-08-05 14:07:21 +00:00
Arthur	dedcbd6e3d	run model debugging with forward arg (#39905 ) * run model debugging a lot simpler * fixup * Update src/transformers/utils/generic.py * fixup * mode syle? * guard a bit	2025-08-05 15:46:19 +02:00
Arthur	20ce210ab7	Revert "remove dtensors, not explicit (#39840 )" (#39912 ) * Revert "remove dtensors, not explicit (#39840)" This did not work with generation (lm_head needs extra care!) This reverts commit `6dfd561d9c`. * update * style?	2025-08-05 15:12:14 +02:00
Raushan Turganbay	2589a52c5c	Fix aria tests (#39879 ) * fix aria tests * awful bug * fix copies * fix tests * fix style * revert this	2025-08-05 13:48:47 +02:00
Justin van Heek	6e4a9a5b43	Fix eval thread fork bomb (#39717 )	2025-08-05 10:50:32 +00:00
Yuanyuan Chen	98a3c49135	Replace video_fps with fps in tests (#39898 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-08-05 10:39:55 +00:00
nnul	1af1071081	Fix misleading WandB error when WANDB_DISABLED is set (#39891 ) When users set `report_to="wandb"` but also have `WANDB_DISABLED=true` in their environment, the previous error message was misleading: "WandbCallback requires wandb to be installed. Run pip install wandb." This was confusing because wandb was actually installed, just disabled via the environment variable. The fix detects this specific case and provides a clear, actionable error message explaining the conflict and how to resolve it.	2025-08-05 10:18:18 +00:00
Yidi Wu	78ef84921b	Avoid aliasing in cond's branches for torch 2.8 (#39488 ) Avoid alaising in cond's branches Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-08-05 11:18:11 +02:00
Yuanyuan Chen	9e676e6a0e	[qwen] remove unnecessary CUDA sync in qwen2_5_vl (#39870 ) Signed-off-by: cyy <cyyever@outlook.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-08-05 08:54:16 +00:00
Yao Matrix	392be3b282	fix test_working_of_tp failure of accelerate ut (#39828 ) Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-08-05 08:52:57 +00:00
Arthur	cc5de36454	[`Exaone4`] Fixes the attn implementation! (#39906 ) * fix * fix config	2025-08-05 09:29:16 +02:00
Lysandre Debut	00d47757bf	Reorder serving docs (#39634 ) * Slight reorg * LLMs + draft VLMs * Actual VLM examples * Initial responses * Reorder * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/tiny_agents.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/open_webui.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/cursor.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Responses API * Address Pedro's comments --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2025-08-05 08:43:06 +02:00
Arpon Kapuria	8c4ea670dc	chore: update DETR model card (#39822 ) * Update model card for DETR * fix: applied suggested changes * fix: simplified pipeline and modified notes and resources * Update detr.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-04 12:25:53 -07:00
Jan Netík	0bd91cc822	Add support for `ModernBertForMultipleChoice` (#39232 ) * implement ModernBertForMultipleChoice * fixup, style, repo consistency * generate modeling_modernbert * add tests + docs * fix test	2025-08-04 20:45:43 +02:00
Yih-Dar	801e869b67	send some feedback when manually building doc via comment (#39889 ) * fix * fix * fix * Update .github/workflows/pr_build_doc_with_comment.yml Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-08-04 18:20:48 +00:00
Yih-Dar	ee7eb2d0b1	Update cohere2 vision test (#39888 ) * fix * fix * fix * fix * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-04 20:08:18 +02:00
rohitthewanderer	3bafa128dc	[DOCS] : Improved mimi model card (#39824 ) * [DOCS] : Improved mimi model card * Removed additional header * Review: addressed feedback * Update mimi.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-04 10:07:06 -07:00
Pavel Iakubovskii	192acc2d0f	Fix link to models in README (#39880 ) Update README.md	2025-08-04 09:34:41 -07:00
Pavel Iakubovskii	7dca2ff8cf	[typing] better return type hint for `AutoModelForCausalLM` and `AutoModelForImageTextToText` (#39881 ) * Better return type hint for AutoModelForCausalLM and AutoModelForImageTextToText * fix imports * fix	2025-08-04 15:03:53 +00:00
Yih-Dar	3edd14610e	Set `torch.backends.cudnn.allow_tf32 = False` for CI (#39885 ) * fix * fix * [test all] --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-04 16:55:16 +02:00
Quentin Gallouédec	e3505cd4dc	Replace `Tokenizer` with `PreTrainedTokenizerFast` in `ContinuousBatchProcessor` (#39858 ) Replace Tokenizer with PreTrainedTokenizerFast in ContinuousBatchProcessor	2025-08-04 16:39:19 +02:00
Cyril Vallez	380b2a0317	Rework add-new-model-like with modular and make test filenames coherent (#39612 ) * remove tf/flax * fix * style * Update add_new_model_like.py * work in progress * continue * more cleanup * simplify and first final version * fixes -> it works * add linter checks * Update add_new_model_like.py * fix * add modular conversion at the end * Update add_new_model_like.py * add video processor * Update add_new_model_like.py * Update add_new_model_like.py * Update add_new_model_like.py * fix * Update image_processing_auto.py * Update image_processing_auto.py * fix post rebase * start test filenames replacement * rename all test_processor -> test_processing * fix copied from * add docstrings * Update add_new_model_like.py * fix regex * improve wording * Update add_new_model_like.py * Update add_new_model_like.py * Update add_new_model_like.py * start adding test * fix * fix * proper first test * tests * fix * fix * fix * fix * modular can be used from anywhere * protect import * fix * Update add_new_model_like.py * fix	2025-08-04 14:41:09 +02:00
Marc Sun	5fb5b6cfaf	Fix quant docker for fp-quant (#39641 ) * fix quant docker * Apply style fixes --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-08-04 11:57:08 +00:00
Pavel Iakubovskii	16d6faef9a	[core] Fix attn_implementation setter with missing `sub_configs` (#39855 ) * fix * add sub_configs * remove case for attention setter * fix None * Add test * Fix sub-configs * fix tests_config * fix consistency * fix fsmt * fix	2025-08-04 11:35:09 +01:00
Akib Jawad	2a9febd632	Add support for including in-memory videos (not just files/urls) in apply_chat_template (#39494 ) * added code for handling video object ,as dictionary of frames and metadata, in chat template * added new test where videos are passed as objects (dict of frames, metadata) in the chat template * modified hardcoded video_len check that does not match with increased number of tests cases. * Modify hardcoded video_len check that fails with increased number of tests * update documentation of multi-modal chat templating with extra information about including video object in chat template. * add array handling in load_video() * temporary test video inlcuded * skip testing smolvlm with videos that are list of frames * update documentation & make fixup * Address review comments	2025-08-04 11:49:42 +02:00
Yih-Dar	0d511f7a77	Use comment to build doc on PRs (#39846 ) * try * try * try * try * try --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-04 10:24:45 +02:00
Quentin Gallouédec	4819adbbaa	Refactor label name handling for PEFT models in Trainer class (#39265 ) Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-08-04 06:29:57 +00:00
Quentin Gallouédec	166fcad3f8	Improve `is_wandb_available` function to verify WandB installation (#39875 ) Improve `is_wandb_available` function to verify WandB installation by checking for a key attribute	2025-08-04 08:22:52 +02:00
Arthur	6dfd561d9c	remove dtensors, not explicit (#39840 ) * remove dtensors, not explicit Co-authored-by: 3outeille <3outeille@users.noreply.github.com> * style * fix test * update * as we broke saving try to fix * output layouts should exit * nit * devicemesh exists if it was distributed * use _device_mesh of self * update * lol * fix * nit * update * fix! * this??? * grumble grumble * ? * fuck me --------- Co-authored-by: 3outeille <3outeille@users.noreply.github.com>	2025-08-01 22:02:47 +02:00
Quentin Gallouédec	b727c2b20e	Allow `TrackioCallback` to work when pynvml is not installed (#39851 ) Allow TrackioCallback to work when pynvml is not installed	2025-08-01 18:57:25 +02:00
StevenBucaille	1ec0feccdd	[image-processing] deprecate `plot_keypoint_matching`, make `visualize_keypoint_matching` as a standard (#39830 ) * fix: deprecate plot_keypoint_matching and make visualize_keypoint_matching for all Keypoint Matching models * refactor: added copied from * fix: make style * fix: repo consistency * fix: make style * docs: added missing method in SuperGlue docs	2025-08-01 16:29:57 +00:00
Yoni Gozlan	7b4d9843ba	Add fast image processor Janus, Deepseek VL, Deepseek VL hybrid (#39739 ) * add fast image processor Janus, deepseek_vl, deepseek_vl_hybrid * fix after review	2025-08-01 12:20:08 -04:00
Lysandre Debut	88ead3f518	Fix responses add tests (#39848 ) * Quick responses fix * [serve] Fix responses API and add tests * Remove typo * Remove typo * Tests	2025-08-01 18:06:08 +02:00
Arthur	6ea646a03a	Update ux cb (#39845 ) * clenaup * nits * updates * fix logging * push updates? * just passexception * update * nits * fix * add tokencount * style	2025-08-01 16:50:28 +02:00
rziga	3951d4ad5d	Add MM Grounding DINO (#37925 ) * first commit Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed. * fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model * cleaned up a hack in the conversion script * Fixed the expected values in integration tests Cross att masking and cpu-gpu consistency tests are still failing however. * changes for make style and quality * add documentation * clean up contrastive embedding * add mm grounding dino to loss mapping * add model link to config docstring * hack fix for mm grounding dino consistency tests * add special cases for unused config attr check * add all models and update docs * update model doc to the new style * Use super_kwargs for modular config * Move init to the _init_weights function * Add copied from for tests * fixup * update typehints * Fix-copies for tests * fix-copies * Fix init test * fix snippets in docs * fix consistency * fix consistency * update conversion script * fix nits in readme and remove old comments from conversion script * add license * remove unused config args * remove unnecessary if/else in model init * fix quality * Update references * fix test * fixup --------- Co-authored-by: qubvel <qubvel@gmail.com>	2025-08-01 15:43:23 +01:00
Yuanyuan Chen	50145474b7	[typecheck] proper export of private symbols (#39729 ) * Export private symbols Signed-off-by: cyy <cyyever@outlook.com> * Update src/transformers/__init__.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/__init__.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Fix format Signed-off-by: cyy <cyyever@outlook.com> * Add a comment for exported symbols Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-08-01 13:36:47 +01:00
Arthur	c962f1515e	[`attn_implementation`] remove recursive, allows custom kernels with wrappers (#39823 ) * fix? * fixme and style * Update src/transformers/modeling_utils.py * update * update * fix * small fixees * nit * nits * fix init check? * fix * fix default * or fucks me * nits * include a small nit * does this make it hapy? * fixup * fix the remaining ones	2025-08-01 12:18:28 +02:00
Raushan Turganbay	d3b8627b56	[VLMs] split out "get placeholder mask" to helper (#39777 ) * batch upidate all models * update * forgot about llava onevision * update * fix tests * delete file * typo * fix emu3 once and forever * update cohere2 vision as well	2025-08-01 08:01:06 +00:00
Arthur	a115b67392	Fix tp cb (#39838 ) * fixes * one more	2025-08-01 09:59:04 +02:00
Eric Bezzam	2c0af41ce5	Fix bad markdown links (#39819 ) Fix bad markdown links.	2025-07-31 09:14:14 -07:00
Tommy Chiang	4fcf455517	Fix broken links (#39809 ) Replace links in the form of `[text]((url))` to `[text](url)`. This is the correct format of a url in the markdown.	2025-07-31 13:23:04 +00:00
Raushan Turganbay	b937d47455	[cohere2 vision] move doc to multimodal section (#39820 ) move doc to multimodal section	2025-07-31 15:13:02 +02:00
Kyle Duffy	6ba8a1ff45	Update documentation for Cohere2Vision models (#39817 ) * Update docs with pipeline example * Add Cohere2Vision to list of vision models * Sort models	2025-07-31 11:58:45 +00:00
Raushan Turganbay	e1688d28d3	[Model] Cohere2 Vision (#39810 ) * Add cohere2_vision to support CohereLabs/command-a-vision-07-2025 * update and add modualr file * update processors and check with orig impl later * delete unused files * image processor reduce LOC and re-use GotOCR2 * update the config to use modular * model tests pass * processor fixes * check model outputs decorator * address one more comment * Update tokens. Temp - need to read from tokenizer' * fix for multi-gpu * Fix image token handling * upadte image token expansion logic * fix a few issues with remote code loading * not related but modular forces us to change all files now * Add overview and code sample to cohere vision docs * add scripts. TMP. * Update inference script * Create script * set dtype in export script * TO revert: modular export fix * Fix scripts * Revert "TO revert: modular export fix" This reverts commit bdb2f305b61027a05f0032ce70d6ca698879191c. * Use modular weights * Upload to hub Removed OOD weights ad script * Updated docs * fix import error Update docs Added pipeline test * Updated docs * Run modular script remove modular for config Added patch_size Added docstrings in modular Fix OOM Add docs, fixup integration tests. 8-gpu passing * tiny updates * address comments + fixup * add test for chat template * check model outputs workaround * aya vision fix check model inputs * Revert "add test for chat template" This reverts commit 42c756e397f588d76b449ff1f93292d8ee0202d8. * reveert more changes * last revert * skip and merge * faulty copy from --------- Co-authored-by: Julian Mack <julian.mack@cohere.com> Co-authored-by: kyle-cohere <kyle@cohere.com>	2025-07-31 10:57:34 +00:00
Joao Gante	6c3f27ba61	[docs] fix korean docs yet again (#39813 ) fix korean docs yet again	2025-07-31 09:13:25 +00:00
Jeff Zhang	cb289ad243	feat(tokenization): add encode_message to tokenize messages one by one (#39507 ) * feat(tokenization): add encode_message to tokenize messages one by one * Fix the `encode_message` method, remove the `add_generation_prompt` parameter and add the corresponding error handling. Update the document to reflect this change and verify the error handling in the test. * Optimize the `encode_message` method, improve the processing logic of the empty dialogue history, and ensure that the chat template can be applied correctly when the dialogue history is empty. Update the document to reflect these changes. * The `_encode_message` method is deleted, the message coding logic is simplified, and the functional integrity of the `encode_message` method is ensured. Update the document to reflect these changes. * Docs fix * Revert changes in docstring of pad() * Revert changes in docstring * Update src/transformers/tokenization_utils_base.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Repair the call of the `encode_message` method, update it to `encode_message_with_chat_template` to support the chat template, and adjust the relevant test cases to reflect this change. * Optimize the call format of the `apply_chat_template` method, and merge multi-line calls into a single line to improve code readability. --------- Co-authored-by: pco111 <15262555+pco111@user.noreply.gitee.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-31 10:55:45 +02:00
Joao Gante	4f93cc9174	fix: providing a tensor to cache_position in model.generate kwargs always crashes because of boolean test (#39300 ) * fix: cache_position: RuntimeError: Boolean value of Tensor with more than one value is ambiguous * test cache_position * move test * propagate changes --------- Co-authored-by: Masataro Asai <guicho2.71828@gmail.com>	2025-07-30 17:30:28 +00:00
Bernhard Liebl	9b3203f47b	Add callback to monitor progress in whisper transcription (#37483 ) * Add callback to monitor progress in whisper transcription * Added `` around variables, rewording * Add example of `monitor_progress`. --------- Co-authored-by: Eric B <ebezzam@gmail.com>	2025-07-30 17:40:53 +02:00
Drew Ross	7abb5d3992	Update mT5 model card (#39702 ) * Update mt5 model card * Fix casing of model title * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-30 08:35:04 -07:00
Arpon Kapuria	1019b00028	Update model card for Cohere2 (Command R7B) (#39604 ) * Update model card for Cohere2 (Command R7B) * fix: applied suggested changes	2025-07-30 08:34:26 -07:00
Ethan Villarosa	ecbb5ee194	standardized BARThez model card (#39701 ) * standardized barthez model card according to template * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * suggested changes to barthez model card --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-30 08:33:13 -07:00
Raushan Turganbay	8e077a3e45	Fix re-compilations for cross attention cache (#39788 ) fix recompilations for cross attn cache	2025-07-30 14:52:03 +02:00
Yuanyuan Chen	1e0665a191	Simplify conditional code (#39781 ) * Use != Signed-off-by: cyy <cyyever@outlook.com> * Use get Signed-off-by: cyy <cyyever@outlook.com> * Format * Simplify bool operations Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-07-30 12:32:10 +00:00
Yuanyuan Chen	b94929eb49	Fix an invalid condition (#39762 ) Fix an invalid judgement Signed-off-by: cyy <cyyever@outlook.com>	2025-07-30 12:19:17 +00:00
Yao Matrix	bb2ac66453	fix chameleonvision UT failure (#39646 ) * fix chameleonvision UT failure Signed-off-by: matrix.yao@intel.com <Yao Matrix> * fix style Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: matrix.yao@intel.com <Yao Matrix> Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: root <Yao Matrix>	2025-07-30 12:09:26 +00:00
Raushan Turganbay	5348445dfa	Super tiny update (#39727 ) super tiny update	2025-07-30 12:21:41 +02:00
Yih-Dar	54cbea5615	more info in `model_results.json` (#39783 ) more info Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-30 11:43:10 +02:00
eustlb	01d5f94695	[ASR pipline] fix with datasets 4.0 (#39504 ) * fix * handle edge case * make	2025-07-30 08:13:40 +00:00
jiqing-feng	8ab21be570	enable static cache on vision encoder decoder (#39773 ) Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-07-30 08:10:46 +00:00
Cyril Vallez	67cfe11528	Fix Evolla and xLSTM tests (#39769 ) * fix all evolla * xlstm	2025-07-30 09:51:55 +02:00
Quentin Gallouédec	ec4033457e	Don't set `run_name` when none (#39695 ) * Don't set run_name when none * revert --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-07-30 01:39:29 +00:00
Yana Mishula	551a89a4a3	Standardize CLAP model card format (#39738 ) * Standardize CLAP model card format * Apply review feedback * Remove Resources section	2025-07-29 14:13:04 -07:00
StevenBucaille	da70b1389a	docs: Update EfficientLoFTR documentation (#39620 ) * docs: Update EfficientLoFTR documentation * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-29 13:54:44 -07:00
Cyril Vallez	ddd2100767	Fix OmDet test after arg deprecation (#39766 ) fix arg name	2025-07-29 22:10:36 +02:00
st81	4abb053b6c	Remove python3.7 reference from doc link (#39706 )	2025-07-29 09:17:13 -07:00
Joao Gante	33aa49df9d	[docs] Ko doc fixes after toc update (#39660 ) * update docs * doc builder working * make fixup	2025-07-29 17:05:26 +01:00
Manuel de Prada Corral	c4e2069898	Fix Cache.max_cache_len max value for Hybrid models (#39737 ) * fix gemma * fix min * fix quant init issue * fix gemma 3n * skip quant cache test * fix modular * new test for Gemma * include cyril change --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-29 17:12:50 +02:00
Taihang Hu	075dbbceaa	fix(trainer): Correct loss scaling for incomplete gradient accumulation steps (#39659 ) * Fix issue[#38837]: wrong loss scaled in last step of epoch * chore: trigger CI * Update src/transformers/trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/modeling_flash_attention_utils.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: taihang <taihang@U-2RHYVWX7-2207.local> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-07-29 17:12:31 +02:00
Jaehyeon Shin	1d061536cf	🌐 [i18n-KO] Translated `how_to_hack_models.md` to Korean (#39536 ) * docs: ko: how_to_hack_models.md * feat: nmt draft * fix: manual edits	2025-07-29 08:09:16 -07:00
박종범	43fe41c0a8	🌐 [i18n-KO] Translated `perf_train_gpu_one.md` to Korean (#39552 ) * docs: ko: perf_train_gpu_one.md * feat: nmt draft * fix: manual edits * fix: Manually added missing backticks * Update docs/source/ko/perf_train_gpu_one.md fix: remove space between heading and GPU anchor Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/perf_train_gpu_one.md fix: clarify table headers to indicate training speed boost and memory savings Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/perf_train_gpu_one.md fix: improve readability Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/perf_train_gpu_one.md fix : rephrase explanation of data preloading to improve readability Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> --------- Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>	2025-07-29 08:08:57 -07:00
Ahn Joon Sung	9f38763731	🌐 [i18n-KO] Translated `pipeline_gradio.md` to Korean (#39520 ) * docs: ko: pipeline_gradio.md * feat: nmt draft * fix: manual edits * docs: ko: pipeline_gradio.md	2025-07-29 08:04:30 -07:00
Lio (임승섭)	f72311796b	🌐 [i18n-KO] Translated `tokenizer.md` to Korean (#39532 ) * docs: ko: tokenizer.md * feat: nmt draft * fix: manual edits * fix: resolve suggestions Co-authored-by: Yijun Lee <yijun-lee@users.noreply.github.com> Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * fix: resolve suggestions Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> --------- Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>	2025-07-29 08:04:14 -07:00
Kim Juwon	d346d46752	🌐 [i18n-KO] Translated `tvp.md` to Korean (#39578 ) * docs: ko: tvp.md * feat: nmt draft * fix: manual edits * fix: manual edits * fix: manual edits * fix: manual edits * fix: manual edits Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> --------- Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>	2025-07-29 08:04:00 -07:00
Ahnjj_DEV	2f59c15b33	🌐 [i18n-KO] Translated albert.md to Korean (#39524 ) * docs: ko: albert.md * feat: nmt draft * fix: manual edits	2025-07-29 08:03:40 -07:00
Minseo Kim	98386dcee9	🌐 [i18n-KO] Translated `main_classes/peft.md` (#39515 ) * docs: ko: main_classes/peft.md * feat: nmt draft * docs: add missing TOC to documentation for `PeftAdapterMixin` section Added a table of contents (TOC) to the documentation, specifically for the `transformers.integrations.PeftAdapterMixin` section, following the structure and content outlined in [this link](https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin). * fix: Improve naturalness of purpose expression in Korean Changed '관리하기 위한' to '관리할 수 있도록' for more natural Korean expression when describing the purpose of providing functions. * fix: Simplify plural form and make expression more concise Changed '~할 수 없기 때문에' to '~할 수 없어' for more concise expression while maintaining clarity. * fix: Replace technical term '주입' with more natural '적용' Changed '주입할 수 없어' to '적용할 수 없어' for better readability. Considered alternatives: '삽입': Too literal translation of 'inject' '입력': Could be misunderstood as data input '통합': Implies merging two systems '추가': Simple but less precise '적용' was chosen as it's the most natural and widely used term in Korean technical documentation for this context. * fix: update toctree path for PEFT to lowercase Changed the toctree path from 'PEFT' (uppercase) to 'peft' (lowercase) to match the correct directory naming convention and prevent broken links. * docs: update as per reviewer feedback after rebase	2025-07-29 08:03:17 -07:00
Raushan Turganbay	1ad216bd7d	[modenbert] fix regression (#39750 ) * fix regression * add FA2 test	2025-07-29 16:58:59 +02:00
Yih-Dar	379209b603	add `libcst` to `extras["testing"]` in `setup.py` (#39761 ) add Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-29 16:58:51 +02:00
Cyril Vallez	abf101af1f	Fix version issue in modeling_utils.py (#39759 ) fix version issue	2025-07-29 16:15:30 +02:00
jiqing-feng	8db4d79161	Enable xpu allocator on caching_allocator_warmup (#39654 ) * add xpu allocator Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix variable name Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm useless default value Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-07-29 16:06:52 +02:00
Çağrı Tuğrul Canbol	fb141e2c90	Support loading Qwen3 MoE GGUF (#39638 ) * support loading qwen3 gguf * qwen3moe test cases * fix whitespaces * fix ggml tests	2025-07-29 13:44:44 +00:00
Raushan Turganbay	ccb2e0e03b	Fix GPT2 with cross attention (#39754 ) * fix * use new mask API * style * fix copies and attention tests * fix head pruning tests	2025-07-29 15:40:31 +02:00
Yih-Dar	dfd616e658	Avoid OOM when other tests are failing (#39758 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-29 15:35:44 +02:00
ivarflakstad	65df73aa88	AMD disable torchcodec (#39757 ) Temporarily disable torchcodec installation because of bizarre segfault	2025-07-29 13:07:25 +00:00
Yih-Dar	63b3200779	Use `--gpus all` in workflow files (#39752 ) gpu all Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-29 14:53:33 +02:00
Yuanyuan Chen	95faabf0a6	Apply several ruff SIM rules (#37283 ) * Apply ruff SIM118 fix Signed-off-by: cyy <cyyever@outlook.com> * Apply ruff SIM910 fix Signed-off-by: cyy <cyyever@outlook.com> * Apply ruff SIM101 fix Signed-off-by: cyy <cyyever@outlook.com> * Format code Signed-off-by: cyy <cyyever@outlook.com> * More fixes Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-07-29 11:40:34 +00:00
Manuel de Prada Corral	cf97f6cfd1	Fix mamba regression (#39728 ) * fix mamba regression * fix compile test	2025-07-29 12:44:28 +02:00
ivarflakstad	66984ed4f6	Update IMPORTANT_MODELS list (#39734 )	2025-07-29 12:34:57 +02:00
Yih-Dar	de8d0cec30	update `GemmaIntegrationTest::test_model_2b_bf16_dola` again (#39731 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-29 11:42:55 +02:00
Matej Sirovatka	85d5aeb324	Fix: add back base model plan (#39733 ) * Fix: add back base model plan * Fix: typo * fixup * remove unused import --------- Co-authored-by: Arthur <arthur.zucker@gmail.com>	2025-07-29 11:37:33 +02:00
hebangwen	2a90193dd8	[Fix] import two missing typos in `models/__init__.py` for typo checking (#39745 ) * [Fix] import lost gemma3n for type checking in vscode * [Fix] import missing qwen2_5_omni typo * [Refactor] sort by ascii order	2025-07-29 11:35:22 +02:00
Arthur	f2aca3eccc	fix cache inheritance (#39748 ) * fix cache inheritance * styule	2025-07-29 11:24:44 +02:00
Yao Matrix	f3598a95c7	extend more trainer test cases to XPU, all pass (#39652 ) extend more trainer test cases to XPU Signed-off-by: Yao, Matrix <matrix.yao@intel.com>	2025-07-29 10:51:00 +02:00
Raushan Turganbay	75794792ad	BLIPs clean-up (#35560 ) * blips clean up * update processor * readability * fix processor length * fix copies * tmp * update and fix copies * why keep these, delete? * fix test fetcher * irrelevant comment * fix tests * fix tests * fix copies	2025-07-29 10:03:06 +02:00
Ramesh	4f8f51be4e	Add Fast Segformer Processor (#37024 ) * Add Fast Segformer Processor * Modified the params according to segformer model * modified test_image_processing_Segformer_fast args - removed redundant params like do_center_crop,center_crop which aren't present in the original segformer class * added segmentation_maps processing logic form the slow segformer processing module with references from beitimageprocessing fast * fixed code_quality * added recommended fixes and tests to make sure everything processess smoothly * Fixed SegmentationMapsLogic - modified the preprocessing of segmentation maps to use tensors - added batch support * fixed some mismatched files * modified the tolerance for tests * use modular * fix ci --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-07-28 19:22:32 +00:00
Avigyan Sinha	c353f2bb5e	Superpoint fast image processor (#37804 ) * feat: superpoint fast image processor * fix: reran fast cli command to generate fast config * feat: updated test cases * fix: removed old model add * fix: format fix * Update src/transformers/models/superpoint/image_processing_superpoint_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * fix: ported to torch and made requested changes * fix: removed changes to init * fix: init fix * fix: init format fix * fixed testcases and ported to torch * fix: format fixes * failed test case fix * fix superpoint fast * fix docstring --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-07-28 18:15:06 +00:00
Rémi Ouazan	14adcbd937	Fix AMD dockerfile for audio models (#39669 )	2025-07-28 19:05:41 +02:00
Raushan Turganbay	1c6b47451d	Fix cache-related tests (#39676 ) * fix * fix kyutai at last * fix unrelated tests and copies * update musicgen as well * revert tensor * fix old test failures * why it wasn't added?	2025-07-28 17:30:11 +02:00
Cyril Vallez	fc2bd1eac0	Fix Layer device placement in Caches (#39732 ) * fix device placement * style * typo in comment	2025-07-28 16:37:11 +02:00
Eric Bezzam	7623aa3e5f	Fix `Qwen2AudioForConditionalGeneration.forward()` and `test_flash_attn_kernels_inference_equivalence` (#39503 ) * Add missing cache_position argument. * Pass cache_position to language model. * Overwrite prepare_inputs_for_generation. * Set model to half precision for Flash Attention test. * Cast model to bfloat16.	2025-07-28 16:35:08 +02:00
Yih-Dar	28f2619868	skip `Glm4MoeModelTest::test_torch_compile_for_training` (#39670 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-28 16:30:40 +02:00
Yih-Dar	88aed92b59	Update `QAPipelineTests::test_large_model_course` after #39193 (#39666 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-28 16:26:49 +02:00
Ita Zaporozhets	da823fc04e	mllama outputs refactor (#39643 ) * mllama outputs refactor * forgot kwargs * fix output * add can_record_outputs * correct @check_model_inputs placement * ruff and copies * rebase * feedback * only return hidden_states --------- Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>	2025-07-28 15:59:20 +02:00
Cyril Vallez	686bb3b098	Remove all expired deprecation cycles (#39725 ) * remove all deprecation cycles * style * fix * remove * remove * fix * Update modular_dpt.py * back * typo * typo * final fix * remove all args	2025-07-28 15:43:41 +02:00
Anton Vlasjuk	a0fa500a3d	[`CI`] Add Eric to comment slow ci (#39601 ) add to ci	2025-07-28 13:24:00 +00:00
Matej Sirovatka	4c7da9fedf	PATCH: add back n-dim device-mesh + fix tp trainer saving (#39693 ) * Feat: something * Feat: initial changes * tmp changes to unblock * Refactor * remove todo * Feat: docstring * Fix: saving of distributed model in trainer * Fix: distributed saving with trainer * Feat: add pure tp saving * Only require tp dim if ndim > 1 * Fix: default to None * Fix: better comments/errors * Fix: properly check tp_size attribute * Fix: properly check for None in tp_size --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-07-28 12:29:58 +00:00
Jitesh Gupta	cbede2969b	Add self-hosted runner scale set workflow for mi325 CI (#39651 )	2025-07-28 13:32:25 +02:00
Raushan Turganbay	b56d721397	[configuration] remove redundant `classmethod` (#38812 ) * remove redundant classmethod * warning message, add space between words * fix tests * fix copies	2025-07-28 10:38:48 +00:00
jzhang533	02ea23cbde	update ernie model card (#39657 ) * update ernie model doc Signed-off-by: Zhang Jun <jzhang533@gmail.com> * address ruff format error reported by ci Signed-off-by: Zhang Jun <jzhang533@gmail.com> * address check_repository_consistency error reported by ci Signed-off-by: Zhang Jun <jzhang533@gmail.com> --------- Signed-off-by: Zhang Jun <jzhang533@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-28 10:21:18 +00:00
Raushan Turganbay	8b237b8639	[processors] add tests for helper fn (#39629 ) * add tests for helpers * duplicate test for each model * why llava next video has no helper * oops must have been in the commit * fix test after rebase * add copy from	2025-07-28 09:41:58 +00:00
Wang, Yi	6638b3642d	xpu optimization for generation case (#39573 ) * xpu optimization for generation case Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * fix ci failure Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2025-07-28 11:34:58 +02:00
pjo256	5c15eb55d2	fix(tokenization): check token.content for trie (#39587 ) fix: check token.content for trie	2025-07-28 11:28:56 +02:00
BUI Van Tuan	6a61e16626	Fix missing initialization of `FastSpeech2Conformer` (#39689 ) * fix missing initialization of FastSpeech2Conformer * switch order and reactivate tests --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-28 10:47:39 +02:00
Wing Lian	a6393e7d28	fix missing model._tp_size from ep refactor (#39688 ) * fix missing model._tp_size from ep refactor * restore setting device_mesh too	2025-07-26 12:26:36 +02:00
Cyril Vallez	18a7c29ff8	More robust tied weight test (#39681 ) * Update test_modeling_common.py * remove old ones * Update test_modeling_common.py * Update test_modeling_common.py * add * Update test_modeling_musicgen_melody.py	2025-07-25 22:03:21 +02:00
Arthur	c3401d6fad	dev version 4.55	2025-07-25 21:11:20 +02:00
Garrett Goon	97f8c71f52	Add padding-free to Granite hybrid moe models (#39677 ) * start fixing kwarg handling * fmt * updates padding free tests * docs * add missing kwargs modeling_granitemoe.py * run modular util * rm unrelated changes from modular util	2025-07-25 20:10:50 +02:00
Cyril Vallez	d6e9f71a6e	Fix tied weight test (#39680 ) Update test_modeling_common.py	2025-07-25 20:09:33 +02:00
ssum21	bdba1f83a8	fix: glossary edits	2025-07-25 11:06:11 -07:00
bigmoyan	5da6ad2731	fix break for ckpt without _tp_plan (#39658 ) * fix break for ckpt without _tp_plan * Update src/transformers/modeling_utils.py * Update src/transformers/modeling_utils.py --------- Co-authored-by: wangzhengtao <wangzhengtao@msh.team> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-25 20:03:48 +02:00
lgai-exaone	c06d4cd6ce	Add EXAONE 4.0 model (#39129 ) * Add EXAONE 4.0 model * Refactor EXAONE 4.0 modeling code * Fix cache slicing on SWA + FA2 * Fix cache slicing on FA2 + HybridCache * Update EXAONE 4.0 modeling code for main branch * Update o_proj for asymmetric projection * Address PR feedback * Add EXAONE 4.0 docs * Update EXAONE 4.0 modeling code for main branch * update * fix updates * updates * fix * fix * fix --------- Co-authored-by: Arthur <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-25 19:58:28 +02:00
Park Woorak	3e4d584a5b	Support `typing.Literal` as type of tool parameters or return value (#39633 ) * support `typing.Literal` as type of tool parameters * validate the `args` of `typing.Literal` roughly * add test to get json schema for `typing.Literal` type hint * fix: add `"type"` attribute to the parsed result of `typing.Literal` * test: add argument `booleanish` to test multi-type literal * style: auto fixup	2025-07-25 17:51:28 +00:00
Arthur	300d42a43e	Add ep (#39501 ) * EP + updates Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com> Co-authored-by: drbh <drbh@users.noreply.github.com> * remove unrelated change * not working yet but let's see where it goes! * update the api a bit * udpate * where I am at for now * fix ep * refactor the API * yups * fix * fixup * clean modeling * just support llama4 for now! * properly avoid * fix * nits * Update src/transformers/models/llama4/modeling_llama4.py * Update src/transformers/integrations/tensor_parallel.py * style * ,,,, * update --------- Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com> Co-authored-by: drbh <drbh@users.noreply.github.com>	2025-07-25 19:46:17 +02:00
Dario Salvati	abaa043d60	bad_words_ids no longer slow on mps (#39556 ) * fix: bad_words_ids no longer slow on mps * fix: SequenceBiasLogitsProcessor slow `_prepare_bias_variables` method * fix: re-adding a deleted comment * fix: bug in no_bad_words_logits * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-07-25 19:45:41 +02:00
Cyril Vallez	6630c5b714	Add xlstm model (#39665 ) * Add xLSTM cleanly with optimizations. * Fix style. * Fix modeling test. * Make xLSTM package optional. * Fix: Update torch version check. * Fix: Bad variable naming in test. * Fix: Import structure cleaning with Ruff. * Fix: Update docstrings. * Fix: Mitigate unused config attr tests by explicit usage. * Fix: Skip tests, if xlstm library is not installed. * Feat: Enable longer context window for inference by chunking. * Fix: Make training test pass by lowering target accuracy. * Chore: Increase test verbosity for failing generation test. * Update docs/source/en/model_doc/xlstm.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix: Make xlstm available even without CUDA. * Chore: Remove unnecessary import. * Fix: Remove BOS insertion. * Chore: Improve xLSTMCache documentation. * Integrate basic xLSTM fallback code. * Chore: Remove unnecessary import. * Chore: Remove duplicate LayerNorm. * chore: update copyright, minor reformatting * fix: refactor mLSTMStateType due to missing torch import * fix: add missing import * Chore: Replace einops. * fix: apply ruff formatting * fix: run `make fix-copies` to re-generate dummy_pt_objects.py * fix: make type hints Python 3.9 compatible * fix: remove obsolete import * fix: remove obsolete method from docs * chore: remove obsolete `force_bos_token_insert` from config * Chore: Remove duplicated xLSTMCache class. * Fix: Formatting of modeling_xlstm.py * Chore: Remove xlstm package requirement from test. Re-add update_rnn_state. * Fix: Update xLSTMCache docstring. * Feat: Add proper initialization of xLSTM. * Chore: Re-format files. * Chore: Adapt format. * Fix: xLSTMCache import restructuring. * Fix: Add __all__ lists to modeling and configuration files. * Chore: Reformat. * Fix: Remove unnecessary update_rnn_state function. * Fix: Undo test accuracy quickfix. * Fix: Update copyright year, remvoe config copy. * Chore: Flatten all internal configs to xLSTMConfig. * Fix: Unused config variables check. * Chore: Remove unnecessary imports. * Fix: Unify xlstm cache argument from batch_size to max_batch_size. * Chore: Remove bad default arg value for xLSTMCache. * Chore: Rename core configuration arguments to HF default in xLSTM. * Chore: Fix formatting. * Fix: xLSTM Cache config access. * Fix: Update xlstm tests for config update. * Feat: Re-add embbeding_dim, num_blocks config options for compat with xLSTM-7B. * Fix: Configuration xLSTM python3.9 syntax. * Fix: Difference to main in test_utils.py assertion. * Fix: Bad syntax in xlstm config for python3.9. * Fix: xLSTMConfig docstring. * Fix: xLSTMConfig docstring. * Fix typing issues in xLSTM and BeiT, Paligemma. * Fix: Exclude xLSTM from test cache utils. * Chore: Fix style. * Chore: Fix format. * Chore: Remove unnecessary LayerNorm, NormLayer layer abstractions. * Chore: Remove asserts and replace with ValueErrors. * Chore: Update __init__.py structure of xLSTM. * Chore: Clean xLSTM initialization of weights. * Fix index names in modeling_xlstm.py * Update xlstm model test typing annotations. * Fix: Remove all asserts. * Revert changes to the main __init__.py * Fix: Move xLSTMCache to modeling_xlstm.py * Fix: Remove xLSTMForCausalLM mapping from modeling_auto.py * Remove xLSTMCache from dummy_pt_objects.py * Fix: Remove extended torchdynamo compilation check integrating cuda graph captures. * Revert test_cache_utils.py xLSTM change. * Fix: Move xLSTM init functions before init call. * Remove xLSTMCache from generation utils. * Fix: Clean xLSTM init functionality for recursive calls. * Fix: Move xLSTMCache before its first call. * Fix formatting. * Add partial docstring for xLSTMModel forward. * Fix xLSTMCache docstring in xLSTMModel. * Remove xLSTMCache from public documentation. Update auto_docstring. * Remove all agressive shape comments * style * Fix names * simplify * remove output_hidden_states * Update modeling_xlstm.py * Update modeling_xlstm.py * Update test_modeling_xlstm.py * Update modeling_xlstm.py * Update modeling_xlstm.py * fix * fix * style * style --------- Co-authored-by: Korbinian Poeppel <korbinian.poeppel@nx-ai.com> Co-authored-by: Korbinian Pöppel <37810656+kpoeppel@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Sebastian Böck <sebastian.boeck@nx-ai.com> Co-authored-by: Korbinian Poeppel <poeppel@ml.jku.at>	2025-07-25 19:39:17 +02:00
Yoni Gozlan	ed9a96bc6d	Use auto_docstring for perception_lm fast image processor (#39679 )	2025-07-25 17:32:48 +00:00
Ryan Mullins	d913b39ef3	fix: HWIO to OIHW (#39200 ) * fix: HWIO to OIHW * Bug in attention type * Conversion script docstring * style --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Arthur <arthur.zucker@gmail.com>	2025-07-25 19:23:15 +02:00
Yoni Gozlan	a26f0fabb8	Fix auto_docstring crashing when dependencies are missing (#39564 ) * add try except to not crash auto_docstring when some dependency are missing * safeguard None value in placeholder dict	2025-07-25 19:19:23 +02:00
Armaghan Shakir	69cff312f5	Add support for DeepseekAI's DeepseekVL (#36248 ) * upload initial code * update deepseek-vl adaptor * update hierarchy of vision model classes * udpate aligner model * add text model * Added Image Processor * Added Image Processor * Added Image Processor * apply masks * remove projection; add aligner * remove interpolate_pos_encoding * remove unused params in config * cleaning * Add the __init__ file * added processing deepseek_vl class * modified the deepseek-vl processor * modified the deepseek-vl processor * update __init__ * Update the image processor class name * Added Deepseek to src/transformers/__init__.py file * Added Deepseek to image_processing_auto.py * update the __init__ file * update deepseek_vl image processor * Update Deepseek Processor * upload fast image processor * Revert "upload fast image processor" This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4. * update image processor * flatten heirarchy * remove DeepseekVLModel * major update (complete modeling) * auto modeling and other files * formatting * fix quality * replace torchvision in modeling * set default do_normalize to False * add fast image processor template using tool * update image processors * add fast image processor to other files * update liscense * Added deepseek image testcases * update image test * update processor * write CHAT_TEMPLATE * update model for processor * fix processor * minor fixes and formatting * fix image processing and tests * fix interpolation in sam * fix output_attentions in DeepseekVLModel * upload test_modeling * fix tests because of vocab size * set use_high_res_vision=False in tests * fix all modeling tests * fix styling * remove explicit background_color from image processors * added test_processor * added test_processor * fix processor tests * update docs * update docs * update docs * update conversion script * Fixed typos * minor fixes from review - remove model_id comments in examples - remove from pre-trained auto mapping - move to image-text-to-text from vision-to-seq in auto mapping - add image_token_index to __init__ for config - remove outdated temporary config in conversion script - update example to use chat_template in docstring example - update liscense 2021->2025 * fix type in config docstring Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * update get_image_features * fix config * improve DeepseekVLImageProcessor.preprocess * return image_hidden_states * use AutoTokenizer and AutoImageProcessor in Processor * fix model outputs * make num_image_tokens configurable * fix docstring of processor * move system prompt to chat template * fix repo consistency * fix return_dict * replace SamVisionEncoder with SamVisionModel * update to remove deepcopy * 🛠️ Major Architectural Changes (Adds DeepseekVLHybrid) * fix quality checks * add missing hybrid in auto modeling * run make style * update sam_hq * update high_res_size in test * update docs following #36979 * update code with auto_docstring * update conversion scripts * fix style * fix failing test because of tuple * set weights_only=True in conversion script * use safetensors.torch.load_file instead of torch.load in conversion script * make output_dir optional in conversion script * fix code snippets in docs (now the examples work fine) * integration tests for DeepseekVL * update expected texts * make style * integration tests for DeepseekVLHybrid * fix class name * update expected texts for hybrid * run "make style" * update since changes in main * run make-style * nits since changes in main * undo changes in sam * fix tests * fix tests; update with main * update with main: output_attention/output_hidden_states * fix copied part in deepseek_vl * run fix-copies * fix output_hidden_states * sam: fix _init_weigths * use modular for DeepseekVL * make image processor more modular * modular: use JanusPreTrainedModel * janus: provide kwargs in loss * update processors in conversion script * Revert "sam: fix _init_weigths" This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27. * run fix-copies --------- Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>	2025-07-25 19:18:50 +02:00
Cyril Vallez	a98bbc294c	Add missing flag for CacheLayer (#39678 ) * fix * Update cache_utils.py	2025-07-25 19:12:13 +02:00
Xibin Bayes Zhou	45c7bfb157	Add evolla rebase main (#36232 ) * add evolla * adding protein encoder part * add initial processing test * save processor * add docstring * add evolla processor * add two test * change vision to protein * change resampler to sequence_compressor * change vision to protein * initial update for llama * add initial update for llamaForCausalLM * add `test_processor`, `test_saprot_output`, `test_protein_encoder_output` * change evolla, but still working on it * add test_single_forward * pass test_attention_outputs * pass test_hidden_states_output * pass test_save_load and test_from_pretrained_no_checkpoint * pass test_cpu_offload * skip some tests * update new progress * skip test_model_is_small * pass test_model_weights_reload_no_missing_tied_weights * pass test_model_get_set_embeddings * pass test_cpu_offload * skip test_resize_embeddings * add pipeline_model_mapping * remote old setUp * pass processor save_pretrained and load_pretrained * remove pooling layer * pass test_inputs_embeds_matches_input_ids * pass test_model_is_small * pass test_attention_outputs * pass test_initialization * pass test_model_get_set_embeddings * pass test_single_forward * skip test_disk_offload_bin and test_disk_offload_safetensors * fix most tests * pass test_protein_encoder_output * remove useless code * add EvollaForProteinText2Text * pass test_saprot_output * pass all EvollaModelTest test and remove processor test * add processor test to its own file * skip is_training since esm skipped it and the saprot code causes error when setting is_training True * pass processor tests * solve all except config * pass most cases * change init * add doc to `configuration_evolla.py` * remove image_processing test * remove extra processor test * remove extra modules * remove extra modules * change all configs into one config * pass all evolla test * pass `make fixup` * update short summary * update Evolla-10B-hf * pass check_dummies.py and check_code_quality * fix `tests/models/auto/test_tokenization_auto.py::AutoTokenizerTest::test_model_name_edge_cases_in_mappings` * remove dummy codes * change format * fix llava issue * update format * update to solve llama3 access issue * update to make forward right * solve processor save load problem from instructblip solution * remove unexpected file * skip `test_generation_tester_mixin_inheritance` * add `test_single_forward_correct` and `test_inference_natural_language_protein_reasoning` * add `modular_evolla.py` * solved issue #36362 * run `make fixup` * update modular * solve float32 training * add fix * solve `utils/check_docstrings.py` * update * update * update * remove other files and replace sequential and einsum * add use case in document * update the models * update model * change some wrong code * Update src/transformers/models/evolla/modular_evolla.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/evolla/modular_evolla.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/evolla/modular_evolla.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/evolla/modular_evolla.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix issues mentioned in PR * update style and rearrange the placement * fix return_dict argument issue * solve SaProtConfig issue * Solve EvollaSaProtRotaryEmbedding issue * solve attention_mask issue * solve almosst all issues * make style * update config * remove unrelated pickle file * delete pickle files * fix config * simplify a lot * remove past k-v from encoder * continue work * style * skip it from init * fix init * fix init * simplify more * fill in docstrings * change test for generation * skip test * fix style --------- Co-authored-by: Chenchen Han <13980209828@163.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-25 19:11:57 +02:00
Yih-Dar	2670da66ce	update expected outputs for whisper after #38778 (#39304 ) * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-25 16:48:10 +00:00
Yih-Dar	4b125e2993	fix `kyutai` tests (#39416 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>	2025-07-25 18:42:04 +02:00
Arthur	4f17bf0572	Fixes the BC (#39636 ) * fix * update * Update src/transformers/utils/generic.py Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * fixup * fixes * fix more models * fix fix fix * add embedding to more models * update * update * fix --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>	2025-07-25 18:41:21 +02:00
Cyril Vallez	ddb0546d14	Delete bad rebasing functions (#39672 ) * remove outdated stuff * remove comment * use register * remove finally clause (to allow further check if fallback to sdpa) * general exception * add wrapper * revert check * typo	2025-07-25 18:28:09 +02:00
Anton Vlasjuk	a91653561e	[`Ernie 4.5`] Post merge adaptations (#39664 ) * ernie 4.5 fixes * Apply style fixes * fix --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-07-25 17:36:18 +02:00
Joao Gante	5d0ba3e479	[CI] revert device in `test_export_static_cache` (#39662 ) * revert device * add todo	2025-07-25 15:36:12 +00:00
Pavel Iakubovskii	850bdeaa95	Fix ModernBERT Decoder model (#39671 ) fix	2025-07-25 16:20:12 +01:00
Yoni Gozlan	17f02102c5	🚨[Fast Image Processor] Force Fast Image Processor for Qwen2_VL/2_5_VL + Refactor (#39591 ) * init * Force qwen2VL image proc to fast * refactor qwen2 vl fast * fix copies * Update after PR review and update tests to use return_tensors="pt" * fix processor tests * add BC for min pixels/max pixels	2025-07-25 11:11:28 -04:00
Lysandre Debut	f90de364c2	Rename huggingface_cli to hf (#39630 ) * Rename huggingface_cli to hf * hfh	2025-07-25 14:10:04 +02:00
revanth	3b3f9c0c46	fix(voxtral): correct typo in apply_transcription_request (#39572 ) * fix(voxtral): correct typo in apply_transcription_request * temporary wrapper: apply_transcrition_request * Update processing_voxtral.py * style: sort imports in processing_voxtral.py * docs(voxtral): fix typo in voxtral.md * make style * doc update --------- Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>	2025-07-25 12:09:44 +00:00
Joao Gante	2a82cf06ad	make fixup (#39661 )	2025-07-25 11:27:45 +00:00
Joao Gante	e3760501b0	[docs] fix ko cache docs (#39644 ) fix ko docs	2025-07-25 10:06:03 +01:00
Quentin Lhoest	91f591f7bc	Make pytorch examples UV-compatible (#39635 ) * update release.py * add uv headers in some pytorch examples * rest of pytorch examples * style	2025-07-25 10:46:22 +02:00
Wing Lian	c46c17db57	revert change to cu_seqlen_k and max_k when preparing from position_ids (#39653 )	2025-07-25 10:28:22 +02:00
Jeffrey Li	4600c27c4f	Fix: explicit not none check for tensors in flash attention (#39639 ) fix: explicit not none check for tensors	2025-07-25 10:09:14 +02:00
Raushan Turganbay	c392d47c9b	[attention] fix test for packed padfree masking (#39582 ) * fix most tests * skip a few more tests * address comments * fix chameleon tests * forgot to uncomment * qwen has its own tests with images, rename it as well	2025-07-25 07:44:52 +00:00
lmarshall12	565c035a2e	Add owlv2 fast processor (#39041 ) * add owlv2 fast image processor * add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class * add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class * change references to owlVit to owlv2 in docstrings for post process methods * change type hints from List, Dict, Tuple to list, dict, tuple * remove unused typing imports * add disable grouping argument to group images by shape * run make quality and repo-consistency * use modular * fix auto_docstring --------- Co-authored-by: Lewis Marshall <lewism@elderda.co.uk> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-07-25 02:40:11 +00:00
ssum21	61eb8b32cc	fix: manual edits	2025-07-24 16:07:28 -07:00
ssum21	4d297c2e8c	feat: nmt draft	2025-07-24 15:57:31 -07:00
ssum21	0dc80fcdad	docs: ko: deepseek_v3.md	2025-07-24 15:54:56 -07:00
Wing Lian	5a81d7e0b3	revert behavior of _prepare_from_posids (#39622 ) * revert behavior of _prepare_from_posids * add back cu_seqlens_k and max_k for inference	2025-07-24 20:31:00 +02:00
eustlb	ad6fd2da0e	[Voxtral] values for A10 runners (#39605 ) * values for A10 runners * make * as for Llava * does not apply to Voxtral	2025-07-24 18:52:35 +02:00
Joao Gante	4741e1f1b7	[timm] new timm pin (#39640 )	2025-07-24 16:01:59 +00:00
StevenBucaille	12b612830d	[efficientloftr] fix model_id in tests (#39621 ) fix: wrong EfficientLoFTR model id in tests	2025-07-24 10:41:06 +01:00
Raushan Turganbay	947a37e8f5	Update recent processors for vLLM backend (#39583 ) * update recent models and make sure it runs withh vLLM * delete!	2025-07-24 10:29:27 +02:00
Matthew Hernandez	7b897fe583	[Docs] Translate audio_classification.md from English to Spanish (#39513 ) * Docs: translate audio_classification to Spanish * Update audio_classification.md * Remove space * Normalize backticks * Update audio_classification.md * Apply corrections recommended by aaronjimv * Update _toctree.yml --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-23 15:55:13 -07:00
Ethan Villarosa	9b7244f189	standardized YOLOS model card according to template in #36979 (#39528 ) * standardized YOLOS model card according to template in #36979 * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * standardized YOLOS model card according to template in #36979 * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * replaced YOLOS architecture image, deleted quantization and AttentionMaskVisualizer sections * removed cli section * Update yolos.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-23 11:00:25 -07:00
JoestarGagan	ec8a09a5fe	Feature/standardize opt model card (#39568 ) * docs: Standardize OPT model card with enhanced details * Remove incorrect link from OPT model card * Address review feedback on OPT model card * Update opt.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-23 10:57:48 -07:00
Eric Bezzam	c5a80dd6c4	🔴 Fix EnCodec internals and integration tests (#39431 ) * EnCodec fixes and update integration tests. * Apply padding mask when normalize is False. * Update comment of copied function. * Fix padding mask within modeling. * Revert padding function. * Simplify handling of padding_mask. * Address variable codebook size. * Add output for padding for consistency with original model, fix docstrings. * last_frame_pad_length as int * Update example code. * Improve docstring/comments. * Shorten expected output. * Consistent docstring. * Parameterize tests. * Properties for derived variables. * Update expected outputs from GitHub runner. * Consistent outputs with runner GPUs.	2025-07-23 19:39:27 +02:00
Eric Bezzam	7a4e2e7868	Fix DAC integration tests and checkpoint conversion. (#39313 ) * Fix DAC (slow) integration tests. * Fix DAC conversion. * Address comments * Sync with main, uncomment nn.utils.parametrizations.weight_norm. * Update DAC integration tests with expected outputs. * Added info about encoder/decoder error and longer decoder outputs. * Parameterize tests. * Set expected values to GitHub runners.	2025-07-23 19:21:26 +02:00
Eric Bezzam	596a75f6e9	Move openai import (#39613 )	2025-07-23 19:05:39 +02:00
Lysandre Debut	a0e5a7d34b	Transformers serve VLM (#39454 ) * Add support for VLMs in Transformers Serve * Raushan comments * Update src/transformers/commands/serving.py Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> * Quick fix * CPU -> Auto * Update src/transformers/commands/serving.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Fixup --------- Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-07-23 17:03:18 +02:00
Pablo Montalvo	ea56eb6bed	Fix important models CI (#39576 ) * relax test boundaries and fix from config * eager is always supported.	2025-07-23 16:24:29 +02:00
Maxime Grenu	0fe03afeb8	Fix typos and grammar issues in documentation and code (#39598 ) - Fix Cyrillic 'Р' to Latin 'P' in Portuguese language link (README.md) - Fix 'meanginful' to 'meaningful' in training documentation - Fix duplicate 'Cohere' reference in modular transformers documentation - Fix duplicate 'the the' in trainer and chat command comments 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-07-23 12:43:11 +00:00
Matej Sirovatka	82603b6cc2	Allow `device_mesh` have multiple dim (#38949 ) * Feat: something * Feat: initial changes * tmp changes to unblock * Refactor * remove todo * Feat: docstring --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-07-23 12:27:36 +00:00
jiqing-feng	10c990f7e2	enable triton backend on awq xpu (#39443 ) * enable triton backend on awq xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update src/transformers/quantizers/quantizer_awq.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * fix dtype check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-07-23 12:10:38 +00:00
Raushan Turganbay	e7e6efcbbd	[idefics3] fix for vLLM (#39470 ) * fix idefics3 for vllm tests * fix copies	2025-07-23 14:00:43 +02:00
llbdyiu66	a62f65a989	fix moe routing_weights (#39581 ) * fix moe routing_weights * fix ernie4_5_moe routing_weights * fix integration test --------- Co-authored-by: llbdyiu66 <llbdyiu66@users.noreply.github.com> Co-authored-by: Vasqu <antonprogamer@gmail.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>	2025-07-23 11:20:23 +00:00
Andrei Panferov	623ab01039	FP-Quant support (#38696 ) * quartet * quartet qat -> quartet * format * bf16 backward * interfaces * forward_method * quartet -> fp_quant * style * List -> list * list typing * fixed format and annotations * test_fp_quant * docstrings and default dtypes * better docstring and removed noop checks * docs * pseudoquantization support to test on non-blackwell * pseudoquant * Pseudoquant docs * Update docs/source/en/quantization/fp_quant.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/quantization/fp_quant.md * Update docs/source/en/quantization/fp_quant.md * Update src/transformers/utils/quantization_config.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Update tests/quantization/fp_quant_integration/test_fp_quant.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Update tests/quantization/fp_quant_integration/test_fp_quant.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * small test fixes * dockerfile update * spec link * removed `_process_model_after_weight_loading` * toctree --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-07-23 11:41:10 +02:00
Raushan Turganbay	eb1a007f7f	Rename `supports_static_cache` to `can_compile_fullgraph` (#39505 ) * update all * Apply suggestions from code review Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * apply suggestions * fix copies --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-07-23 09:35:18 +00:00
Quentin Gallouédec	b357cbb19d	[Trackio] Allow single-gpu training and monitor power (#39595 ) Allow not distributed and monitor power	2025-07-23 11:22:50 +02:00
Cyril Vallez	019b74977d	Generic task-specific base classes (#39584 ) * first shot * Update modeling_layers.py * fix mro order * finalize llama * all modular and copied from from llama * fix	2025-07-23 10:49:47 +02:00
Cyril Vallez	5dba4bc7b2	Fix DynamicCache and simplify Cache classes a bit (#39590 ) * fix * use kwargs * simplify * Update cache_utils.py * Update cache_utils.py * Update test_cache_utils.py * fix * style	2025-07-23 10:13:45 +02:00
Sangbum Daniel Choi	d9b35c635e	Mask2former & Maskformer Fast Image Processor (#35685 ) * add maskformerfast * test * revert do_reduce_labels and add testing * make style & fix-copies * add mask2former and make fix-copies TO DO: add test for mask2former * make fix-copies * fill docstring * enable mask2former fast processor * python utils/custom_init_isort.py * make fix-copies * fix PR's comments * modular file update * add license * make style * modular file * make fix-copies * merge * temp commit * finish up maskformer mask2former * remove zero shot examples --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-07-23 02:47:47 +00:00
Quentin Gallouédec	6e9972962f	🎯 Trackio integration (#38814 ) * First attempt * fix * fix * Enhance TrackioCallback to log GPU memory usage and allocation * Enhance Trackio integration in callbacks and training arguments documentation * re order * remove unused lines * fix torch optional	2025-07-22 14:50:20 -07:00
space_samurai	c6d0500d15	[WIP] Add OneformerFastImageProcessor (#38343 ) * [WIP] OneformerFastImageProcessor * update init * Fully working oneformer image processor fast * change Nearest to Neares exact interpolation where needed * fix doc --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-07-22 20:41:39 +00:00
Harry Mellor	4884b6bf41	Fix link in "Inference server backends" doc (#39589 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-22 16:44:08 +00:00
Marc Sun	075a65657a	Torchdec RuntimeError catch (#39580 ) * fix * fix * maybe better * style	2025-07-22 18:35:03 +02:00
Kashif Rasul	2936902a76	[Paged-Attention] Handle continuous batching for repetition penalty (#39457 ) * Handle continuous batching for repetition penalty * fix last scores and with token mask creation * add test * Update src/transformers/generation/continuous_batching.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/logits_process.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix formatting * remove unneeded cast --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-22 18:13:40 +02:00
Cássia Sampaio	cbcb8e6c1f	updated mistral3 model card (#39531 ) * updated mistral3 model card (#1) * updated mistral3 model card * applying suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * made all changes to mistral3.md * adding space between paragraphs in docs/source/en/model_doc/mistral3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * removing duplicate in mistral3.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * adding 4 backticks to preserve formatting --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-22 09:01:55 -07:00
Woojun Jung	601260fd96	Update `docs/source/ko/_toctree.yml` (#39516 ) docs: update `docs/source/ko/_toctree.yml`	2025-07-22 09:00:42 -07:00
Manuel de Prada Corral	c338fd43b0	[cache refactor] Move all the caching logic to a per-layer approach (#39106 ) * Squash for refactor: Replace monolithic cache classes with modular LayeredCache (#38077) - Introduces CacheLayer and Cache base classes - Ports Static, Dynamic, Offloaded, Quantized, Hybrid, etc. to use layers - Implements method/attr dispatch across layers to reduce boilerplate - Adds CacheProcessor hooks for offloading, quantization, etc. - Updates and passes tests * fix quantized, add tests * remove CacheProcessorList * raushan review, arthur review * joao review: minor things * remove cache configs, make CacheLayer a mixin (joaos review) * back to storage inside Cache() * remove cachebase for decorator * no more __getattr__ * fix tests * joaos review except docs * fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant` More verbose exceptions in `fix_docstring` on docstring formatting issues. * Revert "back to storage inside Cache()" This reverts commit 27916bc2737806bf849ce2148cb1e66d59573913. * cyril review * simplify cache export * fix lfm2 cache * HybridChunked to layer * BC proxy object for cache.key_cache[i]=... * reorder classes * bfff come on LFM2 * better tests for hybrid and hybridChunked * complete coverage for hybrid chunked caches (prefill chunking) * reimplementing HybridChunked * cyril review * fix ci * docs for cache refactor * docs * oopsie * oopsie * fix after merge * cyril review * arthur review * opsie * fix lfm2 * opsie2	2025-07-22 16:10:25 +02:00
Cyril Vallez	b16688e96a	General weight initialization scheme (#39579 ) * general + modulars from llama * all modular models * style and fix musicgen * fix * Update configuration_musicgen.py * Update modeling_utils.py	2025-07-22 16:04:20 +02:00
Ákos Hadnagy	015b62bf3e	Add AMD GPU expectations for LLaVA tests (#39486 ) * Add AMD GPU expectation to llava tests * FMT * Remove debug print * Address review comments	2025-07-22 14:01:54 +00:00
Arthur	efceeaf267	Kernels flash attn (#39474 ) * use partial to wrap around `transformers` utils! * try to refactor? * revert one wrong change * just a nit * push * reverter watever was wrong! * some nits * fixes when there is no attention mask * bring the licence back * some fixes * nit * style * remove prints * correct dtype * fa flags for testing * update * use paged attention if requested! * updates * a clone was needed, not sure why * automatically create cu seq lens when input is flash, this at least makes sure layers don't re-compute * simplify and improve? * flash attention is kinda broken on recent cuda version so allow the opportunity to use something else * fix! * protect kernels import * update * properly parse generation config being passed * revert and update * add two tests * some fixes * fix test FA2 * takes comment into account * fixup * revert changes * revert the clone, it is only needed because the metal kernel is not doing it? * [docs] update attention implementation and cache docs (#39547) * update docs * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * applu suggestions --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix mps on our side for now * Update src/transformers/integrations/flash_paged.py * no qa --------- Co-authored-by: Vasqu <antonprogamer@gmail.com> Co-authored-by: Raushan Turganbay <raushan@huggingface.co> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-22 15:41:06 +02:00
Ákos Hadnagy	b62557e712	Add AMD expectations to Mistral3 tests (#39481 ) Add AMD expectations to mistral3 tests	2025-07-22 15:40:16 +02:00
Raushan Turganbay	1806583390	[docs] Create page on inference servers with transformers backend (#39550 ) * draft docs on inference servers * Update docs/source/en/_toctree.yml Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * update * dic build failed * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * apply last suggestions --------- Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-22 15:31:10 +02:00
Raushan Turganbay	cd98c1fee3	[docs] update attention implementation and cache docs (#39547 ) * update docs * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * applu suggestions --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-22 15:06:43 +02:00
Ákos Hadnagy	ef99537f37	Add AMD test expectations to DETR model (#39539 ) * Add AMD test expectations to DETR model * Fix baseline expectation * Address review comments * Make formatting a bit more consistent	2025-07-22 12:07:10 +00:00
Dominik Baran	30567c28e8	[timm_wrapper] add support for gradient checkpointing (#39287 ) * feat: add support for gradient checkpointing in TimmWrapperModel and TimmWrapperForImageClassification * ruff fix * refactor + add test for not supported model * ruff * Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-07-22 11:07:52 +00:00
Wing Lian	a44dcbe513	Fixes needed for n-d parallelism and TP (#39562 ) Handle non-DTensors cases in TP Layers Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-07-22 10:24:59 +00:00
Ákos Hadnagy	0cae633ce1	Bump AMD container for 2.7.1 PyTorch (#39458 ) * Bump AMD container for 2.7.1 PyTorch * Forgot to update pinned packages	2025-07-22 12:11:38 +02:00
StevenBucaille	a88ea9cbc8	Add EfficientLoFTR model (#36355 ) * initial commit * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix: various typos, typehints, refactors from suggestions * fix: fine_matching method * Added EfficientLoFTRModel and AutoModelForKeypointMatching class * fix: got rid of compilation breaking instructions * docs: added todo for plot * fix: used correct hub repo * docs: added comments * fix: run modular * doc: added PyTorch badge * fix: model repo typo in config * fix: make modular * fix: removed mask values from outputs * feat: added plot_keypoint_matching to EfficientLoFTRImageProcessor * feat: added SuperGlueForKeypointMatching to AutoModelForKeypointMatching list * fix: reformat * refactor: renamed aggregation_sizes config parameter into q, kv aggregation kernel size and stride * doc: added q, kv aggregation kernel size and stride doc to config * refactor: converted efficientloftr implementation from modular to copied from mechanism * tests: overwrote batching_equivalence for "keypoints" specific tests * fix: changed EfficientLoFTRConfig import in test_modeling_rope_utils * fix: make fix-copies * fix: make style * fix: update rope function to make meta tests pass * fix: rename plot_keypoint_matching to visualize_output for clarity * refactor: optimize image pair processing by removing redundant target size calculations * feat: add EfficientLoFTRImageProcessor to image processor mapping * refactor: removed logger and updated attention forward * refactor: added auto_docstring and can_return_tuple decorators * refactor: update type imports * refactor: update type hints from List/Dict to list/dict for consistency * refactor: update MODEL_MAPPING_NAMES and __all__ to include LightGlue and AutoModelForKeypointMatching * fix: change type hint for size parameter in EfficientLoFTRImageProcessor to Optional[dict] * fix typing * fix some typing issues * nit * a few more typehint fixes * Remove output_attentions and output_hidden_states from modeling code * else -> elif to support efficientloftr * nit * tests: added EfficientLoFTR image processor tests * refactor: reorder functions * chore: update copyright year in EfficientLoFTR test file * Use default rope * Add docs * Update visualization method * fix doc order * remove 2d rope test * Update src/transformers/models/efficientloftr/modeling_efficientloftr.py * fix docs * Update src/transformers/models/efficientloftr/image_processing_efficientloftr.py * update gradient * refactor: removed unused codepath * Add motivation to keep postprocessing in modeling code * refactor: removed unnecessary variable declarations * docs: use load_image from image_utils * refactor: moved stage in and out channels computation to configuration * refactor: set an intermediate_size parameter to be more explicit * refactor: removed all mentions of attention masks as they are not used * refactor: moved position_embeddings to be computed once in the model instead of every layer * refactor: removed unnecessary hidden expansion parameter from config * refactor: removed completely hidden expansions * refactor: removed position embeddings slice function * tests: fixed broken tests because of previous commit * fix is_grayscale typehint * not refactoring * not renaming * move h/w to embeddings class * Precompute embeddings in init * fix: replaced cuda device in convert script to accelerate device * fix: replaced stevenbucaille repo to zju-community * Remove accelerator.device from conversion script * refactor: moved parameter computation in configuration instead of figuring it out when instantiating a Module * fix: removed unused attributes in configuration * fix: missing self * fix: refactoring and tests * fix: make style --------- Co-authored-by: steven <steven.bucaille@buawei.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-07-22 10:53:16 +01:00
Raushan Turganbay	3bc726b381	[gemma3] fix bidirectional image mask (#39396 ) * fix gemma3 mask * make compile happy, and use only torch ops * no full attention between images * update tests * fix tests * add a fast test	2025-07-22 10:04:56 +02:00
nlhm	fbeaf96f9e	Update OLMoE model card (#39344 ) * Update OLMoE model card * Checks Test * Add license and code * Update docs/source/en/model_doc/olmoe.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update olmoe.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-21 16:41:01 -07:00
Orion Weller	641aaed7c0	Update modernbertdecoder docs (#39453 ) * update docs with paper and real model * nit * Apply suggestions from code review Thanks to @stevhlui! Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Remove usage examples, add quantization --------- Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-21 16:40:22 -07:00
Anton Vlasjuk	049a674e68	[`CI`] Fix post merge ernie 4.5 (#39561 ) fix repo consistency	2025-07-21 20:56:24 +02:00
Yoni Gozlan	b3ebc761e2	[Fast image processors] Improve handling of image-like inputs other than images (segmentation_maps) (#39489 ) * improve handlike of other image-like inputs in fast image processors * fix issues with _prepare_images_structure * update sam image processor fast * use dict update	2025-07-21 14:12:14 -04:00
Anton Vlasjuk	b4115a426e	[`Ernie 4.5`] Add ernie text models (#39228 ) Some checks failed Release - Conda / build_and_package (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details * init * copied from remote * add proper structure and llama like structure * fixup * revert to state that works * get closer to llama * slow and steady * some removal * masks work * it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm * nice * getting closer * closer to transformers style * let's simplify this, batching works now * simplified * working version with modular * it is indeed the rotation per weights, make it complete llama style * cleanup conversion, next to look at -> tokenizer * remove llama artefacts * fix modeling tests (common ones) * style * integration test + first look into tokenization (will need more work, focussing on modeling other models first) * style * working moe version, based on remote * lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view) * more cleanup * refactor namings and remove addition forXXX classes * our moe won't cut it it seems, correction bias seems to be missing in remote code version * tokenization change (remote) * our moe version works when adding normalization :D * cleanup moe * nits * cleanup modeling -> let's get to modular next * style * modular v1 * minor things + attempt at conversion (which doesn't work) * no conversion follow glm, fixup modular and other nits * modular cleanup * fixes * tests, tests, tests + some moe dtype forcing * simplify modular, fix fatal fa2 bug, remaining tests * fix import issue? * some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float * fix sdpa test, load on init dtype only * fixup post merge * style * fix doc links * tokenization cleanup beginnings * simplify tokenizer by a lot as its basically llama * tokenizer is full llama with different defaults + extra special tokens * sync og special tokens of ernie * fix decoding with numbers (also in remote done what a timing), begin of tok tests * align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama) * nits * docs * my daily post merge it is * check * tokenization update with explanations and conversion script * review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio) * post merge fixes * fixup tokenization, llama fast is the way to go * more fixups * check * import fixes * correction bias following the paddle code * fix * fix TP plan, fix correction bias sharding during forward * style * whoops * fix tied weights * docs and last nit * license * flasky tests * move repo id, update when merged on the hub	2025-07-21 19:51:49 +02:00
Pablo Montalvo	69b158260f	Refactor embedding input/output getter/setter (#39339 ) * simplify common get/set * remove some noise * change some 5 years old modeling utils * update examples * fix copies * revert some changes * fixes, gah * format * move to Mixin * remove smolvlm specific require grad * skip * force defaults * remodularise some stuff * remodularise more stuff * add safety for audio models * style * have a correct fallback, you daft donkey * remove this argh * change heuristic for audio models * fixup * revert * this works * revert again * 🧠 * aaah ESM has two modelings aaah * add informative but short comment * add `input_embed_layer` mixin attribute * style * walrus has low precedence * modular fix * this was breaking parser	2025-07-21 18:18:14 +02:00
김민서	2da97f0943	🌐 [i18n-KO] Translated `perf_infer_gpu_multi.md` to Korean (#39441 ) * docs: ko: perf_infer_gpu_many.md * feat: nmt draft * docs: refine KO translation and enhance naturalness * docs: add missing TOC to documentation * Align toctree and filename with original: perf_infer_gpu_multi Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Refine Korean translation * Update docs/source/ko/perf_infer_gpu_multi.md Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> * Update docs/source/ko/perf_infer_gpu_multi.md Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> * Update docs/source/ko/perf_infer_gpu_multi.md Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> * Update docs/source/ko/perf_infer_gpu_multi.md Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> * Update docs/source/ko/perf_infer_gpu_multi.md Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> * Update docs/source/ko/perf_infer_gpu_multi.md Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> * Update docs/source/ko/perf_infer_gpu_multi.md Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> * Update docs/source/ko/perf_infer_gpu_multi.md Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> * Update docs/source/ko/perf_infer_gpu_multi.md Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> * Update docs/source/ko/perf_infer_gpu_multi.md Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * Update docs/source/ko/perf_infer_gpu_multi.md Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> --------- Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>	2025-07-21 09:14:15 -07:00
Yoni Gozlan	82807e56b1	[Fast image processor] refactor fast image processor glm4v (#39490 ) refactor fast image processor glm4v	2025-07-21 11:18:46 -04:00
Wing Lian	4b4f04fcca	fix ndim check of device_mesh for TP (#39538 )	2025-07-21 13:09:33 +00:00
Manuel de Prada Corral	1aa7256f01	Refactor `MambaCache` to `modeling_mamba.py` (#38086 ) * Refactor MambaCache to modeling_mamba.py (parity with Zamba) * ruff * fix dummies * update * update * remove mamba ref in cache tests * remove cache_implementation from tests * update * ruff * ruff * sneaky regression * model consistency * fix test_multi_gpu_data_parallel_forward * fix falcon slow tests * ruff * ruff * add sample false * try to fix slow tests * Revert "fix test_multi_gpu_data_parallel_forward" This reverts commit 66b7162c7c5c5ce8a73ccf48cffc8a96343ebb33. * fix tests on nvidia t4, remove dataparallel tests from mamba * ruff * remove DDP tests from mamba and falcon_mamba * add explicit error for MambaCache * mamba2 also needs to init cache in prepare_inputs_for_generation * ruff * ruff * move MambaCache to its own file * ruff * unprotected import fix * another attempt to fix unprotected imports * Revert "another attempt to fix unprotected imports" This reverts commit 2338354fcab630de5899321f5daced5fb312c2a2. * fixing unprotected import, attempt 3 * Update src/transformers/cache_utils.py * ruff's fault * fix arthur review * modular falcon mamba * found a hack * fix config docs * fix docs * add export info * merge modular falcon branch * oopsie * fix fast path failing * new approach * oopsie * fix types * Revert new pragma in modular This reverts commit 80b1cf160ee251536f07c40b8a0857d499e70db6. * trying another modular workaround * review & fix ci * oopsie * clear prepare_inputs on mamba/mamba2/falcon_mamba	2025-07-21 14:59:36 +02:00
st81	a419a40234	Fix Docstring of BarkProcessor (#39546 ) * Fix Docstring of BarkProcessor * Fix typo * Add type hint of return value for BarkProcessor.__call__	2025-07-21 12:56:44 +00:00
Wang, Yi	9323d0873c	use the enable_gqa param in torch.nn.functional.scaled_dot_product_at… (#39412 ) * use the enable_gqa param in torch.nn.functional.scaled_dot_product_attention Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * ci failure fix Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * add check Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * fix ci failure Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * refine code, extend to cuda Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * refine code Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * fix review comments Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * refine the PR Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-07-21 14:46:43 +02:00
BUI Van Tuan	6b3a1f2f51	Fix missing initializations for models created in 2023 (#39239 ) * fix SwiftFormer * fix Kosmos2 * fix Owlv2 * fix Sam * fix Vits * fix Pvt * fix MobileViTV2 * fix PatchTST * fix Bros * fix Informer * fix BridgeTower * fix Mra and Yoso * fix Rwkv * fix EfficientNet * fix NllbMoe * fix Tvp * fix Clap * fix Autoformer * fix SwiftFormer * fix Mgpstr * fix Align * fix VitMatte * fix SpeechT5 * add conditional check for parameters * fix SpeechT5 * fix TimmBackbone and Clvp * fix SwiftFormer * fix SeamlessM4T and SeamlessM4Tv2 * fix Align * fix Owlv2 and OwlViT * add reviewed changes * add reviewed changes * fix typo --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-07-21 14:43:52 +02:00
Sai-Suraj-27	970d9a75ce	Raise `TypeError` instead of ValueError for invalid types (#38660 ) * Raise TypeError instead of ValueError for invalid types. * Removed un-necessary changes. * Resolved conflicts * Code quality * Fix failing tests. * Fix failing tests.	2025-07-21 12:42:00 +00:00
Yuanyuan Chen	822c5e45b2	Fix pylint warnings (#39477 ) * Fix pylint warnings Signed-off-by: cyy <cyyever@outlook.com> * Fix variable names Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-07-21 12:38:05 +00:00
Cyril Vallez	dc017cd763	Fix Qwen Omni integration test (#39553 ) fix	2025-07-21 14:11:46 +02:00
Krishnan Vignesh	fdc0566e15	🚨🚨🚨 [Trainer] Enable `average_tokens_across_devices` by default in `TrainingArguments` (#39395 ) Enable average_tokens_across_devices by default in TrainingArguments Fixes #39392 This change improves loss calculation correctness for multi-GPU training by enabling proper token averaging across devices by default. Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-07-21 12:11:20 +00:00
Raushan Turganbay	8c102e2eb1	Rename `_supports_flash_attn_2` in examples and tests (#39471 ) * delete `_supports_flash_attn_2` from examples and tests * simplify docs	2025-07-21 14:02:57 +02:00
Cyril Vallez	3a152e3a5c	Fix the check in flex test (#39548 ) * fix the check * fix flags * flags	2025-07-21 13:29:44 +02:00
Eric Bezzam	78fb2d2760	Fix bad tensor shape in failing Hubert test. (#39502 ) Fix bad tensor shape in Hubert test.	2025-07-21 12:25:52 +01:00
Yuxuan Zhang	39ba5f3cc2	GLM-4 Update (#39393 ) * one commit with full * Create glm4_moe.md * Update check_config_docstrings.py * Update __init__.py * update * argue * argue: router problem * 1 * Update test_modeling_glm4_moe.py * Update test_modeling_glm4_moe.py * Update test_modeling_glm4_moe.py * Update modular_glm4_moe.py * update * use dsv3 pretrainmodel in modular * update for test * upodate new modular * use LlamaAttention and avoid use CohereAttention cause repeat norm * update the modular * update attn modular * update * Update modular_glm4_moe.py * MTP layer is need to ignore * fix gradient error using with dots_1 method * Update test_modeling_glm4_moe.py * Update test_modeling_glm4_moe.py * Update test_modeling_glm4_moe.py --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-07-21 13:24:34 +02:00
Raushan Turganbay	344012b3a6	[qwen2 vl] fix packing with all attentions (#39447 ) * fix qwen2 vl packing in FA2 * why? delete! * qwen2-5-vl seems to work now * update * fix tests * start by adapting FA2 tests * add similar tests for sdpa/eager * address comments * why is this even in conditional model and not base model?	2025-07-21 12:19:15 +02:00
Raushan Turganbay	e42681b48b	[gemma3] support sequence classification task (#39465 ) * add seq clf class * fix docs and add in auto-map * skip tests * optional pixels	2025-07-21 11:03:20 +02:00
Yoni Gozlan	34133d0a79	Fix placeholders replacement logic in auto_docstring (#39433 ) Fix and simplify placeholders replacement logic	2025-07-18 22:56:23 +00:00
Yoni Gozlan	433d2a23d7	Update SAM/SAM HQ attention implementation + fix Cuda sync issues (#39386 ) * update attention implementation and improve inference speed * modular sam_hq + fix integration tests on A10 * fixup * fix after review * softmax in correct place * return attn_weights in sam/sam_hq	2025-07-18 18:46:27 -04:00
Yoni Gozlan	541bed22d6	Improve @auto_docstring doc and rename `args_doc.py` to `auto_docstring.py` (#39439 ) * rename `args_doc.py` to `auto_docstring.py` and improve doc * modifs after review	2025-07-18 18:00:34 +00:00
Yoni Gozlan	de0dd3139d	Add fast image processor SAM (#39385 ) * add fast image processor sam * nits	2025-07-18 17:27:16 +00:00
Enno Hermann	561a79a2f4	Fix BatchEncoding.to() for nested elements (#38985 )	2025-07-18 14:14:45 +01:00
Mohit Deopujari	f4d076561f	[gemma3] Fix do_convert_rgb in image processors. (#39438 ) * [gemma3] Fix do_convert_rgb in image processors. * [gemma3] Fix do_convert_rgb in image processors.	2025-07-18 12:33:00 +00:00
Raushan Turganbay	bcc0091937	[chat template] return assistant mask in processors (#38545 ) * messed up the git history, squash commits * raise error if slow and refine tests * index was off by one * fix the test	2025-07-18 12:23:20 +00:00
Joao Gante	328ca9cf1d	[dependencies] Update `datasets` pin (#39500 ) * pyarrow pin * make fixup * test? * like this? * like this? * like this? * datasets pin * comment	2025-07-18 12:05:28 +00:00
Ákos Hadnagy	fb58377700	Slack CI bot: set default result for non-existing artifacts (#39499 ) * Set default result for non-existing artifacts * FMT * Address review comments	2025-07-18 11:45:47 +00:00
Cyril Vallez	4ded9a4113	🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling (#39423 ) * first try * Update modeling_utils.py * Update modeling_utils.py * big refactor * Update modeling_utils.py * style * docstrings and simplify inner workings of configs * remove all trace of _internal * Update modeling_utils.py * fix logic error * Update modeling_utils.py * recursive on config * Update configuration_utils.py * fix * Update configuration_dpt.py * Update configuration_utils.py * Update configuration_utils.py * Update modeling_idefics.py * Update modeling_utils.py * fix for old models * more old models fixup * Update modeling_utils.py * Update configuration_utils.py * Remove outdated test * remove the deepcopy!! 🥵🥵 * Update test_modeling_gpt_bigcode.py * fix qwen dispatch * restrict to only models supporting it * style * switch name * Update modeling_utils.py * Update modeling_utils.py * add tests! * fix * rypo * remove bad copies * fix * Update modeling_utils.py * additional check * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * fix * skip	2025-07-18 13:41:54 +02:00
Joao Gante	2b819ba4e3	[dependencies] temporary pyarrow pin (#39496 ) * pyarrow pin * make fixup * test? * like this? * like this? * like this?	2025-07-18 10:05:40 +00:00
eustlb	967045082f	Add voxtral (#39429 ) * draft * draft update (conversion working) * mend * draft update * draft update: working generate * refactor * VoxtralProcessor draft * processor update * update convert_tekken_tokenizer * refactor processor * update convert * make style * better handle prefil * make style * add tests * add mistral_common audio loading * processor update * revert changes * audio utils update * add audio to apply chat template mistral update * voxtral processor update * fix * udpate converstion script * make mistral tokenier from pretrain work from local dir * fix udpates * add integration tests * add batched version * processor docstring * make style * revert convert_tekken_tokenizer changes * revert processing_qwen2.5 changes * add multi-turn test * processor improvements * address review changes * Update src/transformers/tokenization_mistral_common.py Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> * update audio utils * nits * integration test update * correct _support * update tests * test update * update integration tests * fix * fix * fix * add test_apply_chat_template_with_audio * add model doc * model doc * nit * doc uptade * nit * processor improvement * ensure default is 3B * nits * make * make * convert modular * update checkpoint * fix test * make * make * autos * make * make * nit * nit * nit --------- Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-18 00:02:04 +00:00
Qizhi Chen	73869f2e81	Fix typing order (#39467 ) * fix type order * change all Union[str, dict] to Union[dict, str] * add hf_parser test && fix test order * add deepspeed dependency * replace deepspeed with accelerator	2025-07-17 15:47:31 +00:00
he pang	bda75b4011	Add unified logits_to_keep support to LLMClass (#39472 ) * add supports for logits_to_keep for qwen25vl and glm4v * Update relevant modular files	2025-07-17 17:07:12 +02:00
Joao Gante	bf6c997685	[serve] Add speech to text (`/v1/audio/transcriptions`) (#39434 ) * Scaffolding * Explicit content * Naïve Responses API streaming implementation * Cleanup * Scaffolding * Explicit content * Naïve Responses API streaming implementation * Cleanup * use openai * validate request, including detecting unused fields * dict indexing * dict var access * tmp commit (tests failing) * add slow * use oai output type in completions * (little rebase errors) * working spec? * guard type hint * type hints. fix state (CB can now load different models) * type hints; fn names; error type * add docstrings * responses + kv cache * metadata support; fix kv cache; error event * add output_index and content_index * docstrings * add test_build_response_event * docs/comments * gate test requirements; terminate cb manager on model switch * nasty type hints * more type hints * disable validation by default; enable force models * todo * experiment: base model from typed dict * audio working * fix bad rebase * load audio with librosa * implement timed models * almost working * make fixup * fix tests * transcription request type * tokenizer -> processor * add example in docs --------- Co-authored-by: Lysandre <hi@lysand.re>	2025-07-17 14:29:57 +00:00
zhaiji0727	8b3de61a65	Update integration_utils.py (#39469 ) * Update integration_utils.py sanitize mlflow upload metric * Update integration_utils.py change import order to pass CI * Update integration_utils.py add comments * Update integration_utils.py Remove whitespace from blank line	2025-07-17 13:57:49 +00:00
Peter Schneider	7fd60047c8	fix: ImageTextToTextPipeline handles user-defined generation_config (#39374 ) fix: ImageTextToTextPipeline handles user-defined generation_config passed to the pipeline Co-authored-by: Raushan Turganbay <raushan@huggingface.co>	2025-07-17 13:23:29 +00:00
Yuanyuan Chen	60b5471da3	Enable some ruff checks for performance and readability (#39383 ) * Fix inefficient sequence tests Signed-off-by: cyy <cyyever@outlook.com> * Enable PERF102 Signed-off-by: cyy <cyyever@outlook.com> * Enable PLC1802 Signed-off-by: cyy <cyyever@outlook.com> * Enable PLC0208 Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-07-17 13:21:59 +00:00
Stonepia	fc700c2a26	Fix convert_and_export_with_cache failures for GPU models (#38976 ) * Add the `device` option for `generate()` * Add device for default tensors to avoid tensor mismatch * [test] Enable test_static_cache_exportability for torch_device * infer device from the prompt_token_ids * Add device for generated tensor * [Test] Make `test_export_static_cache` tests to run on devices rather than only CPU * fix format * infer device from the model	2025-07-17 13:12:32 +00:00
Yih-Dar	54680d75c9	Update `GemmaIntegrationTest::test_model_2b_bf16_dola` (#39362 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-17 14:06:23 +01:00
klimarissa17	322400af58	fix a comment typo in utils.py (#39459 )	2025-07-17 13:06:04 +00:00
Yuanyuan Chen	43f07018cf	Use newer typing notation (#38934 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-07-17 13:05:21 +00:00
Marc Sun	565dd0bad7	Fix tests due to breaking change in accelerate (#39451 ) * update values * fix	2025-07-17 13:51:50 +01:00
Zhongkai Zhao	26fed50460	fix max_length calculating using cu_seq_lens (#39341 )	2025-07-17 10:54:23 +02:00
Yusuf Shihata	cdfe6164b3	fix(pipelines): QA pipeline returns fewer than top_k results in batch mode (#39193 ) * fixing the bug * Try a simpler approach * make fixup --------- Co-authored-by: Matt <rocketknight1@gmail.com>	2025-07-17 10:24:30 +02:00
renet10	b85ed49e0a	Corrections to PR #38642 and enhancements to Wav2Vec2Processor __call__ and pad docstrings (#38822 ) * Correcting PR #38642. The PR removed references to the deprecated method "as_target_processor()" in the __call__ and pad method docstrings, which is correct, but also removed all references to PreTrainedTokenizer, which is incorrect. This commit adds back the reference to PreTrainedTokenizer and also takes the opportunity to enhance the docstrings with the invocation procedure post removal of "as_target_processor()" and adds information on return values. * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/wav2vec2/processing_wav2vec2.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: René Tio <tor@Jammer.local> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-16 14:13:07 -07:00
Dhruv Malik	787a0128a9	create ijepa modelcard (ref : PR #36979 ). (#39354 ) * wip: adding first version of the IJEPA model card. * refactor based on the @stevhliu feedbacks * refactor: - revert the accidental removal of the autodoc api description and the image reerece architecture - general context updation. * - changes of model for example quantization. - merging the quantization content.	2025-07-16 12:40:22 -07:00
ridima11	48f2233cdf	Improve grammar and clarity in perf_hardware.md (#39428 )	2025-07-16 12:15:15 -07:00
Yaowei Zheng	e68ebb695f	fix cached file error when repo type is dataset (#36909 ) * fix cached file * Update hub.py	2025-07-16 18:02:26 +02:00
Krishnan Vignesh	35a416c400	Fix indentation bug in SmolVLM image processor causing KeyError (#39452 ) Fix indentation bug in Idefics3 image processor - Fix KeyError when do_image_splitting=False - Move split_images_grouped assignment inside loop - Ensures all image shapes are stored, not just the last one - This fixes the bug in both Idefics3 and generated SmolVLM processors cc @yonigozlan Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>	2025-07-16 11:59:28 -04:00
Luke Friedrichs	2c58705dc2	Updated Megatron conversion script for gpt2 checkpoints (#38969 ) * update script to support new megatron gpt format * fixed quality failures --------- Co-authored-by: Luke Friedrichs <LckyLke>	2025-07-16 15:54:29 +00:00
Anton Vlasjuk	26be7f717e	[`CI`] Fix partially red CI (#39448 ) fix	2025-07-16 15:53:43 +02:00
sebastianvlad1	0a88751940	Fixes #39204 : add fallback if get_base_model missing (#39226 ) * Fixes #39204: add fallback if get_base_model missing * Inline try_get_base_model logic as suggested in PR review * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-07-16 15:51:30 +02:00
Wing Lian	ba506f87db	make the loss context manager easier to extend (#39321 )	2025-07-16 15:47:24 +02:00
Arthur	9f1ac6f185	Remove something that should have never been there (#38254 ) * what the hell * update * style * style * typing * fix init issue * fix granite moe hybrid as well	2025-07-16 15:22:44 +02:00
Raushan Turganbay	a7ca5b5d67	Fix processor tests (#39450 ) fix	2025-07-16 15:01:35 +02:00
Kyle Sayers	71818f570b	[Bugfix] [Quantization] Remove unused init arg (#39324 ) remove unused arg from ct config init Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-07-16 14:57:42 +02:00
Pavel Iakubovskii	cc24b0378e	Better typing for model.config (#39132 ) * Apply to all models config annotation * Update modular to preserve order * Apply modular * fix define docstring * fix dinov2 consistency (docs<->modular) * fix InstructBlipVideoForConditionalGeneration docs<->modular consistency * fixup * remove duplicate code * Delete config_class attribute from the modeling code * Add config_class attribute in base model * Update init sub class * Deprecated models update * Update new models * Fix remote code BC issue * fixup * fixing more corner cases * fix new models * add test * modular docs update * fix comment a bit * fix for py3.9	2025-07-16 14:50:35 +02:00
Eon Kim	4b258454a7	Fix typo in generation configuration for Janus model weight conversion (#39432 ) * Fix typo in generation configuration for Janus model weight conversion * Fix typo * Update Janus model generation configuration * Update Janus model to use generation_kwargs	2025-07-16 14:28:02 +02:00
Lysandre Debut	de5ca373ac	Responses API in `transformers serve` (#39155 ) * Scaffolding * Explicit content * Naïve Responses API streaming implementation * Cleanup * Responses API (to be merged into #39155) (#39338) * Scaffolding * Explicit content * Naïve Responses API streaming implementation * Cleanup * use openai * validate request, including detecting unused fields * dict indexing * dict var access * tmp commit (tests failing) * add slow * use oai output type in completions * (little rebase errors) * working spec? * guard type hint * type hints. fix state (CB can now load different models) * type hints; fn names; error type * add docstrings * responses + kv cache * metadata support; fix kv cache; error event * add output_index and content_index * docstrings * add test_build_response_event * docs/comments * gate test requirements; terminate cb manager on model switch * nasty type hints * more type hints * disable validation by default; enable force models * todo --------- Co-authored-by: Lysandre <hi@lysand.re> * Slight bugfixes * PR comments from #39338 * make fixup --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Joao Gante <joao@huggingface.co>	2025-07-16 14:16:16 +02:00
Raushan Turganbay	c8524aeb07	[cache] make all classes cache compatible finally (#38635 ) * dump * push other models * fix simple greedy generation * xmod * add fmst and clean up some mentions of old cache format * gpt-bigcode now follows standards * delete tuple cache reference in generation * fix some models * fix some models * fix mambas and support cache in tapas * fix some more tests * fix copies * delete `_reorder_cache` * another fix copies * fix typos and delete unnecessary test * fix rag generate, needs special cache reordering * fix tapas and superglue * reformer create special cache * recurrent gemma `reorder_cache` was a no-op, delete * fix-copies * fix blio and musicgen pipeline tests * fix reformer * fix reformer, again... * delete `_supports_cache_class` * delete `supports_quantized_cache` * fix failing tests * fix copies * some minor clean up * style * style * fix copies * fix tests * fix copies * create causal mask now needs positions? * fixc copies * style * Update tests/test_modeling_common.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * clean-up of non-generative model after merging main * check `is_decoder` for cache * delete transpose for scores * remove tuple cache from docs everywhere * fix tests * fix copies * fix copies once more * properly deprecate `encoder_attention_mask` in Bert-like models * import `deprecate_kwarg` where needed * fix copies again * fix copies * delete `nex_decoder_cache` * fix copies asks to update for PLM * fix copies * rebasing had a few new models, fix them and merge asap! * fix copies once more * fix slow tests * fix tests and updare PLM checkpoint * add read token and revert accidentally removed line * oh com -on, style * just skip it, read token has no access to PLM yet --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-07-16 14:00:17 +02:00
Ilias Aarab	6cb43defd0	docs: add missing numpy import to minimal example (#39444 ) docs: add numpy import to minimal example	2025-07-16 11:57:13 +00:00
Yuanyuan Chen	61163099f1	Remove runtime conditions for type checking (#37340 ) Remove dynamic conditions for type checking Signed-off-by: cyy <cyyever@outlook.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-16 13:36:48 +02:00
Marc Sun	bfc9ddf5c6	Add StableAdamW Optimizer (#39446 ) * Added StableAdamW as an optimizer option for Trainer. Also wrote tests to verify its behaviour. * Fixed issue with * Added docs for StableAdamW. Also fixed a typo in schedule free optimizers --------- Co-authored-by: Gautham Krithiwas <gauthamkrithiwas2003@gmail.com>	2025-07-16 13:35:53 +02:00
Pablo Montalvo	b9ee528246	add test scanner (#39419 ) * add test scanner * add doc + license * refactor for only 1 tree traversal * add back test of only one method * document single method scan * format * fixup generate tests * minor fix * fixup * fixup doc	2025-07-16 12:45:46 +02:00
Ákos Hadnagy	79941c61ce	Fix missing definition of diff_file_url in notification service (#39445 ) Fix missing definition of diff_file_url	2025-07-16 12:09:18 +02:00
richardodliu	e048d48bd0	Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer (#31870 ) * add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in trainer * Update src/transformers/optimization.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update optimization.py fix the error of the unclosed "(" * Update optimization.py remove whitespace in line 402 in order to pass the quality test * Update src/transformers/optimization.py * Update src/transformers/optimization.py * Apply style fixes --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-07-16 12:01:08 +02:00
Quentin Gallouédec	0cf08e90dd	Change log level from warning to info for scheduled request logging in `ContinuousBatchProcessor` (#39372 ) Change log level from warning to info for scheduled request logging in ContinuousBatchProcessor	2025-07-16 11:54:20 +02:00
Yuanyuan Chen	ae4e306a40	Defaults to adamw_torch_fused for Pytorch>=2.8 (#37358 ) * Defaults to adamw_torch_fused for latest Pytorch Signed-off-by: cyy <cyyever@outlook.com> * Fix test Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-07-16 09:52:33 +00:00
Jeonghwan Kim	4524a68c66	Fix L270 - hasattr("moe_args") returning False error (#38715 ) * Fix L270 - hasattr("moe_args") returning False error * Update src/transformers/models/llama4/convert_llama4_weights_to_hf.py --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-16 09:45:58 +00:00
Raushan Turganbay	d33a1c389f	[chat template] add a testcase for kwargs (#39415 ) add a testcase	2025-07-16 11:31:35 +02:00
S1quence	99c9763398	Fixed a bug calculating cross entropy loss in `JetMoeForCausalLM` (#37830 ) fix: 🐛 Fixed a bug in calculating Cross Entropy loss in JetMoeForCausalLM In the original code, we shift the logits and pass shift_logits into the self.loss_function, but in self.loss_function, the shift_logits will be shifted again, so we are actually doing "next next token prediction", which is incorrect. I have removed the logits shifting before calling self.loss_function. Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-16 11:22:00 +02:00
Klaus-Rudolf Kladny	667ad02374	Remove double soft-max in load-balancing loss. Fixes #39055 . (#39056 ) Remove double soft-max in load-balancing loss. Fixes #39055	2025-07-16 09:20:23 +00:00
Kyle Sayers	31d81943c9	[Core] [Offloading] Fix saving offloaded submodules (#39280 ) * fix counting meta tensors, fix onloading meta tensors Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove unrelated fix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove unrelated change Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add clarifying comment Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add test_save_offloaded_model_with_direct_params Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix merge conflict, add decorators Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-07-16 08:44:40 +00:00
Raushan Turganbay	add43c4d09	[autodocstring] add video and audio inputs (#39420 ) * add video and audio inputs in auto docstring * fix copies	2025-07-16 09:41:50 +02:00
Ákos Hadnagy	0dc2df5dda	CI workflow for performed test regressions (#39198 ) * WIP script to compare test runs for models * Update line normalitzation logic * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-07-16 04:20:02 +02:00
StevenBucaille	1bc9ac5107	docs: update LightGlue docs (#39407 ) * docs: update LightGlue docs * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-15 12:40:50 -07:00
StevenBucaille	d9574f2fe3	docs: update SuperGlue docs (#39406 ) * docs: update SuperGlue docs * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-15 12:40:26 -07:00
Raushan Turganbay	9f41f67135	[vlm] fix loading of retrieval VLMs (#39242 ) * fix vlm with retrieval * we can't use AutoModel because new ColQwen was released after refactor * no need for colqwen * tied weight keys are necessary, if using IMageTextToText * need to apply renaming in tied weights, only for ColPali * overwrite tied keys in ColPali * fix copies, modular can't handle if-statements	2025-07-15 17:23:54 +02:00
Wing Lian	b1d14086e4	handle training summary when creating modelcard but offline mode is set (#37095 ) * handle training summary when creating modelcard but offline mode is set * chore: lint	2025-07-15 17:21:15 +02:00
Dario Salvati	67f42928f0	Remove residual quantization attribute from dequantized models (#39373 ) * fix: removing quantization trace attribute from dequantized model Fixes #39295 * add: test `to(dtype=torch.float16)` after dequantization	2025-07-15 17:16:10 +02:00
Wangyi Jiang	30c508dbcb	Remove deprecated audio utils functions (#39330 ) Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-15 14:02:25 +00:00
Hosein Rezaei	d8e05951b8	Fix bugs in pytorch example run_clm when streaming is enabled (#39286 )	2025-07-15 15:37:28 +02:00
Matt	a989bf8d84	Fix bugs from pipeline preprocessor overhaul (#39425 ) * Correct load classes for VideoClassificationPipeline * Correct load classes for the ASR pipeline	2025-07-15 14:28:59 +01:00
Luc Georges	53c9dcd6fd	refactor: remove `set_tracer_provider` and `set_meter_provider` calls (#39422 )	2025-07-15 14:22:12 +02:00
Yuanyuan Chen	f03b384149	Fix invalid property (#39384 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-07-15 12:11:37 +00:00
jiqing-feng	c4d41567fa	set document_question_answering pipeline _load_tokenizer to True (#39411 ) Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-07-15 12:05:49 +00:00
Matt	f56b49f48f	Ignore extra position embeddings weights for ESM (#39063 ) * Ignore extra position embeddings weights * Slight name fix	2025-07-15 11:57:32 +00:00
44670	2b79f14375	support loading qwen3 gguf (#38645 ) * support loading qwen3 gguf * Add qwen3 into GGUF_TO_FAST_CONVERTERS for tokenizer conversion * Add testcase * Fix formatting	2025-07-15 09:53:41 +00:00

6650 changed files with 146774 additions and 64978 deletions

BIN
._.DS_Store Normal file

View File

Binary file not shown.

BIN
._.circleci Normal file

View File

Binary file not shown.

BIN
._.git Normal file

View File

Binary file not shown.

BIN
._.gitattributes Normal file

View File

Binary file not shown.

BIN
._.github Normal file

View File

Binary file not shown.

BIN
._.gitignore Normal file

View File

Binary file not shown.

BIN
._AGENTS.md Normal file

View File

Binary file not shown.

BIN
._CITATION.cff Normal file

View File

Binary file not shown.

BIN
._CODE_OF_CONDUCT.md Normal file

View File

Binary file not shown.

BIN
._ISSUES.md Normal file

View File

Binary file not shown.

BIN
._LICENSE Normal file

View File

Binary file not shown.

BIN
._awesome-transformers.md Normal file

View File

Binary file not shown.

BIN
._benchmark Normal file

View File

Binary file not shown.

BIN
._docker Normal file

View File

Binary file not shown.

BIN
._docs Normal file

View File

Binary file not shown.

BIN
._examples Normal file

View File

Binary file not shown.

BIN
._i18n Normal file

View File

Binary file not shown.

BIN
._notebooks Normal file

View File

Binary file not shown.

BIN
._scripts Normal file

View File

Binary file not shown.

BIN
._src Normal file

View File

Binary file not shown.

BIN
._templates Normal file

View File

Binary file not shown.

BIN
._tests Normal file

View File

Binary file not shown.

BIN
._utils Normal file

View File

Binary file not shown.

BIN
.circleci/._TROUBLESHOOT.md Normal file

View File

Binary file not shown.

BIN
.circleci/._config.yml Normal file

View File

Binary file not shown.

BIN
.circleci/._parse_test_outputs.py Normal file

View File

Binary file not shown.

									
										15

.circleci/create_circleci_config.py
									
												View File
												
				@@ -109,7 +109,9 @@ class CircleCIJob:

				                self.docker_image[0]["image"] = f"{self.docker_image[0]['image']}:dev"

				            print(f"Using {self.docker_image} docker image")

				        if self.install_steps is None:

				            self.install_steps = ["uv venv && uv pip install ."]

				            self.install_steps = ["uv pip install ."]

				        # Use a custom patched pytest to force exit the process at the end, to avoid `Too long with no output (exceeded 10m0s): context deadline exceeded`

				        self.install_steps.append("uv pip install git+https://github.com/ydshieh/pytest.git@8.4.1-ydshieh")

				        if self.pytest_options is None:

				            self.pytest_options = {}

				        if isinstance(self.tests_to_run, str):

				@@ -213,7 +215,7 @@ generate_job = CircleCIJob(

				    docker_image=[{"image": "huggingface/transformers-torch-light"}],

				    # networkx==3.3 (after #36957) cause some issues

				    # TODO: remove this once it works directly

				    install_steps=["uv venv && uv pip install ."],

				    install_steps=["uv pip install ."],

				    marker="generate",

				    parallelism=6,

				)

				@@ -250,7 +252,7 @@ examples_torch_job = CircleCIJob(

				    additional_env={"OMP_NUM_THREADS": 8},

				    docker_image=[{"image":"huggingface/transformers-examples-torch"}],

				    # TODO @ArthurZucker remove this once docker is easier to build

				    install_steps=["uv venv && uv pip install . && uv pip install -r examples/pytorch/_tests_requirements.txt"],

				    install_steps=["uv pip install . && uv pip install -r examples/pytorch/_tests_requirements.txt"],

				    pytest_num_workers=4,

				)

				@@ -259,7 +261,7 @@ hub_job = CircleCIJob(

				    additional_env={"HUGGINGFACE_CO_STAGING": True},

				    docker_image=[{"image":"huggingface/transformers-torch-light"}],

				    install_steps=[

				        'uv venv && uv pip install .',

				        'uv pip install .',

				        'git config --global user.email "ci@dummy.com"',

				        'git config --global user.name "ci"',

				    ],

				@@ -273,7 +275,6 @@ onnx_job = CircleCIJob(

				    "onnx",

				    docker_image=[{"image":"huggingface/transformers-torch-tf-light"}],

				    install_steps=[

				        "uv venv",

				        "uv pip install .[testing,sentencepiece,onnxruntime,vision,rjieba]",

				    ],

				    pytest_options={"k onnx": None},

				@@ -303,7 +304,7 @@ non_model_job = CircleCIJob(

				    docker_image=[{"image": "huggingface/transformers-torch-light"}],

				    # networkx==3.3 (after #36957) cause some issues

				    # TODO: remove this once it works directly

				    install_steps=["uv venv && uv pip install .[serving]"],

				    install_steps=["uv pip install .[serving]"],

				    marker="not generate",

				    parallelism=6,

				)

				@@ -321,7 +322,7 @@ doc_test_job = CircleCIJob(

				    additional_env={"TRANSFORMERS_VERBOSITY": "error", "DATASETS_VERBOSITY": "error", "SKIP_CUDA_DOCTEST": "1"},

				    install_steps=[

				        # Add an empty file to keep the test step running correctly even no file is selected to be tested.

				        "uv venv && pip install .",

				        "uv pip install .",

				        "touch dummy.py",

				        command,

				        "cat pr_documentation_tests_temp.txt",

11

.gitattributes vendored

View File

@@ -1,4 +1,13 @@
 *.py	eol=lf
 *.rst	eol=lf
 *.md	eol=lf
 *.mdx   eol=lf
 *.mdx   eol=lf
 *.model filter=lfs diff=lfs merge=lfs -text
 *.png filter=lfs diff=lfs merge=lfs -text
 *.jpg filter=lfs diff=lfs merge=lfs -text
 *.jpeg filter=lfs diff=lfs merge=lfs -text
 *.gif filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text

BIN
.github/._ISSUE_TEMPLATE vendored Normal file

View File

Binary file not shown.

BIN
.github/._PULL_REQUEST_TEMPLATE.md vendored Normal file

View File

Binary file not shown.

BIN
.github/._conda vendored Normal file

View File

Binary file not shown.

BIN
.github/._scripts vendored Normal file

View File

Binary file not shown.

BIN
.github/._workflows vendored Normal file

View File

Binary file not shown.

BIN
.github/ISSUE_TEMPLATE/._bug-report.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/ISSUE_TEMPLATE/._config.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/ISSUE_TEMPLATE/._feature-request.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/ISSUE_TEMPLATE/._i18n.md vendored Normal file

View File

Binary file not shown.

BIN
.github/ISSUE_TEMPLATE/._migration.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/ISSUE_TEMPLATE/._new-model-addition.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/conda/._build.sh vendored Normal file

View File

Binary file not shown.

BIN
.github/conda/._meta.yaml vendored Normal file

View File

Binary file not shown.

BIN
.github/scripts/._assign_reviewers.py vendored Normal file

View File

Binary file not shown.

BIN
.github/scripts/._codeowners_for_review_action vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._TROUBLESHOOT.md vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._add-model-like.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._assign-reviewers.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._build-ci-docker-images.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._build-docker-images.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._build-nightly-ci-docker-images.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._build-past-ci-docker-images.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._check_tiny_models.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._get-pr-info.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._get-pr-number.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._model_jobs_intel_gaudi.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._new_model_pr_merged_notification.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._pr-style-bot.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._push-important-models.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._release-conda.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._self-nightly-past-ci-caller.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._self-past-caller.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._self-push-amd-mi210-caller.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._self-push-amd-mi250-caller.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._self-push-amd.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._self-push-caller.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._self-scheduled-amd-caller.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._self-scheduled-amd-mi250-caller.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._self-scheduled-intel-gaudi.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._self-scheduled-intel-gaudi3-caller.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._ssh-runner.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._stale.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._trufflehog.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._update_metdata.yml vendored Normal file

View File

Binary file not shown.

BIN
.github/workflows/._upload_pr_documentation.yml vendored Normal file

View File

Binary file not shown.

									
										2

.github/workflows/benchmark.yml
									
										vendored
									
												View File
												
				@@ -48,7 +48,7 @@ jobs:

				      - name: Run database init script

				        run: |

				          psql -f benchmark/init_db.sql

				          psql -f benchmark/utils/init_db.sql

				        env:

				          PGDATABASE: metrics

				          PGHOST: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGHOST }}

									
										5

.github/workflows/check_failed_tests.yml
									
										vendored
									
												View File
												
				@@ -21,6 +21,9 @@ on:

				      report_repo_id:

				        required: true

				        type: string

				      commit_sha:

				        required: false

				        type: string

				env:

				@@ -87,7 +90,7 @@ jobs:

				      - name: Update clone

				        working-directory: /transformers

				        if: ${{ env.process == 'true' }}

				        run: git fetch && git checkout ${{ github.sha }}

				        run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}

				      - name: Get target commit

				        working-directory: /transformers/utils

									
										49

.github/workflows/collated-reports.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,49 @@

				name: CI collated reports

				on:

				  workflow_call:

				    inputs:

				      job:

				        required: true

				        type: string

				      report_repo_id:

				        required: true

				        type: string

				      machine_type:

				        required: true

				        type: string

				      gpu_name:

				        description: Name of the GPU used for the job. Its enough that the value contains the name of the GPU, e.g. "noise-h100-more-noise". Case insensitive.

				        required: true

				        type: string

				jobs:

				  collated_reports:

				    name: Collated reports

				    runs-on: ubuntu-22.04

				    if: always()

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/download-artifact@v4

				      - name: Collated reports

				        shell: bash

				        env:

				          ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}

				          CI_SHA: ${{ github.sha }}

				          TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}

				        run: |

				          pip install huggingface_hub

				          python3 utils/collated_reports.py                  \

				            --path .                                         \

				            --machine-type ${{ inputs.machine_type }}        \

				            --commit-hash ${{ env.CI_SHA }}                  \

				            --job ${{ inputs.job }}                          \

				            --report-repo-id ${{ inputs.report_repo_id }}    \

				            --gpu-name ${{ inputs.gpu_name }}

				      - name: Upload collated reports

				        uses: actions/upload-artifact@v4

				        with:

				          name: collated_reports_${{ env.CI_SHA }}.json

				          path: collated_reports_${{ env.CI_SHA }}.json

									
										2

.github/workflows/doctest_job.yml
									
										vendored
									
												View File
												
				@@ -31,7 +31,7 @@ jobs:

				      group: aws-g5-4xlarge-cache

				    container:

				      image: huggingface/transformers-all-latest-gpu

				      options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				    steps:

				      - name: Update clone

				        working-directory: /transformers

									
										2

.github/workflows/doctests.yml
									
										vendored
									
												View File
												
				@@ -18,7 +18,7 @@ jobs:

				      group: aws-g5-4xlarge-cache

				    container:

				      image: huggingface/transformers-all-latest-gpu

				      options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				    outputs:

				      job_splits: ${{ steps.set-matrix.outputs.job_splits }}

				      split_keys: ${{ steps.set-matrix.outputs.split_keys }}

									
										5

.github/workflows/model_jobs.yml
									
										vendored
									
												View File
												
				@@ -18,6 +18,9 @@ on:

				      docker:

				        required: true

				        type: string

				      commit_sha:

				        required: false

				        type: string

				      report_name_prefix:

				        required: false

				        default: run_models_gpu

				@@ -70,7 +73,7 @@ jobs:

				      - name: Update clone

				        working-directory: /transformers

				        run: git fetch && git checkout ${{ github.sha }}

				        run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}

				      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)

				        working-directory: /transformers

									
										134

.github/workflows/pr_build_doc_with_comment.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,134 @@

				name: PR - build doc via comment

				on:

				  issue_comment:

				    types:

				      - created

				    branches-ignore:

				      - main

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.issue.number }}-${{ startsWith(github.event.comment.body, 'build-doc') }}

				  cancel-in-progress: true

				permissions: {}

				jobs:

				  get-pr-number:

				    name: Get PR number

				    if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "qubvel", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad", "stevhliu", "ebezzam"]'), github.actor) && (startsWith(github.event.comment.body, 'build-doc')) }}

				    uses: ./.github/workflows/get-pr-number.yml

				  get-pr-info:

				    name: Get PR commit SHA

				    needs: get-pr-number

				    if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}

				    uses: ./.github/workflows/get-pr-info.yml

				    with:

				      pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}

				  verity_pr_commit:

				    name: Verity PR commit corresponds to a specific event by comparing timestamps

				    if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}

				    runs-on: ubuntu-22.04

				    needs: get-pr-info

				    env:

				      COMMENT_DATE: ${{ github.event.comment.created_at }}

				      PR_MERGE_COMMIT_DATE: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_DATE }}

				      PR_MERGE_COMMIT_TIMESTAMP: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}

				    steps:

				      - run: |

				          COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")

				          echo "COMMENT_DATE: $COMMENT_DATE"

				          echo "PR_MERGE_COMMIT_DATE: $PR_MERGE_COMMIT_DATE"

				          echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"

				          echo "PR_MERGE_COMMIT_TIMESTAMP: $PR_MERGE_COMMIT_TIMESTAMP"

				          if [ $COMMENT_TIMESTAMP -le $PR_MERGE_COMMIT_TIMESTAMP ]; then

				            echo "Last commit on the pull request is newer than the issue comment triggering this run! Abort!";

				            exit -1;

				          fi

				  create_run:

				    name: Create run

				    needs: [get-pr-number, get-pr-info]

				    if: ${{ needs.get-pr-number.outputs.PR_NUMBER != '' }}

				    permissions:

				      statuses: write

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Create Run

				        id: create_run

				        env:

				          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				          # Create a commit status (pending) for a run of this workflow. The status has to be updated later in `update_run_status`.

				          # See https://docs.github.com/en/rest/commits/statuses?apiVersion=2022-11-28#create-a-commit-status

				          GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}

				        run: |

				          gh api \

				            --method POST \

				            -H "Accept: application/vnd.github+json" \

				            -H "X-GitHub-Api-Version: 2022-11-28" \

				            repos/${{ github.repository }}/statuses/${{ needs.get-pr-info.outputs.PR_HEAD_SHA }} \

				            -f "target_url=$GITHUB_RUN_URL" -f "state=pending" -f "description=Custom doc building job" -f "context=custom-doc-build"

				  reply_to_comment:

				    name: Reply to the comment

				    if: ${{ needs.create_run.result == 'success' }}

				    needs: [get-pr-number, create_run]

				    permissions:

				      pull-requests: write

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Reply to the comment

				        env:

				          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				          GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}

				        run: |

				          gh api \

				            --method POST \

				            -H "Accept: application/vnd.github+json" \

				            -H "X-GitHub-Api-Version: 2022-11-28" \

				            repos/${{ github.repository }}/issues/${{ needs.get-pr-number.outputs.PR_NUMBER }}/comments \

				            -f "body=[Building docs for all languages...](${{ env.GITHUB_RUN_URL }})"

				  build-doc:

				    name: Build doc

				    needs: [get-pr-number, get-pr-info]

				    if: ${{ needs.get-pr-number.outputs.PR_NUMBER != '' }}

				    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main

				    with:

				      commit_sha: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}

				      pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}

				      package: transformers

				      languages: ar de en es fr hi it ko pt tr zh ja te

				  update_run_status:

				    name: Update Check Run Status

				    needs: [ get-pr-info, create_run, build-doc ]

				    permissions:

				      statuses: write

				    if: ${{ always() && needs.create_run.result == 'success' }}

				    runs-on: ubuntu-22.04

				    env:

				      GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}

				      STATUS_OK: ${{ contains(fromJSON('["skipped", "success"]'), needs.create_run.result) }}

				    steps:

				      - name: Get `build-doc` job status

				        run: |

				          echo "${{ needs.build-doc.result }}"

				          echo $STATUS_OK

				          if [ "$STATUS_OK" = "true" ]; then

				            echo "STATUS=success" >> $GITHUB_ENV

				          else

				            echo "STATUS=failure" >> $GITHUB_ENV

				          fi

				      - name: Update PR commit statuses

				        run: |

				          echo "${{ needs.build-doc.result }}"

				          echo "${{ env.STATUS }}"

				          gh api \

				            --method POST \

				            -H "Accept: application/vnd.github+json" \

				            -H "X-GitHub-Api-Version: 2022-11-28" \

				            repos/${{ github.repository }}/statuses/${{ needs.get-pr-info.outputs.PR_HEAD_SHA }} \

				            -f "target_url=$GITHUB_RUN_URL" -f "state=${{ env.STATUS }}" -f "description=Custom doc building job" -f "context=custom-doc-build"

									
										22

.github/workflows/pr_run_slow_ci.yml
									
										vendored
									
												View File
												
				@@ -16,28 +16,6 @@ jobs:

				    with:

				      pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}

				  # We only need to verify the timestamp if the workflow is triggered by `issue_comment`.

				  verity_pr_commit:

				    name: Verity PR commit corresponds to a specific event by comparing timestamps

				    if: ${{ github.event.comment.created_at != '' }}

				    runs-on: ubuntu-22.04

				    needs: get-pr-info

				    env:

				      COMMENT_DATE: ${{ github.event.comment.created_at }}

				      PR_MERGE_COMMIT_DATE: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_DATE }}

				      PR_MERGE_COMMIT_TIMESTAMP: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}

				    steps:

				      - run: |

				          COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")

				          echo "COMMENT_DATE: $COMMENT_DATE"

				          echo "PR_MERGE_COMMIT_DATE: $PR_MERGE_COMMIT_DATE"

				          echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"

				          echo "PR_MERGE_COMMIT_TIMESTAMP: $PR_MERGE_COMMIT_TIMESTAMP"

				          if [ $COMMENT_TIMESTAMP -le $PR_MERGE_COMMIT_TIMESTAMP ]; then

				            echo "Last commit on the pull request is newer than the issue comment triggering this run! Abort!";

				            exit -1;

				          fi

				  get-jobs:

				    name: Get test files to run

				    runs-on: ubuntu-22.04

									
										2

.github/workflows/self-comment-ci.yml
									
										vendored
									
												View File
												
				@@ -29,7 +29,7 @@ jobs:

				    runs-on: ubuntu-22.04

				    name: Get PR number

				    # For security: only allow team members to run

				    if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "qubvel", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad", "stevhliu"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}

				    if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "qubvel", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad", "stevhliu", "ebezzam"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}

				    outputs:

				      PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}

				    steps:

									
										59

.github/workflows/self-nightly-caller.yml
									
										vendored
									
												View File
												
				@@ -1,43 +1,54 @@

				name: Self-hosted runner (nightly-ci)

				name: Nvidia CI with nightly torch

				on:

				  repository_dispatch:

				  schedule:

				    - cron: "17 2 * * *"

				  # triggered when the daily scheduled Nvidia CI is completed.

				  # This way, we can compare the results more easily.

				  workflow_run:

				    workflows: ["Nvidia CI"]

				    branches: ["main"]

				    types: [completed]

				  push:

				    branches:

				      - run_nightly_ci*

				      - run_ci_with_nightly_torch*

				# Used for `push` to easily modify the target workflow runs to compare against

				env:

				    prev_workflow_run_id: ""

				    other_workflow_run_id: ""

				jobs:

				  build_nightly_ci_images:

				    name: Build Nightly CI Docker Images

				    if: (github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_nightly_ci'))

				  build_nightly_torch_ci_images:

				    name: Build CI Docker Images with nightly torch

				    uses: ./.github/workflows/build-nightly-ci-docker-images.yml

				    secrets: inherit

				  setup:

				    name: Setup

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Setup

				        run: |

				          mkdir "setup_values"

				          echo "${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}" > "setup_values/prev_workflow_run_id.txt"

				          echo "${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}" > "setup_values/other_workflow_run_id.txt"

				      - name: Upload artifacts

				        uses: actions/upload-artifact@v4

				        with:

				          name: setup_values

				          path: setup_values

				  model-ci:

				    name: Model CI

				    needs: [build_nightly_ci_images]

				    needs: build_nightly_torch_ci_images

				    uses: ./.github/workflows/self-scheduled.yml

				    with:

				      job: run_models_gpu

				      slack_report_channel: "#transformers-ci-past-future"

				      runner: ci

				      docker: huggingface/transformers-all-latest-torch-nightly-gpu

				      ci_event: Nightly CI

				    secrets: inherit

				  deepspeed-ci:

				    name: DeepSpeed CI

				    needs: [build_nightly_ci_images]

				    uses: ./.github/workflows/self-scheduled.yml

				    with:

				      job: run_torch_cuda_extensions_gpu

				      slack_report_channel: "#transformers-ci-past-future"

				      runner: ci

				      # test deepspeed nightly build with the latest release torch

				      docker: huggingface/transformers-pytorch-deepspeed-latest-gpu

				      ci_event: Nightly CI

				      working-directory-prefix: /workspace

				      report_repo_id: hf-internal-testing/transformers_daily_ci_with_torch_nightly

				      commit_sha: ${{ github.event.workflow_run.head_sha || github.sha }}

				    secrets: inherit

									
										25

.github/workflows/self-push-amd-mi300-caller.yml
									
										vendored
									
												View File
											
				@@ -1,25 +0,0 @@

				name: Self-hosted runner (AMD mi300 CI caller)

				on:

				  #workflow_run:

				  #  workflows: ["Self-hosted runner (push-caller)"]

				  #  branches: ["main"]

				  #  types: [completed]

				  push:

				    branches:

				      - run_amd_push_ci_caller*

				    paths:

				      - "src/**"

				      - "tests/**"

				      - ".github/**"

				      - "templates/**"

				      - "utils/**"

				jobs:

				  run_amd_ci:

				    name: AMD mi300

				    if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && (startsWith(github.ref_name, 'run_amd_push_ci_caller') || startsWith(github.ref_name, 'mi300-ci'))))

				    uses: ./.github/workflows/self-push-amd.yml

				    with:

				      gpu_flavor: mi300

				    secrets: inherit

									
										6

.github/workflows/self-push.yml
									
										vendored
									
												View File
												
				@@ -36,7 +36,7 @@ jobs:

				      group: '${{ matrix.machine_type }}'

				    container:

				      image: huggingface/transformers-all-latest-gpu-push-ci

				      options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      test_map: ${{ steps.set-matrix.outputs.test_map }}

				@@ -136,7 +136,7 @@ jobs:

				      group: '${{ matrix.machine_type }}'

				    container:

				      image: huggingface/transformers-all-latest-gpu-push-ci

				      options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				    env:

				      # For the meaning of these environment variables, see the job `Setup`

				      CI_BRANCH_PUSH: ${{ github.event.ref }}

				@@ -362,7 +362,7 @@ jobs:

				      group: '${{ matrix.machine_type }}'

				    container:

				      image: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci

				      options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				    env:

				      # For the meaning of these environment variables, see the job `Setup`

				      CI_BRANCH_PUSH: ${{ github.event.ref }}

									
										67

.github/workflows/self-scheduled-amd-mi325-caller.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,67 @@

				name: Self-hosted runner scale set (AMD mi325 scheduled CI caller)

				# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml

				# For example, 1gpu scale set: amd-mi325-ci-1gpu

				#              2gpu scale set: amd-mi325-ci-2gpu

				on:

				  workflow_run:

				    workflows: ["Self-hosted runner (AMD scheduled CI caller)"]

				    branches: ["main"]

				    types: [completed]

				  push:

				    branches:

				      - run_amd_scheduled_ci_caller*

				jobs:

				  model-ci:

				    name: Model CI

				    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main

				    with:

				      job: run_models_gpu

				      slack_report_channel: "#amd-hf-ci"

				      runner_scale_set: amd-mi325-ci

				      docker: huggingface/transformers-pytorch-amd-gpu

				      ci_event: Scheduled CI (AMD) - mi325

				      report_repo_id: optimum-amd/transformers_daily_ci

				      env_file: /etc/podinfo/gha-gpu-isolation-settings

				    secrets: inherit

				  torch-pipeline:

				    name: Torch pipeline CI

				    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main

				    with:

				      job: run_pipelines_torch_gpu

				      slack_report_channel: "#amd-hf-ci"

				      runner_scale_set: amd-mi325-ci

				      docker: huggingface/transformers-pytorch-amd-gpu

				      ci_event: Scheduled CI (AMD) - mi325

				      report_repo_id: optimum-amd/transformers_daily_ci

				      env_file: /etc/podinfo/gha-gpu-isolation-settings

				    secrets: inherit

				  example-ci:

				    name: Example CI

				    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main

				    with:

				      job: run_examples_gpu

				      slack_report_channel: "#amd-hf-ci"

				      runner_scale_set: amd-mi325-ci

				      docker: huggingface/transformers-pytorch-amd-gpu

				      ci_event: Scheduled CI (AMD) - mi325

				      report_repo_id: optimum-amd/transformers_daily_ci

				      env_file: /etc/podinfo/gha-gpu-isolation-settings

				    secrets: inherit

				  deepspeed-ci:

				    name: DeepSpeed CI

				    uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main

				    with:

				      job: run_torch_cuda_extensions_gpu

				      slack_report_channel: "#amd-hf-ci"

				      runner_scale_set: amd-mi325-ci

				      docker: huggingface/transformers-pytorch-deepspeed-amd-gpu

				      ci_event: Scheduled CI (AMD) - mi325

				      report_repo_id: optimum-amd/transformers_daily_ci

				      env_file: /etc/podinfo/gha-gpu-isolation-settings

				    secrets: inherit

									
										22

.github/workflows/self-scheduled-amd-mi300-caller.yml → .github/workflows/self-scheduled-amd-mi355-caller.yml
									
										vendored
									
												View File
												
				@@ -1,8 +1,8 @@

				name: Self-hosted runner scale set (AMD mi300 scheduled CI caller)

				name: Self-hosted runner scale set (AMD mi355 scheduled CI caller)

				# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml

				# For example, 1gpu scale set: amd-mi300-ci-1gpu

				#              2gpu scale set: amd-mi300-ci-2gpu

				# For example, 1gpu : amd-mi355-ci-1gpu

				#              2gpu : amd-mi355-ci-2gpu

				on:

				  workflow_run:

				@@ -20,9 +20,9 @@ jobs:

				    with:

				      job: run_models_gpu

				      slack_report_channel: "#amd-hf-ci"

				      runner_scale_set: amd-mi300-ci

				      runner_scale_set: amd-mi355-ci

				      docker: huggingface/transformers-pytorch-amd-gpu

				      ci_event: Scheduled CI (AMD) - mi300

				      ci_event: Scheduled CI (AMD) - mi355

				      report_repo_id: optimum-amd/transformers_daily_ci

				    secrets: inherit

				@@ -32,9 +32,9 @@ jobs:

				    with:

				      job: run_pipelines_torch_gpu

				      slack_report_channel: "#amd-hf-ci"

				      runner_scale_set: amd-mi300-ci

				      runner_scale_set: amd-mi355-ci

				      docker: huggingface/transformers-pytorch-amd-gpu

				      ci_event: Scheduled CI (AMD) - mi300

				      ci_event: Scheduled CI (AMD) - mi355

				      report_repo_id: optimum-amd/transformers_daily_ci

				    secrets: inherit

				@@ -44,9 +44,9 @@ jobs:

				    with:

				      job: run_examples_gpu

				      slack_report_channel: "#amd-hf-ci"

				      runner_scale_set: amd-mi300-ci

				      runner_scale_set: amd-mi355-ci

				      docker: huggingface/transformers-pytorch-amd-gpu

				      ci_event: Scheduled CI (AMD) - mi300

				      ci_event: Scheduled CI (AMD) - mi355

				      report_repo_id: optimum-amd/transformers_daily_ci

				    secrets: inherit

				@@ -56,8 +56,8 @@ jobs:

				    with:

				      job: run_torch_cuda_extensions_gpu

				      slack_report_channel: "#amd-hf-ci"

				      runner_scale_set: amd-mi300-ci

				      runner_scale_set: amd-mi355-ci

				      docker: huggingface/transformers-pytorch-deepspeed-amd-gpu

				      ci_event: Scheduled CI (AMD) - mi300

				      ci_event: Scheduled CI (AMD) - mi355

				      report_repo_id: optimum-amd/transformers_daily_ci

				    secrets: inherit

									
										11

.github/workflows/self-scheduled-caller.yml
									
										vendored
									
												View File
												
				@@ -1,5 +1,4 @@

				name: Self-hosted runner (scheduled)

				name: Nvidia CI

				on:

				  repository_dispatch:

				@@ -7,7 +6,7 @@ on:

				    - cron: "17 2 * * *"

				  push:

				    branches:

				      - run_scheduled_ci*

				      - run_nvidia_ci*

				  workflow_dispatch:

				    inputs:

				      prev_workflow_run_id:

				@@ -54,6 +53,7 @@ jobs:

				      docker: huggingface/transformers-all-latest-gpu

				      ci_event: Daily CI

				      report_repo_id: hf-internal-testing/transformers_daily_ci

				      commit_sha: ${{ github.sha }}

				    secrets: inherit

				  torch-pipeline:

				@@ -65,6 +65,7 @@ jobs:

				      docker: huggingface/transformers-pytorch-gpu

				      ci_event: Daily CI

				      report_repo_id: hf-internal-testing/transformers_daily_ci

				      commit_sha: ${{ github.sha }}

				    secrets: inherit

				  example-ci:

				@@ -76,6 +77,7 @@ jobs:

				      docker: huggingface/transformers-all-latest-gpu

				      ci_event: Daily CI

				      report_repo_id: hf-internal-testing/transformers_daily_ci

				      commit_sha: ${{ github.sha }}

				    secrets: inherit

				  trainer-fsdp-ci:

				@@ -87,6 +89,7 @@ jobs:

				      docker: huggingface/transformers-all-latest-gpu

				      ci_event: Daily CI

				      report_repo_id: hf-internal-testing/transformers_daily_ci

				      commit_sha: ${{ github.sha }}

				    secrets: inherit

				  deepspeed-ci:

				@@ -99,6 +102,7 @@ jobs:

				      ci_event: Daily CI

				      working-directory-prefix: /workspace

				      report_repo_id: hf-internal-testing/transformers_daily_ci

				      commit_sha: ${{ github.sha }}

				    secrets: inherit

				  quantization-ci:

				@@ -110,4 +114,5 @@ jobs:

				      docker: huggingface/transformers-quantization-latest-gpu

				      ci_event: Daily CI

				      report_repo_id: hf-internal-testing/transformers_daily_ci

				      commit_sha: ${{ github.sha }}

				    secrets: inherit

									
										25

.github/workflows/self-scheduled.yml
									
										vendored
									
												View File
												
				@@ -1,4 +1,4 @@

				name: Self-hosted runner (scheduled)

				name: Nvidia CI (job definitions)

				# Note that each job's dependencies go into a corresponding docker file.

				#

				@@ -28,6 +28,9 @@ on:

				      report_repo_id:

				        required: true

				        type: string

				      commit_sha:

				        required: false

				        type: string

				env:

				@@ -46,8 +49,8 @@ env:

				jobs:

				  setup:

				    if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu", "run_quantization_torch_gpu"]'), inputs.job)

				    name: Setup

				    if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu", "run_quantization_torch_gpu"]'), inputs.job)

				    strategy:

				      matrix:

				        machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]

				@@ -55,7 +58,7 @@ jobs:

				      group: '${{ matrix.machine_type }}'

				    container:

				      image: huggingface/transformers-all-latest-gpu

				      options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				    outputs:

				      folder_slices: ${{ steps.set-matrix.outputs.folder_slices }}

				      slice_ids: ${{ steps.set-matrix.outputs.slice_ids }}

				@@ -119,6 +122,7 @@ jobs:

				      slice_id: ${{ matrix.slice_id }}

				      runner_map: ${{ needs.setup.outputs.runner_map }}

				      docker: ${{ inputs.docker }}

				      commit_sha: ${{ inputs.commit_sha || github.sha }}

				    secrets: inherit

				  run_trainer_and_fsdp_gpu:

				@@ -137,6 +141,7 @@ jobs:

				      slice_id: ${{ matrix.slice_id }}

				      runner_map: ${{ needs.setup.outputs.runner_map }}

				      docker: ${{ inputs.docker }}

				      commit_sha: ${{ inputs.commit_sha || github.sha }}

				      report_name_prefix: run_trainer_and_fsdp_gpu

				    secrets: inherit

				@@ -155,7 +160,7 @@ jobs:

				    steps:

				      - name: Update clone

				        working-directory: /transformers

				        run: git fetch && git checkout ${{ github.sha }}

				        run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}

				      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)

				        working-directory: /transformers

				@@ -219,11 +224,11 @@ jobs:

				      group: '${{ matrix.machine_type }}'

				    container:

				      image: huggingface/transformers-all-latest-gpu

				      options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

				    steps:

				      - name: Update clone

				        working-directory: /transformers

				        run: git fetch && git checkout ${{ github.sha }}

				        run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}

				      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)

				        working-directory: /transformers

				@@ -292,7 +297,7 @@ jobs:

				    steps:

				      - name: Update clone

				        working-directory: ${{ inputs.working-directory-prefix }}/transformers

				        run: git fetch && git checkout ${{ github.sha }}

				        run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}

				      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)

				        working-directory: ${{ inputs.working-directory-prefix }}/transformers

				@@ -400,7 +405,7 @@ jobs:

				      - name: Update clone

				        working-directory: /transformers

				        run: git fetch && git checkout ${{ github.sha }}

				        run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}

				      - name: Reinstall transformers in edit mode (remove the one installed during docker image build)

				        working-directory: /transformers

				@@ -464,6 +469,7 @@ jobs:

				        uses: actions/checkout@v4

				        with:

				          fetch-depth: 2

				          ref: ${{ inputs.commit_sha || github.sha }}

				      - name: Install transformers

				        run: pip install transformers

				@@ -518,6 +524,7 @@ jobs:

				      quantization_matrix: ${{ needs.setup.outputs.quantization_matrix }}

				      ci_event: ${{ inputs.ci_event }}

				      report_repo_id: ${{ inputs.report_repo_id }}

				      commit_sha: ${{ inputs.commit_sha || github.sha }}

				    secrets: inherit

				@@ -528,7 +535,7 @@ jobs:

				    uses: ./.github/workflows/check_failed_tests.yml

				    with:

				      docker: ${{ inputs.docker }}

				      start_sha: ${{ github.sha }}

				      start_sha: ${{ inputs.commit_sha || github.sha }}

				      job: ${{ inputs.job }}

				      slack_report_channel: ${{ inputs.slack_report_channel }}

				      ci_event: ${{ inputs.ci_event }}

									
										12

.github/workflows/slack-report.yml
									
										vendored
									
												View File
												
				@@ -24,6 +24,10 @@ on:

				      report_repo_id:

				        required: true

				        type: string

				      commit_sha:

				        required: false

				        type: string

				env:

				  TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}

				@@ -32,7 +36,7 @@ jobs:

				  send_results:

				    name: Send results to webhook

				    runs-on: ubuntu-22.04

				    if: always()

				    if: always() && !cancelled()

				    steps:

				      - name: Preliminary job status

				        shell: bash

				@@ -41,6 +45,10 @@ jobs:

				          echo "Setup status: ${{ inputs.setup_status }}"

				      - uses: actions/checkout@v4

				        with:

				          fetch-depth: 2

				          ref: ${{ inputs.commit_sha || github.sha }}

				      - uses: actions/download-artifact@v4

				      - name: Prepare some setup values

				@@ -67,7 +75,7 @@ jobs:

				          SLACK_REPORT_CHANNEL: ${{ inputs.slack_report_channel }}

				          ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}

				          CI_EVENT: ${{ inputs.ci_event }}

				          CI_SHA: ${{ github.sha }}

				          CI_SHA: ${{ inputs.commit_sha || github.sha }}

				          CI_TEST_JOB: ${{ inputs.job }}

				          SETUP_STATUS: ${{ inputs.setup_status }}

				          REPORT_REPO_ID: ${{ inputs.report_repo_id }}

									
										6

CONTRIBUTING.md
									
												View File
												
				@@ -68,8 +68,7 @@ already reported** (use the search bar on GitHub under Issues). Your issue shoul

				Once you've confirmed the bug hasn't already been reported, please include the following information in your issue so we can quickly resolve it:

				* Your **OS type and version** and **Python**, **PyTorch** and

				  **TensorFlow** versions when applicable.

				* Your **OS type and version** and **Python**, and **PyTorch** versions when applicable.

				* A short, self-contained, code snippet that allows us to reproduce the bug in

				  less than 30s.

				* The *full* traceback if an exception is raised.

				@@ -165,8 +164,7 @@ You'll need **[Python 3.9](https://github.com/huggingface/transformers/blob/main

				   mode with the `-e` flag.

				   Depending on your OS, and since the number of optional dependencies of Transformers is growing, you might get a

				   failure with this command. If that's the case make sure to install the Deep Learning framework you are working with

				   (PyTorch, TensorFlow and/or Flax) then do:

				   failure with this command. If that's the case make sure to install Pytorch then do:

				   ```bash

				   pip install -e ".[quality]"

1

Makefile

View File

@@ -52,6 +52,7 @@ repo-consistency:
 	python utils/check_doctest_list.py
 	python utils/update_metadata.py --check-only
 	python utils/check_docstrings.py
 	python utils/add_dates.py
 # this target runs checks on all files

									
										10

README.md
									
												View File
												
				@@ -44,7 +44,7 @@ limitations under the License.

				        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ja.md">日本語</a> |

				        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_hd.md">हिन्दी</a> |

				        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ru.md">Русский</a> |

				        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_pt-br.md">Рortuguês</a> |

				        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_pt-br.md">Português</a> |

				        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_te.md">తెలుగు</a> |

				        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_fr.md">Français</a> |

				        <a href="https://github.com/huggingface/transformers/blob/main/i18n/README_de.md">Deutsch</a> |

				@@ -147,7 +147,7 @@ chat = [

				    {"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}

				]

				pipeline = pipeline(task="text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.bfloat16, device_map="auto")

				pipeline = pipeline(task="text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", dtype=torch.bfloat16, device_map="auto")

				response = pipeline(chat, max_new_tokens=512)

				print(response[0]["generated_text"][-1]["content"])

				```

				@@ -242,7 +242,7 @@ pipeline(

				- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.

				- The training API is optimized to work with PyTorch models provided by Transformers. For generic machine learning loops, you should use another library like [Accelerate](https://huggingface.co/docs/accelerate).

				- The [example scripts]((https://github.com/huggingface/transformers/tree/main/examples)) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work.

				- The [example scripts](https://github.com/huggingface/transformers/tree/main/examples) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work.

				## 100 projects using Transformers

				@@ -280,8 +280,8 @@ Expand each modality below to see a few example models for various use cases.

				- Automatic mask generation with [SAM](https://huggingface.co/facebook/sam-vit-base)

				- Depth estimation with [DepthPro](https://huggingface.co/apple/DepthPro-hf)

				- Image classification with [DINO v2](https://huggingface.co/facebook/dinov2-base)

				- Keypoint detection with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor)

				- Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue)

				- Keypoint detection with [SuperPoint](https://huggingface.co/magic-leap-community/superpoint)

				- Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor)

				- Object detection with [RT-DETRv2](https://huggingface.co/PekingU/rtdetr_v2_r50vd)

				- Pose Estimation with [VitPose](https://huggingface.co/usyd-community/vitpose-base-simple)

				- Universal segmentation with [OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_swin_large)

									
										2

SECURITY.md
									
												View File
												
				@@ -14,7 +14,7 @@ Models uploaded on the Hugging Face Hub come in different formats. We heavily re

				models in the [`safetensors`](https://github.com/huggingface/safetensors) format (which is the default prioritized

				by the transformers library), as developed specifically to prevent arbitrary code execution on your system.

				To avoid loading models from unsafe formats(e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.

				To avoid loading models from unsafe formats (e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.

				### Remote code

BIN
benchmark/._README.md Normal file

View File

Binary file not shown.

BIN
benchmark/._init.py Normal file

View File

Binary file not shown.

BIN
benchmark/._benchmark.py Normal file

View File

Binary file not shown.

BIN
benchmark/._config Normal file

View File

Binary file not shown.

BIN
benchmark/._default.yml Normal file

View File

Binary file not shown.

BIN
benchmark/._grafana_dashboard.json Normal file

View File

Binary file not shown.

Compare commits

596 Commits v4.53.2-mo ... ko-deepsee

BIN ._.DS_Store Normal file View File

BIN ._.circleci Normal file View File

BIN ._.git Normal file View File

BIN ._.gitattributes Normal file View File

BIN ._.github Normal file View File

BIN ._.gitignore Normal file View File

BIN ._AGENTS.md Normal file View File

BIN ._CITATION.cff Normal file View File

BIN ._CODE_OF_CONDUCT.md Normal file View File

BIN ._ISSUES.md Normal file View File

BIN ._LICENSE Normal file View File

BIN ._awesome-transformers.md Normal file View File

BIN ._benchmark Normal file View File

BIN ._docker Normal file View File

BIN ._docs Normal file View File

BIN ._examples Normal file View File

BIN ._i18n Normal file View File

BIN ._notebooks Normal file View File

BIN ._scripts Normal file View File

BIN ._src Normal file View File

BIN ._templates Normal file View File

BIN ._tests Normal file View File

BIN ._utils Normal file View File

BIN .circleci/._TROUBLESHOOT.md Normal file View File

BIN .circleci/._config.yml Normal file View File

BIN .circleci/._parse_test_outputs.py Normal file View File

15 .circleci/create_circleci_config.py Unescape Escape View File

11 .gitattributes vendored Unescape Escape View File

BIN .github/._ISSUE_TEMPLATE vendored Normal file View File

BIN .github/._PULL_REQUEST_TEMPLATE.md vendored Normal file View File

BIN .github/._conda vendored Normal file View File

BIN .github/._scripts vendored Normal file View File

BIN .github/._workflows vendored Normal file View File

BIN .github/ISSUE_TEMPLATE/._bug-report.yml vendored Normal file View File

BIN .github/ISSUE_TEMPLATE/._config.yml vendored Normal file View File

BIN .github/ISSUE_TEMPLATE/._feature-request.yml vendored Normal file View File

BIN .github/ISSUE_TEMPLATE/._i18n.md vendored Normal file View File

BIN .github/ISSUE_TEMPLATE/._migration.yml vendored Normal file View File

BIN .github/ISSUE_TEMPLATE/._new-model-addition.yml vendored Normal file View File

BIN .github/conda/._build.sh vendored Normal file View File

BIN .github/conda/._meta.yaml vendored Normal file View File

BIN .github/scripts/._assign_reviewers.py vendored Normal file View File

BIN .github/scripts/._codeowners_for_review_action vendored Normal file View File

BIN .github/workflows/._TROUBLESHOOT.md vendored Normal file View File

BIN .github/workflows/._add-model-like.yml vendored Normal file View File

BIN .github/workflows/._assign-reviewers.yml vendored Normal file View File

BIN .github/workflows/._build-ci-docker-images.yml vendored Normal file View File

BIN .github/workflows/._build-docker-images.yml vendored Normal file View File

BIN .github/workflows/._build-nightly-ci-docker-images.yml vendored Normal file View File

BIN .github/workflows/._build-past-ci-docker-images.yml vendored Normal file View File

BIN .github/workflows/._check_tiny_models.yml vendored Normal file View File

BIN .github/workflows/._get-pr-info.yml vendored Normal file View File

BIN .github/workflows/._get-pr-number.yml vendored Normal file View File

BIN .github/workflows/._model_jobs_intel_gaudi.yml vendored Normal file View File

BIN .github/workflows/._new_model_pr_merged_notification.yml vendored Normal file View File

BIN .github/workflows/._pr-style-bot.yml vendored Normal file View File

BIN .github/workflows/._push-important-models.yml vendored Normal file View File

BIN .github/workflows/._release-conda.yml vendored Normal file View File

BIN .github/workflows/._self-nightly-past-ci-caller.yml vendored Normal file View File

BIN .github/workflows/._self-past-caller.yml vendored Normal file View File

BIN .github/workflows/._self-push-amd-mi210-caller.yml vendored Normal file View File

BIN .github/workflows/._self-push-amd-mi250-caller.yml vendored Normal file View File

BIN .github/workflows/._self-push-amd.yml vendored Normal file View File

BIN .github/workflows/._self-push-caller.yml vendored Normal file View File

BIN .github/workflows/._self-scheduled-amd-caller.yml vendored Normal file View File

BIN .github/workflows/._self-scheduled-amd-mi250-caller.yml vendored Normal file View File

BIN .github/workflows/._self-scheduled-intel-gaudi.yml vendored Normal file View File

BIN .github/workflows/._self-scheduled-intel-gaudi3-caller.yml vendored Normal file View File

BIN .github/workflows/._ssh-runner.yml vendored Normal file View File

BIN .github/workflows/._stale.yml vendored Normal file View File

BIN .github/workflows/._trufflehog.yml vendored Normal file View File

BIN .github/workflows/._update_metdata.yml vendored Normal file View File

BIN .github/workflows/._upload_pr_documentation.yml vendored Normal file View File

2 .github/workflows/benchmark.yml vendored Unescape Escape View File

5 .github/workflows/check_failed_tests.yml vendored Unescape Escape View File

49 .github/workflows/collated-reports.yml vendored Normal file Unescape Escape View File

2 .github/workflows/doctest_job.yml vendored Unescape Escape View File

2 .github/workflows/doctests.yml vendored Unescape Escape View File

596 Commits

v4.53.2-mo ... ko-deepsee

BIN
._.DS_Store Normal file

View File

BIN
._.circleci Normal file

View File

BIN
._.git Normal file

View File

BIN
._.gitattributes Normal file

View File

BIN
._.github Normal file

View File

BIN
._.gitignore Normal file

View File

BIN
._AGENTS.md Normal file

View File

BIN
._CITATION.cff Normal file

View File

BIN
._CODE_OF_CONDUCT.md Normal file

View File

BIN
._ISSUES.md Normal file

View File

BIN
._LICENSE Normal file

View File

BIN
._awesome-transformers.md Normal file

View File

BIN
._benchmark Normal file

View File

BIN
._docker Normal file

View File

BIN
._docs Normal file

View File

BIN
._examples Normal file

View File

BIN
._i18n Normal file

View File

BIN
._notebooks Normal file

View File

BIN
._scripts Normal file

View File

BIN
._src Normal file

View File

BIN
._templates Normal file

View File

BIN
._tests Normal file

View File

BIN
._utils Normal file

View File

BIN
.circleci/._TROUBLESHOOT.md Normal file

View File

BIN
.circleci/._config.yml Normal file

View File

BIN
.circleci/._parse_test_outputs.py Normal file

View File

15

.circleci/create_circleci_config.py

View File

11

.gitattributes vendored

View File

BIN
.github/._ISSUE_TEMPLATE vendored Normal file

View File

BIN
.github/._PULL_REQUEST_TEMPLATE.md vendored Normal file

View File

BIN
.github/._conda vendored Normal file

View File

BIN
.github/._scripts vendored Normal file

View File

BIN
.github/._workflows vendored Normal file

View File

BIN
.github/ISSUE_TEMPLATE/._bug-report.yml vendored Normal file

View File

BIN
.github/ISSUE_TEMPLATE/._config.yml vendored Normal file

View File

BIN
.github/ISSUE_TEMPLATE/._feature-request.yml vendored Normal file

View File

BIN
.github/ISSUE_TEMPLATE/._i18n.md vendored Normal file

View File

BIN
.github/ISSUE_TEMPLATE/._migration.yml vendored Normal file

View File

BIN
.github/ISSUE_TEMPLATE/._new-model-addition.yml vendored Normal file

View File

BIN
.github/conda/._build.sh vendored Normal file

View File

BIN
.github/conda/._meta.yaml vendored Normal file

View File

BIN
.github/scripts/._assign_reviewers.py vendored Normal file

View File

BIN
.github/scripts/._codeowners_for_review_action vendored Normal file

View File

BIN
.github/workflows/._TROUBLESHOOT.md vendored Normal file

View File

BIN
.github/workflows/._add-model-like.yml vendored Normal file

View File

BIN
.github/workflows/._assign-reviewers.yml vendored Normal file

View File

BIN
.github/workflows/._build-ci-docker-images.yml vendored Normal file

View File

BIN
.github/workflows/._build-docker-images.yml vendored Normal file

View File

BIN
.github/workflows/._build-nightly-ci-docker-images.yml vendored Normal file

View File

BIN
.github/workflows/._build-past-ci-docker-images.yml vendored Normal file

View File

BIN
.github/workflows/._check_tiny_models.yml vendored Normal file

View File

BIN
.github/workflows/._get-pr-info.yml vendored Normal file

View File

BIN
.github/workflows/._get-pr-number.yml vendored Normal file

View File

BIN
.github/workflows/._model_jobs_intel_gaudi.yml vendored Normal file

View File

BIN
.github/workflows/._new_model_pr_merged_notification.yml vendored Normal file

View File

BIN
.github/workflows/._pr-style-bot.yml vendored Normal file

View File

BIN
.github/workflows/._push-important-models.yml vendored Normal file

View File

BIN
.github/workflows/._release-conda.yml vendored Normal file

View File

BIN
.github/workflows/._self-nightly-past-ci-caller.yml vendored Normal file

View File

BIN
.github/workflows/._self-past-caller.yml vendored Normal file

View File

BIN
.github/workflows/._self-push-amd-mi210-caller.yml vendored Normal file

View File

BIN
.github/workflows/._self-push-amd-mi250-caller.yml vendored Normal file

View File

BIN
.github/workflows/._self-push-amd.yml vendored Normal file

View File

BIN
.github/workflows/._self-push-caller.yml vendored Normal file

View File

BIN
.github/workflows/._self-scheduled-amd-caller.yml vendored Normal file

View File

BIN
.github/workflows/._self-scheduled-amd-mi250-caller.yml vendored Normal file

View File

BIN
.github/workflows/._self-scheduled-intel-gaudi.yml vendored Normal file

View File

BIN
.github/workflows/._self-scheduled-intel-gaudi3-caller.yml vendored Normal file

View File

BIN
.github/workflows/._ssh-runner.yml vendored Normal file

View File

BIN
.github/workflows/._stale.yml vendored Normal file

View File

BIN
.github/workflows/._trufflehog.yml vendored Normal file

View File

BIN
.github/workflows/._update_metdata.yml vendored Normal file

View File

BIN
.github/workflows/._upload_pr_documentation.yml vendored Normal file

View File

2

.github/workflows/benchmark.yml vendored

View File

5

.github/workflows/check_failed_tests.yml vendored

View File

49

.github/workflows/collated-reports.yml vendored Normal file

View File

2

.github/workflows/doctest_job.yml vendored

View File

2

.github/workflows/doctests.yml vendored

View File

5

.github/workflows/model_jobs.yml vendored

View File