Commit Graph

20222 Commits

Author SHA1 Message Date
ccb1d06ecf Convert binary/image/model files to Git LFS pointers
Some checks failed
Secret Leaks / trufflehog (push) Has been cancelled
2026-04-11 01:50:39 +09:00
e4b809e5b2 Add Git LFS tracking for binary/model/image files 2026-04-11 01:45:26 +09:00
ssum21
e52c5890d1 add_toctree.yml 2025-08-30 15:57:15 +09:00
SSUM
b80c173b8f Update docs/source/ko/model_doc/deepseek_v3.md
Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
2025-08-27 18:54:00 +09:00
SSUM
15b4988bb7 Update docs/source/ko/model_doc/deepseek_v3.md
Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
2025-08-27 18:53:52 +09:00
SSUM
231653db22 Merge branch 'main' into ko-deepseek_v3.md 2025-08-27 13:54:56 +09:00
Yih-Dar
ff8b88a948 Fix nightly torch CI (#40469)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-26 22:02:15 +02:00
Yih-Dar
74ad608a2b Not to shock AMD team by the cancelled workflow run notification ❤️ 💖 (#40467) 2025-08-26 20:53:24 +02:00
SowmiyaNarayanan G
c8c7623f20 Update SegFormer model card (#40417)
* Update SegFormer model card

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update the segformer model card

* Remove quantization example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-26 08:27:25 -07:00
StevenBucaille
78f32c3917 [pipeline] Add Keypoint Matching pipeline (#39970)
* feat: keypoint-matcher pipeline

* docs: added keypoint-matcher pipeline in docs

* fix: added missing statements for repo consistency

* docs: updated SuperGlue, LightGlue and EfficientLoFTR docs

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* test: fixed run_pipeline_test

* update pipeline typing and docs

* update tests

* update docs snippets

* Fix import error

* fix: pipeline init

* pt framework

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-26 15:26:57 +01:00
Joao Gante
6451294f6f [RoPE] explicit factor > implicit factor in YaRN (#40320)
explicit factor > implicit factor
2025-08-26 14:58:28 +01:00
audioXD
5a8ba87ecf [fast_image_processor] fix image normalization for resize (#40436) 2025-08-26 13:49:51 +00:00
VED
0ce6709e70 deci gguf support (#38669)
* deci gguf support

* make style

* tests for deci

* try except removed

* style

* try except removed
2025-08-26 13:43:17 +00:00
Matt
263d06fedc Fix extra template loading (#40455)
* Fix extra template loading

* Reformat

* Trigger tests
2025-08-26 14:01:01 +01:00
Pedro Cuenca
58cebc848b flash_paged: s_aux may not exist (#40434)
Some implementations (i.e.,
https://huggingface.co/kernels-community/vllm-flash-attn3) support an
`s_aux` arg for attention sinks, but others
(https://huggingface.co/kernels-community/flash-attn) do not. If s_aux
is present in the kwargs, we forward it, otherwise we don't.

The user will still get an error if they use a model like gpt-oss-20b
with an implementation that does not support `s_aux`, but models that
don't use it won't error out. For example, [this is currently
failing](399cd5c04b/examples/pytorch/continuous_batching.py (L16))
because we are sending `s_aux: None` in the dict.
2025-08-26 13:15:59 +02:00
Rémi Ouazan
34108a2230 Continuous batching refactor (#40426)
* Rework of the CB example

* Further rework of CB example

* Refactor PA cache, slice on tokens, add debug prints -- WIP

* Slice cache -- WIP

* Added a mechanism to check batched outputs in CB script

* Less logging, debug flag for slice, !better reset! -- WIP

* QOL and safety margins

* Refactor and style

* Better saving of cb example

* Fix

* Fixes and QOL

* Mor einformations about metrics

* Further logging

* Style

* Licenses

* Removed some comments

* Add a slice input flag

* Fix in example

* Added back some open-telemetry deps

* Removed some aux function

* Added FA2 option to example script

* Fixed math (all of it)

* Added a simple example

* Renamed core to classes

* Made allocation of attention mask optionnal

* Style
2025-08-26 13:01:42 +02:00
Manuel de Prada Corral
49e168ff08 🚨 Remove Contrastive Search decoding strategy (#40428)
* delete go brrr

* fix tests

* review
2025-08-26 12:31:46 +02:00
Rémi Ouazan
b8184b7ce9 Make cache_config not mandatory (#40316)
* Relaxed assumptions on cache_config

* Review compliance

* Style

* Styyyle

* Removed default and added args

* Rebase mishapfix

* Propagate args to TorchExportableModuleForDecoderOnlyLM

* Fix the test I wanted  fixed in this PR

* Added some AMD expectation related to cache tests
2025-08-26 12:06:17 +02:00
Yao Matrix
32fcc24667 rename get_cuda_warm_up_factor to get_accelerator_warm_up_factor (#40363)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-08-26 09:56:35 +00:00
Raushan Turganbay
f690a2a1e0 [video processors] decode only sampled videos -> less RAM and faster processing (#39600)
* draft update two models for now

* batch update all VLMs first

* update some more image processors

* update

* fix a few tests

* just make CI green for now

* fix copies

* update once more

* update

* unskip the test

* fix these two

* fix torchcodec audio loading

* maybe

* yay, i fixed torchcodec installation and now can actually test it

* fix copies deepseek

* make sure the metadata is returrned when users request it

* add docs

* update

* fixup

* Update src/transformers/audio_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/glm4v/video_processing_glm4v.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* update

* what if we set some metadata attr to `None`

* fix CI

* fix one test

* fix 4 channel test

* fix glm timestemps

* rebase gone wrong

* raise warning once

* fixup

* typo

* fix copies

* ifx smolvlm test

* this is why torch's official benchmark was faster, set threads to `0`

* Apply style fixes

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-26 11:38:02 +02:00
Xin Yao
64ae6e6b1d fix qwen25-vl grad acc (#40333)
* fix qwen25—vl grad acc

* fix Qwen2_5_VLForConditionalGeneration for accepts_loss_kwargs

* fix ci

* fix ci

* fix typo

* fix CI
2025-08-26 09:30:06 +00:00
Kashif Rasul
6d2bb1e04d [Trainer] accelerate contextparallel support in trainer (#40205)
* initial context_parallel_size support in trainer

* For context parallelism, use AVG instead of SUM to avoid over-accounting tokens

* use parallelism_config.cp_enabled

* add parallelism_config to trainer state

* warn when auto-enabling FSDP

* fix some reviews

* WIP: somewhat matching loss

* Feat: add back nested_gather

* Feat: cleanup

* Fix: raise on non-sdpa attn

* remove context_parallel_size from TrainingArguments

* if we have parallelism_config, we defer to get_state_dict from accelerate

* fix form review

* Feat: add parallelism config support

* Chore: revert some unwanted formatting changes

* Fix: check None

* Check none 2

* Fix: remove duplicate import

* Update src/transformers/trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Fin

* require accerelate 1.10.1 and higer

---------

Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-26 09:28:48 +00:00
Pavel Iakubovskii
63caaea1fb Refactor ViT-like models (#39816)
* refactor vit

* fix

* fixup

* turn off FX tests

* AST

* deit

* dinov2

* dinov2_with_registers

* dpt

* depth anything (nit)

* depth pro (nit)

* ijepa

* ijepa (modular)

* prompt_depth_anything (nit)

* vilt (nit)

* zoedepth (nit)

* videomae

* vit_mae

* vit_msn

* vivit

* yolos

* eomt

* vitpose

* update auto backbone

* disable `fx` and export tests (dnov2, dpt, ijepa, vit, vitpose)

* fix kwargs for backbone

* fix

* convnext

* fixup

* update convnext layernorm

* fix-copies layer_norm

* convnextv2

* explicit output_hidden_states for models with backbones

* explicit hidden states collection for dinov2

* tests fixed

* fix DPT as well

* fix dinov2 with registers

* add comment
2025-08-26 11:14:06 +02:00
Yih-Dar
922e65b3fc Fix non FA2 tests after FA2 installed in CI docker image (#40430)
* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-26 10:36:50 +02:00
ivarflakstad
e68146fbe7 Fix collated reports model name entry (#40441)
Some checks failed
Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled
Build documentation / build (push) Has been cancelled
Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled
Self-hosted runner (push-caller) / Check if setup was changed (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Transformers metadata / build_and_package (push) Has been cancelled
Slow tests on important models (on Push - A10) / Slow & FA2 tests (push) Has been cancelled
Self-hosted runner (push-caller) / build-docker-containers (push) Has been cancelled
Self-hosted runner (push-caller) / Trigger Push CI (push) Has been cancelled
Check Tiny Models / Check tiny models (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI / Setup (push) Has been cancelled
Nvidia CI / Model CI (push) Has been cancelled
Nvidia CI / Torch pipeline CI (push) Has been cancelled
Nvidia CI / Example CI (push) Has been cancelled
Nvidia CI / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI / DeepSpeed CI (push) Has been cancelled
Nvidia CI / Quantization CI (push) Has been cancelled
Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled
Doctests / Setup (push) Has been cancelled
Doctests / Call doctest jobs (push) Has been cancelled
Doctests / Send results to webhook (push) Has been cancelled
Stale Bot / Close Stale Issues (push) Has been cancelled
2025-08-25 20:36:01 +00:00
Ákos Hadnagy
8ce633cc75 InternVL MI325 test expectations (#40387)
* Adjust ROCm expectations

* MI355

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-08-25 22:00:35 +02:00
ivarflakstad
7637d298b3 Fix collated reports uploading (#40440) 2025-08-25 21:49:59 +02:00
id01
fa59cf9c9f Fix https://github.com/huggingface/transformers/issues/40292 (#40439)
* Fix https://github.com/huggingface/transformers/issues/40292

* Trigger tests

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2025-08-25 20:12:57 +01:00
ivarflakstad
f0e87b436d Fix collated reports model directory traversal (#40437)
Fix model dir traversal
2025-08-25 18:01:58 +00:00
Ákos Hadnagy
ef406902bf Gemma3 text fixes: Add expectations for MI325 (#40384)
* Add expectations for MI325

* Ruff

* Adjust CUDA expectations as well

* Another attempt for CUDA expectations
2025-08-25 19:57:50 +02:00
Judy
c81723d31b 🌐 [i18n-KO] Translated models.md to Korean (#39518)
* docs: ko: models.md

* feat: nmt draft

* fix: manual edits

* Resolved _toctree.yaml conflict during merge from main

* Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Apply suggestions from code review

* fix: update toctree

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-25 09:17:08 -07:00
ivarflakstad
6b5eab70e4 Remove working-dir from collated reports job (#40435) 2025-08-25 18:14:35 +02:00
Joao Gante
1763ef2951 [docs] remove last references to transformers TF classes/methods (#40429)
* halfway through tasks

* complete

* Update utils/check_docstrings.py
2025-08-25 16:30:59 +01:00
Olumayowa Akinkuehinmi
eac4f00bdf Fix typo and improve GPU kernel check error message in MXFP4 quantization (#40349) (#40408)
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-25 15:21:55 +00:00
Joshua Chin
d8f2edcc46 Add tokenizer_kwargs argument to the text generation pipeline (#40364)
* Add `tokenizer_kwargs`  arg to text generation pipeline.

* chore: re-run CI

* Rename `tokenizer_kwargs` to `tokenizer_encode_kwargs` for text generation pipeline

* Fix `tokenizer_encode_kwargs` doc string.

* Fix note related to `tokenizer _kwargs` in text generation pipeline

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-25 15:21:19 +00:00
ivarflakstad
1a35d07f56 Update collated reports working directory and --path (#40433) 2025-08-25 15:18:26 +00:00
Cyril Vallez
399cd5c04b Fix modular for modernbert-decoder (#40431)
* fix the modular

* CI
2025-08-25 16:50:49 +02:00
Manuel de Prada Corral
ea8d9c8f06 🚨 Remove DoLa decoding strategy (#40082)
* remove dola generation strategy

* add fast test
2025-08-25 16:33:27 +02:00
Arthur
6bf6f8490c [Mxfp4] Add a way to save with a quantization method (#40176)
* add a test

* tempdir

* fix import issue[

* wow I am tired

* properly init

* i am not super familiar with quantizer api :|

* set to TRUE fro now

* full support

* push current changes

* will clean this later but the imports are a shitshow here

* this correctly saves the block and scales but forward seems broken

* quanitze was not correct

* fix storage

* why were bias even included

* finally!

* style

* fix style

* remove print

* lazy import

* up

* not sure what happens this works now?

* holy molly it was not so far

* okay this seems to work!

* workings!!!

* allow save_pretrained to create PR

* Apply suggestions from code review

* fixup

* add deqyabtze fakse as wek

* working new

* fix

* rm swizzle and unswizzle during saving

* rm print

* Update src/transformers/modeling_utils.py

* fix

* style

---------

Co-authored-by: Marc Sun <marc@huggingface.co>
2025-08-25 16:27:19 +02:00
Andrew Chauzov
04c2bae3a8 Fix label smoothing incompatibility with multi-label classification (#40296)
* Fix label smoothing incompatibility with multi-label classification (#40258)

* Improve label smoothing multi-label check based on reviewer feedback

- Move check from LabelSmoother to Trainer.__init__() for better architecture
- Use model.config.problem_type instead of tensor inference for robustness
- Warn and disable smoothing instead of raising error for better UX
- Update test to verify warning behavior
2025-08-25 14:23:31 +00:00
Raushan Turganbay
3b5b9f6518 Fix processing tests (#40379)
* fix tests

* skip failing test in generation as well

* grounding dino was overwritten

* one more overwritten code

* clear comment
2025-08-25 14:50:54 +02:00
jiqing-feng
a0a37b3250 Gpt oss optim (#40304)
* enable fast index selecting

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update model

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix gpt-oss tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix check tensor

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-08-25 14:36:33 +02:00
ρrαnαm
d73181b3fc Fix UnboundLocalError in WER metric computation (#40402)
Renamed wer metric variable to wer_metric to avoid naming conflict
with local variable assignment in compute_metrics function.

Co-authored-by: pranam-gf <pranam@goodfin.com>
2025-08-25 12:02:22 +00:00
Prawal Sharma
11e12a715a Fix typo: 'seperator' to 'separator' in variable names (#40389)
Fixed 4 instances of the typo "seperator" → "separator" in variable names:
- 2 instances in src/transformers/models/shieldgemma2/convert_shieldgemma2_weights_orbax_to_hf.py
- 2 instances in src/transformers/models/gemma3/convert_gemma3_weights_orbax_to_hf.py

These typos were in variable names used for parsing path components in weight conversion scripts.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-25 11:56:30 +00:00
Cyril Vallez
40299134a8 Fix CI (hunyuan moe does not support fullgraph) (#40423)
fix flag
2025-08-25 12:01:28 +02:00
Olumayowa Akinkuehinmi
a2b37bfd58 Fix typo: 'casual' -> 'causal' in code and documentation (#40371) (#40407) 2025-08-25 09:32:15 +00:00
Joao Gante
0031c044f8 [docs] flax/jax purge (#40372)
flax/jax purge
2025-08-25 10:25:00 +01:00
Du Wenjie
14b89fed24 fix to accept cumulative_seqlens from TransformersKwargs in FA (#40194)
* fix to the typings which are unmatched to FA function signature

cumulative_seqlens_q/k -> cu_seq_lens_q/k:
- in the FlashAttentionKwargs in modeling_flash_attention_utils
- in the TransformersKwargs in generic
- in the PagedAttentionArgs in continuous_batching

It is **BC**, because they are created in `ContinuousBatchProcessor.setup_static_tensors:L762`, used in `ContinuousBatchingManager._model_forward:L1233` and destroyed with `ContinuousBatchProcessor`

* format changes by ruff

* Update src/transformers/integrations/flash_paged.py

unused function arg in `PagedAttentionCache.update`

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* revert continuous_batching signiture, which is more meaningful

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-08-25 11:00:13 +02:00
Pablo Montalvo
ba095d387d 🧹 🧹 🧹 Get set decoder cleanup (#39509)
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* this should be explicit

* fix Nth ESM exception

* tryout decoder

* this as well

* revert again

* 🧠

* aaah ESM has two modelings aaah

* broom broom

* format

* wrong copies

* copies

* modular cleanups

* format

* modularities

* wrong mergefix

* seriously

* align with new model

* new model
2025-08-25 10:57:56 +02:00
Cyril Vallez
2c55c7fc94 Reactivate a lot of tests skipped for no reason anymore (#40378)
* reactivate all the tests

* some tests still failing
2025-08-25 10:44:43 +02:00