Commit Graph

5270 Commits

Author SHA1 Message Date
Cyril Vallez
18a7c29ff8 More robust tied weight test (#39681)
* Update test_modeling_common.py

* remove old ones

* Update test_modeling_common.py

* Update test_modeling_common.py

* add

* Update test_modeling_musicgen_melody.py
2025-07-25 22:03:21 +02:00
Garrett Goon
97f8c71f52 Add padding-free to Granite hybrid moe models (#39677)
* start fixing kwarg handling

* fmt

* updates padding free tests

* docs

* add missing kwargs modeling_granitemoe.py

* run modular util

* rm unrelated changes from modular util
2025-07-25 20:10:50 +02:00
Cyril Vallez
d6e9f71a6e Fix tied weight test (#39680)
Update test_modeling_common.py
2025-07-25 20:09:33 +02:00
lgai-exaone
c06d4cd6ce Add EXAONE 4.0 model (#39129)
* Add EXAONE 4.0 model

* Refactor EXAONE 4.0 modeling code

* Fix cache slicing on SWA + FA2

* Fix cache slicing on FA2 + HybridCache

* Update EXAONE 4.0 modeling code for main branch

* Update o_proj for asymmetric projection

* Address PR feedback

* Add EXAONE 4.0 docs

* Update EXAONE 4.0 modeling code for main branch

* update

* fix updates

* updates

* fix

* fix

* fix

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-25 19:58:28 +02:00
Park Woorak
3e4d584a5b Support typing.Literal as type of tool parameters or return value (#39633)
* support `typing.Literal` as type of tool parameters

* validate the `args` of `typing.Literal` roughly

* add test to get json schema for `typing.Literal` type hint

* fix: add `"type"` attribute to the parsed result of `typing.Literal`

* test: add argument `booleanish` to test multi-type literal

* style: auto fixup
2025-07-25 17:51:28 +00:00
Arthur
300d42a43e Add ep (#39501)
* EP + updates

Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>

* remove unrelated change

* not working yet but let's see where it goes!

* update the api a bit

* udpate

* where I am at for now

* fix ep

* refactor the API

* yups

* fix

* fixup

* clean modeling

* just support llama4 for now!

* properly avoid

* fix

* nits

* Update src/transformers/models/llama4/modeling_llama4.py

* Update src/transformers/integrations/tensor_parallel.py

* style

* ,,,,

* update

---------

Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>
2025-07-25 19:46:17 +02:00
Cyril Vallez
6630c5b714 Add xlstm model (#39665)
* Add xLSTM cleanly with optimizations.

* Fix style.

* Fix modeling test.

* Make xLSTM package optional.

* Fix: Update torch version check.

* Fix: Bad variable naming in test.

* Fix: Import structure cleaning with Ruff.

* Fix: Update docstrings.

* Fix: Mitigate unused config attr tests by explicit usage.

* Fix: Skip tests, if xlstm library is not installed.

* Feat: Enable longer context window for inference by chunking.

* Fix: Make training test pass by lowering target accuracy.

* Chore: Increase test verbosity for failing generation test.

* Update docs/source/en/model_doc/xlstm.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix: Make xlstm available even without CUDA.

* Chore: Remove unnecessary import.

* Fix: Remove BOS insertion.

* Chore: Improve xLSTMCache documentation.

* Integrate basic xLSTM fallback code.

* Chore: Remove unnecessary import.

* Chore: Remove duplicate LayerNorm.

* chore: update copyright, minor reformatting

* fix: refactor mLSTMStateType due to missing torch import

* fix: add missing import

* Chore: Replace einops.

* fix: apply ruff formatting

* fix: run `make fix-copies` to re-generate dummy_pt_objects.py

* fix: make type hints Python 3.9 compatible

* fix: remove obsolete import

* fix: remove obsolete method from docs

* chore: remove obsolete `force_bos_token_insert` from config

* Chore: Remove duplicated xLSTMCache class.

* Fix: Formatting of modeling_xlstm.py

* Chore: Remove xlstm package requirement from test. Re-add update_rnn_state.

* Fix: Update xLSTMCache docstring.

* Feat: Add proper initialization of xLSTM.

* Chore: Re-format files.

* Chore: Adapt format.

* Fix: xLSTMCache import restructuring.

* Fix: Add __all__ lists to modeling and configuration files.

* Chore: Reformat.

* Fix: Remove unnecessary update_rnn_state function.

* Fix: Undo test accuracy quickfix.

* Fix: Update copyright year, remvoe config copy.

* Chore: Flatten all internal configs to xLSTMConfig.

* Fix: Unused config variables check.

* Chore: Remove unnecessary imports.

* Fix: Unify xlstm cache argument from batch_size to max_batch_size.

* Chore: Remove bad default arg value for xLSTMCache.

* Chore: Rename core configuration arguments to HF default in xLSTM.

* Chore: Fix formatting.

* Fix: xLSTM Cache config access.

* Fix: Update xlstm tests for config update.

* Feat: Re-add embbeding_dim, num_blocks config options for compat with xLSTM-7B.

* Fix: Configuration xLSTM python3.9 syntax.

* Fix: Difference to main in test_utils.py assertion.

* Fix: Bad syntax in xlstm config for python3.9.

* Fix: xLSTMConfig docstring.

* Fix: xLSTMConfig docstring.

* Fix typing issues in xLSTM and BeiT, Paligemma.

* Fix: Exclude xLSTM from test cache utils.

* Chore: Fix style.

* Chore: Fix format.

* Chore: Remove unnecessary LayerNorm, NormLayer layer abstractions.

* Chore: Remove asserts and replace with ValueErrors.

* Chore: Update __init__.py structure of xLSTM.

* Chore: Clean xLSTM initialization of weights.

* Fix index names in modeling_xlstm.py

* Update xlstm model test typing annotations.

* Fix: Remove all asserts.

* Revert changes to the main __init__.py

* Fix: Move xLSTMCache to modeling_xlstm.py

* Fix: Remove xLSTMForCausalLM mapping from modeling_auto.py

* Remove xLSTMCache from dummy_pt_objects.py

* Fix: Remove extended torchdynamo compilation check integrating cuda graph captures.

* Revert test_cache_utils.py xLSTM change.

* Fix: Move xLSTM init functions before init call.

* Remove xLSTMCache from generation utils.

* Fix: Clean xLSTM init functionality for recursive calls.

* Fix: Move xLSTMCache before its first call.

* Fix formatting.

* Add partial docstring for xLSTMModel forward.

* Fix xLSTMCache docstring in xLSTMModel.

* Remove xLSTMCache from public documentation. Update auto_docstring.

* Remove all agressive shape comments

* style

* Fix names

* simplify

* remove output_hidden_states

* Update modeling_xlstm.py

* Update modeling_xlstm.py

* Update test_modeling_xlstm.py

* Update modeling_xlstm.py

* Update modeling_xlstm.py

* fix

* fix

* style

* style

---------

Co-authored-by: Korbinian Poeppel <korbinian.poeppel@nx-ai.com>
Co-authored-by: Korbinian Pöppel <37810656+kpoeppel@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sebastian Böck <sebastian.boeck@nx-ai.com>
Co-authored-by: Korbinian Poeppel <poeppel@ml.jku.at>
2025-07-25 19:39:17 +02:00
Armaghan Shakir
69cff312f5 Add support for DeepseekAI's DeepseekVL (#36248)
* upload initial code

* update deepseek-vl adaptor

* update hierarchy of vision model classes

* udpate aligner model

* add text model

* Added Image Processor

* Added Image Processor

* Added Image Processor

* apply masks

* remove projection; add aligner

* remove interpolate_pos_encoding

* remove unused params in config

* cleaning

* Add the __init__ file

* added processing deepseek_vl class

* modified the deepseek-vl processor

* modified the deepseek-vl processor

* update __init__

* Update the image processor class name

* Added Deepseek to src/transformers/__init__.py file

* Added Deepseek to image_processing_auto.py

* update the __init__ file

* update deepseek_vl image processor

* Update Deepseek Processor

* upload fast image processor

* Revert "upload fast image processor"

This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4.

* update image processor

* flatten heirarchy

* remove DeepseekVLModel

* major update (complete modeling)

* auto modeling and other files

* formatting

* fix quality

* replace torchvision in modeling

* set default do_normalize to False

* add fast image processor template using tool

* update image processors

* add fast image processor to other files

* update liscense

* Added deepseek image testcases

* update image test

* update processor

* write CHAT_TEMPLATE

* update model for processor

* fix processor

* minor fixes and formatting

* fix image processing and tests

* fix interpolation in sam

* fix output_attentions in DeepseekVLModel

* upload test_modeling

* fix tests because of vocab size

* set use_high_res_vision=False in tests

* fix all modeling tests

* fix styling

* remove explicit background_color from image processors

* added test_processor

* added test_processor

* fix processor tests

* update docs

* update docs

* update docs

* update conversion script

* Fixed typos

* minor fixes from review

- remove model_id comments in examples
- remove from pre-trained auto mapping
- move to image-text-to-text from vision-to-seq in auto mapping
- add image_token_index to __init__ for config
- remove outdated temporary config in conversion script
- update example to use chat_template in docstring example
- update liscense 2021->2025

* fix type in config docstring

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* update get_image_features

* fix config

* improve DeepseekVLImageProcessor.preprocess

* return image_hidden_states

* use AutoTokenizer and AutoImageProcessor in Processor

* fix model outputs

* make num_image_tokens configurable

* fix docstring of processor

* move system prompt to chat template

* fix repo consistency

* fix return_dict

* replace SamVisionEncoder with SamVisionModel

* update to remove deepcopy

* 🛠️  Major Architectural Changes (Adds DeepseekVLHybrid)

* fix quality checks

* add missing hybrid in auto modeling

* run make style

* update sam_hq

* update high_res_size in test

* update docs following #36979

* update code with auto_docstring

* update conversion scripts

* fix style

* fix failing test because of tuple

* set weights_only=True in conversion script

* use safetensors.torch.load_file instead of torch.load in conversion script

* make output_dir optional in conversion script

* fix code snippets in docs (now the examples work fine)

* integration tests for DeepseekVL

* update expected texts

* make style

* integration tests for DeepseekVLHybrid

* fix class name

* update expected texts for hybrid

* run "make style"

* update since changes in main

* run make-style

* nits since changes in main

* undo changes in sam

* fix tests

* fix tests; update with main

* update with main: output_attention/output_hidden_states

* fix copied part in deepseek_vl

* run fix-copies

* fix output_hidden_states

* sam: fix _init_weigths

* use modular for DeepseekVL

* make image processor more modular

* modular: use JanusPreTrainedModel

* janus: provide kwargs in loss

* update processors in conversion script

* Revert "sam: fix _init_weigths"

This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27.

* run fix-copies

---------

Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
2025-07-25 19:18:50 +02:00
Xibin Bayes Zhou
45c7bfb157 Add evolla rebase main (#36232)
* add evolla

* adding protein encoder part

* add initial processing test

* save processor

* add docstring

* add evolla processor

* add two test

* change vision to protein

* change resampler to sequence_compressor

* change vision to protein

* initial update for llama

* add initial update for llamaForCausalLM

* add `test_processor`, `test_saprot_output`, `test_protein_encoder_output`

* change evolla, but still working on it

* add test_single_forward

* pass test_attention_outputs

* pass test_hidden_states_output

* pass test_save_load and test_from_pretrained_no_checkpoint

* pass test_cpu_offload

* skip some tests

* update new progress

* skip test_model_is_small

* pass test_model_weights_reload_no_missing_tied_weights

* pass test_model_get_set_embeddings

* pass test_cpu_offload

* skip test_resize_embeddings

* add pipeline_model_mapping

* remote old setUp

* pass processor save_pretrained and load_pretrained

* remove pooling layer

* pass test_inputs_embeds_matches_input_ids

* pass test_model_is_small

* pass test_attention_outputs

* pass test_initialization

* pass test_model_get_set_embeddings

* pass test_single_forward

* skip test_disk_offload_bin and test_disk_offload_safetensors

* fix most tests

* pass test_protein_encoder_output

* remove useless code

* add EvollaForProteinText2Text

* pass test_saprot_output

* pass all EvollaModelTest test and remove processor test

* add processor test to its own file

* skip is_training since esm skipped it and the saprot code causes error when setting is_training True

* pass processor tests

* solve all except config

* pass most cases

* change init

* add doc to `configuration_evolla.py`

* remove image_processing test

* remove extra processor test

* remove extra modules

* remove extra modules

* change all configs into one config

* pass all evolla test

* pass `make fixup`

* update short summary

* update Evolla-10B-hf

* pass check_dummies.py and check_code_quality

* fix  `tests/models/auto/test_tokenization_auto.py::AutoTokenizerTest::test_model_name_edge_cases_in_mappings`

* remove dummy codes

* change format

* fix llava issue

* update format

* update to solve llama3 access issue

* update to make forward right

* solve processor save load problem from instructblip solution

* remove unexpected file

* skip `test_generation_tester_mixin_inheritance`

* add `test_single_forward_correct` and `test_inference_natural_language_protein_reasoning`

* add `modular_evolla.py`

* solved issue #36362

* run `make fixup`

* update modular

* solve float32 training

* add fix

* solve `utils/check_docstrings.py`

* update

* update

* update

* remove other files and replace sequential and einsum

* add use case in document

* update the models

* update model

* change some wrong code

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix issues mentioned in PR

* update style and rearrange the placement

* fix return_dict argument issue

* solve SaProtConfig issue

* Solve EvollaSaProtRotaryEmbedding issue

* solve attention_mask issue

* solve almosst all issues

* make style

* update config

* remove unrelated pickle file

* delete pickle files

* fix config

* simplify a lot

* remove past k-v from encoder

* continue work

* style

* skip it from init

* fix init

* fix init

* simplify more

* fill in docstrings

* change test for generation

* skip test

* fix style

---------

Co-authored-by: Chenchen Han <13980209828@163.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-25 19:11:57 +02:00
Yih-Dar
2670da66ce update expected outputs for whisper after #38778 (#39304)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-25 16:48:10 +00:00
Yih-Dar
4b125e2993 fix kyutai tests (#39416)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-07-25 18:42:04 +02:00
Anton Vlasjuk
a91653561e [Ernie 4.5] Post merge adaptations (#39664)
* ernie 4.5 fixes

* Apply style fixes

* fix

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-25 17:36:18 +02:00
Joao Gante
5d0ba3e479 [CI] revert device in test_export_static_cache (#39662)
* revert device

* add todo
2025-07-25 15:36:12 +00:00
Yoni Gozlan
17f02102c5 🚨[Fast Image Processor] Force Fast Image Processor for Qwen2_VL/2_5_VL + Refactor (#39591)
* init

* Force qwen2VL image proc to fast

* refactor qwen2 vl fast

* fix copies

* Update after PR review and update tests to use return_tensors="pt"

* fix processor tests

* add BC for min pixels/max pixels
2025-07-25 11:11:28 -04:00
Lysandre Debut
f90de364c2 Rename huggingface_cli to hf (#39630)
* Rename huggingface_cli to hf

* hfh
2025-07-25 14:10:04 +02:00
revanth
3b3f9c0c46 fix(voxtral): correct typo in apply_transcription_request (#39572)
* fix(voxtral): correct typo in apply_transcription_request

* temporary wrapper: apply_transcrition_request

* Update processing_voxtral.py

* style: sort imports in processing_voxtral.py

* docs(voxtral): fix typo in voxtral.md

* make style

* doc update

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-07-25 12:09:44 +00:00
Raushan Turganbay
c392d47c9b [attention] fix test for packed padfree masking (#39582)
* fix most tests

* skip a few more tests

* address comments

* fix chameleon tests

* forgot to uncomment

* qwen has its own tests with images, rename it as well
2025-07-25 07:44:52 +00:00
lmarshall12
565c035a2e Add owlv2 fast processor (#39041)
* add owlv2 fast image processor

* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class

* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class

* change references to owlVit to owlv2 in docstrings for post process methods

* change type hints from List, Dict, Tuple to list, dict, tuple

* remove unused typing imports

* add disable grouping argument to group images by shape

* run make quality and repo-consistency

* use modular

* fix auto_docstring

---------

Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-25 02:40:11 +00:00
eustlb
ad6fd2da0e [Voxtral] values for A10 runners (#39605)
* values for A10 runners

* make

* as for Llava

* does not apply to Voxtral
2025-07-24 18:52:35 +02:00
StevenBucaille
12b612830d [efficientloftr] fix model_id in tests (#39621)
fix: wrong EfficientLoFTR model id in tests
2025-07-24 10:41:06 +01:00
Eric Bezzam
c5a80dd6c4 🔴 Fix EnCodec internals and integration tests (#39431)
* EnCodec fixes and update integration tests.

* Apply padding mask when normalize is False.

* Update comment of copied function.

* Fix padding mask within modeling.

* Revert padding function.

* Simplify handling of padding_mask.

* Address variable codebook size.

* Add output for padding for consistency with original model, fix docstrings.

* last_frame_pad_length as int

* Update example code.

* Improve docstring/comments.

* Shorten expected output.

* Consistent docstring.

* Parameterize tests.

* Properties for derived variables.

* Update expected outputs from GitHub runner.

* Consistent outputs with runner GPUs.
2025-07-23 19:39:27 +02:00
Eric Bezzam
7a4e2e7868 Fix DAC integration tests and checkpoint conversion. (#39313)
* Fix DAC (slow) integration tests.

* Fix DAC conversion.

* Address comments

* Sync with main, uncomment nn.utils.parametrizations.weight_norm.

* Update DAC integration tests with expected outputs.

* Added info about encoder/decoder error and longer decoder outputs.

* Parameterize tests.

* Set expected values to GitHub runners.
2025-07-23 19:21:26 +02:00
Lysandre Debut
a0e5a7d34b Transformers serve VLM (#39454)
* Add support for VLMs in Transformers Serve

* Raushan comments

* Update src/transformers/commands/serving.py

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

* Quick fix

* CPU -> Auto

* Update src/transformers/commands/serving.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixup

---------

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-23 17:03:18 +02:00
Pablo Montalvo
ea56eb6bed Fix important models CI (#39576)
* relax test boundaries and fix from config

* eager is always supported.
2025-07-23 16:24:29 +02:00
llbdyiu66
a62f65a989 fix moe routing_weights (#39581)
* fix moe routing_weights

* fix ernie4_5_moe routing_weights

* fix integration test

---------

Co-authored-by: llbdyiu66 <llbdyiu66@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-07-23 11:20:23 +00:00
Andrei Panferov
623ab01039 FP-Quant support (#38696)
* quartet

* quartet qat -> quartet

* format

* bf16 backward

* interfaces

* forward_method

* quartet -> fp_quant

* style

* List -> list

* list typing

* fixed format and annotations

* test_fp_quant

* docstrings and default dtypes

* better docstring and removed noop checks

* docs

* pseudoquantization support to test on non-blackwell

* pseudoquant

* Pseudoquant docs

* Update docs/source/en/quantization/fp_quant.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization/fp_quant.md

* Update docs/source/en/quantization/fp_quant.md

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* small test fixes

* dockerfile update

* spec link

* removed `_process_model_after_weight_loading`

* toctree

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-07-23 11:41:10 +02:00
Raushan Turganbay
eb1a007f7f Rename supports_static_cache to can_compile_fullgraph (#39505)
* update all

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* apply suggestions

* fix copies

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-23 09:35:18 +00:00
Cyril Vallez
5dba4bc7b2 Fix DynamicCache and simplify Cache classes a bit (#39590)
* fix

* use kwargs

* simplify

* Update cache_utils.py

* Update cache_utils.py

* Update test_cache_utils.py

* fix

* style
2025-07-23 10:13:45 +02:00
Sangbum Daniel Choi
d9b35c635e Mask2former & Maskformer Fast Image Processor (#35685)
* add maskformerfast

* test

* revert do_reduce_labels and add testing

* make style & fix-copies

* add mask2former and make fix-copies
TO DO:
	add test for mask2former

* make fix-copies

* fill docstring

* enable mask2former fast processor

* python utils/custom_init_isort.py

* make fix-copies

* fix PR's comments

* modular file update

* add license

* make style

* modular file

* make fix-copies

* merge

* temp commit

* finish up maskformer mask2former

* remove zero shot examples

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-23 02:47:47 +00:00
space_samurai
c6d0500d15 [WIP] Add OneformerFastImageProcessor (#38343)
* [WIP] OneformerFastImageProcessor

* update init

* Fully working oneformer image processor fast

* change Nearest to Neares exact interpolation where needed

* fix doc

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-22 20:41:39 +00:00
Kashif Rasul
2936902a76 [Paged-Attention] Handle continuous batching for repetition penalty (#39457)
* Handle continuous batching for repetition penalty

* fix last scores and with token mask creation

* add test

* Update src/transformers/generation/continuous_batching.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/logits_process.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix formatting

* remove unneeded cast

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-22 18:13:40 +02:00
Manuel de Prada Corral
c338fd43b0 [cache refactor] Move all the caching logic to a per-layer approach (#39106)
* Squash for refactor: Replace monolithic cache classes with modular LayeredCache (#38077)

- Introduces CacheLayer and Cache base classes
- Ports Static, Dynamic, Offloaded, Quantized, Hybrid, etc. to use layers
- Implements method/attr dispatch across layers to reduce boilerplate
- Adds CacheProcessor hooks for offloading, quantization, etc.
- Updates and passes tests

* fix quantized, add tests

* remove CacheProcessorList

* raushan review, arthur review

* joao review: minor things

* remove cache configs, make CacheLayer a mixin (joaos review)

* back to storage inside Cache()

* remove cachebase for decorator

* no more __getattr__

* fix tests

* joaos review except docs

* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`

More verbose exceptions in `fix_docstring` on docstring formatting issues.

* Revert "back to storage inside Cache()"

This reverts commit 27916bc2737806bf849ce2148cb1e66d59573913.

* cyril review

* simplify cache export

* fix lfm2 cache

* HybridChunked to layer

* BC proxy object for cache.key_cache[i]=...

* reorder classes

* bfff come on LFM2

* better tests for hybrid and hybridChunked

* complete coverage for hybrid chunked caches (prefill chunking)

* reimplementing HybridChunked

* cyril review

* fix ci

* docs for cache refactor

* docs

* oopsie

* oopsie

* fix after merge

* cyril review

* arthur review

* opsie

* fix lfm2

* opsie2
2025-07-22 16:10:25 +02:00
Ákos Hadnagy
015b62bf3e Add AMD GPU expectations for LLaVA tests (#39486)
* Add AMD GPU expectation to llava tests

* FMT

* Remove debug print

* Address review  comments
2025-07-22 14:01:54 +00:00
Arthur
efceeaf267 Kernels flash attn (#39474)
* use partial to wrap around `transformers` utils!

* try to refactor?

* revert one wrong change

* just a nit

* push

* reverter watever was wrong!

* some nits

* fixes when there is no attention mask

* bring the licence back

* some fixes

* nit

* style

* remove prints

* correct dtype

* fa flags for testing

* update

* use paged attention if requested!

* updates

* a clone was needed, not sure why

* automatically create cu seq lens when input is flash, this at least makes sure layers don't re-compute

* simplify and improve?

* flash attention is kinda broken on recent cuda version so allow the opportunity to use something else

* fix!

* protect kernels import

* update

* properly parse generation config being passed

* revert and update

* add two tests

* some fixes

* fix test FA2

* takes comment into account

* fixup

* revert changes

* revert the clone, it is only needed because the metal kernel is not doing it?

* [docs] update attention implementation and cache docs (#39547)

* update docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* applu suggestions

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix mps on our side for now

* Update src/transformers/integrations/flash_paged.py

* no qa

---------

Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 15:41:06 +02:00
Ákos Hadnagy
b62557e712 Add AMD expectations to Mistral3 tests (#39481)
Add AMD expectations to mistral3 tests
2025-07-22 15:40:16 +02:00
Ákos Hadnagy
ef99537f37 Add AMD test expectations to DETR model (#39539)
* Add AMD test expectations to DETR model

* Fix baseline expectation

* Address review comments

* Make formatting a bit more consistent
2025-07-22 12:07:10 +00:00
Dominik Baran
30567c28e8 [timm_wrapper] add support for gradient checkpointing (#39287)
* feat: add support for gradient checkpointing in TimmWrapperModel and TimmWrapperForImageClassification

* ruff fix

* refactor + add test for not supported model

* ruff

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-07-22 11:07:52 +00:00
StevenBucaille
a88ea9cbc8 Add EfficientLoFTR model (#36355)
* initial commit

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: various typos, typehints, refactors from suggestions

* fix: fine_matching method

* Added EfficientLoFTRModel and AutoModelForKeypointMatching class

* fix: got rid of compilation breaking instructions

* docs: added todo for plot

* fix: used correct hub repo

* docs: added comments

* fix: run modular

* doc: added PyTorch badge

* fix: model repo typo in config

* fix: make modular

* fix: removed mask values from outputs

* feat: added plot_keypoint_matching to EfficientLoFTRImageProcessor

* feat: added SuperGlueForKeypointMatching to AutoModelForKeypointMatching list

* fix: reformat

* refactor: renamed aggregation_sizes config parameter into q, kv aggregation kernel size and stride

* doc: added q, kv aggregation kernel size and stride doc to config

* refactor: converted efficientloftr implementation from modular to copied from mechanism

* tests: overwrote batching_equivalence for "keypoints" specific tests

* fix: changed EfficientLoFTRConfig import in test_modeling_rope_utils

* fix: make fix-copies

* fix: make style

* fix: update rope function to make meta tests pass

* fix: rename plot_keypoint_matching to visualize_output for clarity

* refactor: optimize image pair processing by removing redundant target size calculations

* feat: add EfficientLoFTRImageProcessor to image processor mapping

* refactor: removed logger and updated attention forward

* refactor: added auto_docstring and can_return_tuple decorators

* refactor: update type imports

* refactor: update type hints from List/Dict to list/dict for consistency

* refactor: update MODEL_MAPPING_NAMES and __all__ to include LightGlue and AutoModelForKeypointMatching

* fix: change type hint for size parameter in EfficientLoFTRImageProcessor to Optional[dict]

* fix typing

* fix some typing issues

* nit

* a few more typehint fixes

* Remove output_attentions and output_hidden_states from modeling code

* else -> elif to support efficientloftr

* nit

* tests: added EfficientLoFTR image processor tests

* refactor: reorder functions

* chore: update copyright year in EfficientLoFTR test file

* Use default rope

* Add docs

* Update visualization method

* fix doc order

* remove 2d rope test

* Update src/transformers/models/efficientloftr/modeling_efficientloftr.py

* fix docs

* Update src/transformers/models/efficientloftr/image_processing_efficientloftr.py

* update gradient

* refactor: removed unused codepath

* Add motivation to keep postprocessing in modeling code

* refactor: removed unnecessary variable declarations

* docs: use load_image from image_utils

* refactor: moved stage in and out channels computation to configuration

* refactor: set an intermediate_size parameter to be more explicit

* refactor: removed all mentions of attention masks as they are not used

* refactor: moved position_embeddings to be computed once in the model instead of every layer

* refactor: removed unnecessary hidden expansion parameter from config

* refactor: removed completely hidden expansions

* refactor: removed position embeddings slice function

* tests: fixed broken tests because of previous commit

* fix is_grayscale typehint

* not refactoring

* not renaming

* move h/w to embeddings class

* Precompute embeddings in init

* fix: replaced cuda device in convert script to accelerate device

* fix: replaced stevenbucaille repo to zju-community

* Remove accelerator.device from conversion script

* refactor: moved parameter computation in configuration instead of figuring it out when instantiating a Module

* fix: removed unused attributes in configuration

* fix: missing self

* fix: refactoring and tests

* fix: make style

---------

Co-authored-by: steven <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-07-22 10:53:16 +01:00
Raushan Turganbay
3bc726b381 [gemma3] fix bidirectional image mask (#39396)
* fix gemma3 mask

* make compile happy, and use only torch ops

* no full attention between images

* update tests

* fix tests

* add a fast test
2025-07-22 10:04:56 +02:00
Anton Vlasjuk
b4115a426e [Ernie 4.5] Add ernie text models (#39228)
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
* init

* copied from remote

* add proper structure and llama like structure

* fixup

* revert to state that works

* get closer to llama

* slow and steady

* some removal

* masks work

* it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm

* nice

* getting closer

* closer to transformers style

* let's simplify this, batching works now

* simplified

* working version with modular

* it is indeed the rotation per weights, make it complete llama style

* cleanup conversion, next to look at -> tokenizer

* remove llama artefacts

* fix modeling tests (common ones)

* style

* integration test + first look into tokenization (will need more work, focussing on modeling other models first)

* style

* working moe version, based on remote

* lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view)

* more cleanup

* refactor namings and remove addition forXXX classes

* our moe won't cut it it seems, correction bias seems to be missing in remote code version

* tokenization change (remote)

* our moe version works when adding normalization :D

* cleanup moe

* nits

* cleanup modeling -> let's get to modular next

* style

* modular v1

* minor things + attempt at conversion (which doesn't work)

* no conversion follow glm, fixup modular and other nits

* modular cleanup

* fixes

* tests, tests, tests + some moe dtype forcing

* simplify modular, fix fatal fa2 bug, remaining tests

* fix import issue?

* some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float

* fix sdpa test, load on init dtype only

* fixup post merge

* style

* fix doc links

* tokenization cleanup beginnings

* simplify tokenizer by a lot as its basically llama

* tokenizer is full llama with different defaults + extra special tokens

* sync og special tokens of ernie

* fix decoding with numbers (also in remote done what a timing), begin of tok tests

* align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama)

* nits

* docs

* my daily post merge it is

* check

* tokenization update with explanations and conversion script

* review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio)

* post merge fixes

* fixup tokenization, llama fast is the way to go

* more fixups

* check

* import fixes

* correction bias following the paddle code

* fix

* fix TP plan, fix correction bias sharding during forward

* style

* whoops

* fix tied weights

* docs and last nit

* license

* flasky tests

* move repo id, update when merged on the hub
2025-07-21 19:51:49 +02:00
Pablo Montalvo
69b158260f Refactor embedding input/output getter/setter (#39339)
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* revert again

* 🧠

* aaah ESM has two modelings aaah

* add informative but short comment

* add `input_embed_layer` mixin attribute

* style

* walrus has low precedence

* modular fix

* this was breaking parser
2025-07-21 18:18:14 +02:00
Manuel de Prada Corral
1aa7256f01 Refactor MambaCache to modeling_mamba.py (#38086)
* Refactor MambaCache to modeling_mamba.py (parity with Zamba)

* ruff

* fix dummies

* update

* update

* remove mamba ref in cache tests

* remove cache_implementation from tests

* update

* ruff

* ruff

* sneaky regression

* model consistency

* fix test_multi_gpu_data_parallel_forward

* fix falcon slow tests

* ruff

* ruff

* add sample false

* try to fix slow tests

* Revert "fix test_multi_gpu_data_parallel_forward"

This reverts commit 66b7162c7c5c5ce8a73ccf48cffc8a96343ebb33.

* fix tests on nvidia t4, remove dataparallel tests from mamba

* ruff

* remove DDP tests from mamba and falcon_mamba

* add explicit error for MambaCache

* mamba2 also needs to init cache in prepare_inputs_for_generation

* ruff

* ruff

* move MambaCache to its own file

* ruff

* unprotected import fix

* another attempt to fix unprotected imports

* Revert "another attempt to fix unprotected imports"

This reverts commit 2338354fcab630de5899321f5daced5fb312c2a2.

* fixing unprotected import, attempt 3

* Update src/transformers/cache_utils.py

* ruff's fault

* fix arthur review

* modular falcon mamba

* found a hack

* fix config docs

* fix docs

* add export info

* merge modular falcon branch

* oopsie

* fix fast path failing

* new approach

* oopsie

* fix types

* Revert new pragma in modular

This reverts commit 80b1cf160ee251536f07c40b8a0857d499e70db6.

* trying another modular workaround

* review & fix ci

* oopsie

* clear prepare_inputs on mamba/mamba2/falcon_mamba
2025-07-21 14:59:36 +02:00
Wang, Yi
9323d0873c use the enable_gqa param in torch.nn.functional.scaled_dot_product_at… (#39412)
* use the enable_gqa param in torch.nn.functional.scaled_dot_product_attention

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* ci failure fix

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add check

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix ci failure

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine code, extend to cuda

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine code

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix review comments

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine the PR

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 14:46:43 +02:00
BUI Van Tuan
6b3a1f2f51 Fix missing initializations for models created in 2023 (#39239)
* fix SwiftFormer

* fix Kosmos2

* fix Owlv2

* fix Sam

* fix Vits

* fix Pvt

* fix MobileViTV2

* fix PatchTST

* fix Bros

* fix Informer

* fix BridgeTower

* fix Mra and Yoso

* fix Rwkv

* fix EfficientNet

* fix NllbMoe

* fix Tvp

* fix Clap

* fix Autoformer

* fix SwiftFormer

* fix Mgpstr

* fix Align

* fix VitMatte

* fix SpeechT5

* add conditional check for parameters

* fix SpeechT5

* fix TimmBackbone and Clvp

* fix SwiftFormer

* fix SeamlessM4T and SeamlessM4Tv2

* fix Align

* fix Owlv2 and OwlViT

* add reviewed changes

* add reviewed changes

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 14:43:52 +02:00
Sai-Suraj-27
970d9a75ce Raise TypeError instead of ValueError for invalid types (#38660)
* Raise TypeError instead of ValueError for invalid types.

* Removed un-necessary changes.

* Resolved conflicts

* Code quality

* Fix failing tests.

* Fix failing tests.
2025-07-21 12:42:00 +00:00
Yuanyuan Chen
822c5e45b2 Fix pylint warnings (#39477)
* Fix pylint warnings

Signed-off-by: cyy <cyyever@outlook.com>

* Fix variable names

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-21 12:38:05 +00:00
Cyril Vallez
dc017cd763 Fix Qwen Omni integration test (#39553)
fix
2025-07-21 14:11:46 +02:00
Raushan Turganbay
8c102e2eb1 Rename _supports_flash_attn_2 in examples and tests (#39471)
* delete `_supports_flash_attn_2` from examples and tests

* simplify docs
2025-07-21 14:02:57 +02:00
Cyril Vallez
3a152e3a5c Fix the check in flex test (#39548)
* fix the check

* fix flags

* flags
2025-07-21 13:29:44 +02:00
Eric Bezzam
78fb2d2760 Fix bad tensor shape in failing Hubert test. (#39502)
Fix bad tensor shape in Hubert test.
2025-07-21 12:25:52 +01:00