Commit Graph

3517 Commits

Author SHA1 Message Date
Ethan Villarosa
ecbb5ee194 standardized BARThez model card (#39701)
* standardized barthez model card according to template

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* suggested changes to barthez model card

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-30 08:33:13 -07:00
Yana Mishula
551a89a4a3 Standardize CLAP model card format (#39738)
* Standardize CLAP model card format

* Apply review feedback

* Remove Resources section
2025-07-29 14:13:04 -07:00
StevenBucaille
da70b1389a docs: Update EfficientLoFTR documentation (#39620)
* docs: Update EfficientLoFTR documentation

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-29 13:54:44 -07:00
Joao Gante
33aa49df9d [docs] Ko doc fixes after toc update (#39660)
* update docs

* doc builder working

* make fixup
2025-07-29 17:05:26 +01:00
Jaehyeon Shin
1d061536cf 🌐 [i18n-KO] Translated how_to_hack_models.md to Korean (#39536)
* docs: ko: how_to_hack_models.md

* feat: nmt draft

* fix: manual edits
2025-07-29 08:09:16 -07:00
박종범
43fe41c0a8 🌐 [i18n-KO] Translated perf_train_gpu_one.md to Korean (#39552)
* docs: ko: perf_train_gpu_one.md

* feat: nmt draft

* fix: manual edits

* fix: Manually added missing backticks

* Update docs/source/ko/perf_train_gpu_one.md

fix: remove space between heading and GPU anchor

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/perf_train_gpu_one.md

fix: clarify table headers to indicate training speed boost and memory savings

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/perf_train_gpu_one.md

fix: improve readability

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/perf_train_gpu_one.md

fix : rephrase explanation of data preloading to improve readability

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2025-07-29 08:08:57 -07:00
Ahn Joon Sung
9f38763731 🌐 [i18n-KO] Translated pipeline_gradio.md to Korean (#39520)
* docs: ko: pipeline_gradio.md

* feat: nmt draft

* fix: manual edits

* docs: ko: pipeline_gradio.md
2025-07-29 08:04:30 -07:00
Lio (임승섭)
f72311796b 🌐 [i18n-KO] Translated tokenizer.md to Korean (#39532)
* docs: ko: tokenizer.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Yijun Lee <yijun-lee@users.noreply.github.com>

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* fix: resolve suggestions

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

---------

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
2025-07-29 08:04:14 -07:00
Kim Juwon
d346d46752 🌐 [i18n-KO] Translated tvp.md to Korean (#39578)
* docs: ko: tvp.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

---------

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
2025-07-29 08:04:00 -07:00
Ahnjj_DEV
2f59c15b33 🌐 [i18n-KO] Translated albert.md to Korean (#39524)
* docs: ko: albert.md

* feat: nmt draft

* fix: manual edits
2025-07-29 08:03:40 -07:00
Minseo Kim
98386dcee9 🌐 [i18n-KO] Translated main_classes/peft.md (#39515)
* docs: ko: main_classes/peft.md

* feat: nmt draft

* docs: add missing TOC to documentation for `PeftAdapterMixin` section

Added a table of contents (TOC) to the documentation, specifically for the `transformers.integrations.PeftAdapterMixin` section, following the structure and content outlined in [this link](https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin).

* fix: Improve naturalness of purpose expression in Korean

Changed '관리하기 위한' to '관리할 수 있도록' for more natural Korean expression when describing the purpose of providing functions.

* fix: Simplify plural form and make expression more concise

Changed '~할 수 없기 때문에' to '~할 수 없어' for more concise expression while maintaining clarity.

* fix: Replace technical term '주입' with more natural '적용'

Changed '주입할 수 없어' to '적용할 수 없어' for better readability.
Considered alternatives:

'삽입': Too literal translation of 'inject'
'입력': Could be misunderstood as data input
'통합': Implies merging two systems
'추가': Simple but less precise

'적용' was chosen as it's the most natural and widely used term in Korean technical documentation for this context.

* fix: update toctree path for PEFT to lowercase

Changed the toctree path from 'PEFT' (uppercase) to 'peft' (lowercase) to match the correct directory naming convention and prevent broken links.

* docs: update as per reviewer feedback after rebase
2025-07-29 08:03:17 -07:00
Ramesh
4f8f51be4e Add Fast Segformer Processor (#37024)
* Add Fast Segformer Processor

* Modified the params according to segformer model

* modified test_image_processing_Segformer_fast args

- removed redundant params like do_center_crop,center_crop which aren't present in the original segformer class

* added segmentation_maps processing logic form the slow segformer processing module with references from beitimageprocessing fast

* fixed code_quality

* added recommended fixes and tests to make sure everything processess smoothly

* Fixed SegmentationMapsLogic

- modified the preprocessing of segmentation maps to use tensors
- added batch support

* fixed some mismatched files

* modified the tolerance for tests

* use modular

* fix ci

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-28 19:22:32 +00:00
Avigyan Sinha
c353f2bb5e Superpoint fast image processor (#37804)
* feat: superpoint fast image processor

* fix: reran fast cli command to generate fast config

* feat: updated test cases

* fix: removed old model add

* fix: format fix

* Update src/transformers/models/superpoint/image_processing_superpoint_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* fix: ported to torch and made requested changes

* fix: removed changes to init

* fix: init fix

* fix: init format fix

* fixed testcases and ported to torch

* fix: format fixes

* failed
test case fix

* fix superpoint fast

* fix docstring

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-28 18:15:06 +00:00
jzhang533
02ea23cbde update ernie model card (#39657)
* update ernie model doc

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

* address ruff format error reported by ci

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

* address check_repository_consistency error reported by ci

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

---------

Signed-off-by: Zhang Jun <jzhang533@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-28 10:21:18 +00:00
Garrett Goon
97f8c71f52 Add padding-free to Granite hybrid moe models (#39677)
* start fixing kwarg handling

* fmt

* updates padding free tests

* docs

* add missing kwargs modeling_granitemoe.py

* run modular util

* rm unrelated changes from modular util
2025-07-25 20:10:50 +02:00
lgai-exaone
c06d4cd6ce Add EXAONE 4.0 model (#39129)
* Add EXAONE 4.0 model

* Refactor EXAONE 4.0 modeling code

* Fix cache slicing on SWA + FA2

* Fix cache slicing on FA2 + HybridCache

* Update EXAONE 4.0 modeling code for main branch

* Update o_proj for asymmetric projection

* Address PR feedback

* Add EXAONE 4.0 docs

* Update EXAONE 4.0 modeling code for main branch

* update

* fix updates

* updates

* fix

* fix

* fix

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-25 19:58:28 +02:00
Cyril Vallez
6630c5b714 Add xlstm model (#39665)
* Add xLSTM cleanly with optimizations.

* Fix style.

* Fix modeling test.

* Make xLSTM package optional.

* Fix: Update torch version check.

* Fix: Bad variable naming in test.

* Fix: Import structure cleaning with Ruff.

* Fix: Update docstrings.

* Fix: Mitigate unused config attr tests by explicit usage.

* Fix: Skip tests, if xlstm library is not installed.

* Feat: Enable longer context window for inference by chunking.

* Fix: Make training test pass by lowering target accuracy.

* Chore: Increase test verbosity for failing generation test.

* Update docs/source/en/model_doc/xlstm.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix: Make xlstm available even without CUDA.

* Chore: Remove unnecessary import.

* Fix: Remove BOS insertion.

* Chore: Improve xLSTMCache documentation.

* Integrate basic xLSTM fallback code.

* Chore: Remove unnecessary import.

* Chore: Remove duplicate LayerNorm.

* chore: update copyright, minor reformatting

* fix: refactor mLSTMStateType due to missing torch import

* fix: add missing import

* Chore: Replace einops.

* fix: apply ruff formatting

* fix: run `make fix-copies` to re-generate dummy_pt_objects.py

* fix: make type hints Python 3.9 compatible

* fix: remove obsolete import

* fix: remove obsolete method from docs

* chore: remove obsolete `force_bos_token_insert` from config

* Chore: Remove duplicated xLSTMCache class.

* Fix: Formatting of modeling_xlstm.py

* Chore: Remove xlstm package requirement from test. Re-add update_rnn_state.

* Fix: Update xLSTMCache docstring.

* Feat: Add proper initialization of xLSTM.

* Chore: Re-format files.

* Chore: Adapt format.

* Fix: xLSTMCache import restructuring.

* Fix: Add __all__ lists to modeling and configuration files.

* Chore: Reformat.

* Fix: Remove unnecessary update_rnn_state function.

* Fix: Undo test accuracy quickfix.

* Fix: Update copyright year, remvoe config copy.

* Chore: Flatten all internal configs to xLSTMConfig.

* Fix: Unused config variables check.

* Chore: Remove unnecessary imports.

* Fix: Unify xlstm cache argument from batch_size to max_batch_size.

* Chore: Remove bad default arg value for xLSTMCache.

* Chore: Rename core configuration arguments to HF default in xLSTM.

* Chore: Fix formatting.

* Fix: xLSTM Cache config access.

* Fix: Update xlstm tests for config update.

* Feat: Re-add embbeding_dim, num_blocks config options for compat with xLSTM-7B.

* Fix: Configuration xLSTM python3.9 syntax.

* Fix: Difference to main in test_utils.py assertion.

* Fix: Bad syntax in xlstm config for python3.9.

* Fix: xLSTMConfig docstring.

* Fix: xLSTMConfig docstring.

* Fix typing issues in xLSTM and BeiT, Paligemma.

* Fix: Exclude xLSTM from test cache utils.

* Chore: Fix style.

* Chore: Fix format.

* Chore: Remove unnecessary LayerNorm, NormLayer layer abstractions.

* Chore: Remove asserts and replace with ValueErrors.

* Chore: Update __init__.py structure of xLSTM.

* Chore: Clean xLSTM initialization of weights.

* Fix index names in modeling_xlstm.py

* Update xlstm model test typing annotations.

* Fix: Remove all asserts.

* Revert changes to the main __init__.py

* Fix: Move xLSTMCache to modeling_xlstm.py

* Fix: Remove xLSTMForCausalLM mapping from modeling_auto.py

* Remove xLSTMCache from dummy_pt_objects.py

* Fix: Remove extended torchdynamo compilation check integrating cuda graph captures.

* Revert test_cache_utils.py xLSTM change.

* Fix: Move xLSTM init functions before init call.

* Remove xLSTMCache from generation utils.

* Fix: Clean xLSTM init functionality for recursive calls.

* Fix: Move xLSTMCache before its first call.

* Fix formatting.

* Add partial docstring for xLSTMModel forward.

* Fix xLSTMCache docstring in xLSTMModel.

* Remove xLSTMCache from public documentation. Update auto_docstring.

* Remove all agressive shape comments

* style

* Fix names

* simplify

* remove output_hidden_states

* Update modeling_xlstm.py

* Update modeling_xlstm.py

* Update test_modeling_xlstm.py

* Update modeling_xlstm.py

* Update modeling_xlstm.py

* fix

* fix

* style

* style

---------

Co-authored-by: Korbinian Poeppel <korbinian.poeppel@nx-ai.com>
Co-authored-by: Korbinian Pöppel <37810656+kpoeppel@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sebastian Böck <sebastian.boeck@nx-ai.com>
Co-authored-by: Korbinian Poeppel <poeppel@ml.jku.at>
2025-07-25 19:39:17 +02:00
Armaghan Shakir
69cff312f5 Add support for DeepseekAI's DeepseekVL (#36248)
* upload initial code

* update deepseek-vl adaptor

* update hierarchy of vision model classes

* udpate aligner model

* add text model

* Added Image Processor

* Added Image Processor

* Added Image Processor

* apply masks

* remove projection; add aligner

* remove interpolate_pos_encoding

* remove unused params in config

* cleaning

* Add the __init__ file

* added processing deepseek_vl class

* modified the deepseek-vl processor

* modified the deepseek-vl processor

* update __init__

* Update the image processor class name

* Added Deepseek to src/transformers/__init__.py file

* Added Deepseek to image_processing_auto.py

* update the __init__ file

* update deepseek_vl image processor

* Update Deepseek Processor

* upload fast image processor

* Revert "upload fast image processor"

This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4.

* update image processor

* flatten heirarchy

* remove DeepseekVLModel

* major update (complete modeling)

* auto modeling and other files

* formatting

* fix quality

* replace torchvision in modeling

* set default do_normalize to False

* add fast image processor template using tool

* update image processors

* add fast image processor to other files

* update liscense

* Added deepseek image testcases

* update image test

* update processor

* write CHAT_TEMPLATE

* update model for processor

* fix processor

* minor fixes and formatting

* fix image processing and tests

* fix interpolation in sam

* fix output_attentions in DeepseekVLModel

* upload test_modeling

* fix tests because of vocab size

* set use_high_res_vision=False in tests

* fix all modeling tests

* fix styling

* remove explicit background_color from image processors

* added test_processor

* added test_processor

* fix processor tests

* update docs

* update docs

* update docs

* update conversion script

* Fixed typos

* minor fixes from review

- remove model_id comments in examples
- remove from pre-trained auto mapping
- move to image-text-to-text from vision-to-seq in auto mapping
- add image_token_index to __init__ for config
- remove outdated temporary config in conversion script
- update example to use chat_template in docstring example
- update liscense 2021->2025

* fix type in config docstring

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* update get_image_features

* fix config

* improve DeepseekVLImageProcessor.preprocess

* return image_hidden_states

* use AutoTokenizer and AutoImageProcessor in Processor

* fix model outputs

* make num_image_tokens configurable

* fix docstring of processor

* move system prompt to chat template

* fix repo consistency

* fix return_dict

* replace SamVisionEncoder with SamVisionModel

* update to remove deepcopy

* 🛠️  Major Architectural Changes (Adds DeepseekVLHybrid)

* fix quality checks

* add missing hybrid in auto modeling

* run make style

* update sam_hq

* update high_res_size in test

* update docs following #36979

* update code with auto_docstring

* update conversion scripts

* fix style

* fix failing test because of tuple

* set weights_only=True in conversion script

* use safetensors.torch.load_file instead of torch.load in conversion script

* make output_dir optional in conversion script

* fix code snippets in docs (now the examples work fine)

* integration tests for DeepseekVL

* update expected texts

* make style

* integration tests for DeepseekVLHybrid

* fix class name

* update expected texts for hybrid

* run "make style"

* update since changes in main

* run make-style

* nits since changes in main

* undo changes in sam

* fix tests

* fix tests; update with main

* update with main: output_attention/output_hidden_states

* fix copied part in deepseek_vl

* run fix-copies

* fix output_hidden_states

* sam: fix _init_weigths

* use modular for DeepseekVL

* make image processor more modular

* modular: use JanusPreTrainedModel

* janus: provide kwargs in loss

* update processors in conversion script

* Revert "sam: fix _init_weigths"

This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27.

* run fix-copies

---------

Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
2025-07-25 19:18:50 +02:00
Xibin Bayes Zhou
45c7bfb157 Add evolla rebase main (#36232)
* add evolla

* adding protein encoder part

* add initial processing test

* save processor

* add docstring

* add evolla processor

* add two test

* change vision to protein

* change resampler to sequence_compressor

* change vision to protein

* initial update for llama

* add initial update for llamaForCausalLM

* add `test_processor`, `test_saprot_output`, `test_protein_encoder_output`

* change evolla, but still working on it

* add test_single_forward

* pass test_attention_outputs

* pass test_hidden_states_output

* pass test_save_load and test_from_pretrained_no_checkpoint

* pass test_cpu_offload

* skip some tests

* update new progress

* skip test_model_is_small

* pass test_model_weights_reload_no_missing_tied_weights

* pass test_model_get_set_embeddings

* pass test_cpu_offload

* skip test_resize_embeddings

* add pipeline_model_mapping

* remote old setUp

* pass processor save_pretrained and load_pretrained

* remove pooling layer

* pass test_inputs_embeds_matches_input_ids

* pass test_model_is_small

* pass test_attention_outputs

* pass test_initialization

* pass test_model_get_set_embeddings

* pass test_single_forward

* skip test_disk_offload_bin and test_disk_offload_safetensors

* fix most tests

* pass test_protein_encoder_output

* remove useless code

* add EvollaForProteinText2Text

* pass test_saprot_output

* pass all EvollaModelTest test and remove processor test

* add processor test to its own file

* skip is_training since esm skipped it and the saprot code causes error when setting is_training True

* pass processor tests

* solve all except config

* pass most cases

* change init

* add doc to `configuration_evolla.py`

* remove image_processing test

* remove extra processor test

* remove extra modules

* remove extra modules

* change all configs into one config

* pass all evolla test

* pass `make fixup`

* update short summary

* update Evolla-10B-hf

* pass check_dummies.py and check_code_quality

* fix  `tests/models/auto/test_tokenization_auto.py::AutoTokenizerTest::test_model_name_edge_cases_in_mappings`

* remove dummy codes

* change format

* fix llava issue

* update format

* update to solve llama3 access issue

* update to make forward right

* solve processor save load problem from instructblip solution

* remove unexpected file

* skip `test_generation_tester_mixin_inheritance`

* add `test_single_forward_correct` and `test_inference_natural_language_protein_reasoning`

* add `modular_evolla.py`

* solved issue #36362

* run `make fixup`

* update modular

* solve float32 training

* add fix

* solve `utils/check_docstrings.py`

* update

* update

* update

* remove other files and replace sequential and einsum

* add use case in document

* update the models

* update model

* change some wrong code

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix issues mentioned in PR

* update style and rearrange the placement

* fix return_dict argument issue

* solve SaProtConfig issue

* Solve EvollaSaProtRotaryEmbedding issue

* solve attention_mask issue

* solve almosst all issues

* make style

* update config

* remove unrelated pickle file

* delete pickle files

* fix config

* simplify a lot

* remove past k-v from encoder

* continue work

* style

* skip it from init

* fix init

* fix init

* simplify more

* fill in docstrings

* change test for generation

* skip test

* fix style

---------

Co-authored-by: Chenchen Han <13980209828@163.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-25 19:11:57 +02:00
Anton Vlasjuk
a91653561e [Ernie 4.5] Post merge adaptations (#39664)
* ernie 4.5 fixes

* Apply style fixes

* fix

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-25 17:36:18 +02:00
Lysandre Debut
f90de364c2 Rename huggingface_cli to hf (#39630)
* Rename huggingface_cli to hf

* hfh
2025-07-25 14:10:04 +02:00
revanth
3b3f9c0c46 fix(voxtral): correct typo in apply_transcription_request (#39572)
* fix(voxtral): correct typo in apply_transcription_request

* temporary wrapper: apply_transcrition_request

* Update processing_voxtral.py

* style: sort imports in processing_voxtral.py

* docs(voxtral): fix typo in voxtral.md

* make style

* doc update

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-07-25 12:09:44 +00:00
Joao Gante
e3760501b0 [docs] fix ko cache docs (#39644)
fix ko docs
2025-07-25 10:06:03 +01:00
lmarshall12
565c035a2e Add owlv2 fast processor (#39041)
* add owlv2 fast image processor

* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class

* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class

* change references to owlVit to owlv2 in docstrings for post process methods

* change type hints from List, Dict, Tuple to list, dict, tuple

* remove unused typing imports

* add disable grouping argument to group images by shape

* run make quality and repo-consistency

* use modular

* fix auto_docstring

---------

Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-25 02:40:11 +00:00
Matthew Hernandez
7b897fe583 [Docs] Translate audio_classification.md from English to Spanish (#39513)
* Docs: translate audio_classification to Spanish

* Update audio_classification.md

* Remove space
* Normalize backticks

* Update audio_classification.md

* Apply corrections recommended by aaronjimv

* Update _toctree.yml

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-23 15:55:13 -07:00
Ethan Villarosa
9b7244f189 standardized YOLOS model card according to template in #36979 (#39528)
* standardized YOLOS model card according to template in #36979

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* standardized YOLOS model card according to template in #36979

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* replaced YOLOS architecture image, deleted quantization and AttentionMaskVisualizer sections

* removed cli section

* Update yolos.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-23 11:00:25 -07:00
JoestarGagan
ec8a09a5fe Feature/standardize opt model card (#39568)
* docs: Standardize OPT model card with enhanced details

* Remove incorrect link from OPT model card

* Address review feedback on OPT model card

* Update opt.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-23 10:57:48 -07:00
Eric Bezzam
c5a80dd6c4 🔴 Fix EnCodec internals and integration tests (#39431)
* EnCodec fixes and update integration tests.

* Apply padding mask when normalize is False.

* Update comment of copied function.

* Fix padding mask within modeling.

* Revert padding function.

* Simplify handling of padding_mask.

* Address variable codebook size.

* Add output for padding for consistency with original model, fix docstrings.

* last_frame_pad_length as int

* Update example code.

* Improve docstring/comments.

* Shorten expected output.

* Consistent docstring.

* Parameterize tests.

* Properties for derived variables.

* Update expected outputs from GitHub runner.

* Consistent outputs with runner GPUs.
2025-07-23 19:39:27 +02:00
Maxime Grenu
0fe03afeb8 Fix typos and grammar issues in documentation and code (#39598)
- Fix Cyrillic 'Р' to Latin 'P' in Portuguese language link (README.md)
- Fix 'meanginful' to 'meaningful' in training documentation
- Fix duplicate 'Cohere' reference in modular transformers documentation
- Fix duplicate 'the the' in trainer and chat command comments

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-07-23 12:43:11 +00:00
Andrei Panferov
623ab01039 FP-Quant support (#38696)
* quartet

* quartet qat -> quartet

* format

* bf16 backward

* interfaces

* forward_method

* quartet -> fp_quant

* style

* List -> list

* list typing

* fixed format and annotations

* test_fp_quant

* docstrings and default dtypes

* better docstring and removed noop checks

* docs

* pseudoquantization support to test on non-blackwell

* pseudoquant

* Pseudoquant docs

* Update docs/source/en/quantization/fp_quant.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization/fp_quant.md

* Update docs/source/en/quantization/fp_quant.md

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* small test fixes

* dockerfile update

* spec link

* removed `_process_model_after_weight_loading`

* toctree

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-07-23 11:41:10 +02:00
Sangbum Daniel Choi
d9b35c635e Mask2former & Maskformer Fast Image Processor (#35685)
* add maskformerfast

* test

* revert do_reduce_labels and add testing

* make style & fix-copies

* add mask2former and make fix-copies
TO DO:
	add test for mask2former

* make fix-copies

* fill docstring

* enable mask2former fast processor

* python utils/custom_init_isort.py

* make fix-copies

* fix PR's comments

* modular file update

* add license

* make style

* modular file

* make fix-copies

* merge

* temp commit

* finish up maskformer mask2former

* remove zero shot examples

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-23 02:47:47 +00:00
Quentin Gallouédec
6e9972962f 🎯 Trackio integration (#38814)
* First attempt

* fix

* fix

* Enhance TrackioCallback to log GPU memory usage and allocation

* Enhance Trackio integration in callbacks and training arguments documentation

* re order

* remove unused lines

* fix torch optional
2025-07-22 14:50:20 -07:00
space_samurai
c6d0500d15 [WIP] Add OneformerFastImageProcessor (#38343)
* [WIP] OneformerFastImageProcessor

* update init

* Fully working oneformer image processor fast

* change Nearest to Neares exact interpolation where needed

* fix doc

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-22 20:41:39 +00:00
Harry Mellor
4884b6bf41 Fix link in "Inference server backends" doc (#39589)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-22 16:44:08 +00:00
Cássia Sampaio
cbcb8e6c1f updated mistral3 model card (#39531)
* updated mistral3 model card (#1)

* updated mistral3 model card

* applying suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* made all changes to mistral3.md

* adding space between paragraphs in docs/source/en/model_doc/mistral3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* removing duplicate in mistral3.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* adding 4 backticks to preserve formatting

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 09:01:55 -07:00
Woojun Jung
601260fd96 Update docs/source/ko/_toctree.yml (#39516)
docs: update `docs/source/ko/_toctree.yml`
2025-07-22 09:00:42 -07:00
Manuel de Prada Corral
c338fd43b0 [cache refactor] Move all the caching logic to a per-layer approach (#39106)
* Squash for refactor: Replace monolithic cache classes with modular LayeredCache (#38077)

- Introduces CacheLayer and Cache base classes
- Ports Static, Dynamic, Offloaded, Quantized, Hybrid, etc. to use layers
- Implements method/attr dispatch across layers to reduce boilerplate
- Adds CacheProcessor hooks for offloading, quantization, etc.
- Updates and passes tests

* fix quantized, add tests

* remove CacheProcessorList

* raushan review, arthur review

* joao review: minor things

* remove cache configs, make CacheLayer a mixin (joaos review)

* back to storage inside Cache()

* remove cachebase for decorator

* no more __getattr__

* fix tests

* joaos review except docs

* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`

More verbose exceptions in `fix_docstring` on docstring formatting issues.

* Revert "back to storage inside Cache()"

This reverts commit 27916bc2737806bf849ce2148cb1e66d59573913.

* cyril review

* simplify cache export

* fix lfm2 cache

* HybridChunked to layer

* BC proxy object for cache.key_cache[i]=...

* reorder classes

* bfff come on LFM2

* better tests for hybrid and hybridChunked

* complete coverage for hybrid chunked caches (prefill chunking)

* reimplementing HybridChunked

* cyril review

* fix ci

* docs for cache refactor

* docs

* oopsie

* oopsie

* fix after merge

* cyril review

* arthur review

* opsie

* fix lfm2

* opsie2
2025-07-22 16:10:25 +02:00
Raushan Turganbay
1806583390 [docs] Create page on inference servers with transformers backend (#39550)
* draft docs on inference servers

* Update docs/source/en/_toctree.yml

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* update

* dic build failed

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply last suggestions

---------

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 15:31:10 +02:00
Raushan Turganbay
cd98c1fee3 [docs] update attention implementation and cache docs (#39547)
* update docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* applu suggestions

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 15:06:43 +02:00
StevenBucaille
a88ea9cbc8 Add EfficientLoFTR model (#36355)
* initial commit

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: various typos, typehints, refactors from suggestions

* fix: fine_matching method

* Added EfficientLoFTRModel and AutoModelForKeypointMatching class

* fix: got rid of compilation breaking instructions

* docs: added todo for plot

* fix: used correct hub repo

* docs: added comments

* fix: run modular

* doc: added PyTorch badge

* fix: model repo typo in config

* fix: make modular

* fix: removed mask values from outputs

* feat: added plot_keypoint_matching to EfficientLoFTRImageProcessor

* feat: added SuperGlueForKeypointMatching to AutoModelForKeypointMatching list

* fix: reformat

* refactor: renamed aggregation_sizes config parameter into q, kv aggregation kernel size and stride

* doc: added q, kv aggregation kernel size and stride doc to config

* refactor: converted efficientloftr implementation from modular to copied from mechanism

* tests: overwrote batching_equivalence for "keypoints" specific tests

* fix: changed EfficientLoFTRConfig import in test_modeling_rope_utils

* fix: make fix-copies

* fix: make style

* fix: update rope function to make meta tests pass

* fix: rename plot_keypoint_matching to visualize_output for clarity

* refactor: optimize image pair processing by removing redundant target size calculations

* feat: add EfficientLoFTRImageProcessor to image processor mapping

* refactor: removed logger and updated attention forward

* refactor: added auto_docstring and can_return_tuple decorators

* refactor: update type imports

* refactor: update type hints from List/Dict to list/dict for consistency

* refactor: update MODEL_MAPPING_NAMES and __all__ to include LightGlue and AutoModelForKeypointMatching

* fix: change type hint for size parameter in EfficientLoFTRImageProcessor to Optional[dict]

* fix typing

* fix some typing issues

* nit

* a few more typehint fixes

* Remove output_attentions and output_hidden_states from modeling code

* else -> elif to support efficientloftr

* nit

* tests: added EfficientLoFTR image processor tests

* refactor: reorder functions

* chore: update copyright year in EfficientLoFTR test file

* Use default rope

* Add docs

* Update visualization method

* fix doc order

* remove 2d rope test

* Update src/transformers/models/efficientloftr/modeling_efficientloftr.py

* fix docs

* Update src/transformers/models/efficientloftr/image_processing_efficientloftr.py

* update gradient

* refactor: removed unused codepath

* Add motivation to keep postprocessing in modeling code

* refactor: removed unnecessary variable declarations

* docs: use load_image from image_utils

* refactor: moved stage in and out channels computation to configuration

* refactor: set an intermediate_size parameter to be more explicit

* refactor: removed all mentions of attention masks as they are not used

* refactor: moved position_embeddings to be computed once in the model instead of every layer

* refactor: removed unnecessary hidden expansion parameter from config

* refactor: removed completely hidden expansions

* refactor: removed position embeddings slice function

* tests: fixed broken tests because of previous commit

* fix is_grayscale typehint

* not refactoring

* not renaming

* move h/w to embeddings class

* Precompute embeddings in init

* fix: replaced cuda device in convert script to accelerate device

* fix: replaced stevenbucaille repo to zju-community

* Remove accelerator.device from conversion script

* refactor: moved parameter computation in configuration instead of figuring it out when instantiating a Module

* fix: removed unused attributes in configuration

* fix: missing self

* fix: refactoring and tests

* fix: make style

---------

Co-authored-by: steven <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-07-22 10:53:16 +01:00
nlhm
fbeaf96f9e Update OLMoE model card (#39344)
* Update OLMoE model card

* Checks Test

* Add license and code

* Update docs/source/en/model_doc/olmoe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update olmoe.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-21 16:41:01 -07:00
Orion Weller
641aaed7c0 Update modernbertdecoder docs (#39453)
* update docs with paper and real model

* nit

* Apply suggestions from code review

Thanks to @stevhlui!

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Remove usage examples, add quantization

---------

Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-21 16:40:22 -07:00
Anton Vlasjuk
b4115a426e [Ernie 4.5] Add ernie text models (#39228)
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
* init

* copied from remote

* add proper structure and llama like structure

* fixup

* revert to state that works

* get closer to llama

* slow and steady

* some removal

* masks work

* it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm

* nice

* getting closer

* closer to transformers style

* let's simplify this, batching works now

* simplified

* working version with modular

* it is indeed the rotation per weights, make it complete llama style

* cleanup conversion, next to look at -> tokenizer

* remove llama artefacts

* fix modeling tests (common ones)

* style

* integration test + first look into tokenization (will need more work, focussing on modeling other models first)

* style

* working moe version, based on remote

* lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view)

* more cleanup

* refactor namings and remove addition forXXX classes

* our moe won't cut it it seems, correction bias seems to be missing in remote code version

* tokenization change (remote)

* our moe version works when adding normalization :D

* cleanup moe

* nits

* cleanup modeling -> let's get to modular next

* style

* modular v1

* minor things + attempt at conversion (which doesn't work)

* no conversion follow glm, fixup modular and other nits

* modular cleanup

* fixes

* tests, tests, tests + some moe dtype forcing

* simplify modular, fix fatal fa2 bug, remaining tests

* fix import issue?

* some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float

* fix sdpa test, load on init dtype only

* fixup post merge

* style

* fix doc links

* tokenization cleanup beginnings

* simplify tokenizer by a lot as its basically llama

* tokenizer is full llama with different defaults + extra special tokens

* sync og special tokens of ernie

* fix decoding with numbers (also in remote done what a timing), begin of tok tests

* align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama)

* nits

* docs

* my daily post merge it is

* check

* tokenization update with explanations and conversion script

* review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio)

* post merge fixes

* fixup tokenization, llama fast is the way to go

* more fixups

* check

* import fixes

* correction bias following the paddle code

* fix

* fix TP plan, fix correction bias sharding during forward

* style

* whoops

* fix tied weights

* docs and last nit

* license

* flasky tests

* move repo id, update when merged on the hub
2025-07-21 19:51:49 +02:00
김민서
2da97f0943 🌐 [i18n-KO] Translated perf_infer_gpu_multi.md to Korean (#39441)
* docs: ko: perf_infer_gpu_many.md

* feat: nmt draft

* docs: refine KO translation and enhance naturalness

* docs: add missing TOC to documentation

* Align toctree and filename with original: perf_infer_gpu_multi

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Refine Korean translation

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
2025-07-21 09:14:15 -07:00
Manuel de Prada Corral
1aa7256f01 Refactor MambaCache to modeling_mamba.py (#38086)
* Refactor MambaCache to modeling_mamba.py (parity with Zamba)

* ruff

* fix dummies

* update

* update

* remove mamba ref in cache tests

* remove cache_implementation from tests

* update

* ruff

* ruff

* sneaky regression

* model consistency

* fix test_multi_gpu_data_parallel_forward

* fix falcon slow tests

* ruff

* ruff

* add sample false

* try to fix slow tests

* Revert "fix test_multi_gpu_data_parallel_forward"

This reverts commit 66b7162c7c5c5ce8a73ccf48cffc8a96343ebb33.

* fix tests on nvidia t4, remove dataparallel tests from mamba

* ruff

* remove DDP tests from mamba and falcon_mamba

* add explicit error for MambaCache

* mamba2 also needs to init cache in prepare_inputs_for_generation

* ruff

* ruff

* move MambaCache to its own file

* ruff

* unprotected import fix

* another attempt to fix unprotected imports

* Revert "another attempt to fix unprotected imports"

This reverts commit 2338354fcab630de5899321f5daced5fb312c2a2.

* fixing unprotected import, attempt 3

* Update src/transformers/cache_utils.py

* ruff's fault

* fix arthur review

* modular falcon mamba

* found a hack

* fix config docs

* fix docs

* add export info

* merge modular falcon branch

* oopsie

* fix fast path failing

* new approach

* oopsie

* fix types

* Revert new pragma in modular

This reverts commit 80b1cf160ee251536f07c40b8a0857d499e70db6.

* trying another modular workaround

* review & fix ci

* oopsie

* clear prepare_inputs on mamba/mamba2/falcon_mamba
2025-07-21 14:59:36 +02:00
Yuxuan Zhang
39ba5f3cc2 GLM-4 Update (#39393)
* one commit with full

* Create glm4_moe.md

* Update check_config_docstrings.py

* Update __init__.py

* update

* argue

* argue: router problem

* 1

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update modular_glm4_moe.py

* update

* use dsv3 pretrainmodel in modular

* update for test

* upodate new modular

* use LlamaAttention and avoid use  CohereAttention cause repeat norm

* update the modular

* update attn modular

* update

* Update modular_glm4_moe.py

* MTP layer is need to ignore

* fix gradient error using with dots_1 method

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 13:24:34 +02:00
Raushan Turganbay
e42681b48b [gemma3] support sequence classification task (#39465)
* add seq clf class

* fix docs and add in auto-map

* skip tests

* optional pixels
2025-07-21 11:03:20 +02:00
Yoni Gozlan
541bed22d6 Improve @auto_docstring doc and rename args_doc.py to auto_docstring.py (#39439)
* rename `args_doc.py` to `auto_docstring.py` and improve doc

* modifs after review
2025-07-18 18:00:34 +00:00
Yoni Gozlan
de0dd3139d Add fast image processor SAM (#39385)
* add fast image processor sam

* nits
2025-07-18 17:27:16 +00:00
Cyril Vallez
4ded9a4113 🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling (#39423)
* first try

* Update modeling_utils.py

* Update modeling_utils.py

* big refactor

* Update modeling_utils.py

* style

* docstrings and simplify inner workings of configs

* remove all trace of _internal

* Update modeling_utils.py

* fix logic error

* Update modeling_utils.py

* recursive on config

* Update configuration_utils.py

* fix

* Update configuration_dpt.py

* Update configuration_utils.py

* Update configuration_utils.py

* Update modeling_idefics.py

* Update modeling_utils.py

* fix for old models

* more old models fixup

* Update modeling_utils.py

* Update configuration_utils.py

* Remove outdated test

* remove the deepcopy!! 🥵🥵

* Update test_modeling_gpt_bigcode.py

* fix qwen dispatch

* restrict to only models supporting it

* style

* switch name

* Update modeling_utils.py

* Update modeling_utils.py

* add tests!

* fix

* rypo

* remove bad copies

* fix

* Update modeling_utils.py

* additional check

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* fix

* skip
2025-07-18 13:41:54 +02:00