Linnet Cosmos Tuscano
0ef339ff1b
Update OpenAI GPT model card ( #37255 )
...
* Update OpenAI GPT model card
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update OpenAI GPT model card: add usage examples and notes section
* Add API autodoc tags after Notes section for OpenAI GPT model
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Added missing badges
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-04 15:25:16 -07:00
Sharareh Younesian
46d73910d5
Updated T5 model card with standardized format ( #37261 )
...
* Updated T5 model card with standardized format
* Updated T5 model card with standardized format, fixed typo
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Apply reviewer suggestions
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-04 15:23:09 -07:00
Chathumina Vimukthi
579135a2f6
Updated model card for distilbert ( #37157 )
...
* Updated model card for distilbert
* Updated the distilbert model card
* Updated model card for distilbert
* Updated the distilbert model card
* Addressed code review comments
* Addressed review comments
* fix pipeline
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-04 15:22:46 -07:00
Reshan Gomis
8cd57eb731
mobilebert model card update ( #37256 )
...
* mobilebert model card update
* Updates to model card mobilebert
---------
Co-authored-by: Reshan Gomis <reshang@verdentra.com >
2025-04-04 14:28:35 -07:00
Shubham Panchal
531e4fcf0e
Update model card for Depth Anything ( #37065 )
...
[docs] Update model card for Depth Anything
2025-04-04 11:36:05 -07:00
Joao Gante
ad3d157188
[RoPE] abstract dynamic RoPE update under a decorator ✨ ( #37249 )
...
* dynamic rope decorator
* longrope; shorter fwd pass
* propper docstring
* make fixup
2025-04-04 14:27:28 +01:00
Surya Garikipati
8dd0a2b89c
Update model card for electra ( #37063 )
...
* Update ELECTRA model card with new format
* Update ELECTRA model card with new format
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* close hfoption block
---------
Co-authored-by: Wun0 <f20191221@hyderabad.bits-pilani.ac.in >
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-03 10:45:35 -07:00
Parag Ekbote
15ac2b6ac5
Update Model Card for ModernBERT ( #37052 )
...
* Modify Model Card for ModernBERT.
* Update as per code review.
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update model card.
* Update model card.
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-03 10:14:02 -07:00
Abhishek Ranjan
b552708694
chore: Update model doc for code_llama ( #37115 )
...
* Update code_llama.md
aims to handle https://github.com/huggingface/transformers/issues/36979#issuecomment-2758560598
sub part of https://github.com/huggingface/transformers/issues/36979
* Update docs/source/en/model_doc/code_llama.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/code_llama.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/code_llama.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* make changes as per code review
* chore: make the function smaller for attention mask visualizer
* chore[docs]: update code_llama.md with some more suggested changes
* Update docs/source/en/model_doc/code_llama.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* chore[docs] : Update code_llama.md with indentation changes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-03 10:09:41 -07:00
Bimal Gajera
2b84831a93
Update model card for Cohere ( #37056 )
...
* Update Cohere model card to follow standard template
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update cohere.md
Update code snippet for AutoModel, quantization, and transformers-cli
* Update cohere.md
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-03 09:51:40 -07:00
Avigyan Sinha
1b29409d89
feat: updated model card for qwen_2.5_vl ( #37099 )
...
* feat: updated model card for qwen_2.5_vl
* applied suggested change 1
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* applied suggested change 2
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* applied suggested change 3
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* fix: made requested changes for quantization and notes
* suggeested model card change 4
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* updated model card wiht suggested change 5
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* updated model card wiht suggested change 6
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* updated model card wiht suggested change 7
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* feat: applied requested changes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-03 09:13:26 -07:00
Ryan Mullins
3f6af96732
Adding links to ShieldGemma 2 technical report ( #37247 )
2025-04-03 16:26:29 +01:00
Joao Gante
9a1c1fe7ed
[CI] green llama tests ( #37244 )
...
* green llama tests
* use cleanup instead
* better test comment; cleanup upgrade
* better test comment; cleanup upgrade
2025-04-03 14:15:53 +01:00
Raushan Turganbay
98601cc818
[Phi4] add multimodal chat template ( #36996 )
...
* phi4 chat template
* remove from valid kwargs
2025-04-03 09:52:09 +02:00
ARAVINDHAN T
2056287940
Updated model card for Qwen2 ( #37192 )
...
* Update qwen2.md
* Update qwen2.md
* Update qwen2.md
* Update qwen2.md
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update qwen2.md
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/qwen2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-02 18:10:41 -07:00
Ricardo Alanis
3e96a0c32b
Update falcon model card ( #37184 )
...
* feat: updated model card for falcon
* fix:rewrite model description
* fix: add link to conversion script
* Update docs/source/en/model_doc/falcon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/falcon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/falcon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/falcon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/falcon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/falcon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/falcon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/falcon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* fix: Add suggested changes
* fix: typo in link for quantization
* Update docs/source/en/model_doc/falcon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/falcon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* fix: fix indent and close ticks
* fix: add indent
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-02 17:30:37 -07:00
Purusharth Malik
199d7adf10
Updated the model card for CLIP ( #37040 )
...
* Update clip.md
* Update docs/source/en/model_doc/clip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/clip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/clip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Incorporated suggested changes
* Update docs/source/en/model_doc/clip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/clip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/clip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-02 14:57:38 -07:00
Bowen Bao
800510c67b
[doc] Fix link for Quark quantization page ( #37179 )
...
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com >
2025-04-01 20:57:38 +02:00
Armaghan Shakir
0710e9b1e8
Create and Expose SamVisionModel as public for better accessibility ( #36493 )
...
* move encoder below
* auto modeling
* write SamVisionTester
* fix vision attention shape
* fix SamVisionTest
* minor changes to SamVisionTest
* Revert "fix vision attention shape"
This reverts commit d2a4083ae5704716e33351aed03af8f3cc45f3ae.
* fix attention output shape in new tests
* remove encoder examples
* run modular on got_ocr2
* code formatting
* fix got_ocr2
* ruff fixes
* code quality
* add sam_vision in auto modeling and auto configuration
* remove composite test
* updated index.md
* add TFSamVisionEncoder to __init__
* fix public TFSamVisionEncoder
* remove outdated todo comment
* set test_torch_exportable
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
* rename: VisionEncoder -> VisionModel
* bring back original SamVisionEncoder
* rename back: VisionEncoderOutput -> VisionModelOutput
* undo changes in SamModelTester
* reuse SamVisionEncoder in SamVisionModel
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
2025-03-31 11:45:07 +02:00
jiqing-feng
286393fbb1
enable tp on CPU ( #36299 )
...
* enable tp on CPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* get rank from cpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* update
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* enable TP tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* em print
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix model id
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix conflict
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix index and add doc
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
2025-03-31 10:55:47 +02:00
Bo Zheng
6acd5aecb3
Adding Qwen3 and Qwen3MoE ( #36878 )
...
* Initial commit for Qwen3
* fix and add tests for qwen3 & qwen3_moe
* rename models for tests.
* fix
* fix
* fix and add docs.
* fix model name in docs.
* simplify modular and fix configuration issues
* Fix the red CI: ruff was updated
* revert ruff, version was wrong
* fix qwen3moe.
* fix
* make sure MOE can load
* fix copies
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com >
2025-03-31 09:50:49 +02:00
MinJu-Ha
0d6a60fe55
🌐 [i18n-KO] Translated qwen2_vl.md to Korean ( #36750 )
...
* fix: manual edits
* fix: resolve suggestions
* Update toctree.yml
2025-03-30 15:00:27 -07:00
Cyril Vallez
2bea6bf24e
Fix AttentionInterface following feedback ( #37010 )
...
* up
* typo
* update doc
* Update attention_interface.md
2025-03-28 18:00:35 +01:00
Minho Ryu
eca74d1367
[WIP] add deepseek-v3 ( #35926 )
...
* init commit
* style
* take comments into account
* add deepseekv3 modeling
* remove redundant code
* apply make style
* apply fix-copies
* make format
* add init files
* rename deepseekv3 into deepseek_v3 based on its model_type
* rename deepseekv3 into deepseek_v3 based on its model_type
* deepseek-v3 not deepseek_v3
* set model_type as deepseek_v3
* use default docs
* apply make
* fill type and docstring
* add rope_config_validation
* use custom DeepseekV3MLP
* hold code only for checkpoints congifuration; remove redundant
* revise rope yarn for DeepSeek variation
* rename DeepSeek-V3
* some refactoring
* revise load_hook to work properly; make moe func trainable; use llama instead of mixtral
* fix attention forward
* use -1 for not-changing dim when to use exapnd
* refactor DeepseekV3TopkRouter
* use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim
* register pre_hook and hook both
* make style
* use n_shared_experts
* Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* add test file
* update modeling_file according to modular file
* make style
* add mapping for DeepseekV3ForSequenceClassification
* remove aux_loss_alpha
* add deepseek_v3 for perf
* add deepseek_v3
* rename test as deepseekv3
* use tiny-deepseek-v3
* remove DeepseekV3ForSequenceClassification
* cache before padding
* remote output_router_logits
* Revert "remote output_router_logits"
This reverts commit f264f800d04950390db8413b9efb24cef8186330.
* remove output_router_logits
* make e_score_correction_bias as buffer
* skip tests not compatible
* make style
* make e_score_correction_bias as buffer
* use rope_interleave instead of load_hook
* skip tests not compatible with MLA
* add doc for rope_interleave
* fix typo
* remove torch.no_grad for selecting topk
* fix post merge issue
* mrege with main and simplify
* nits
* final
* small fixes
* fix
* support TP better
* stash
* changes currently requires
* remove synch
* more fixes for TP
* temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used
* updates to have generation work!
* push most of the changes
* reorder functions + call for contributions!
* update readme
* nits
* update
* ruff was updated on main
* merge with main and fix copies
* revert unrelated changes
* route all tokens to all experts when testing to avoid no gradient iddues
* finish fixing all tests
* fixup
* nit
* clean config
* last readme changes
* nit
* do cnit
* typo
* last nit
* one more one more
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com >
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal >
2025-03-28 15:56:59 +01:00
Abu Bakr Soliman
49b5ab6a27
Support QuestionAnswering Module for ModernBert based models. ( #35566 )
...
* push ModernBertForQuestionAnswering
* update ModernBertForQuestionAnswering
* update __init__ loading
* set imports for ModernBertForQuestionAnswering
* update ModernBertForQuestionAnswering
* remove debugging logs
* update init_weights method
* remove custom initialization for ModernBertForQuestionAnswering
* apply make fix-copies
* apply make style
* apply make fix-copies
* append ModernBertForQuestionAnswering to the pipeline supported models
* remove unused file
* remove invalid autoload value
* update en/model_doc/modernbert.md
* apply make fixup command
* make fixup
* Update dummies
* update usage tips for ModernBertForQuestionAnswering
* update usage tips for ModernBertForQuestionAnswering
* add init
* add lint
* add consistency
* update init test
* change text to trigger stuck text
* use self.loss_function instead of custom loss
By @Cyrilvallez
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com >
* Update modeling_modernbert.py
make comparable commit to even it out
* Match whitespace
* whitespace
---------
Co-authored-by: Matt <rocketknight1@gmail.com >
Co-authored-by: Orion Weller <wellerorion@gmail.com >
Co-authored-by: Orion Weller <31665361+orionw@users.noreply.github.com >
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com >
2025-03-26 21:24:18 +01:00
Steven Liu
3a8ec8c467
[docs] Attention mask image ( #36970 )
...
add image
2025-03-26 10:11:34 -07:00
Cyril Vallez
788e1092e9
Allow easy registration of custom attention functions ( #36889 )
...
* Update modeling_utils.py
* style
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* add to init
* Update modeling_utils.py
* style
* update
* Update modeling_utils.py
* Update modeling_utils.py
* style
* Add some doc
* Update _toctree.yml
* readd it for tgi/vllm compat
* CIs
* CIs
2025-03-26 16:15:06 +01:00
Steven Liu
a844297088
[docs] Fix image link ( #36869 )
...
* fix image link
* fix
* update
* fix
2025-03-25 11:34:21 -07:00
Cyril Vallez
4303d88c09
Add Phi4 multimodal ( #36939 )
...
* raw start
* update
* update
* add to imports
* update
* up
* simplify configs
* clean configs
* style
* typos
* Update convert_phi4_multimodal_weights_to_hf.py
* Update convert_phi4_multimodal_weights_to_hf.py
* fix
* up
* up
* up
* Update convert_phi4_multimodal_weights_to_hf.py
* Update convert_phi4_multimodal_weights_to_hf.py
* up
* up
* up
* Update feature_extraction_phi4_multimodal.py
* up
* up
* up
* up
* up
* simplify configs
* typo
* cut code
* typo
* typo
* typo
* re
* typo
* up
* up
* up
* add tests
* fix
* fix
* Update test_modeling_phi4_multimodal.py
* up
* Update test_modeling_phi4_multimodal.py
* doc
* fix
* up
* up
* up
* up
* up
* up
* simplify
* up
* simplify
* config docstrings
* cleanup
* clean
* typo
* typo
* fix
* Update phi4_multimodal.md
* fix
* fix
* Update test_modeling_phi4_multimodal.py
* update
* simplify reshapes and permutes
* up
* simplify special tokens
* simplify processor a lot
* Update processing_phi4_multimodal.py
* Update processing_phi4_multimodal.py
* switch to fast processor
* image processor
* Update image_processing_phi4_multimodal_fast.py
* add lora extraction to converter
* Update convert_phi4_multimodal_weights_to_hf.py
* Update __init__.py
* add AudioInput type in audio_utils
* rewrite feature_extraction: support torch batched FFT
* input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values
* test update
* not mono channel warning update
* remove auto maps from processor
* kargs dispatch in processor
* simplify kwargs dispatch
* simplify merging
* remove default sampling rate
* style
* Update test_modeling_phi4_multimodal.py
* update doc
* doc
* torch only feature extractor
* make fake tokens adjustable
* Update feature_extraction_phi4_multimodal.py
* fix
* Update processing_phi4_multimodal.py
* simplify mask
* last touch
* fix copies
* style
* Update audio_utils.py
* style
* Update feature_extraction_phi4_multimodal.py
* Update __init__.py
* docstrings
* copies
* fix all checks
* back to fix-copies
* trigger CIs
* Update feature_extraction_phi4_multimodal.py
* improve tests with multimodal inputs
* trigger CIs
---------
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com >
2025-03-25 09:55:21 +01:00
Raushan Turganbay
47e5432805
Deprecate #36741 and map Causal to Conditional ( #36917 )
...
* deprecate the prev fix
* reword warning and update docs
* reword warning
* tests
* dont bloat `get_text_config()`
2025-03-25 09:13:56 +01:00
omahs
cbf924b76c
Fix typos ( #36910 )
...
* fix typos
* fix typos
* fix typos
* fix typos
2025-03-24 14:08:29 +00:00
Aritra Roy Gosthipaty
c9d1e5238a
Update installation.md ( #36826 )
...
* Update installation.md
* Update README.md
2025-03-21 16:32:02 -07:00
Steven Liu
d253de6d58
[docs] Model docs ( #36469 )
...
* initial
* fix
* fix
* update
* fix
* fixes
* quantization
* attention mask visualizer
* multimodal
* small changes
* fix code samples
2025-03-21 15:35:22 -07:00
Joao Gante
949cca4061
[CI] doc builder without custom image ( #36862 )
...
* no image
* test
* revert jax version updates
* make fixup
* update autodoc path for model_addition_debugger
* shieldgemma2
* add missing pages to toctree
2025-03-21 09:10:27 +00:00
Pablo Montalvo
1d3f35f30a
Add model visual debugger ( #36798 )
...
* draft of model tracer visualiser
* add context manager in addition to decorator
* add debug utils to init
* move model debugging utils to dedicated file
* add documentation
* protect some imports
* format
* move and protect imports
* format
* doc: improve errors in case of broken dummy imports.
* format
* use automatic torch backend
* update doc
* fix backend
* (TEMP) move to dummies while backend wait
* update documentation
* doc
2025-03-20 17:37:29 +01:00
Haotong LIN
6515c25953
Add Prompt Depth Anything Model ( #35401 )
...
* add prompt depth anything model by modular transformer
* add prompt depth anything docs and imports
* update code style according transformers doc
* update code style: import order issue is fixed by custom_init_isort
* fix depth shape from B,1,H,W to B,H,W which is as the same as Depth Anything
* move prompt depth anything to vision models in _toctree.yml
* update backbone test; there is no need for resnet18 backbone test
* update init file & pass RUN_SLOW tests
* update len(prompt_depth) to prompt_depth.shape[0]
Co-authored-by: Joshua Lochner <admin@xenova.com >
* fix torch_int/model_doc
* fix typo
* update PromptDepthAnythingImageProcessor
* fix typo
* fix typo for prompt depth anything doc
* update promptda overview image link of huggingface repo
* fix some typos in promptda doc
* Update image processing to include pad_image, prompt depth position, and related explanations for better clarity and functionality.
* add copy disclaimer for prompt depth anything image processing
* fix some format typos in image processing and conversion scripts
* fix nn.ReLU(False) to nn.ReLU()
* rename residual layer as it's a sequential layer
* move size compute to a separate line/variable for easier debug in modular prompt depth anything
* fix modular format for prompt depth anything
* update modular prompt depth anything
* fix scale to meter and some internal funcs warp
* fix code style in image_processing_prompt_depth_anything.py
* fix issues in image_processing_prompt_depth_anything.py
* fix issues in image_processing_prompt_depth_anything.py
* fix issues in prompt depth anything
* update converting script similar to mllamma
* update testing for modeling prompt depth anything
* update testing for image_processing_prompt_depth_anything
* fix assertion in image_processing_prompt_depth_anything
* Update src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
* Update src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
* Update docs/source/en/model_doc/prompt_depth_anything.md
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
* Update docs/source/en/model_doc/prompt_depth_anything.md
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
* update some testing
* fix testing
* fix
* add return doc for forward of prompt depth anything
* Update src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
* Update tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
* fix prompt depth order
* fix format for testing prompt depth anything
* fix minor issues in prompt depth anything doc
* fix format for modular prompt depth anything
* revert format for modular prompt depth anything
* revert format for modular prompt depth anything
* update format for modular prompt depth anything
* fix parallel testing errors
* fix doc for prompt depth anything
* Add header
* Fix imports
* Licence header
---------
Co-authored-by: Joshua Lochner <admin@xenova.com >
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
2025-03-20 16:12:44 +00:00
Pavel Iakubovskii
66291778dd
Refactor Attention implementation for ViT-based models ( #36545 )
...
* Refactor vit attention
* Refactor ViT-based models
* 🚨 🚨 🚨 Fix prefix for DPT
* Update params order
* trigger tests
* Fix Dinov2 attention
* Fix DPT attention impl propagation for backbone config
* Common test fix: config is modif. inplace - avoid it
* view->reshape
* Fixup
* Fixup
* Enable IJepa FA2
* Add FA2 in corresponding model docs
2025-03-20 15:15:01 +00:00
fxmarty-amd
1a374799ce
Support loading Quark quantized models in Transformers ( #36372 )
...
* add quark quantizer
* add quark doc
* clean up doc
* fix tests
* make style
* more style fixes
* cleanup imports
* cleaning
* precise install
* Update docs/source/en/quantization/quark.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update tests/quantization/quark_integration/test_quark.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* remove import guard as suggested
* update copyright headers
* add quark to transformers-quantization-latest-gpu Dockerfile
* make tests pass on transformers main + quark==0.7
* add missing F8_E4M3 and F8_E5M2 keys from str_to_torch_dtype
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
Co-authored-by: Bowen Bao <bowenbao@amd.com >
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com >
2025-03-20 15:40:51 +01:00
Ryan Mullins
487dab1b2b
Shieldgemma2 ( #36678 )
...
* single commit
* correct config
* fixup
* dummy pt
* Use ShieldGemma2Config in conversion script
* Update src/transformers/models/shieldgemma2/configuration_shieldgemma2.py
* Adding shieldgemma2 to models.__init__.py
* Adding ShieldGemma2 to main __init__.py
* Update shieldgemma2.md
* Update shieldgemma2.md
* Adding tests. Addressing review feedback.
* Minor docs update
* Fixing code quality feedback from CI
* Fixing empty messages bug reported by ghunkins
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com >
Co-authored-by: Ren Pang <ain-soph@live.com >
2025-03-20 15:14:38 +01:00
Joao Gante
957b05b413
[qwen2 audio] remove redundant code and update docs ( #36282 )
2025-03-20 10:54:51 +00:00
HDCharles
94555437e2
Disable inductor config setter by default ( #36608 )
...
* Disable inductor config setter by default
This is hard to debug and should be off by default
* remove default settings in autoquant too
* Add info to torchao.md about recommended settings
* satisfying Ruff format
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
2025-03-20 11:23:14 +01:00
Matt
9be4728af8
Just import torch AdamW instead ( #36177 )
...
* Just import torch AdamW instead
* Update docs too
* Make AdamW undocumented
* make fixup
* Add a basic wrapper class
* Add it back to the docs
* Just remove AdamW entirely
* Remove some AdamW references
* Drop AdamW from the public init
* make fix-copies
* Cleanup some references
* make fixup
* Delete lots of transformers.AdamW references
* Remove extra references to adamw_hf
2025-03-19 18:29:40 +00:00
Mohamed Mekkouri
258dd9cc69
Add Space to Bitsandbytes doc ( #36834 )
...
* add space
* address review
2025-03-19 18:56:07 +01:00
Driss Guessous
e8d960329e
Add option for ao base configs ( #36526 )
2025-03-19 14:59:47 +01:00
Yoni Gozlan
12f2ebef63
Support custom dosctrings in modular ( #36726 )
...
* Override docstrings in modular if not none
* Update doc
2025-03-18 14:00:54 -04:00
Yoni Gozlan
30580f035b
Fix Mistral3 tests ( #36797 )
...
* fix processor tests
* fix modeling tests
* fix test processor chat template
* revert modeling test changes
2025-03-18 13:08:12 -04:00
Cyril Vallez
e959530b8f
Add Mistral3 ( #36790 )
...
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
* initial start
* style and dummies
* Create convert_mistral3_weights_to_hf.py
* update
* typo
* typo
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* up
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* update
* update
* Update image_processing_mistral3.py
* Update convert_mistral3_weights_to_hf.py
* fix patch merger
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* up
* update modular to fit
* style
* Update convert_mistral3_weights_to_hf.py
* typo
* Update modular_mistral3.py
* simplify a lot all shape shenanigans
* simplify
* add working test processor
* Add partially working common modeling tests
* All tests working and remove mistral3 image processors
* add docs and fixup
* fix inference with image size >1540
* 🚨 fix test image proc pixtral
* Remove vision_feature_select_strategy
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* clean
* fix test checkpoints
* Update test_modeling_mistral3.py
* Update test_modeling_mistral3.py
* style
* Use Pixtral processor
* up
* finish cleaning processor to use pixtral directly
* Update __init__.py
* Update processing_pixtral.py
* doc
* Update __init__.py
* Update mistral3.md
* Update _toctree.yml
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co >
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com >
2025-03-18 12:04:42 +01:00
Steven Liu
ac1a1b66b9
[docs] Update README ( #36265 )
...
* update
* feedback
* feedback
* update versions
2025-03-17 09:37:19 -07:00
Christopher Akiki
e3af4fec91
[MINOR:TYPO] Update hubert.md ( #36733 )
...
* [MINOR:TYPO] Update hubert.md
- typo fix (wave2vec instead of hubert)
- make code snippet copiable and runnable
* Run tests
2025-03-17 09:07:51 -07:00
MaCAT
25992b493c
🌐 [i18n-KO] Translated codegen.md to Korean ( #36698 )
...
* Initial translation
* Add _toctree.yml
2025-03-14 09:31:18 -07:00