Arthur
10baffb599
Multiple llama4 fixe ( #37353 )
...
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
* update for fixes
* more fixes
* fuxix dynamic cache?
* style
* fix both traiining and generating. Eager seems alright
* dynamic does not work
* fix most cases, use_cache or not, eager or not, no default cache (ex: not training but you want to get cache states)
* should be final fixes
* fix more stuff no cat
* style
* fix
* style
* final sytle
* qualityeioiwhjfaopsejdpofqsdjkfjha;wesdhgfkjlqsw.denghjkaswednkgs
* fix
* revert
v4.51.1
2025-04-08 11:15:06 +02:00
Arthur Zucker
4a88ffae40
v4.51.1
2025-04-08 00:27:58 +02:00
salman
f19aec737e
Fixing flex attention for torch=2.6.0 ( #37285 )
...
* adding compile kwarg for torch 2.6
* fixing dynamic
* addressing comment
* typo
* Update src/transformers/integrations/flex_attention.py
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
2025-04-08 00:22:21 +02:00
Wing Lian
d8f0695e84
more fixes for post-training llama4 ( #37329 )
...
* more fixes for post-training llama4
* use target_length instead of guearded past_key_values
2025-04-08 00:22:17 +02:00
Cyril Vallez
d27c8c38f4
Remove HQQ from caching allocator warmup ( #37347 )
...
Update modeling_utils.py
2025-04-08 00:22:07 +02:00
Cyril Vallez
04c0cedcdf
fix derived berts _init_weights ( #37341 )
...
* fix derived berts
* more
* roformer
2025-04-08 00:21:44 +02:00
Cyril Vallez
4f536ba0ae
Fix init empty weights without accelerate ( #37337 )
...
* add the integration
* Update accelerate.py
* Update accelerate.py
* add find_tied_params as well
* Update accelerate.py
* add where copied from
* simplify
* add error
2025-04-08 00:21:36 +02:00
Cyril Vallez
6b82af0a5b
Fix deepspeed with quantization ( #37324 )
...
* Update modeling_utils.py
* Update modeling_utils.py
2025-04-08 00:21:32 +02:00
hoshi-hiyouga
2bf3d4aca8
fix llama4 training ( #37319 )
2025-04-08 00:21:28 +02:00
Wing Lian
a79b7abede
fix flex attn when optional args aren't passed ( #37327 )
2025-04-08 00:21:24 +02:00
Lysandre
0720e206c6
Release: v4.51.0
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
v4.51.0
2025-04-05 22:03:17 +02:00
Arthur
25b7f27234
Add llama4 ( #37307 )
...
* remove one of the last deps
* update fast image processor after refactor
* styling
* more quality of life improvements
* nit
* update
* cleanups
* some cleanups
* vllm updates
* update fake image token
* [convert] Fix typo
* [convert] Strip extraneous bytes from shards
* [convert] Minor fixes
* [convert] Use num_experts
* multi-image fixes in modeling + processor
* fixup size
* 128 experts
* Use default rope
* Unfuse mlp
* simplify a lot inputs embeds merging
* remove .item() 👀
* fix from review
* Address feedback
* Use None "default" for rope_scaling. Add eot.
* set seed
* return aspect ratios and bug fixes
* Moe 128 rebased (#8 )
* 128 experts
* Use default rope
* Unfuse mlp
* Address feedback
* Use None "default" for rope_scaling. Add eot.
* Meta/llama quant compat (#7 )
* add quant compatible model & conversion code for llama4
* fix a few issues
* fix a few issues
* minor type mapping fix
---------
Co-authored-by: Lu Fang <fanglu@fb.com >
* use a new config parameter to determine which model definition to use for MoE
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co >
Co-authored-by: Lu Fang <fanglu@fb.com >
* un-comment write_tokenizer from converting script
* remove un-used imports
* [llama4] Pop aspect_ratios from image processor output in Llama4Processor
Signed-off-by: Jon Swenson <jmswen@gmail.com >
* Fix parameter_count name
* Update src/transformers/models/llama4/configuration_llama4.py
* nit
* Add changes for no_rope, moe_layers, chunked attention. Just need to test all
* Update src/transformers/models/llama4/image_processing_llama4_fast.py
* nit
* fix post merge with main
* support flex attention
* fixes
* fix
* add layer
* small updates
* rebase and delete llm_compressor
* nit
* [llama4/mm] Add back <|image|> token that delimits global tile
* [llama4/mm] Fix Llama 4 image processing unit tests
* add explicit dtype
Signed-off-by: Jon Swenson <jmswen@gmail.com >
* sdpa works
* comment todo small
* fix model loading
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
* revert
* nits
* small fix for TP on 1 node
* Read new params from config
* Add <|eom|>
* lol don't know how this got here
* adding fp8
* Save processor, fix chat template
* style
* Add boi/eoi tokens
We don't use them.
* fixes for now flex seems to work :)
* updates
* nits
* updates
* missking keys
* add context parallel
* update
* update
* fix
* nits
* add worldsize and make eager attn work for vision
* Ignore new key present in base models
* add tp_plan
* fix nope
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
* minor fix
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
* Clean up Llama4 vision model
* current updates
* add support for `attn_temperature_tuning`
* add floor scale
* add missing attn scales
* push what works, dirty trick for the device synch
* oups
* Fix pad_token_id
See
https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files
Confirmed in the original codebase.
* fix causallml loading
* rm
* fix tied-weights
* fix sdpa
* push current version
* should work with both short and long
* add compressed_tensos & fix fbgemm tp
* Fix flex impl
* style
* chunking
* try to revert the potentially breaking change
* fix auto factory
* fix shapes in general
* rm processing
* commit cache utils cleanup
* Fix context length
* fix
* allocate
* update tp_plan
* fix SDPA!
* Add support for sparse `Llama4TextMoe` layer from the kernel hub
* cleanup
* better merge
* update
* still broken fixing now
* nits
* revert print
* Write max_position_embeddings and max_model_length
* Update modeling_llama4.py
* Save attention_chunk_size
* Sync eos terminators
* Read initializer_range
* style
* remove `dict`
* fix
* eager should use `chunked_attention_mask`
* revert
* fixup
* fix config
* Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"
This reverts commit ccda19f050867dd42ea143c5de60f3dec81375f0, reversing
changes made to a515579aed8c0fe9bf529b6c40446a289406d5d6.
* Fix typo and remove warning with compiled flex and chunked prefill
* Fix MoE vs FF (#41 )
* fix
* Use correct no_rope_layers if provided one is empty list
* update tests
* fix
* skipping some tests
* fix fp8 loading
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
* fix text geneartion pipeline
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
* eager needs 4D mask
* fix
* Some cleanup
* fix
* update
* fix
* replace correctly module
* patch
* modulelist
* update
* update
* clean up
* Don't move to `cuda:0` in distributed mode
* restrict to compressed tensors for now
* rm print
* Docs!
* Fixes
* Update docs/source/en/model_doc/llama4.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co >
* Fixes
* cuda graph fix
* revert some stuff
* fixup
* styling
* Update src/transformers/models/llama4/modeling_llama4.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* fixup
* commit licence, cleanup here and there and style
* more styling changes
* fix dummies
* fix and clean docstrings
* remove comment
* remove warning
* Only fast image processor is supported
* nit
* trigger CI
* fix issue with flex encoder
* fix dynamic cache
* Code quality
* Code quality
* fix more tests for now
* Code quality
* Code quality
* Nuke bunch of failing stuff
* Code quality
* Code quality
* cleanup removal of slow image processor
* ruff fix fast image processor
* fix
* fix styling
* Docs
* Repo consistency
* Repo consistency
* fix sliding window issue
* separate llama cache
* styling
* Repo consistency
* Repo consistency
* push waht works
* L4 Repo consistency
* Docs
* fix last last alst alst alst alstsaltlsltlaslt
---------
Signed-off-by: Jon Swenson <jmswen@gmail.com >
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com >
Co-authored-by: Pedro Cuenca <pedro@huggingface.co >
Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com >
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com >
Co-authored-by: Keyun Tong <tongkeyun@gmail.com >
Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com >
Co-authored-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com >
Co-authored-by: Jon Swenson <jmswen@gmail.com >
Co-authored-by: jmswen <jmswen@users.noreply.github.com >
Co-authored-by: MekkCyber <mekk.cyber@gmail.com >
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com >
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com >
Co-authored-by: Yong Hoon Shin <yhshin@meta.com >
Co-authored-by: Marc Sun <marc@huggingface.co >
Co-authored-by: drisspg <drisspguessous@gmail.com >
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com >
Co-authored-by: Daniël de Kok <me@danieldk.eu >
Co-authored-by: Lysandre <hi@lysand.re >
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com >
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-04-05 22:02:22 +02:00
Lysandre Debut
aa40fda346
Hf Xet extra ( #37305 )
...
* Hf Xet extra
* Hf Xet extra
2025-04-05 21:06:05 +02:00
Cyril Vallez
e94571580b
Fix deepspeed loading (part 2) ( #37306 )
...
* fix
* Update modeling_utils.py
* Update modeling_utils.py
* oups remove print
2025-04-05 20:41:42 +02:00
Cyril Vallez
84aa13dd85
Fix deepspeed loading ( #37281 )
...
* Update modeling_utils.py
* Update modeling_utils.py
* fix and remove all imports
* Update modeling_utils.py
* Update modeling_utils.py
* style
* Update modeling_utils.py
2025-04-05 17:05:45 +02:00
Linnet Cosmos Tuscano
0ef339ff1b
Update OpenAI GPT model card ( #37255 )
...
* Update OpenAI GPT model card
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update OpenAI GPT model card: add usage examples and notes section
* Add API autodoc tags after Notes section for OpenAI GPT model
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/openai-gpt.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Added missing badges
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-04 15:25:16 -07:00
Sharareh Younesian
46d73910d5
Updated T5 model card with standardized format ( #37261 )
...
* Updated T5 model card with standardized format
* Updated T5 model card with standardized format, fixed typo
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Apply reviewer suggestions
* Update docs/source/en/model_doc/t5.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-04 15:23:09 -07:00
Chathumina Vimukthi
579135a2f6
Updated model card for distilbert ( #37157 )
...
* Updated model card for distilbert
* Updated the distilbert model card
* Updated model card for distilbert
* Updated the distilbert model card
* Addressed code review comments
* Addressed review comments
* fix pipeline
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-04 15:22:46 -07:00
Reshan Gomis
8cd57eb731
mobilebert model card update ( #37256 )
...
* mobilebert model card update
* Updates to model card mobilebert
---------
Co-authored-by: Reshan Gomis <reshang@verdentra.com >
2025-04-04 14:28:35 -07:00
Rahul Tuli
ebe47ce3e9
Fix: Unexpected Keys, Improve run_compressed, Rename Test Folder ( #37077 )
2025-04-04 21:30:11 +02:00
Shubham Panchal
531e4fcf0e
Update model card for Depth Anything ( #37065 )
...
[docs] Update model card for Depth Anything
2025-04-04 11:36:05 -07:00
byi8220
a4e55fcff8
Disable delay_optimizer_creation in Trainer to support fsdp2 ( #37147 )
...
* github why you do this
* fix
* make fixup
* disable cpu offload test
* fixup
* tmp reworks
* git branch movement
* make fixup
* add require_fsdp_v2_version
* dep issues
* update ruff and fixup
2025-04-04 20:11:37 +02:00
Yao Matrix
878562b68d
fix test device spec relative path importing issue ( #37190 )
...
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com >
2025-04-04 18:22:55 +02:00
Matt
8ebc435267
Fix llava_onevision tests ( #37280 )
...
* Fix llava_onevision tests
* Trigger tests
2025-04-04 15:03:38 +01:00
Joao Gante
ad3d157188
[RoPE] abstract dynamic RoPE update under a decorator ✨ ( #37249 )
...
* dynamic rope decorator
* longrope; shorter fwd pass
* propper docstring
* make fixup
2025-04-04 14:27:28 +01:00
Lysandre Debut
3d40bda30e
Hugging Face Hub pin to v0.30.0 for Xet ( #37166 )
2025-04-04 14:58:22 +02:00
Joao Gante
acbcb5d07d
[Tests] flaky test_constrained_beam_search_generate_dict_output ( #37276 )
2025-04-04 13:38:42 +01:00
Ryan McConville
4ba0989eab
Clarify error message to ensure min 28x28 image supplied for Qwen 2.5 VL ( #37264 )
...
fix: clarify error message for min 28x28 images
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
2025-04-04 12:53:38 +01:00
Yih-Dar
352ec8ef22
pin specific natten version in docker file ( #37274 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-04-04 13:47:16 +02:00
cyyever
edd345b52e
Fix deprecated PT functions ( #37237 )
...
* Fix deprecated PT functions
Signed-off-by: cyy <cyyever@outlook.com >
* Revert some changes
Signed-off-by: cyy <cyyever@outlook.com >
---------
Signed-off-by: cyy <cyyever@outlook.com >
2025-04-04 12:31:11 +01:00
Yih-Dar
b016de1ae4
Fix utils/check_bad_commit.py ( #37272 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-04-04 12:18:20 +02:00
Nikos Antoniou
f74d7da836
Introduce modular files for speech models ( #35902 )
...
* WAV_2_VEC_2 to WAV2VEC2
* added modular files for hubert, wavlm, wav2vec2_bert, data2vec_audio
* remove unnessary definitions in modulars
* added modular files for UniSpeech, UniSpeechSat, Wav2Vec2Conformer
* docstring fix for UniSpeechForCTC
* removed unneccessary re-definition of modular classes
* reverted lazy imports change on modular_model_converter, type-alias for Wav2Vec2BaseModelOutput
* top-level import of deepspeed in seamless_m4t, speecht5
* avoid tracking imports inside classes, relocate lazy deepspeed, peft imports in their original locations
* convert modular
* tiny modular typing fixes
* some more modular fixes
* make style
---------
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com >
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com >
2025-04-04 11:46:27 +02:00
Ita Zaporozhets
d130cd0e16
update error msg ( #37207 )
2025-04-04 10:21:30 +02:00
Raushan Turganbay
41b9b92b52
[qwen-vl] fix image processor ( #37258 )
...
* fix
* add test
2025-04-03 19:48:56 +02:00
Surya Garikipati
8dd0a2b89c
Update model card for electra ( #37063 )
...
* Update ELECTRA model card with new format
* Update ELECTRA model card with new format
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/electra.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* close hfoption block
---------
Co-authored-by: Wun0 <f20191221@hyderabad.bits-pilani.ac.in >
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-03 10:45:35 -07:00
Parag Ekbote
15ac2b6ac5
Update Model Card for ModernBERT ( #37052 )
...
* Modify Model Card for ModernBERT.
* Update as per code review.
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update model card.
* Update model card.
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-03 10:14:02 -07:00
Abhishek Ranjan
b552708694
chore: Update model doc for code_llama ( #37115 )
...
* Update code_llama.md
aims to handle https://github.com/huggingface/transformers/issues/36979#issuecomment-2758560598
sub part of https://github.com/huggingface/transformers/issues/36979
* Update docs/source/en/model_doc/code_llama.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/code_llama.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/code_llama.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* make changes as per code review
* chore: make the function smaller for attention mask visualizer
* chore[docs]: update code_llama.md with some more suggested changes
* Update docs/source/en/model_doc/code_llama.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* chore[docs] : Update code_llama.md with indentation changes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-03 10:09:41 -07:00
Bimal Gajera
2b84831a93
Update model card for Cohere ( #37056 )
...
* Update Cohere model card to follow standard template
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update cohere.md
Update code snippet for AutoModel, quantization, and transformers-cli
* Update cohere.md
* Update docs/source/en/model_doc/cohere.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-03 09:51:40 -07:00
Matt
2d46a08b63
Purge unused ModelTester code ( #37085 )
...
* Purge correctly this time
* Remove more methods from recent PRs
* make fixup
2025-04-03 17:48:35 +01:00
Avigyan Sinha
1b29409d89
feat: updated model card for qwen_2.5_vl ( #37099 )
...
* feat: updated model card for qwen_2.5_vl
* applied suggested change 1
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* applied suggested change 2
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* applied suggested change 3
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* fix: made requested changes for quantization and notes
* suggeested model card change 4
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* updated model card wiht suggested change 5
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* updated model card wiht suggested change 6
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* updated model card wiht suggested change 7
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* feat: applied requested changes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-04-03 09:13:26 -07:00
cyyever
8a828a747e
Add Optional to types ( #37163 )
...
Signed-off-by: cyy <cyyever@outlook.com >
2025-04-03 16:38:01 +01:00
Ryan Mullins
3f6af96732
Adding links to ShieldGemma 2 technical report ( #37247 )
2025-04-03 16:26:29 +01:00
Joao Gante
9a1c1fe7ed
[CI] green llama tests ( #37244 )
...
* green llama tests
* use cleanup instead
* better test comment; cleanup upgrade
* better test comment; cleanup upgrade
2025-04-03 14:15:53 +01:00
Matt
782d7d945d
Allow flexible generation params arg when checking pipeline specs ( #37211 )
...
* Allow flexible generation params arg
* Trigger tests
* Add docstring and rename js_generate to hub_generate
2025-04-03 13:29:36 +01:00
Jaime Fraustro
afafb84b59
Add support for fast image processing in image-pretraining example ( #37021 )
...
* Add support for fast image processing in image-pretraining example
Fix typo: correct tuple formatting in IMAGE_PROCESSOR_MAPPING_NAMES
Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com >
* Use fast image processor by default
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com >
---------
Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com >
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com >
2025-04-03 13:26:46 +01:00
Matt
34ccfebf32
Fix AST parsing when looking for remote code imports ( #37245 )
...
* Not all Call.func nodes have id because they can be methods
* Trigger tests
* Trigger tests
2025-04-03 13:00:51 +01:00
Yao Matrix
f697b3f824
enable 2 types of case on XPU ( #37198 )
...
enable 2 types of case on XPU 1. test_resize_tokens_embeddings_with_deepspeed_multi_gpu 2. test_resize_embeddings_untied_with_deepspeed_multi_gpu
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
2025-04-03 11:37:55 +02:00
Joao Gante
2099287a59
[CI] lazy loading external datasets ( #37218 )
2025-04-03 09:57:45 +01:00
Fanli Lin
a0803a9555
[tests] fix mamba integration simple inference precision issue ( #37193 )
...
* fix precision issue
* use float32
2025-04-03 10:38:03 +02:00
Cyril Vallez
6ce238fe7a
Fix test ( #37213 )
...
* Update test_modeling_common.py
* style
2025-04-03 10:24:34 +02:00