Lukas Geiger
e508965df7
Cleanup BatchFeature and BatchEncoding ( #38459 )
...
* Use dict comprehension to create dict
* Fix type annotation
Union[Any] doesn't really make any sense
* Remove methods that are already implemented in the `UserDict` parent
class
2025-05-29 14:13:43 +00:00
Rahul
8e5cefcb1e
Fix TypeError in save_pretrained error handling ( fixes #38422 ) ( #38449 )
2025-05-29 13:58:16 +00:00
Raushan Turganbay
ad9dd3d17b
🔴 [VLM] modeling updates ( #38317 )
...
* updates
* fixup
* fix tests
* fix test
* fix
* let it be here for now, till monday
* two more fixes
* persimmon
* fixup
* fix
* fixup
* make sure fuyu runs now that LM has new attn API
* fixup + tests
* qwen vl uses new mask interface as well
* qwen image features format
* update
* remove image_sizes
* address comments
* i am dumb...
2025-05-29 11:08:23 +00:00
Yaswanth Gali
a6f7acb603
[Tests] Clean up test cases for few models ( #38315 )
...
* Update tests
* revert aria change
* too slow hence revert
2025-05-29 08:21:28 +00:00
Luc Georges
8010f3cf61
feat: add cache retention for requests ( #38446 )
...
* feat: add cache retention for requests
* fix: propagate `manual_eviction` param & refactor `finish_request`
`finish_request` now only takes `request_id: str` as an input rather
than the full `RequestState`, which was not needed and simplifies
calling from `ContinuousBatchingManager::evict_request_from_cache`
* refactor: pop req from `active_requests`
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-28 18:15:10 +00:00
Yih-Dar
66da700145
Fix GLM4 checkpoints ( #38412 )
...
* fix
* fix
* fix
* fix
* fix
* fix
* test style bot
* Apply style fixes
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-28 16:40:08 +00:00
Avasam
2872e8bac5
Merge type hints from microsoft/python-type-stubs (post dropping support for Python 3.8) ( #38335 )
...
* Merge type hints from microsoft/python-type-stubs (post Python 3.8)
* Remove mention of pylance
* Resolved conflict
* Merge type hints from microsoft/python-type-stubs (post Python 3.8)
* Remove mention of pylance
* Resolved conflict
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: Avasam <samuel.06@hotmail.com >
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com >
2025-05-28 16:21:40 +00:00
Yuanzhou Cai
942c60956f
Model card for mobilenet v1 and v2 ( #37948 )
...
* doc: #36979
* doc: update hfoptions
* add model checkpoints links
* add model checkpoints links
* update example output
* update style #36979
* add pipeline tags
* improve comments
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* apply suggested changes
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-05-28 09:20:19 -07:00
Jiwook Han
9a8510572b
Updated the model card for ViTMAE ( #38302 )
...
* Update vit_mae.md
* badge float:right
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/vit_mae.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update model_doc/vit_mae.md
* fix
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-05-28 09:19:43 -07:00
Vanshu
c9fcbd5bf9
Updated the Model docs - for the ALIGN model ( #38072 )
...
* Updated the Model docs - for the ALIGN model
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Updated align.md
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/align.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update align.md
* fix
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-05-28 09:19:09 -07:00
Yoni Gozlan
cba94e9272
Fix handling of slow/fast image processors in image_processing_auto.py ( #38161 )
...
Fix wrong error when torchvision is not installed
2025-05-28 16:00:23 +00:00
Yoni Gozlan
21b10d9aa4
Fix from_args_and_dict ProcessorMixin ( #38296 )
...
* fix-from-args-and-dict-processormixin
* change used_kwargs to valid_kwargs
* remove manual valid_kwargs
* fix copies
* fix modular aria
2025-05-28 11:46:33 -04:00
Matt
f844733568
Fix MoE gradient test ( #38438 )
2025-05-28 16:44:20 +01:00
Matt
0ed6f7e6b4
Remove redundant test_sdpa_equivalence test ( #38436 )
...
* Remove redundant test
* make fixup
2025-05-28 17:22:25 +02:00
Yih-Dar
51e0fac29f
Trigger doc-builder job after style bot ( #38398 )
...
* update
* update
* update
* update
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-05-28 17:15:34 +02:00
Yoni Gozlan
c24d18bbae
Fix convert weights for InternVL ( #38233 )
...
Fix internvl convert weights
2025-05-28 11:14:56 -04:00
Matthew Ngan
8850427242
Fix typo in tokenization_utils_base.py docstring ( #38418 )
...
Fix typo in tokenization_utils_base.py
2025-05-28 14:52:10 +00:00
Peter St. John
bab40c6838
[core] support tensor-valued _extra_state values in from_pretrained ( #38155 )
...
Support tensor-valued _extra_state values
TransformerEngine uses the pytorch get/set_extra_state API to store FP8
layer config information as bytes Tensor in the _extra_state entry in
the state dict. With recent changes to from_pretrained, this
functionality has broken and loading a model that uses this API doesn't
appear to work. This PR fixes the save/load pretrained functions for
extra state entries that use a pytorch tensor, and adds a (currently
x-failing) test for a dictionary extra state.
Signed-off-by: Peter St. John <pstjohn@nvidia.com >
2025-05-28 15:38:42 +02:00
Anton Vlasjuk
badc71b9f6
🔴 [Attention] Attention refactor for Whisper-based models ( #38235 )
...
* start refactoring whisper
* revert for now
* first step
* carry over attn fixes
* check if this works
* whisper has an off by one somewhere - cutting mask in any interface
* make it based on interface
* remove some tests that were skipped but now work
* some fixes for whisper tests
* interface changes
* change the order of fix
* some attention adjustments for eager + TP
* fix scaling
* mask changes
* why does whisper contain those extra seq lens?
* fix from config for fa2 as input_ids is invalid
* fix another test
* another fix
* disable flex attn due to compile issues
* copies and refactor for qwen audio since it somewhat relies on whisper
* fix scaling and smaller things
* retrigger
* new new interface version + more fixups
* adjust qwen
* add comment
* forgot this one
* change copies as whisper cuts on the mask
* add guard
* add flex attention
* switch to new mask function + add skips for torchscript
* remove old api with cache position
* last changes?
* trigger ci
2025-05-28 13:32:38 +02:00
JJJYmmm
565a0052ed
make Llama4TextMoe forward more readable ( #37529 )
...
* update forward of Llama4TextMoe
* remove redudant transpose
* fix formatting
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
2025-05-28 11:54:45 +02:00
Yih-Dar
defeb04299
Fix CircleCI not triggered when PR is opened from a branch of huggingface/transformers ( #38413 )
...
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-05-28 11:25:43 +02:00
Cyril Vallez
593276fe1e
Update error when using additional and/or masks ( #38429 )
...
update error
2025-05-28 11:08:49 +02:00
ivarflakstad
3aab6e95cb
Disable mi210 scheduled CI ( #38411 )
2025-05-28 10:35:41 +02:00
Yao Matrix
fb82a98717
enable large_gpu and torchao cases on XPU ( #38355 )
...
* cohere2 done
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* enable torchao cases on XPU
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* fix
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* fix
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* fix
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* rename
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* fix
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* fix comments
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
2025-05-28 10:30:16 +02:00
Yih-Dar
cea254c909
Update CsmForConditionalGenerationIntegrationTest ( #38424 )
...
* require_read_token
* ruff
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-05-28 10:20:43 +02:00
Raushan Turganbay
baddbdd24b
[qwen-vl] Look for vocab size in text config ( #38372 )
...
fix qwen
2025-05-28 09:32:26 +02:00
Koki Ryu
a974e3b4e1
Fix an error in verify_tp_plan for keys without '.' ( #38420 )
2025-05-28 09:30:43 +02:00
ivarflakstad
b1eae943a2
Change slack channel for mi250 CI ( #38410 )
2025-05-28 09:20:34 +02:00
ivarflakstad
5f49e180a6
Add mi300 to amd daily ci workflows definition ( #38415 )
2025-05-28 09:17:41 +02:00
Andy Vu
3b3ebcec40
Updated model card for OLMo2 ( #38394 )
...
* Updated OLMo2 model card
* added command line
* Add suggestions
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Added suggestions
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Indented code block as per suggestions
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-05-27 16:24:36 -07:00
Yoni Gozlan
f5307272f5
Falcon-H1 - Fix auto_docstring and add can_return_tuple decorator ( #38260 )
...
Fix auto_docstring and add can_return_tuple
2025-05-27 16:18:05 -04:00
Tanuj Rai
a092f6babf
Update granite.md ( #37791 )
...
* Update granite.md
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* update granite.md
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/granite.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* minor fixes
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-05-27 12:55:15 -07:00
RogerSinghChugh
be7aa3210b
New bart model card ( #37858 )
...
* Modified BART documentation wrt to issue #36979 .
* Modified BART documentation wrt to issue #36979 .
* fixed a typo.
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* blank commit.
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-05-27 11:51:41 -07:00
RogerSinghChugh
587c1b0ed1
Updated BERTweet model card. ( #37981 )
...
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* updated toctree (EN).
* Updated BERTweet model card.
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/bertweet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* updated toctree (EN).
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-05-27 11:51:22 -07:00
RogerSinghChugh
b73faef52f
Updated BigBird Model card as per #36979 . ( #37959 )
...
* Updated BigBird Model card as per #36979 .
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* Update docs/source/en/model_doc/big_bird.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-05-27 11:24:28 -07:00
Madhav Kumar
538e847c06
Updated Zoedepth model card ( #37898 )
...
* Edited zoedepth model card according to specifications.
* Edited Zoedepth model file
* made suggested changes.
2025-05-27 10:06:53 -07:00
Parag Ekbote
4f7b0ff8d1
Update Model Card for Mamba-2 ( #37951 )
...
* update model page.
* update model page.
* Update docs/source/en/model_doc/mamba2.md
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com >
* update the model page.
* update.
* Apply suggestions from code review
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com >
* Apply the suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* add an quantization example and update the toctree.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* remove the additional comma
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com >
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-05-27 10:06:39 -07:00
Cory Cornelius
9c50576860
[mllama] Allow pixel_values with inputs_embeds ( #38334 )
...
* Allow pixel_values and inputs_embeds at the same time
* remove unnecessary overwritten tests
2025-05-27 16:33:56 +00:00
Joao Gante
0f5a8243c4
[tests] remove overload for deleted test (test_offloaded_cache_implementation) ( #37896 )
...
* remove overload for deleted tests
* make fixup
2025-05-27 16:45:15 +01:00
Joao Gante
f85fd90407
[cleanup] delete deprecated kwargs in qwen2_audio 🧹 ( #38404 )
...
delete deprecated
2025-05-27 16:08:53 +01:00
eustlb
b9f8f863d9
[CSM] update model id ( #38211 )
...
* update model id
* codec_model eval
* add processor img
* use ungated repo for processor tests
2025-05-27 17:03:55 +02:00
ivarflakstad
07dd6b2495
Add report_repo_id to mi300 workflow ( #38401 )
2025-05-27 16:35:07 +02:00
eustlb
3142bd8592
[CSM] infer codec model with no_grad + audio eos label ( #38215 )
...
* infer codec model with no_grad
* codec_model eval
* training labels: add audio eos token
2025-05-27 14:10:17 +00:00
Ye Liu
10ae443ec0
Fix Qwen2.5-VL Video Processor ( #38366 )
...
* Update processing_qwen2_5_vl.py
* Update processing_qwen2_5_vl.py
* Update modular_qwen2_5_vl.py
* Fix CI
* Update modular_qwen2_5_vl.py
* Update processing_qwen2_5_vl.py
* Update video_processing_utils.py
2025-05-27 13:46:37 +02:00
Joao Gante
80902ae9b1
[chat] use the checkpoint's generation_config.json as base parameterization ( #38330 )
...
* use model gen config
* unwanted diff
2025-05-27 10:35:33 +00:00
hoshi-hiyouga
008e0d87c5
Fix convert to original state dict for VLMs ( #38385 )
...
* fix convert to original state dict
* fix
* lint
* Update modeling_utils.py
2025-05-27 10:27:59 +00:00
Joao Gante
c769483188
[chat] improvements for thinking models and reduce default verbosity ( #38322 )
...
misc improvements
2025-05-27 10:20:58 +00:00
Marc Sun
55f2333366
guard size mismatch check to only quantized models ( #38397 )
...
fix
2025-05-27 11:45:03 +02:00
Raushan Turganbay
1a5be2f5c0
[aya vision] fix processor for vLLM ( #38371 )
...
accidentally merged two PRs in one (;-_-)
2025-05-27 09:43:53 +00:00
Raushan Turganbay
19fdb75cf0
[video utils] group and reorder by number of frames ( #38374 )
...
fix
2025-05-27 11:32:33 +02:00