youngrok cha
acded47fe7
[llava] one pixel is missing from padding when length is odd ( #37819 )
...
* [fix] one pixel should be added when length is odd
* [fix] add vision_aspect_ratio args & typo
* [fix] style
* [fix] do not fix fast file directly
* [fix] convert using modular
* remove duplicate codes
* match unpad logic with pad logic
* test odd-sized images for llava & aria
* test unpad odd-sized padding for llava family
* fix style
* add kwarg to onvision modular
* move vision_aspect_ratio from image_processor to processor
(llava_onevision)
2025-05-06 13:11:26 +02:00
Joao Gante
1b222903c3
[tests] Test all cache implementations ( #37873 )
2025-04-30 15:37:00 +01:00
Yuanyuan Chen
da4ff2a5f5
Add Optional to remaining types ( #37808 )
...
More Optional typing
Signed-off-by: cyy <cyyever@outlook.com >
2025-04-28 14:20:45 +01:00
Raushan Turganbay
79d4bc761d
[causal mask] fix preparation with multi-gpu ( #37612 )
...
* fix multi-gpu
* forgot non-copied models
* fixup
2025-04-25 09:34:18 +02:00
Pavel Iakubovskii
9167fadab9
Introduce GradientCheckpointingLayer ( #37223 )
...
* GradientCheckpointingLayer
* trigger
* Move GC layer to a separate file
* Update import
* Expose and document GC layer
* Fix dummy
* Apply to llama-based models
* Update modulars
* Update a few more models for consistency
* Update glm4
* Update Janus
2025-04-22 11:33:31 +01:00
Raushan Turganbay
2ba6b92a6f
[VLMs] use only xxx_token_id for multimodal tokens ( #37573 )
...
* use only `xxx_token_id` for multimodal tokens
* update modeling files as well
* fixup
* why fixup doesn't fix modular docstring first?
* janus, need to update configs in the hub still
* last fixup
2025-04-18 17:03:39 +02:00
Raushan Turganbay
32eca7197a
[vlm] adjust max length for special tokens ( #37342 )
...
* update
* apply suggestion
* fix tests for main branch
* remove unused logger
* add special tokens in tests
* nit
* fix more tests
* fix test
* pg also
2025-04-16 20:49:20 +02:00
Cyril Vallez
8ab296501a
Remove deprecation warning for num_logits_to_keep ( #37149 )
...
* remove everything
* style
2025-04-14 19:08:45 +02:00
Rupesh K Srivastava
1efcfa9ca4
Fix mask handling for flex attention in llama/gemma2/mistral/qwen2 ( #37381 )
...
* fix BlockMask handling when using flex_attention for llama/mistral/gemma2
* fix attention_mask types
* revert type hints and fixup
* remove unnecessary assertion
2025-04-14 15:53:27 +01:00
Cyril Vallez
4e53840920
Detect and fix most _init_weights() issues - make it work for composite models ( #37070 )
...
* Update test_modeling_common.py
* Fix Llama and its modular children
* Update test_modeling_common.py
* qwen3
* first try at prioritizing models
* Update test_modeling_common.py
* Update test_modeling_common.py
* Update test_modeling_common.py
* test
* fix
* fix
* more models
* more
* more
* more
* smarter init for composite models!
* fix post rebase
* smol
* fix missing args
* more
* typo
* Super elegant and efficient init for submodels
* Update modeling_utils.py
* style
* last fixes
* cleanup
* finalize cleanup
* CIs
* improve docstring
* Update modeling_utils.py
* llama4
* style
* CIs
* style
* add dpt
* granite speech
* qwen 2.5 omni
* better fix
* Parse the config file instead
* CIs
2025-04-14 16:19:04 +02:00
Raushan Turganbay
a563999a02
[processor] clean up mulitmodal tests ( #37362 )
...
* clkea up mulitmodal processor tests
* fixup
* fix tests
* fix one last test
* forgot
2025-04-11 13:32:19 +02:00
Mohamed Mekkouri
3c39c07939
Remove triton mlp kernel, not compiling for some models ( #37449 )
...
* remove mlp for now
* disable on docker
2025-04-11 12:47:13 +02:00
duanjunwen
7ff896c0f2
[Feat] Support npu in modeling models ( #37369 )
2025-04-10 19:00:58 +02:00
Mohamed Mekkouri
0ea1151222
Llama Kernel integration ( #37092 )
...
* initial commit
* style
* update
* change approach attention
* clean up
* fix import
* update
* update
* fix style
* change method
* attention
* add mlp back
* change name
* update name
* fix copies
* fix config
* fix
2025-04-10 17:13:25 +02:00
Joao Gante
ad3d157188
[RoPE] abstract dynamic RoPE update under a decorator ✨ ( #37249 )
...
* dynamic rope decorator
* longrope; shorter fwd pass
* propper docstring
* make fixup
2025-04-04 14:27:28 +01:00
cyyever
8a828a747e
Add Optional to types ( #37163 )
...
Signed-off-by: cyy <cyyever@outlook.com >
2025-04-03 16:38:01 +01:00
cyyever
764ab0d46a
Merge tensor operations with device transfer operations ( #37097 )
...
* Merge operations with to
Signed-off-by: cyy <cyyever@outlook.com >
* Use dtype
Signed-off-by: cyy <cyyever@outlook.com >
---------
Signed-off-by: cyy <cyyever@outlook.com >
2025-04-02 14:15:23 +01:00
Pavel Iakubovskii
a1e389e637
Refactor return_dict logic to remove complicated if/else paths ( #36794 )
...
* SAM
* CLIP
* SigLIP
* GOT-OCR2 (depends on SAM)
* SigLIP2 (depends on SigLIP)
* trigger tests
* Fix SAM
* Fix missed indexing, use named attributes
* Llama
* Aria
* Bamba
* Update llama: missed outputs return type
* (fixup) Aria
* DiffLlama
* Emu3
* Gemma
* Gemma2
* Paligemma
* Fix paligemma
* Gemma3
* GLM
* Helium
* JetMoe
* Jamba
* Mistral
* Mistral
* Mixtral
* Nemotron
* Olmo
* Olmo2
* Persimmon
* Phi
* Phi3
* PhiMoe
* Qwen2
* Qwen2_moe
* StableLM
* Starcoder2
* Add return_dict decorator
* SAM
* Update decorator: compile, export, trace - friendly
* Llama (decorator)
* SAM (decorator)
* Add decorator `can_return_tuple`
* Llama
* Update to decorator
* Update CLIP
* Update decorator to store `_is_top_level_module` in self
* Update decorator to correctly handle compile/export
* Remove is_torchdynamo_compiling constraint, all work fine with self attribute assignment
* Typing
* GPT NeoX
* Fixup
* Fix attribute Granite
* Fix return type mixtral
* Update Gemma3
* Fix Cohere amd Cohere2
* Fixup
* Fix corner case for Phi4, when activation is shared
* (fix-copies) deepseekv3, phi4
* Fixup
* Apply to qwen3/qwen3_moe
* Fix
2025-03-31 16:23:37 +01:00
efsotr
2b4734bd49
Support passing flash_attn_kwargs when gradient_checkpointing is enabled ( #37037 )
...
* support passing flash_attn_kwargs when gradient_checkpointing is enabled
* make modeling_deepspeek_v3.py consistent with modular_deepseek_v3.py
2025-03-31 10:53:02 +02:00
Perry Gibson
348f3285c5
fix: Fully remove legacy cache from Llama ( #36958 )
...
* bug: fully remove legacy cache from Llama
* bug: fix CI issues
* bug: update jetmoe model
* bug: apply =check_modular_conversion.py= fix
* bug: apply make fix-copies
* bug: fix ruff
* PR suggestions
* Remove trailing commas in auto-gen files
* Trivial new line removal
2025-03-27 17:22:44 +00:00
Afanti
44715225e3
fix typos in the code comments and error messages ( #36993 )
...
* chore: enhance code comments
* chore: enhance code comments
* chore: enhance code comments
* chore: enhance code comments
* chore: enhance code comments
* chore: enhance code comments
* chore: enhance code comments
2025-03-26 16:09:48 +00:00
Arthur
fef8b7f8e9
Add attention visualization tool ( #36630 )
...
* add utils fiel
* style
* nits
* nits
* update
* updaets
* update
* fix init issues
* big updates
* nits
* nits?
* small updates
* nites
* there were still some models left
* style
* fixes
* updates
* nits _ fixes
* push changes
* update
* update
* update
* Apply suggestions from code review
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com >
* style
* styling and return a string for testing
* small updates
* always biderectional for now
* update
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com >
2025-03-19 13:58:46 +01:00
ivarflakstad
b1a51ea464
Fix AriaForConditionalGeneration flex attn test ( #36604 )
...
AriaForConditionalGeneration depends on idefics3 vision transformer which does not support flex attn
2025-03-11 11:05:49 +01:00
Arthur
d126f35427
Proper_flex ( #36643 )
...
* proper performant flex attention implementation
* wrapper for flex attention to compile only when triggered
* wrapper for flex attention to compile only when triggered
* attention mask type detection
* Update src/transformers/integrations/flex_attention.py
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com >
* nit
* nit
* nit
* nit
* gemma2 support
* add citation for torchtune
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* Update flex_attention.py
* nit
* nit
* nit
* reset gemma2 modifications
* nit
* nit
* nit
* licencing
* apply changes to other models
* safe import
---------
Co-authored-by: Sung Ching Liu <sunny19981005@outlook.com >
Co-authored-by: Sung Ching Liu <22844540+bursteratom@users.noreply.github.com >
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com >
2025-03-11 10:24:12 +01:00
Afanti
af9b2eaa54
chore: fix typos in language models ( #36586 )
...
* chore: fix typos in language models
* chore: fix typos in mistral model
* chore: fix model copy from issue
* chore: fix model copy from issue
* chore: fix model copy from issue
* chore: fix model copy from issue
* chore: fix model copy from issue
2025-03-10 15:54:49 +00:00
Arthur
1603018e7a
Update form pretrained to make TP a first class citizen ( #36335 )
...
* clean code
* oups
* fix merge
* yups
* fix if
* now you can play
* fix shape issue
* try non blocking
* fix
* updates
* up
* updates
* fix most of thetests
* update
* update
* small updates
* up
* fix the remaining bug?
* update
* rename when you read from the file
* buffer issues
* current status
* cleanup
* properly allocate dumb memory
* update a small bug
* fix colwise rep issue
* fix keep in float 32 that was keeping everything in float 32
* typo
* more fixes with keep_in_fp32_modules as we use to serach on it
* fix ROPE dtype for TP
* remove what's breaking the tests
* updates
* update and fixes
* small cleanup after merging
* allocate 2x to be safe
* style, auto
* update
* yup nit
* fix
* remove slow as fuck torch api :(
* work
* fixup
* update
* brting the fix back
* fix and update
* fixes
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* updates because some suggestions were wrong 👀
* update?
* fuck this bloated function
* typo
* fix the dumb prefix thing once and forall
* fixes here and there
* updates
* remove prints
* fix strict cases
* styel
* properly fix keys on load!
* update
* fix base model prefix issue
* style
* update
* fix all?
* remoce 1 print
* fix the final etsts
* fixup
* last nits
* fix the detach issue which cause a 2x slowdown
* fixup
* small fixes
* ultra nit
* fix
* fix
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
2025-02-26 20:12:38 +01:00
Cyril Vallez
da4ab2a1b6
Fix doc formatting in forward passes & modular ( #36243 )
...
* fix indentation issues + modular without magic keyword
* style
* Update doc.py
* style
* Fix all decorators indentation
* all models
* style
* style
* Update doc.py
* fix
* general fix
* style
2025-02-25 11:09:01 +01:00
Dmitry Rogozhkin
2440512723
multi-gpu: fix tensor device placements for various models ( #35763 )
...
* milti-gpu: fix inputs_embeds + position_embeds
Fixing the following errors in few models:
```
> hidden_states = inputs_embeds + pos_embeds
E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, xpu:2 and xpu:3!
```
Fixes : #35762
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com >
* multi-gpu: fix tensor device placements for various models
Fixes : #35762
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com >
* Apply make fix-copies
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com >
---------
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com >
2025-02-12 15:28:18 +01:00
Harry Mellor
f5fff672db
Add pipeline parallel plan to PretrainedConfig and PreTrainedModel ( #36091 )
...
* Add `base_model_pp_plan` to `PretrainedConfig`
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add `_pp_plan` to `PreTrainedModel`
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add both to Llama for testing
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Fix type error
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Update to suggested schema
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* `_pp_plan` keys are not patterns
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Simplify schema
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Fix typing error
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Update input name for Llama
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add pp plan to Aria
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add pp plan to Bamba
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add pp plan to Cohere 1 & 2
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add pp plan to diffllama and emu3
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add pp plan to Gemma 1 & 2
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add pp plan to GLM and GPT NeoX
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add pp plan to Granite and Helium
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add pp plan to Mistral and Mixtral
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add pp plan to OLMo 1 & 2
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add pp plan to Phi and Phi 3
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add pp plan for Qwen 2, 2 MoE, 2 VL and 2.5 VL
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add pp plan for Starcoder 2
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Add enum for accessing inputs and outputs
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Update type hints to use tuples
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
* Change outer list to tuple
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
---------
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-02-12 10:51:48 +01:00
Raushan Turganbay
eebd2c972c
Chat template: update for processor ( #35953 )
...
* update
* we need batched nested input to always process correctly
* update a bit
* fix copies
2025-02-10 09:52:19 +01:00
Liangliang Ma
315a9f494e
Add XPU type for work-around -inf mask causing sdpa NaN issue in modeling files ( #35647 )
...
* add xpu for unmask
* change modular for generated matching
* add lastest modeling for helium
2025-02-05 13:28:31 +01:00
Yoni Gozlan
d7188ba600
Add support for nested images to LLava and VipLLava ( #35558 )
...
* move make_flat_list_of_images and make_batched_videos to image_utils
* remove unnecessary is_vision_available
* move make_nested_list_of_images to image_utils
* fix fast pixtral image processor
* fix import mllama
* fix make_nested_list_of_images
* add tests
* convert 4d arrays/tensors to list
* add test_make_batched_videos
* add support nested batch of videos
* fix image processing qwen2vl
2025-01-30 16:49:20 -05:00
Joao Gante
ece8c42488
Test: generate with torch.compile(model.forward) as a fast test ( #34544 )
2025-01-28 14:10:38 +00:00
Cyril Vallez
d3af76df58
[Backend support] Allow num_logits_to_keep as Tensor + add flag ( #35757 )
...
* support
* Update modeling_utils.py
* style
* most models
* Other models
* fix-copies
* tests + generation utils
2025-01-23 09:47:54 +01:00
Raushan Turganbay
09d5f76274
Clean-up composite configs ( #34603 )
...
* remove manual assignment tie-word-embeddings
* remove another unused attribute
* fix tests
* fix tests
* remove unnecessary overwrites
* fix
* decoder=True
* clean pix2struct
* run-all
* forgot `_tied_weights_keys` when adding Emu3
* also Aria + fix-copies
* and clean aria
2025-01-15 10:04:07 +01:00
Cyril Vallez
cd44bdb4b8
Fix device in rope module when using dynamic updates ( #35608 )
...
fix rope device
2025-01-13 10:11:17 +01:00
Arthur
e97d7a5be5
add _supports_flex_attn = True for models that do support it ( #35598 )
...
* add `_supports_flex_attn = True`
* fix repo consistency
2025-01-09 20:03:33 +01:00
Pablo Montalvo
395b114bd1
Small fix rope kwargs ( #35589 )
...
* don't know why this keeps popping up?
* remove unused rope_kwargs
2025-01-09 15:40:36 +01:00
Cyril Vallez
965a2fb320
More model refactoring! ( #35359 )
...
* cohere
* style
* phi3
* style
* small fix
* small fix
* phi3 longrope
* oups
* Update rope (only for phi3 still)
* Update test_modeling_rope_utils.py
* Update modeling_phi3.py
* fix
* fix copies
* style
* Fix copied from bad renaming
2025-01-09 11:09:09 +01:00
Wing Lian
5e7aedebeb
make LlamaModel._update_causal_mask torch compilable ( #35187 )
...
* make LlamaModel._update_causal_mask torch compilable
* chore: lint (make fix-copies)
* fix-copies
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com >
2024-12-23 13:10:00 +01:00
Arthur
2c47618c1a
🚨 All attention refactor 🚨 ( #35235 )
...
* refactor LlamaAttention
* minimal changes
* fix llama
* update
* modular gemmas
* modular nits
* modular updates
* nits
* simplify
* gpt2
* more modualr and fixes
* granite
* modular modular modular
* nits
* update
* qwen2 + starcoder2
* mostly gemma2
* Update image_processing_auto.py
* fix
* Update modular_starcoder2.py
* fix
* remove all copied from attentions
* remove gcv
* make fix-copies
* oups
* oups2.0
* fix some modulars + all copied from
* should be good now
* revert unwanted changes
* Update modeling_decision_transformer.py
* finish cleanup
* Update modeling_olmo.py
* consistency
* re-add gradient checkpointing attribute
* fix
* style
* make config necessary
* bis
* bis
* Update modeling_my_new_model2.py
* is_causal attr
* fix
* remove past kv return from decoder layer
* fix
* default rope config
* correctly fix rope config
* fix bias
* fix gpt2 attention output
* fix test
* fix inits
* fix default sdpa
* fix default sdpa implementation
* harmonize classes
* fix mistral
* fix sliding window models
* mixtral
* be more explicit
* style
* fix
* several fixes
* Update modeling_dbrx.py
* fix test
* olmo + phi
* rotary
* syle
* phi
* phi again
* again
* kwargs
* Update test_modeling_common.py
* skip fx tracing tests
* Update modeling_utils.py
* gemma 2
* again
* Update modeling_recurrent_gemma.py
* gemma2
* granite
* style
* starcoder
* Update sdpa_attention.py
* switch args
* Update modeling_mllama.py
* fix
* cache type tests
* gpt2
* Update test_modeling_common.py
* fix
* consistency
* fix shape with encoder
* should be the last one
* tests non model
* most comments
* small oupsi
* be more explicit in modulars
* more explicit modulars
* CIs! it works locally
* add kwargs to _flash_attention_forward
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com >
2024-12-18 16:53:39 +01:00
Cyril Vallez
33c12e4d80
Fix CI ( #35208 )
...
fix aria
2024-12-11 14:24:52 +01:00
Aymeric Roucher
9ad4c93536
Add Aria ( #34157 )
...
* Add Aria
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com >
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
2024-12-06 12:17:34 +01:00