* add tests for helpers
* duplicate test for each model
* why llava next video has no helper
* oops must have been in the commit
* fix test after rebase
* add copy from
* support `typing.Literal` as type of tool parameters
* validate the `args` of `typing.Literal` roughly
* add test to get json schema for `typing.Literal` type hint
* fix: add `"type"` attribute to the parsed result of `typing.Literal`
* test: add argument `booleanish` to test multi-type literal
* style: auto fixup
* EP + updates
Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>
* remove unrelated change
* not working yet but let's see where it goes!
* update the api a bit
* udpate
* where I am at for now
* fix ep
* refactor the API
* yups
* fix
* fixup
* clean modeling
* just support llama4 for now!
* properly avoid
* fix
* nits
* Update src/transformers/models/llama4/modeling_llama4.py
* Update src/transformers/integrations/tensor_parallel.py
* style
* ,,,,
* update
---------
Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>
* upload initial code
* update deepseek-vl adaptor
* update hierarchy of vision model classes
* udpate aligner model
* add text model
* Added Image Processor
* Added Image Processor
* Added Image Processor
* apply masks
* remove projection; add aligner
* remove interpolate_pos_encoding
* remove unused params in config
* cleaning
* Add the __init__ file
* added processing deepseek_vl class
* modified the deepseek-vl processor
* modified the deepseek-vl processor
* update __init__
* Update the image processor class name
* Added Deepseek to src/transformers/__init__.py file
* Added Deepseek to image_processing_auto.py
* update the __init__ file
* update deepseek_vl image processor
* Update Deepseek Processor
* upload fast image processor
* Revert "upload fast image processor"
This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4.
* update image processor
* flatten heirarchy
* remove DeepseekVLModel
* major update (complete modeling)
* auto modeling and other files
* formatting
* fix quality
* replace torchvision in modeling
* set default do_normalize to False
* add fast image processor template using tool
* update image processors
* add fast image processor to other files
* update liscense
* Added deepseek image testcases
* update image test
* update processor
* write CHAT_TEMPLATE
* update model for processor
* fix processor
* minor fixes and formatting
* fix image processing and tests
* fix interpolation in sam
* fix output_attentions in DeepseekVLModel
* upload test_modeling
* fix tests because of vocab size
* set use_high_res_vision=False in tests
* fix all modeling tests
* fix styling
* remove explicit background_color from image processors
* added test_processor
* added test_processor
* fix processor tests
* update docs
* update docs
* update docs
* update conversion script
* Fixed typos
* minor fixes from review
- remove model_id comments in examples
- remove from pre-trained auto mapping
- move to image-text-to-text from vision-to-seq in auto mapping
- add image_token_index to __init__ for config
- remove outdated temporary config in conversion script
- update example to use chat_template in docstring example
- update liscense 2021->2025
* fix type in config docstring
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* update get_image_features
* fix config
* improve DeepseekVLImageProcessor.preprocess
* return image_hidden_states
* use AutoTokenizer and AutoImageProcessor in Processor
* fix model outputs
* make num_image_tokens configurable
* fix docstring of processor
* move system prompt to chat template
* fix repo consistency
* fix return_dict
* replace SamVisionEncoder with SamVisionModel
* update to remove deepcopy
* 🛠️ Major Architectural Changes (Adds DeepseekVLHybrid)
* fix quality checks
* add missing hybrid in auto modeling
* run make style
* update sam_hq
* update high_res_size in test
* update docs following #36979
* update code with auto_docstring
* update conversion scripts
* fix style
* fix failing test because of tuple
* set weights_only=True in conversion script
* use safetensors.torch.load_file instead of torch.load in conversion script
* make output_dir optional in conversion script
* fix code snippets in docs (now the examples work fine)
* integration tests for DeepseekVL
* update expected texts
* make style
* integration tests for DeepseekVLHybrid
* fix class name
* update expected texts for hybrid
* run "make style"
* update since changes in main
* run make-style
* nits since changes in main
* undo changes in sam
* fix tests
* fix tests; update with main
* update with main: output_attention/output_hidden_states
* fix copied part in deepseek_vl
* run fix-copies
* fix output_hidden_states
* sam: fix _init_weigths
* use modular for DeepseekVL
* make image processor more modular
* modular: use JanusPreTrainedModel
* janus: provide kwargs in loss
* update processors in conversion script
* Revert "sam: fix _init_weigths"
This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27.
* run fix-copies
---------
Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* init
* Force qwen2VL image proc to fast
* refactor qwen2 vl fast
* fix copies
* Update after PR review and update tests to use return_tensors="pt"
* fix processor tests
* add BC for min pixels/max pixels
* fix most tests
* skip a few more tests
* address comments
* fix chameleon tests
* forgot to uncomment
* qwen has its own tests with images, rename it as well
* add owlv2 fast image processor
* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class
* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class
* change references to owlVit to owlv2 in docstrings for post process methods
* change type hints from List, Dict, Tuple to list, dict, tuple
* remove unused typing imports
* add disable grouping argument to group images by shape
* run make quality and repo-consistency
* use modular
* fix auto_docstring
---------
Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* use partial to wrap around `transformers` utils!
* try to refactor?
* revert one wrong change
* just a nit
* push
* reverter watever was wrong!
* some nits
* fixes when there is no attention mask
* bring the licence back
* some fixes
* nit
* style
* remove prints
* correct dtype
* fa flags for testing
* update
* use paged attention if requested!
* updates
* a clone was needed, not sure why
* automatically create cu seq lens when input is flash, this at least makes sure layers don't re-compute
* simplify and improve?
* flash attention is kinda broken on recent cuda version so allow the opportunity to use something else
* fix!
* protect kernels import
* update
* properly parse generation config being passed
* revert and update
* add two tests
* some fixes
* fix test FA2
* takes comment into account
* fixup
* revert changes
* revert the clone, it is only needed because the metal kernel is not doing it?
* [docs] update attention implementation and cache docs (#39547)
* update docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* applu suggestions
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix mps on our side for now
* Update src/transformers/integrations/flash_paged.py
* no qa
---------
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* feat: add support for gradient checkpointing in TimmWrapperModel and TimmWrapperForImageClassification
* ruff fix
* refactor + add test for not supported model
* ruff
* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* initial commit
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix: various typos, typehints, refactors from suggestions
* fix: fine_matching method
* Added EfficientLoFTRModel and AutoModelForKeypointMatching class
* fix: got rid of compilation breaking instructions
* docs: added todo for plot
* fix: used correct hub repo
* docs: added comments
* fix: run modular
* doc: added PyTorch badge
* fix: model repo typo in config
* fix: make modular
* fix: removed mask values from outputs
* feat: added plot_keypoint_matching to EfficientLoFTRImageProcessor
* feat: added SuperGlueForKeypointMatching to AutoModelForKeypointMatching list
* fix: reformat
* refactor: renamed aggregation_sizes config parameter into q, kv aggregation kernel size and stride
* doc: added q, kv aggregation kernel size and stride doc to config
* refactor: converted efficientloftr implementation from modular to copied from mechanism
* tests: overwrote batching_equivalence for "keypoints" specific tests
* fix: changed EfficientLoFTRConfig import in test_modeling_rope_utils
* fix: make fix-copies
* fix: make style
* fix: update rope function to make meta tests pass
* fix: rename plot_keypoint_matching to visualize_output for clarity
* refactor: optimize image pair processing by removing redundant target size calculations
* feat: add EfficientLoFTRImageProcessor to image processor mapping
* refactor: removed logger and updated attention forward
* refactor: added auto_docstring and can_return_tuple decorators
* refactor: update type imports
* refactor: update type hints from List/Dict to list/dict for consistency
* refactor: update MODEL_MAPPING_NAMES and __all__ to include LightGlue and AutoModelForKeypointMatching
* fix: change type hint for size parameter in EfficientLoFTRImageProcessor to Optional[dict]
* fix typing
* fix some typing issues
* nit
* a few more typehint fixes
* Remove output_attentions and output_hidden_states from modeling code
* else -> elif to support efficientloftr
* nit
* tests: added EfficientLoFTR image processor tests
* refactor: reorder functions
* chore: update copyright year in EfficientLoFTR test file
* Use default rope
* Add docs
* Update visualization method
* fix doc order
* remove 2d rope test
* Update src/transformers/models/efficientloftr/modeling_efficientloftr.py
* fix docs
* Update src/transformers/models/efficientloftr/image_processing_efficientloftr.py
* update gradient
* refactor: removed unused codepath
* Add motivation to keep postprocessing in modeling code
* refactor: removed unnecessary variable declarations
* docs: use load_image from image_utils
* refactor: moved stage in and out channels computation to configuration
* refactor: set an intermediate_size parameter to be more explicit
* refactor: removed all mentions of attention masks as they are not used
* refactor: moved position_embeddings to be computed once in the model instead of every layer
* refactor: removed unnecessary hidden expansion parameter from config
* refactor: removed completely hidden expansions
* refactor: removed position embeddings slice function
* tests: fixed broken tests because of previous commit
* fix is_grayscale typehint
* not refactoring
* not renaming
* move h/w to embeddings class
* Precompute embeddings in init
* fix: replaced cuda device in convert script to accelerate device
* fix: replaced stevenbucaille repo to zju-community
* Remove accelerator.device from conversion script
* refactor: moved parameter computation in configuration instead of figuring it out when instantiating a Module
* fix: removed unused attributes in configuration
* fix: missing self
* fix: refactoring and tests
* fix: make style
---------
Co-authored-by: steven <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* init
* copied from remote
* add proper structure and llama like structure
* fixup
* revert to state that works
* get closer to llama
* slow and steady
* some removal
* masks work
* it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm
* nice
* getting closer
* closer to transformers style
* let's simplify this, batching works now
* simplified
* working version with modular
* it is indeed the rotation per weights, make it complete llama style
* cleanup conversion, next to look at -> tokenizer
* remove llama artefacts
* fix modeling tests (common ones)
* style
* integration test + first look into tokenization (will need more work, focussing on modeling other models first)
* style
* working moe version, based on remote
* lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view)
* more cleanup
* refactor namings and remove addition forXXX classes
* our moe won't cut it it seems, correction bias seems to be missing in remote code version
* tokenization change (remote)
* our moe version works when adding normalization :D
* cleanup moe
* nits
* cleanup modeling -> let's get to modular next
* style
* modular v1
* minor things + attempt at conversion (which doesn't work)
* no conversion follow glm, fixup modular and other nits
* modular cleanup
* fixes
* tests, tests, tests + some moe dtype forcing
* simplify modular, fix fatal fa2 bug, remaining tests
* fix import issue?
* some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float
* fix sdpa test, load on init dtype only
* fixup post merge
* style
* fix doc links
* tokenization cleanup beginnings
* simplify tokenizer by a lot as its basically llama
* tokenizer is full llama with different defaults + extra special tokens
* sync og special tokens of ernie
* fix decoding with numbers (also in remote done what a timing), begin of tok tests
* align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama)
* nits
* docs
* my daily post merge it is
* check
* tokenization update with explanations and conversion script
* review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio)
* post merge fixes
* fixup tokenization, llama fast is the way to go
* more fixups
* check
* import fixes
* correction bias following the paddle code
* fix
* fix TP plan, fix correction bias sharding during forward
* style
* whoops
* fix tied weights
* docs and last nit
* license
* flasky tests
* move repo id, update when merged on the hub
* simplify common get/set
* remove some noise
* change some 5 years old modeling utils
* update examples
* fix copies
* revert some changes
* fixes, gah
* format
* move to Mixin
* remove smolvlm specific require grad
* skip
* force defaults
* remodularise some stuff
* remodularise more stuff
* add safety for audio models
* style
* have a correct fallback, you daft donkey
* remove this argh
* change heuristic for audio models
* fixup
* revert
* this works
* revert again
* 🧠
* aaah ESM has two modelings aaah
* add informative but short comment
* add `input_embed_layer` mixin attribute
* style
* walrus has low precedence
* modular fix
* this was breaking parser