* Add Fast Segformer Processor
* Modified the params according to segformer model
* modified test_image_processing_Segformer_fast args
- removed redundant params like do_center_crop,center_crop which aren't present in the original segformer class
* added segmentation_maps processing logic form the slow segformer processing module with references from beitimageprocessing fast
* fixed code_quality
* added recommended fixes and tests to make sure everything processess smoothly
* Fixed SegmentationMapsLogic
- modified the preprocessing of segmentation maps to use tensors
- added batch support
* fixed some mismatched files
* modified the tolerance for tests
* use modular
* fix ci
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* feat: superpoint fast image processor
* fix: reran fast cli command to generate fast config
* feat: updated test cases
* fix: removed old model add
* fix: format fix
* Update src/transformers/models/superpoint/image_processing_superpoint_fast.py
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* fix: ported to torch and made requested changes
* fix: removed changes to init
* fix: init fix
* fix: init format fix
* fixed testcases and ported to torch
* fix: format fixes
* failed
test case fix
* fix superpoint fast
* fix docstring
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* Add missing cache_position argument.
* Pass cache_position to language model.
* Overwrite prepare_inputs_for_generation.
* Set model to half precision for Flash Attention test.
* Cast model to bfloat16.
* add tests for helpers
* duplicate test for each model
* why llava next video has no helper
* oops must have been in the commit
* fix test after rebase
* add copy from
* support `typing.Literal` as type of tool parameters
* validate the `args` of `typing.Literal` roughly
* add test to get json schema for `typing.Literal` type hint
* fix: add `"type"` attribute to the parsed result of `typing.Literal`
* test: add argument `booleanish` to test multi-type literal
* style: auto fixup
* EP + updates
Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>
* remove unrelated change
* not working yet but let's see where it goes!
* update the api a bit
* udpate
* where I am at for now
* fix ep
* refactor the API
* yups
* fix
* fixup
* clean modeling
* just support llama4 for now!
* properly avoid
* fix
* nits
* Update src/transformers/models/llama4/modeling_llama4.py
* Update src/transformers/integrations/tensor_parallel.py
* style
* ,,,,
* update
---------
Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>
* upload initial code
* update deepseek-vl adaptor
* update hierarchy of vision model classes
* udpate aligner model
* add text model
* Added Image Processor
* Added Image Processor
* Added Image Processor
* apply masks
* remove projection; add aligner
* remove interpolate_pos_encoding
* remove unused params in config
* cleaning
* Add the __init__ file
* added processing deepseek_vl class
* modified the deepseek-vl processor
* modified the deepseek-vl processor
* update __init__
* Update the image processor class name
* Added Deepseek to src/transformers/__init__.py file
* Added Deepseek to image_processing_auto.py
* update the __init__ file
* update deepseek_vl image processor
* Update Deepseek Processor
* upload fast image processor
* Revert "upload fast image processor"
This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4.
* update image processor
* flatten heirarchy
* remove DeepseekVLModel
* major update (complete modeling)
* auto modeling and other files
* formatting
* fix quality
* replace torchvision in modeling
* set default do_normalize to False
* add fast image processor template using tool
* update image processors
* add fast image processor to other files
* update liscense
* Added deepseek image testcases
* update image test
* update processor
* write CHAT_TEMPLATE
* update model for processor
* fix processor
* minor fixes and formatting
* fix image processing and tests
* fix interpolation in sam
* fix output_attentions in DeepseekVLModel
* upload test_modeling
* fix tests because of vocab size
* set use_high_res_vision=False in tests
* fix all modeling tests
* fix styling
* remove explicit background_color from image processors
* added test_processor
* added test_processor
* fix processor tests
* update docs
* update docs
* update docs
* update conversion script
* Fixed typos
* minor fixes from review
- remove model_id comments in examples
- remove from pre-trained auto mapping
- move to image-text-to-text from vision-to-seq in auto mapping
- add image_token_index to __init__ for config
- remove outdated temporary config in conversion script
- update example to use chat_template in docstring example
- update liscense 2021->2025
* fix type in config docstring
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* update get_image_features
* fix config
* improve DeepseekVLImageProcessor.preprocess
* return image_hidden_states
* use AutoTokenizer and AutoImageProcessor in Processor
* fix model outputs
* make num_image_tokens configurable
* fix docstring of processor
* move system prompt to chat template
* fix repo consistency
* fix return_dict
* replace SamVisionEncoder with SamVisionModel
* update to remove deepcopy
* 🛠️ Major Architectural Changes (Adds DeepseekVLHybrid)
* fix quality checks
* add missing hybrid in auto modeling
* run make style
* update sam_hq
* update high_res_size in test
* update docs following #36979
* update code with auto_docstring
* update conversion scripts
* fix style
* fix failing test because of tuple
* set weights_only=True in conversion script
* use safetensors.torch.load_file instead of torch.load in conversion script
* make output_dir optional in conversion script
* fix code snippets in docs (now the examples work fine)
* integration tests for DeepseekVL
* update expected texts
* make style
* integration tests for DeepseekVLHybrid
* fix class name
* update expected texts for hybrid
* run "make style"
* update since changes in main
* run make-style
* nits since changes in main
* undo changes in sam
* fix tests
* fix tests; update with main
* update with main: output_attention/output_hidden_states
* fix copied part in deepseek_vl
* run fix-copies
* fix output_hidden_states
* sam: fix _init_weigths
* use modular for DeepseekVL
* make image processor more modular
* modular: use JanusPreTrainedModel
* janus: provide kwargs in loss
* update processors in conversion script
* Revert "sam: fix _init_weigths"
This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27.
* run fix-copies
---------
Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* init
* Force qwen2VL image proc to fast
* refactor qwen2 vl fast
* fix copies
* Update after PR review and update tests to use return_tensors="pt"
* fix processor tests
* add BC for min pixels/max pixels
* fix most tests
* skip a few more tests
* address comments
* fix chameleon tests
* forgot to uncomment
* qwen has its own tests with images, rename it as well
* add owlv2 fast image processor
* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class
* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class
* change references to owlVit to owlv2 in docstrings for post process methods
* change type hints from List, Dict, Tuple to list, dict, tuple
* remove unused typing imports
* add disable grouping argument to group images by shape
* run make quality and repo-consistency
* use modular
* fix auto_docstring
---------
Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>