* added code for handling video object ,as dictionary of frames and metadata, in chat template
* added new test where videos are passed as objects (dict of frames, metadata) in the chat template
* modified hardcoded video_len check that does not match with increased number of tests cases.
* Modify hardcoded video_len check that fails with increased number of tests
* update documentation of multi-modal chat templating with extra information about including video object in chat template.
* add array handling in load_video()
* temporary test video inlcuded
* skip testing smolvlm with videos that are list of frames
* update documentation & make fixup
* Address review comments
* fix: deprecate plot_keypoint_matching and make visualize_keypoint_matching for all Keypoint Matching models
* refactor: added copied from
* fix: make style
* fix: repo consistency
* fix: make style
* docs: added missing method in SuperGlue docs
* first commit
Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface.
TODO: Some tests are failing so that needs to be fixed.
* fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model
* cleaned up a hack in the conversion script
* Fixed the expected values in integration tests
Cross att masking and cpu-gpu consistency tests are still failing however.
* changes for make style and quality
* add documentation
* clean up contrastive embedding
* add mm grounding dino to loss mapping
* add model link to config docstring
* hack fix for mm grounding dino consistency tests
* add special cases for unused config attr check
* add all models and update docs
* update model doc to the new style
* Use super_kwargs for modular config
* Move init to the _init_weights function
* Add copied from for tests
* fixup
* update typehints
* Fix-copies for tests
* fix-copies
* Fix init test
* fix snippets in docs
* fix consistency
* fix consistency
* update conversion script
* fix nits in readme and remove old comments from conversion script
* add license
* remove unused config args
* remove unnecessary if/else in model init
* fix quality
* Update references
* fix test
* fixup
---------
Co-authored-by: qubvel <qubvel@gmail.com>
* Add cohere2_vision to support CohereLabs/command-a-vision-07-2025
* update and add modualr file
* update processors and check with orig impl later
* delete unused files
* image processor reduce LOC and re-use GotOCR2
* update the config to use modular
* model tests pass
* processor fixes
* check model outputs decorator
* address one more comment
* Update tokens. Temp - need to read from tokenizer'
* fix for multi-gpu
* Fix image token handling
* upadte image token expansion logic
* fix a few issues with remote code loading
* not related but modular forces us to change all files now
* Add overview and code sample to cohere vision docs
* add scripts. TMP.
* Update inference script
* Create script
* set dtype in export script
* TO revert: modular export fix
* Fix scripts
* Revert "TO revert: modular export fix"
This reverts commit bdb2f305b61027a05f0032ce70d6ca698879191c.
* Use modular weights
* Upload to hub
Removed OOD weights ad script
* Updated docs
* fix import error
Update docs
Added pipeline test
* Updated docs
* Run modular script
remove modular for config
Added patch_size
Added docstrings in modular
Fix OOM
Add docs, fixup integration tests. 8-gpu passing
* tiny updates
* address comments + fixup
* add test for chat template
* check model outputs workaround
* aya vision fix check model inputs
* Revert "add test for chat template"
This reverts commit 42c756e397f588d76b449ff1f93292d8ee0202d8.
* reveert more changes
* last revert
* skip and merge
* faulty copy from
---------
Co-authored-by: Julian Mack <julian.mack@cohere.com>
Co-authored-by: kyle-cohere <kyle@cohere.com>
* docs: ko: main_classes/peft.md
* feat: nmt draft
* docs: add missing TOC to documentation for `PeftAdapterMixin` section
Added a table of contents (TOC) to the documentation, specifically for the `transformers.integrations.PeftAdapterMixin` section, following the structure and content outlined in [this link](https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin).
* fix: Improve naturalness of purpose expression in Korean
Changed '관리하기 위한' to '관리할 수 있도록' for more natural Korean expression when describing the purpose of providing functions.
* fix: Simplify plural form and make expression more concise
Changed '~할 수 없기 때문에' to '~할 수 없어' for more concise expression while maintaining clarity.
* fix: Replace technical term '주입' with more natural '적용'
Changed '주입할 수 없어' to '적용할 수 없어' for better readability.
Considered alternatives:
'삽입': Too literal translation of 'inject'
'입력': Could be misunderstood as data input
'통합': Implies merging two systems
'추가': Simple but less precise
'적용' was chosen as it's the most natural and widely used term in Korean technical documentation for this context.
* fix: update toctree path for PEFT to lowercase
Changed the toctree path from 'PEFT' (uppercase) to 'peft' (lowercase) to match the correct directory naming convention and prevent broken links.
* docs: update as per reviewer feedback after rebase
* Add Fast Segformer Processor
* Modified the params according to segformer model
* modified test_image_processing_Segformer_fast args
- removed redundant params like do_center_crop,center_crop which aren't present in the original segformer class
* added segmentation_maps processing logic form the slow segformer processing module with references from beitimageprocessing fast
* fixed code_quality
* added recommended fixes and tests to make sure everything processess smoothly
* Fixed SegmentationMapsLogic
- modified the preprocessing of segmentation maps to use tensors
- added batch support
* fixed some mismatched files
* modified the tolerance for tests
* use modular
* fix ci
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* feat: superpoint fast image processor
* fix: reran fast cli command to generate fast config
* feat: updated test cases
* fix: removed old model add
* fix: format fix
* Update src/transformers/models/superpoint/image_processing_superpoint_fast.py
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* fix: ported to torch and made requested changes
* fix: removed changes to init
* fix: init fix
* fix: init format fix
* fixed testcases and ported to torch
* fix: format fixes
* failed
test case fix
* fix superpoint fast
* fix docstring
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* upload initial code
* update deepseek-vl adaptor
* update hierarchy of vision model classes
* udpate aligner model
* add text model
* Added Image Processor
* Added Image Processor
* Added Image Processor
* apply masks
* remove projection; add aligner
* remove interpolate_pos_encoding
* remove unused params in config
* cleaning
* Add the __init__ file
* added processing deepseek_vl class
* modified the deepseek-vl processor
* modified the deepseek-vl processor
* update __init__
* Update the image processor class name
* Added Deepseek to src/transformers/__init__.py file
* Added Deepseek to image_processing_auto.py
* update the __init__ file
* update deepseek_vl image processor
* Update Deepseek Processor
* upload fast image processor
* Revert "upload fast image processor"
This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4.
* update image processor
* flatten heirarchy
* remove DeepseekVLModel
* major update (complete modeling)
* auto modeling and other files
* formatting
* fix quality
* replace torchvision in modeling
* set default do_normalize to False
* add fast image processor template using tool
* update image processors
* add fast image processor to other files
* update liscense
* Added deepseek image testcases
* update image test
* update processor
* write CHAT_TEMPLATE
* update model for processor
* fix processor
* minor fixes and formatting
* fix image processing and tests
* fix interpolation in sam
* fix output_attentions in DeepseekVLModel
* upload test_modeling
* fix tests because of vocab size
* set use_high_res_vision=False in tests
* fix all modeling tests
* fix styling
* remove explicit background_color from image processors
* added test_processor
* added test_processor
* fix processor tests
* update docs
* update docs
* update docs
* update conversion script
* Fixed typos
* minor fixes from review
- remove model_id comments in examples
- remove from pre-trained auto mapping
- move to image-text-to-text from vision-to-seq in auto mapping
- add image_token_index to __init__ for config
- remove outdated temporary config in conversion script
- update example to use chat_template in docstring example
- update liscense 2021->2025
* fix type in config docstring
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* update get_image_features
* fix config
* improve DeepseekVLImageProcessor.preprocess
* return image_hidden_states
* use AutoTokenizer and AutoImageProcessor in Processor
* fix model outputs
* make num_image_tokens configurable
* fix docstring of processor
* move system prompt to chat template
* fix repo consistency
* fix return_dict
* replace SamVisionEncoder with SamVisionModel
* update to remove deepcopy
* 🛠️ Major Architectural Changes (Adds DeepseekVLHybrid)
* fix quality checks
* add missing hybrid in auto modeling
* run make style
* update sam_hq
* update high_res_size in test
* update docs following #36979
* update code with auto_docstring
* update conversion scripts
* fix style
* fix failing test because of tuple
* set weights_only=True in conversion script
* use safetensors.torch.load_file instead of torch.load in conversion script
* make output_dir optional in conversion script
* fix code snippets in docs (now the examples work fine)
* integration tests for DeepseekVL
* update expected texts
* make style
* integration tests for DeepseekVLHybrid
* fix class name
* update expected texts for hybrid
* run "make style"
* update since changes in main
* run make-style
* nits since changes in main
* undo changes in sam
* fix tests
* fix tests; update with main
* update with main: output_attention/output_hidden_states
* fix copied part in deepseek_vl
* run fix-copies
* fix output_hidden_states
* sam: fix _init_weigths
* use modular for DeepseekVL
* make image processor more modular
* modular: use JanusPreTrainedModel
* janus: provide kwargs in loss
* update processors in conversion script
* Revert "sam: fix _init_weigths"
This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27.
* run fix-copies
---------
Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* add owlv2 fast image processor
* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class
* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class
* change references to owlVit to owlv2 in docstrings for post process methods
* change type hints from List, Dict, Tuple to list, dict, tuple
* remove unused typing imports
* add disable grouping argument to group images by shape
* run make quality and repo-consistency
* use modular
* fix auto_docstring
---------
Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* docs: Standardize OPT model card with enhanced details
* Remove incorrect link from OPT model card
* Address review feedback on OPT model card
* Update opt.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
- Fix Cyrillic 'Р' to Latin 'P' in Portuguese language link (README.md)
- Fix 'meanginful' to 'meaningful' in training documentation
- Fix duplicate 'Cohere' reference in modular transformers documentation
- Fix duplicate 'the the' in trainer and chat command comments
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
* First attempt
* fix
* fix
* Enhance TrackioCallback to log GPU memory usage and allocation
* Enhance Trackio integration in callbacks and training arguments documentation
* re order
* remove unused lines
* fix torch optional