* working locally; need to style and test
* added docs and initial tests; need to debug and flesh out
* fixed tests
* working long context; batches
* working fa2 and eager
* update tests
* add missing confnigs
* remove default autoset
* fix spacing
* fix most tests
* fixed tests
* fix to init
* refactor to match new transformers updates
* remove static cache option
* fa2 fix
* fix docs
* in progress
* working on tests
* fixed issue with attn outputs
* remove debug
* fix local config attr
* update doc string
* fix docstring
* add docs to toc
* correct typo in toc
* add new updates from main w.r.t. ModernBERT RoPE
* fix local param
---------
Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l07.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@n02.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l08.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l01.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l02.mgmt.ai.cluster>
* Update modeling_qwen2_5_vl.py
### 🐛 Bug Description
When using Unsloth’s Qwen2.5-VL vision models (both 3B and 7B) with the latest HuggingFace Transformers (commit: 520b9dcb42), the model crashes due to a type mismatch in the attention mask handling.
---
### 🔥 Error Traceback
* Fix dtype compatibility in attention mask processing
Replace hardcoded torch.finfo() usage with dtype-aware function selection to handle both integer and floating-point attention mask tensors.
Technical Details:
Problem: Line 1292 assumes floating-point dtype for attention_mask_tensor
Solution: Add dtype check to use torch.iinfo() for integer types and torch.finfo() for float types
Files Modified: transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py
* Update modeling_qwen2_5_vl.py
* Update modeling_qwen2_5_vl.py
* Fix: Cast to float before applying torch.finfo
* # Fix: Use appropriate function based on dtype
* Update modular_qwen2_5_vl.py
* Fix: Cast to float before applying torch.finfo
* Fix: Use appropriate function based on dtype
* Fix: Use appropriate function based on dtype
* Updatet modeling_glm4v.py
* Only apply conversion for floating point tensors (inverted masks)
* corrected the format issue
reformatted modeling_glm4v.py
All done! ✨🍰✨
1 file reformatted
* Fix: Cast to float before applying torch.finfo
Corrected the format issue
* Fix torch.finfo() for integer attention mask
#39333
* Run make fix-copies and make style for CI compliance
- Updated dependency versions table
- Fixed code formatting and style issues
- Sorted auto mappings
- Updated documentation TOC
* Fix torch.finfo() TypeError for
Fix torch.finfo() TypeError for integer attention_mask_tensor #39333
* Fix torch.finfo() TypeError for integer
* Updated CamemBERT model card to new standardized format
* Applied review suggestions for CamemBERT: restored API refs, added examples, badges, and attribution
* Updated CamemBERT usage examples, quantization, badges, and format
* Updated CamemBERT badges
* Fixed CLI Section
* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`
More verbose exceptions in `fix_docstring` on docstring formatting issues.
* plm template
* A working plm with fixed image features
* hacked processor
* First version that reproduced PLM output using PE from timm.
* Simplify and fix tie_word_embeddings
* Use PIL resize. Simplify converstion.
* First version that works with video input.
* simplifed image preprocessing (not batched)
* Minor fixes after rebasing on main.
* Video processor based on new API.
* Revert to use _preprocess for image processor.
* refactor with modular
* fix tie_word_embedding
* Testing with timm PE
* check in missed converstion from modular to model.py
* First working version of PLM with Eva PE. PLM-1B and 3B outputs are exactly the same as before. PLM-8B output has some differences.
* address review comments
* Fixed batching if video and image examples mixed.
* Simplify PE configuration.
* Enable AutoModel for PerceptionEncoder.
* Update PE config style.
* update all headers
* Minor fixes.
* Move lm_head to PerceptionLMForConditionalGeneration.
Fix vit_G model specification.
* Fix for testing_modeling_perception_lm.py
* Image processing refactoring to use more common parts.
* Fix processor test.
* update tests to use model from hub
* More test fixes.
* integration test GT update after rebasing; probably due to video preprocessing
* update test media path to hub
* Stop tracking local scripts
* address some review comments
* refactor image processing.
* small fixes
* update documentation and minor fixes
* remove scripts
* Minor fix for CI
* Fix image processing
* CI and doc fix
* CI formatting fix
* ruff fix
* ruff formatting
* ran utils/sort_auto_mappings.py
* update docstring
* more docstring udpates
* add vision_input_type default fallback for image processing
* more verbose variable naming
* test update
* Remove PE and PEConfig use AutoModel(TimmWrapper) instead
* Minor cleanup.
* Minor Fix: remove any ref to PE. Ruff format and check.
* fix docstring
* Fix modular/model consistency.Improvex docstringfor .
* Fix PerceptionLMForConditionalGenerationModelTest
* ruff fix
* fix for check_repo
* minor formatting
* dummy size arg to fix for processor test.
* Update docstring for PerceptionLMConfig
* Minor fixes from review feedback.
* Revert some minor changes per reviewer feedback.
* update base_model_prefix
* address reviewer feedback
* fix comment in modeling file
* address reviewer feedback
* ruff format
* Pre-merge test update.
* reapply modular and fix checkpoint name
* processor test path
* use modular a bit more
* remove dead code
* add token decorator
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Updated Switch Transformers model card with standardized format (Issue #36979)
* Apply reviewer suggestions to the new standardised Switch Transformer's model card
* Update switch_transformers.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* changes for video
* update modular
* change get_video_features
* update video token replacement
* update modular
* add test and fix typo
* lint
* fix order
* lint
* fix
* remove dependency
* lint
* lint
* remove todo
* resize video for test
* lint..
* fix test
* new a processor for video_test
* fix test
Also add notes asking users to set `TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1`
or call `torch._dynamo.config.capture_scalar_outputs = True`, as currently
this will cause a graph break.
Signed-off-by: Hollow Man <hollowman@opensuse.org>
* ensure the query is updated during training
avoid unused parameters that DDP does not like
* avoid a crash when `kwargs` contain `padding=True`
trainers often pass this argument automatically
* minor
* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)
* minor - most feature extractors has a `sampling_rate` property
* speedup relative position embeddings
* fix several issues in model saving/loading:
- avoid modifying `self._hf_peft_config_loaded` when saving
- adapter_config automatically points to the original base model - a finetuned version should point to the model save dir.
- fixing model weights names, that are changed by adding an adapter.
* minor
* minor
* minor
* fixing a crash without peft active
* add todo to replace einsum
* granite speech speedups:
1. register attention_dist to avoid cpu-to-gpu transfer every layer.
2. pad_sequence is much faster than per-sample-padding + concat.
3. avoid returning audio back to cpu when using a compute device.
* support audio.shape=(1,L)