* Add `tokenizer_kwargs` arg to text generation pipeline.
* chore: re-run CI
* Rename `tokenizer_kwargs` to `tokenizer_encode_kwargs` for text generation pipeline
* Fix `tokenizer_encode_kwargs` doc string.
* Fix note related to `tokenizer _kwargs` in text generation pipeline
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* add a test
* tempdir
* fix import issue[
* wow I am tired
* properly init
* i am not super familiar with quantizer api :|
* set to TRUE fro now
* full support
* push current changes
* will clean this later but the imports are a shitshow here
* this correctly saves the block and scales but forward seems broken
* quanitze was not correct
* fix storage
* why were bias even included
* finally!
* style
* fix style
* remove print
* lazy import
* up
* not sure what happens this works now?
* holy molly it was not so far
* okay this seems to work!
* workings!!!
* allow save_pretrained to create PR
* Apply suggestions from code review
* fixup
* add deqyabtze fakse as wek
* working new
* fix
* rm swizzle and unswizzle during saving
* rm print
* Update src/transformers/modeling_utils.py
* fix
* style
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
* Fix label smoothing incompatibility with multi-label classification (#40258)
* Improve label smoothing multi-label check based on reviewer feedback
- Move check from LabelSmoother to Trainer.__init__() for better architecture
- Use model.config.problem_type instead of tensor inference for robustness
- Warn and disable smoothing instead of raising error for better UX
- Update test to verify warning behavior
Renamed wer metric variable to wer_metric to avoid naming conflict
with local variable assignment in compute_metrics function.
Co-authored-by: pranam-gf <pranam@goodfin.com>
Fixed 4 instances of the typo "seperator" → "separator" in variable names:
- 2 instances in src/transformers/models/shieldgemma2/convert_shieldgemma2_weights_orbax_to_hf.py
- 2 instances in src/transformers/models/gemma3/convert_gemma3_weights_orbax_to_hf.py
These typos were in variable names used for parsing path components in weight conversion scripts.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <noreply@anthropic.com>
* fix to the typings which are unmatched to FA function signature
cumulative_seqlens_q/k -> cu_seq_lens_q/k:
- in the FlashAttentionKwargs in modeling_flash_attention_utils
- in the TransformersKwargs in generic
- in the PagedAttentionArgs in continuous_batching
It is **BC**, because they are created in `ContinuousBatchProcessor.setup_static_tensors:L762`, used in `ContinuousBatchingManager._model_forward:L1233` and destroyed with `ContinuousBatchProcessor`
* format changes by ruff
* Update src/transformers/integrations/flash_paged.py
unused function arg in `PagedAttentionCache.update`
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* revert continuous_batching signiture, which is more meaningful
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* simplify common get/set
* remove some noise
* change some 5 years old modeling utils
* update examples
* fix copies
* revert some changes
* fixes, gah
* format
* move to Mixin
* remove smolvlm specific require grad
* skip
* force defaults
* remodularise some stuff
* remodularise more stuff
* add safety for audio models
* style
* have a correct fallback, you daft donkey
* remove this argh
* change heuristic for audio models
* fixup
* revert
* this works
* this should be explicit
* fix Nth ESM exception
* tryout decoder
* this as well
* revert again
* 🧠
* aaah ESM has two modelings aaah
* broom broom
* format
* wrong copies
* copies
* modular cleanups
* format
* modularities
* wrong mergefix
* seriously
* align with new model
* new model
* update everywhere
* style
* pipelines
* switch it everywhere in tests
* switch it everywhere in docs
* switch in converters everywhere
* update in examples
* update in model docstrings
* style
* warnings
* style
* Update configuration_utils.py
* fix
* Update configuration_utils.py
* fixes and add first test
* add pipeline tests
* Update test_pipelines_common.py
* add config test
* Update test_modeling_common.py
* add new ones
* post rebase
* add new
* post rebase adds
* Update trainer.md
* Update trainer.md
Removed the detail about label_names argument usage from the tip/ warning section
* Update training_args.py
Added the label_names usage clarification in the docstring
* Update trainer.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* handle support for cache classes when num enc layers != num dec layers
* handle overwrites
* one more corner case
* Update src/transformers/generation/utils.py
* Update src/transformers/generation/utils.py
* Apply suggestions from code review
* handle corner case :o
* fix
* cleanup, revert aimv2 fa changes
* fix aria
* i searched a long time but the cross dependency is for the recent models so...
* this was something... evolla
* fix modernbert decoder + make fa test more robust
* nit