Matt
9be4728af8
Just import torch AdamW instead ( #36177 )
...
* Just import torch AdamW instead
* Update docs too
* Make AdamW undocumented
* make fixup
* Add a basic wrapper class
* Add it back to the docs
* Just remove AdamW entirely
* Remove some AdamW references
* Drop AdamW from the public init
* make fix-copies
* Cleanup some references
* make fixup
* Delete lots of transformers.AdamW references
* Remove extra references to adamw_hf
2025-03-19 18:29:40 +00:00
Michael Feil
51bd0ceb9e
Update configuration_qwen2.py ( #36735 )
...
* Update configuration_qwen2_moe.py
* Update modeling_qwen2_moe.py
* ruff fmt
* docstring add qkv_bias
2025-03-19 18:15:54 +00:00
JJJYmmm
107fedc1e2
quick fix fast_image_processor register error ( #36716 )
...
* fix fast_image_processor register error
* update error message
* remove redundant import
* fix format
2025-03-19 18:05:45 +00:00
Mohamed Mekkouri
258dd9cc69
Add Space to Bitsandbytes doc ( #36834 )
...
* add space
* address review
2025-03-19 18:56:07 +01:00
Tugsbayasgalan Manlaibaatar
f39f4960f3
Support tracable dynamicKVcache ( #36311 )
...
* Support tracable dynamicKVcache
* Fix lint
* More fine grained test
* Lint
* Update
* Update
* Fix up
* Apply suggestions from code review
* Update src/transformers/cache_utils.py
* Update tests/utils/test_cache_utils.py
* Apply suggestions from code review
* Update
* Change error message
* Rename
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
---------
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com >
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
2025-03-19 16:52:30 +00:00
Matt
63c3116530
One more fix for reviewer assignment ( #36829 )
...
* one more fix
* one more fix
* Trigger tests
2025-03-19 16:25:24 +00:00
Joao Gante
7c233980f4
[gemma 3] multimodal checkpoints + AutoModelForCausalLM ( #36741 )
2025-03-19 15:04:19 +00:00
Yao Matrix
b11050d6a2
enable OffloadedCache on XPU from PyTorch 2.7 ( #36654 )
...
* fix "Cannot copy out of meta tensor; no data!" issue for BartForConditionalGeneration model
* follow Marc's suggestion to use _tie_weights to fix
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
* enable OffloadedCache on XPU since PyTorch 2.7
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
* fix style
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
* don't change bart
Signed-off-by: root <root@a4bf01945cfe.jf.intel.com >
* make code more concise per review comments
Signed-off-by: N <matrix.yao@intel.com >
* fix review comments
Signed-off-by: root <root@a4bf01945cfe.jf.intel.com >
* Revert "fix review comments"
This reverts commit acf1484b86c7cc58b2dee69e7008c0eeb4c97b1b.
* fix review comments
Signed-off-by: root <root@a4bf01945cfe.jf.intel.com >
* fix style
Signed-off-by: root <root@a4bf01945cfe.jf.intel.com >
---------
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
Signed-off-by: root <root@a4bf01945cfe.jf.intel.com >
Signed-off-by: N <matrix.yao@intel.com >
Co-authored-by: root <root@a4bf01945cfe.jf.intel.com >
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
2025-03-19 15:15:52 +01:00
Driss Guessous
e8d960329e
Add option for ao base configs ( #36526 )
2025-03-19 14:59:47 +01:00
Arthur
fef8b7f8e9
Add attention visualization tool ( #36630 )
...
* add utils fiel
* style
* nits
* nits
* update
* updaets
* update
* fix init issues
* big updates
* nits
* nits?
* small updates
* nites
* there were still some models left
* style
* fixes
* updates
* nits _ fixes
* push changes
* update
* update
* update
* Apply suggestions from code review
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com >
* style
* styling and return a string for testing
* small updates
* always biderectional for now
* update
---------
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com >
2025-03-19 13:58:46 +01:00
Joao Gante
0fe0bae0a8
[Generation] remove leftover code from end-to-end compilation ( #36685 )
2025-03-19 11:28:33 +00:00
Mohamed Mekkouri
a861db01e5
Fix Device map for bitsandbytes tests ( #36800 )
...
fix
2025-03-19 11:57:13 +01:00
Yih-Dar
b9374a0763
Remove dist": "loadfile" for pytest in CircleCI jobs ( #36811 )
...
* fasterrrrr
* avoid crash in example jobs
* avoid crash in TF example jobs
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-03-19 11:15:09 +01:00
Yao Matrix
4fa91b1be5
fix "Cannot copy out of meta tensor; no data!" issue for BartForConditionalGeneration model ( #36572 )
...
* fix "Cannot copy out of meta tensor; no data!" issue for BartForConditionalGeneration model
* follow Marc's suggestion to use _tie_weights to fix
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
* fix review comments.
Signed-off-by: N <matrix.yao@intel.com >
* fix quality
Signed-off-by: N <matrix.yao@intel.com >
---------
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
Signed-off-by: N <matrix.yao@intel.com >
2025-03-19 10:48:47 +01:00
ivarflakstad
706703bba6
Expectations test utils ( #36569 )
...
* Add expectation classes + tests
* Use typing Union instead of |
* Use bits to track score in properties cmp method
* Add exceptions and tests + comments
* Remove compute cap minor as it is not needed currently
* Simplify. Remove Properties class
* Add example Exceptions usage
* Expectations as dict subclass
* Update example Exceptions usage
* Refactor. Improve type name. Document score fn.
* Rename to DeviceProperties.
2025-03-18 23:39:50 +01:00
Joao Gante
179d02ffb8
[generate] ✨ vectorized beam search ✨ ( #35802 )
2025-03-18 18:39:36 +00:00
Yoni Gozlan
12f2ebef63
Support custom dosctrings in modular ( #36726 )
...
* Override docstrings in modular if not none
* Update doc
2025-03-18 14:00:54 -04:00
Gar
00915d3041
Fix chameleon's TypeError because inputs_embeds may None ( #36673 )
...
* fix chameleon TypeError when inputs_embeds is None
* reformat
* hotfix
2025-03-18 18:59:30 +01:00
Marc Sun
14b597f518
Fix casting dtype for qunatization ( #36799 )
...
* fix
* remove print
2025-03-18 18:46:03 +01:00
Yoni Gozlan
30580f035b
Fix Mistral3 tests ( #36797 )
...
* fix processor tests
* fix modeling tests
* fix test processor chat template
* revert modeling test changes
2025-03-18 13:08:12 -04:00
Cyril Vallez
db1d4c5a0b
Loading optimizations ( #36742 )
...
* improvements
* Update modeling_utils.py
* add some doc about loading
* Update modeling_utils.py
2025-03-18 16:38:44 +01:00
Yih-Dar
7baf00089a
Update SHA for tj-actions/changed-files ( #36795 )
...
* trigger
* trigger
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-03-18 16:19:39 +01:00
Marc Sun
3017536ebf
fix hqq due to recent modeling changes ( #36771 )
...
* fix-hqq
* style
* test
2025-03-18 12:20:27 +01:00
Cyril Vallez
e959530b8f
Add Mistral3 ( #36790 )
...
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
* initial start
* style and dummies
* Create convert_mistral3_weights_to_hf.py
* update
* typo
* typo
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* up
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* update
* update
* Update image_processing_mistral3.py
* Update convert_mistral3_weights_to_hf.py
* fix patch merger
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* up
* update modular to fit
* style
* Update convert_mistral3_weights_to_hf.py
* typo
* Update modular_mistral3.py
* simplify a lot all shape shenanigans
* simplify
* add working test processor
* Add partially working common modeling tests
* All tests working and remove mistral3 image processors
* add docs and fixup
* fix inference with image size >1540
* 🚨 fix test image proc pixtral
* Remove vision_feature_select_strategy
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* Update convert_mistral3_weights_to_hf.py
* clean
* fix test checkpoints
* Update test_modeling_mistral3.py
* Update test_modeling_mistral3.py
* style
* Use Pixtral processor
* up
* finish cleaning processor to use pixtral directly
* Update __init__.py
* Update processing_pixtral.py
* doc
* Update __init__.py
* Update mistral3.md
* Update _toctree.yml
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co >
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com >
v4.49.0-Mistral-3
2025-03-18 12:04:42 +01:00
Lysandre Debut
bd92073692
Fix gemma3_text tokenizer in mapping ( #36793 )
2025-03-18 11:50:22 +01:00
Zebin
7426d02ea8
Fixing typo in gemma3 image_processor_fast and adding a small test ( #36776 )
...
Co-authored-by: zebz13 <zeb@fedora>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com >
2025-03-18 11:35:06 +01:00
Afanti
19b9d8ae13
chore: fix typos in tests directory ( #36785 )
...
* chore: fix typos in tests directory
* chore: fix typos in tests directory
* chore: fix typos in tests directory
* chore: fix typos in tests directory
* chore: fix typos in tests directory
* chore: fix typos in tests directory
* chore: fix typos in tests directory
2025-03-18 10:31:13 +01:00
Afanti
7f5077e536
fix typos in the tests directory ( #36717 )
2025-03-17 17:45:57 +00:00
Daniel Kleine
cbfb8d7b27
doc: Clarify is_decoder usage in PretrainedConfig documentation ( #36724 )
...
* fix: clarify decoder usage in PretrainedConfig documentation
* Apply suggestions from code review
updated doc
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-03-17 09:40:25 -07:00
Steven Liu
ac1a1b66b9
[docs] Update README ( #36265 )
...
* update
* feedback
* feedback
* update versions
2025-03-17 09:37:19 -07:00
Joao Gante
cff4caa0c1
[CI] remove redundant checks in test_eager_matches_sdpa_inference ( #36740 )
2025-03-17 16:29:18 +00:00
Christopher Akiki
e3af4fec91
[MINOR:TYPO] Update hubert.md ( #36733 )
...
* [MINOR:TYPO] Update hubert.md
- typo fix (wave2vec instead of hubert)
- make code snippet copiable and runnable
* Run tests
2025-03-17 09:07:51 -07:00
Petr Kuderov
c8a2b25f91
Fix TrainingArguments.torch_empty_cache_steps post_init check ( #36734 )
...
Mistaken use of De Morgan's law. Fixed "not (X or Y)"
to correct "not (X and Y)" check to raise a ValueError.
Added corresponding test to check "positive int or None" condition.
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com >
2025-03-17 16:09:46 +01:00
Sambhav Dixit
8e67230860
Fix test isolation for clear_import_cache utility ( #36345 )
...
* test fixup
* test fixup
* fixing tests for unused imports
* style fixes
* fix
* style fixes
* styke fix
* remove isolated module cache
* rm custom subprocess defination
* run using exsiting fn
* style fixup
* make fixup
* remove redundant comments
* rm redundat skipif + style changes
2025-03-17 16:09:09 +01:00
jiqing-feng
27361bd218
fix xpu tests ( #36656 )
...
* fix awq xpu tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* update
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix llava next video bnb tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
2025-03-17 15:57:49 +01:00
Fredrik Norén
da7d64f4ff
Allow ray datasets to be used with trainer ( #36699 )
...
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
2025-03-17 15:44:47 +01:00
jiqing-feng
2256875a77
fix can_generate ( #36570 )
...
* fix can_generate
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix can generate for speecht5 and blip
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix speecht5 tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com >
2025-03-17 14:56:18 +01:00
Marc Sun
9e94801146
enable/disable compile for quants methods ( #36519 )
...
* disable compile for most quants methods
* fix
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com >
* Update tests/quantization/bnb/test_mixed_int8.py
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com >
* Update src/transformers/generation/configuration_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* changes from joao suggestions
---------
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com >
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
2025-03-17 11:38:21 +01:00
Armaghan Shakir
c53d53da89
🚨 🚨 🚨 Fix sdpa in SAM and refactor relative position embeddings ( #36422 )
...
* fall back to eager if output_attentions
* improve relative position embeddings
* run modular on got_ocr2
* run-slow: sam
* fix run-length encoding
* fix tf processor errors
* update tf_sam
* fix compile error
* re-run tests
2025-03-17 09:39:52 +00:00
Joao Gante
fc8764c9a6
[Generation, Gemma 3] When passing a custom generation_config, overwrite default values with the model's base generation_config ( #36684 )
2025-03-15 12:40:09 +00:00
Guillaume LEGENDRE
f263e88dcf
Update self-push-caller.yml
2025-03-15 11:32:04 +01:00
Ilyas Moutawwakil
6f3e0b68e0
Fix grad accum arbitrary value ( #36691 )
2025-03-14 22:03:01 +01:00
Cyril Vallez
2c2495cc7b
Fix post_init() code duplication ( #36727 )
...
* Update modeling_utils.py
* CIs
2025-03-14 17:36:02 +01:00
MaCAT
25992b493c
🌐 [i18n-KO] Translated codegen.md to Korean ( #36698 )
...
* Initial translation
* Add _toctree.yml
2025-03-14 09:31:18 -07:00
Joao Gante
42ebb6c23e
[tests] Parameterized test_eager_matches_sdpa_inference ( #36650 )
2025-03-14 14:41:27 +00:00
Matt
9215cc62d4
Try working around the processor registration bugs ( #36184 )
...
* Try working around the processor registration bugs
* oops
* Update error message
* Clarify error
* Docstring docstring docstring
* The extra content is indexed by config class, so let's grab some values out of there
* Commit my confusion as a TODO
* Resolve my confusion
* Cleanup and mostly revert to the original
* Better autoclass fallback
* Don't nest f-strings you lunatic
* Clearer error message
* Less getattr()
* Revert a lot of changes to try a different approach!
* Try the global registry
* Check the dynamic list as well as the transformers root
* Move the dynamic list somewhere safer
* Move the dynamic list somewhere even safer
* More import cleanup
* Simplify all the register_for_auto_class methods
* Set _auto_class in the register() methods
* Stop setting the cls attribute in register()
* Restore specifying the model class for Model derivatives only
* Fix accidentally taking the .__class__ of a class
* Revert register_for_auto_class changes
* Fix get_possibly_dynamic_module
* No more ALL_CUSTOM_CLASSES
* Fix up get_possibly_dynamic_module as well
* Revert unnecessary formatting changes
* Trigger tests
2025-03-14 13:56:21 +00:00
Sean (Seok-Won) Yi
691d1b52c3
Fix/best model checkpoint fix ( #35885 )
...
* Set best_model_checkpoint only when ckpt exists.
Rather than set it explicitly without checking if the checkpoint directory even exists as before, now we moved the setting logic inside of _save_checkpoint and are only setting it if it exists.
* Added best_global_step to TrainerState.
* Added tests for best_model_checkpoint.
* Fixed hard-coded values in test to prevent fail.
* Added helper func and removed hard-coded best_step.
* Added side effect patch generator for _eval.
* Added evaluate side effect func.
* Removed erroneous patching.
* Fixed minor bug.
* Applied Ruff.
* Fixed Ruff problem in make style.
* Used Trainer.set_initial_training_values.
2025-03-14 14:24:53 +01:00
Joao Gante
3bd1a0ddf1
[model loading] don't gc.collect() if only 1 shard is used ( #36721 )
...
* don't gc collect if 1 shard is used
* delete state dict anyways
2025-03-14 12:56:56 +00:00
Matt
8cb522b419
Cleanup the regex used for doc preprocessing ( #36648 )
...
* Cleanup the regex used for doc preprocessing
* Run tests
2025-03-14 12:18:49 +00:00
Matt
72861e11eb
Make the flaky list a little more general ( #36704 )
...
* Make the flaky list a little more general
* Trigger tests
* Make the flaky list a little more general
2025-03-14 12:15:32 +00:00