HuggingFace_transformer

Author	SHA1	Message	Date
Tanuj Rai	8d40ca5749	Update phi4_multimodal.md (#38830 ) * Update phi4_multimodal.md * Update docs/source/en/model_doc/phi4_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/phi4_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/phi4_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/phi4_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/phi4_multimodal.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update phi4_multimodal.md * Update phi4_multimodal.md * Update phi4_multimodal.md * Update phi4_multimodal.md * Update phi4_multimodal.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-14 10:35:17 -07:00
MilkClouds	3635415af2	[Docs] Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic (#39391 ) Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic	2025-07-14 09:25:06 -07:00
Parag Ekbote	5c30f7e390	Update Model Card for Encoder Decoder Model (#39272 ) * update model card. * add back the model contributors for mamba and mamba2. * update the model card. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update batches with correct alignment. * update examples and remove quantization example. * update the examples. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update example. * correct the example. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-11 11:23:08 -07:00
Xiang Chendong	0d7efe3e4b	fix gpt2 usage doc (#39351 ) fix typo of gpt2 doc usage	2025-07-11 10:59:41 -07:00
Muhammad Shaheer Malik	a646fd55fd	Updated CamemBERT model card to new standardized format (#39227 ) * Updated CamemBERT model card to new standardized format * Applied review suggestions for CamemBERT: restored API refs, added examples, badges, and attribution * Updated CamemBERT usage examples, quantization, badges, and format * Updated CamemBERT badges * Fixed CLI Section	2025-07-11 10:59:09 -07:00
Julien Denize	70e57e4710	Add mistral common support (#38906 ) * wip: correct docstrings * Add mistral-common support. * quality * wip: add requested methods * wip: fix tests * wip: add internally some methods not being supported in mistral-common * wip * wip: add opencv dependency and update test list * wip: add mistral-common to testing dependencies * wip: revert some test changes * wip: ci * wip: ci * clean * check * check * check * wip: add hf image format to apply_chat_template and return pixel_values * wip: make mistral-common non-installed safe * wip: clean zip * fix: from_pretrained * fix: path and base64 * fix: path and import root * wip: add docs * clean * clean * revert --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>	2025-07-11 16:26:58 +00:00
Shuming Hu	bf607f6d3b	PerceptionLM (#37878 ) * plm template * A working plm with fixed image features * hacked processor * First version that reproduced PLM output using PE from timm. * Simplify and fix tie_word_embeddings * Use PIL resize. Simplify converstion. * First version that works with video input. * simplifed image preprocessing (not batched) * Minor fixes after rebasing on main. * Video processor based on new API. * Revert to use _preprocess for image processor. * refactor with modular * fix tie_word_embedding * Testing with timm PE * check in missed converstion from modular to model.py * First working version of PLM with Eva PE. PLM-1B and 3B outputs are exactly the same as before. PLM-8B output has some differences. * address review comments * Fixed batching if video and image examples mixed. * Simplify PE configuration. * Enable AutoModel for PerceptionEncoder. * Update PE config style. * update all headers * Minor fixes. * Move lm_head to PerceptionLMForConditionalGeneration. Fix vit_G model specification. * Fix for testing_modeling_perception_lm.py * Image processing refactoring to use more common parts. * Fix processor test. * update tests to use model from hub * More test fixes. * integration test GT update after rebasing; probably due to video preprocessing * update test media path to hub * Stop tracking local scripts * address some review comments * refactor image processing. * small fixes * update documentation and minor fixes * remove scripts * Minor fix for CI * Fix image processing * CI and doc fix * CI formatting fix * ruff fix * ruff formatting * ran utils/sort_auto_mappings.py * update docstring * more docstring udpates * add vision_input_type default fallback for image processing * more verbose variable naming * test update * Remove PE and PEConfig use AutoModel(TimmWrapper) instead * Minor cleanup. * Minor Fix: remove any ref to PE. Ruff format and check. * fix docstring * Fix modular/model consistency.Improvex docstringfor . * Fix PerceptionLMForConditionalGenerationModelTest * ruff fix * fix for check_repo * minor formatting * dummy size arg to fix for processor test. * Update docstring for PerceptionLMConfig * Minor fixes from review feedback. * Revert some minor changes per reviewer feedback. * update base_model_prefix * address reviewer feedback * fix comment in modeling file * address reviewer feedback * ruff format * Pre-merge test update. * reapply modular and fix checkpoint name * processor test path * use modular a bit more * remove dead code * add token decorator --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-11 11:07:32 +02:00
Giuseppe Coccia	4b47b2b8ea	Updated Switch Transformers model card with standardized format (Issue #36979 ) (#39305 ) * Updated Switch Transformers model card with standardized format (Issue #36979) * Apply reviewer suggestions to the new standardised Switch Transformer's model card * Update switch_transformers.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-10 15:34:10 -07:00
Paul Pak	9682d07f92	LFM2 (#39340 ) * [modeling][lfm2] LFM2 model on 4.53.0 interface * [configuration] hook in LFM2 keys * [modeling][lfm2] update modeling interface for 4.53.1 * [modeling][lfm2] apply mask to hidden conv states * [misc] ruff format/lint * [modeling][lfm2] minor: NotImplemented legacy cache conversion * Create lfm2.md * create nice modular * style * Update modeling_auto.py * clean and start adding tests * style * Update test_modeling_lfm2.py * Update __init__.py * small test model size * config * small fix * fix * remove useless config attrs -> block_dim and conv_dim are hiden_size * fix prepare inputs * fix config * test * typo * skip tests accordingly * config docstrings * add doc to .md * skip config docstring check --------- Co-authored-by: Maxime Labonne <81252890+mlabonne@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-10 16:07:33 +02:00
Raushan Turganbay	bc161d5d06	Delete deprecated stuff (#38838 ) * delete deprecated stuff * fix copies * remove unused tests * fix modernbert and fuyu * Update src/transformers/cache_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * bye bye `seen_tokens` * address comments * update typings * ecnoder decoder models follow same pattern as whisper * fix copies * why is it set to False? * fix switch transformers * fix encoder decoder models shared weight * fix copies and RAG * remove `next_cache` * fix gptj/git * fix copies * fix copies * style... * another forgotten docsrting --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-07-10 05:18:44 +00:00
Tom Aarsen	5111c8ea2f	Fix typo: langauge -> language (#39317 )	2025-07-09 12:06:46 -07:00
Priya aka Priyamvadha Balakrishnan	2781ad092d	docs: update LLaVA-NeXT model card (#38894 ) * docs: update LLaVA-NeXT model card * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * [docs] Updated llava_next model card * Update docs/source/en/model_doc/llava_next.md remove image sources Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * [fix] Change Flash Attention to SDPA badge * [doc] fixed quantization example * docs: updated contribution details and badges * Update llava_next.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-09 11:32:40 -07:00
Eman Risha	d61c0d087c	Updated the Model docs - for the MARIAN model (#39138 ) * Update marian.md This update improves the Marian model card to follow the Hugging Face standardized model card format. The changes include: - Added a clear description of MarianMT, its architecture, and how it differs from other models. - Provided usage examples for Pipeline and AutoModel. - Added a quantization example for optimizing model inference. - Included instructions and examples for multilingual translation with language codes. - Added an Attention Mask Visualizer example. - Added a Resources section with relevant links to papers, the Marian framework, language codes, tokenizer guides, and quantization documentation. - Fixed formatting issues in the code blocks for correct rendering. This update improves the readability, usability, and consistency of the Marian model documentation for users. * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update marian.md * Update marian.md * Update marian.md * Update marian.md * Update docs/source/en/model_doc/marian.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update marian.md * Update marian.md * Update marian.md * Update marian.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-09 10:23:03 -07:00
MaCAT	4652677c89	🌐 [i18n-KO] Translated quark.md to Korean (#39268 ) * initial translation * removed english parts * maintain consistency * Update docs/source/ko/quantization/quark.md Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/quantization/quark.md Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/quantization/quark.md Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/quantization/quark.md Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * add toctree * fixed indentation --------- Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>	2025-07-09 09:29:51 -07:00
Vladislav Bronzov	c980904204	Add DeepSeek V2 Model into Transformers (#36400 ) * add initial structure * doc fixes, add model base logic * update init files * some fixes to config and modular * some improvements for attention * format * remove unused attn * some fixes for moe layer and for decoder * adapt _compute_yarn_parameters for deepseek * format * small fix * fix for decoder forward * add tests, small refactoring * fix dummies * fix init * fix doc * fix config docs * add sequce doc, fix init for gate * fix issues in tests * fix config doc * remove unused args * some fixes and refactoring after review * fix doc for config * small fixes for config args * revert config refactoring * small refactoring * minor fixes after rebase * small fix after merge * fix modular * remove rotaryembd from public init * small test fix * some rotary pos calculation improvement * fix format * some improvements and fixes * fix config * some refactoring * adjust some unit tests * skip test * small fixes and tests adjustment * reapply modular * fix all tests except Integration * fix integration testzs * cleanup BC stuff * rope * fix integrations tests based on a10 * style --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-09 17:04:28 +02:00
Biao Zhang	7ef592c96c	Update T5gemma (#39210 ) * bug fix: add vocab_size to t5gemmaconfig for pipeline. * Update checkpoint placeholder * minor change * minor change * minor change: update example. * fix: add vocab_size as an explict arg. * buf fix: remove vocab_size verification; instead, re-set encoder/decoder vocab size. Note, in t5gemma, vocab size of encoder/decoder shoud be always the same. * add `add_generation_prompt` for message preprocessing.	2025-07-08 19:08:48 +02:00
Quentin Lhoest	1ecd52e50a	Add torchcodec in docstrings/tests for `datasets` 4.0 (#39156 ) * fix dataset run_object_detection * bump version * keep same dataset actually * torchcodec in docstrings and testing utils * torchcodec in dockerfiles and requirements * remove duplicate * add torchocodec to all the remaining docker files * fix tests * support torchcodec in audio classification and ASR * [commit to revert] build ci-dev images * [commit to revert] trigger circleci * [commit to revert] build ci-dev images * fix * fix modeling_hubert * backward compatible run_object_detection * revert ci trigger commits * fix mono conversion and support torch tensor as input * revert map_to_array docs + fix it * revert mono * nit in docstring * style * fix modular --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-07-08 17:06:12 +02:00
Joao Gante	6f1a43896c	[CI] fix docs (#39273 ) * fix docs * add ko gloassary file to toctree	2025-07-08 11:31:03 +01:00
Yaswanth Gali	fbdaa7b099	Add Aimv2 model (#36625 ) * Model skelton * changes * temp push * changes * Added support for aimv2-native * More changes * More changes * Stupid mistake correction * Added config and refactor * Added vison model * update * Refactor for lit variant * Added Text Model * Minor fixes * nits * update * Preliminary tests * More fixes * Updated tests 🤗 * Refactor * Updated testcase * Updated config * make fixup * more fixes * Bug fix and updates * deadcode * Fixes * nit * up * Happy CI ✅ * Reduce LOC * nit * nit * make style * return_dict refactor * bug fix * fix * doc update * nit * make fixup * Minor update * _init_weigths modifcation * update tests * Minor fixes post review * Update w.r.t GradientCheckpointingLayer * docs update * update * nit * Use more Modular 😉 * Change name from AIMv2 to Aimv2 * Nit * make style * Add model doc pointer * make style * Update model doc section * updates * Modify attn mask and interface * update test * Final change * Utilize flash and flex attn * keep attn mask * camelcase model name in test file * Fix docstring * Fix config warning finally and create_causal_mask * disable torchscript * remove unused arg * remove from tests * balance model size for tests * fix device * tests * tests * flaky test * fix import --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-08 11:53:21 +02:00
Jingze Shi	d8590b4b0c	Add Doge model (#35891 ) * Add Doge Model * Fix code quality * Rollback an error commit * Fix config for open-source weights * Revert "Fix config for open-source weights" This reverts commit 229cdcac10a6a4274d1dd13b729bc14c98eb0c76. * Add modular_doge * Update Doge inherits from Llama * Fix import bug * [docs] Add usage of doge model * Fix Doge import pretrainedconfig from modeling_utils to configuration_utils * [docs] remove trust remote code from doge * Fix dynamo bug in doge model * Update docstrings * Import apply_rotary_pos_emb and repeat_kv from Llama * Fix all nits * Fix code quality * Fix some bugs * Fix code quality * Remove inherited `_update_causal_mask` from Llama This leads to incorrect weight initialization. * Fix the wrong tensor orderings in DogeCDMoE * Fix attention mask bug We have to provide attention_mask for dynamic mask computation * Modify most implementations to inherit from Llama But there are two problems: 1. `flex_attention_forward` is not updated properly 2. `Example` error in the forward method of DogeForCausalLM * Modify CDMoE for batch efficient implementation * Uniform MoE configuration names, just like QwenMoE * Fix code quality * Fix code quality * Fix code quality * Add tp plan of CDMoE Module * Hybird DMA with sliding window * Update valid tokens greater than window size * Fix code quality * Add `convert_doge_weights_to_hf` * Fix STATE_DICT_MAPPING in convert_doge_weights_to_hf.py * Fix nits in modular_doge * Fix code quality * Fix all nits * Fix all nits * Make sure the attention function is updated inside the class * Fix code quality issues in the Doge model and add a test for it * Fix `test_generate` * Fix code quality * Fix nits fllowing suggestions * Fix code quality * Fix code quality issues * Fix nits * Fix code quality nits * Fix the missing parameters in the configuration. * Fix the missing parameters in the configuration. * Fix nits * Add initialization of attention * Fix last nits * Simplify dynamic mask generation logic * Rename router_logits to gate_logits for matching latest changes of MixtralModel * Rename typings for matching latest changes of MixtralModel * Fixes typo in comment * Update src/transformers/models/doge/modular_doge.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix code quality issues to match other modular * Fix code quality issues to match other modular * Fix the static compilation errors * Update model weights link * Fix code quality issues to match other modular * reapply modular and support for new outputs * style * simplify a lot * fix import location * reapply modular * fix * fix integration test --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-08 11:44:29 +02:00
gudwls215	ea3c2c0277	Fix license text, duplicate assignment, and typo in constant names (#39250 ) - Complete Apache License text in Italian documentation - Remove duplicate variable assignment in Perceiver converter - Fix typo in MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES constant	2025-07-08 10:20:52 +02:00
Yuxuan Zhang	17b3c96c00	Glm 4 doc (#39247 ) * update the glm4 model readme * update test * update GLM-4.1V model * update as format * update * fix some tests * fix the rest * fix on a10, not t4 * nit: dummy import --------- Co-authored-by: raushan <raushan@huggingface.co>	2025-07-08 08:22:04 +02:00
Drew Ross	bbca9782ca	Update LED model card (#39233 ) * Update LED model card * Remove extra arguments * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-07 15:56:57 -07:00
Mikhail Moskovchenko	3993ee1e98	Add `segmentation_maps` support to MobileNetV2ImageProcessor (#37312 ) * Add `segmentation_maps` support to mobilenet_v2 image processor and `reduce_labels` to mobilevit * Changed mobilenetv2 tests to support fastimageprocessor * added `segmentation_maps` support to fast image processor * reverted to upstream/main * Add optional * Use autodocstring * Changed docs * Docs fix * Changed fp to match beit fp * Change typing imports * Fixed repo inconsistency * Added fast-slow equivalence tests * Removed unnecessary call * Add `reduce_labels` to Mobilevit fast processor --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-07-07 13:34:59 -04:00
Joosun Hwang	9698052560	Add Korean translation for glossary.md (#38804 ) * Add Korean translation for glossary.md * Update docs/source/ko/glossary.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ko/glossary.md Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/glossary.md Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/glossary.md Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/glossary.md Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/glossary.md Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/glossary.md Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/glossary.md Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/glossary.md Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/glossary.md Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/glossary.md Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/glossary.md Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/glossary.md Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> --------- Co-authored-by: Joosun40 <77312900+Joosun40@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>	2025-07-07 09:12:55 -07:00
Lucain	bf203aa9da	Update tiny-agents example (#39245 )	2025-07-07 15:58:36 +02:00
jiqing-feng	14cba7ad33	enable xpu on kv-cache and hqq doc (#39246 ) Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-07-07 13:12:02 +00:00
Daniel van Strien	b8f397e456	fix typo in Gemma3n notes (#39196 )	2025-07-07 14:41:33 +02:00
Joao Gante	85d93cc6e3	[serve] Cursor support, move docs into separate page, add more examples (#39133 ) * jan docs * rm * [cursor] tmp commit * Cursor working :D * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/commands/serving.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * cursor docs * try to fix agents/tools docs? * try to fix agents/tools docs? * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * add transformers chat example with transformers serve --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2025-07-03 17:04:16 +01:00
Anton Vlasjuk	b31e9d19a6	[`Dia`] Change ckpt path in docs (#39181 ) fix ckpt path	2025-07-03 10:02:58 +00:00
Steven Liu	df12d87d18	[docs] ViTPose (#38630 ) * vitpose * fix? * fix? * feedback * fix * feedback * feedback * update sample image	2025-07-02 07:56:29 -07:00
Yaswanth Gali	b61023a1b7	🚨🚨🚨 [eomt] make EoMT compatible with pipeline (#39122 ) * Make EoMT compatible with pipeline * Implicit patch offsets * remove patch offsets from arg * Modify tests * Update example * fix proc testcase * Add few more args * add pipeline test suite * fix * docstring fixes * add pipeline test * changes w.r.t review * 🙈 MB * should fix device mismatch * debug * Fixes device mismatch * use decorator * we can split mlp * expected values update --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>	2025-07-02 12:25:26 +01:00
Chong You	e8e0c76162	Add activation sparsity reference in gemma3n doc (#39160 ) Add activation sparsity reference in the description of gemma3n	2025-07-02 04:11:03 +02:00
Drew Ross	fe35eca7bd	Update BigBirdPegasus model card (#39104 ) * Update igbird_pegasus.md * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-06-30 10:42:56 -07:00
Yao Matrix	29a3f5ed8c	switch default xpu tp backend to pytorch built-in XCCL from pytorch 2.8 (#39024 ) * switch default xpu tp backend to pytorch built-in XCCL from pytorch 2.8 Signed-off-by: YAO Matrix <matrix.yao@intel.com> * Update docs/source/en/perf_infer_gpu_multi.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update perf_infer_gpu_multi.md * Update perf_infer_gpu_multi.md * Update perf_infer_gpu_multi.md --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-06-30 08:54:05 -07:00
jiqing-feng	03db2700ab	Enable XPU doc (#38929 ) * fix example with dataset Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update torchao doc Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update torchao doc Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device type Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert torchao change Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix torchao doc Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert torchao change Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update xpu torchao doc Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update chat_templating_multimodal.md Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * use full name for int8 Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert int8 title Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-06-30 07:56:55 -07:00
Lysandre Debut	e8f90b5397	Split `transformers chat` and `transformers serve` (#38443 ) * Next token * Split chat and serve * Support both generation methods * Style * Generation Config * temp * temp * Finalize serving.py Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com> * Finalize chat.py * Update src/transformers/commands/serving.py Co-authored-by: célina <hanouticelina@gmail.com> * Lucain's comments Co-authored-by: Lucain <lucain@huggingface.co> * Update * Last comments on PR * Better error handling * Better error handling * CI errors * CI errors * Add tests * Fix tests * Fix tests * [chat] Split chat/serve (built on top of lysandre's PR) (#39031) * Next token * Split chat and serve * Support both generation methods * Style * Generation Config * temp * temp * Finalize serving.py Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com> * Finalize chat.py * Update src/transformers/commands/serving.py Co-authored-by: célina <hanouticelina@gmail.com> * Lucain's comments Co-authored-by: Lucain <lucain@huggingface.co> * Update * Last comments on PR * Better error handling * Better error handling * CI errors * CI errors * Add tests * Fix tests * Fix tests * streaming tool call * abstract tool state; set tool start as eos * todos * server working on models without tools * rm chat's deprecated flags * chat defaults * kv cache persists across calls * add server docs * link * Update src/transformers/commands/serving.py * Apply suggestions from code review * i love merge conflicts * solve multi turn with tiny-agents * On the fly switching of the models * Remove required positional arg --------- Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com> Co-authored-by: Lucain <lucain@huggingface.co> * Protect names * Fix tests --------- Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com> Co-authored-by: Lucain <lucain@huggingface.co> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2025-06-30 15:10:53 +02:00
Ryan Mullins	ed9f252608	docs: Gemma 3n audio encoder (#39087 ) Updating Gemma 3n docs and docstrings to clarify the relationship between the newly trained audio encoder used in Gemma 3n and the USM model from the original paper.	2025-06-30 14:10:51 +02:00
Sandeep Yadav	18143c76bf	Sandeepyadav1478/2025 06 19 deberta v2 model card update (#38895 ) * [docs]: update deberta-v2.md model card * chore: req updates * chore: address code review feedback and update docs * chore: review feedback and updates * chore: model selection updates * chores: quantizations review updates	2025-06-27 10:35:30 -07:00
farrosalferro	dd7dc4a4a2	Add Fast Image Processor for Chameleon (#37140 ) * Add Fast Image Processor for Chameleon * add warning to resize and move blend_rgba to convert_to_rgb * Remove unrelated files * Update image_processing_chameleon_fast to use auto_docstring * fix equivalence test --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-06-27 15:26:57 +00:00
MinJu-Ha	49d9fd49bd	Add Fast Image Processor for mobileViT (#37143 ) * Add image_processing_mobilevit_fast.py * Fix copies * update _preprocess for channel_flip * Update for batched image processing * Resolve merge conflicts with main * Fix import order and remove trailing whitespace (ruff clean-up) * Fix copy inconsistencies * Add NotImplementedError for post_process_semantic_segmentation to satisfy repo checks * Add auto_docstring * Adjust style * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Update src/transformers/models/mobilevit/image_processing_mobilevit_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Update src/transformers/models/mobilevit/image_processing_mobilevit_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Delete not used function * test: add missing tests for and * Add post_process_semantic_segmentation to mobilevit_fast.py * Add preprocess function to image_processing_mobilebit_fast.py * ruff check for formatting * fix: modify preprocess method to handle BatchFeature correctly * Remove logic for default value assignment Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Remove normalization adn RGB conversion logic not used in slow processor Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Simplify return_tensors logic using one-liner conditional expression Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Remove unused normalization and format parameters Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * add *kwargs and remove default values in _preprocess add slow_fast equivalence tests for segmentation * style: autoformat code with ruff * Fix slow_fast equivalence test * merge + remove skipped test --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-06-27 14:40:24 +00:00
Nahieli	4336ecd1ea	add fast image processor nougat (#37661 ) * add fast image processor nougat * test fixes * docstring white space * last fixes * docstring_type * tolerance unit test * fix tolerance * fix rtol * remove traling white space * remove white space * note for tolerance unit test * fix tests * remove print --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-06-27 14:39:43 +00:00
Yaswanth Gali	1750c518dd	✨ Add EoMT Model \|\| 🚨 Fix Mask2Former loss calculation (#37610 ) * Initial Commit * up * More changes * up * Only mask_logits mismatch * close enough logits debug later * fixes * format * Add dummy loss * Close enough processing for semantic seg * nit * Added panoptic postprocessor * refactor * refactor * finally fixed panoptic postprocessor * temp update * Refactor ForUniversalSegmentation class * nits and config update * Few fixes and inference matches * change mapping * Added training support but loss slightly off 🥲 * Loss is matching 😀 * update * Initial tests skelton * changes * tests update * more modular * initial tests * updates * better docstrings * changes * proc tests passing :) * Image processor update * tiny change * QOL changes * Update test w.r.t latest attn refactor * repo-consistency fixes * up * Image proc fix and integration tests :) * docs update * integration tests * fix * docs update 🥰 * minor fix * Happy CI * fix * obvious refactoring * refactoring w.r.t review * Add fask image proc skelton * Fast Image proc and cleanups * Use more modular * tests update * Add more tests * Nit * QOL updates * change init_weights to torch default * add eager func coz of make style * up * changes * typo fix * Updates * More deterministic tests * More modular * go more modular 🚀 * up * dump * add supprot for giant ckpts * overhaul * modular * refactor * instace seg is ready * cleanup * forgot this * docs cleanup * minor changes * EoMT - > Eomt * Happy CI * remove redundant comment * Change model references * final change * check annealing per block * My other PR changes 😂 --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-06-27 14:18:18 +02:00
Steven Liu	a52478253b	[docs] Tensor parallelism (#38241 ) * updates * feedback * badges * fix? * fix? * fix? * fix?	2025-06-26 14:40:45 -07:00
Steven Liu	84e8696cae	[docs] @auto_docstring (#39011 ) * refactor * feedback	2025-06-26 14:21:54 -07:00
Drew Ross	018855de63	Update PEGASUS-X model card (#38971 ) * Update PEGASUS-X model card * Add cache_implementation argument in quantization code example * Update CLI example * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Remove TensorFlow and Flax badges --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-06-26 13:54:48 -07:00
Steven Liu	757c26fb40	[docs] Model contribution (#38995 ) improve	2025-06-26 12:25:14 -07:00
StevenBucaille	f171e7e884	Update SuperPoint model card (#38896 ) * docs: first draft to more standard SuperPoint documentation * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * docs: reverted changes on Auto classes * docs: addressed the rest of the comments * docs: remove outdated reference to keypoint detection task guide in SuperPoint documentation * Update superpoint.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-06-26 10:13:06 -07:00
Ryan Mullins	c63cfd6a83	Gemma 3n (#39059 ) * Gemma 3n * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3p5RMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3) * Adding gemma3p5 text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * regenerating modeling file after syncing to HEAD * Use torch.std(..., unbiased=False) for activation sparsity (#8) * Refactoring to a single QVK Norm (#13) * AltUp: support scale_corrected_output (#14) * Converts einsums to nn.Linear (#7) * Converts einsums to nn.Linear * Removing unused variables * Aligning SharedKVCache with HybridCache (#11) * Alinging SharedKVStore with HybridCache * Remove KVStore. Refactor apply_rotary_pos_emb for sharing * Addressing review comments * Supporting split modality embeddings in Gemma3n (#10) * Adding the Embedder class * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation * Apply suggestions from code review Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Addressing review comments, prop drilling audio and vision configs to the text config * Removing TODO's that have been addressed * Simplify Embedder init and add audio embeddings * Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder * Refactoring vision and audio embeddings into ConditionalGeneration model --------- Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating attention mask for Gemma 3.5 (#15) * xxx_token_index to xxx_token_id * remvoing deprecated last_cache_position * Removing references to SigLIP * Always init per-layer inputs * Using torch.finfo().min for epsilon_tensor * Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas * fix modular GEMMA3N_INPUTS_DOCSTRING * Gemma3nAttention inherits from Gemma3Attention * Modular inheritance fixes * CausalLM conversion script for 4B model (#16) * Add Gemma3n Audio Encoder (#6) * initial commit of Gemma 3.5 scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3n overall and text config with vision and audio config placeholders (#3) * Adding gemma3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3.5 (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right Gemma 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * CausalLM conversion script for 4B model * inv_timescales to non-persistent buffer * Addressing audio encoder Attention feedback * Addressing Gemma3nAudioSSCPConvBlock feedback * Addressing Gemma3nAudioConformerAttention feedback * Addressing padding feedback * Weights conversion loads audio state dict * Always use vision_config so saving works * Token id updates for configs * Stubs for interleaving audio embs * Addressing reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> * Fixing cache access error * Removing duplicate code from a bad merge * Gemma 3n Text + Vision Part 1 (#17) * testing utilities for numerics comparisons * Corrected einsum to nn.Linear weights conversion * Inherit scaled word embs from Gemma3 not Bart * Fixing transposes for collapsed linears * More transpose fixes * numpy api fix * RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True * Force AltUp to float32 * Updating debugging script for AudioEncoder debugging * Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs * Correcting attention einsum conversions * RMSNorm in type of x * Fixing douplicate laurel norm/gating * KV sharing using the right previous indices * Refactor kv shared index computation. Correct frac_shared_layers * Use num_shared_layers instead of inferring from a fraction * fixing a bug for logging * Fix shared data_ptrs in altup inits * rope: adjust proj -> norm -> rope to preserve computation (#20) * rope: adjust proj -> norm -> rope to preserve computation * Removing some breaking language model fluff in ConditionalGeneration * Consolidate query_states transforms --------- Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Vectorize the loops in AltUp (#19) * Vectorize the loops in AltUp * fix typo * Expanding to support batched inputs * remove extra debug script * Fix AltUp.forward --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel * Convert norm to 1/sqrt (#21) * Convert norm to 1/sqrt * Scale shift change per Phil's rec * Adding default activation sparsity * Fixing 2B config in weights conversion script * Fixing RMSNorm parameters - adding scale_shift and with_scale * Correcting query pre-attention scaling * Adding query_rescale_scalar to text config * Adding layer_idx to MLP * Permafix for input_layernorm * Use 1/sqrt instead of rsqrt in DecoderLayer * Fix o_proj conversion * Conversion script update for vision encoder * Removing logging for debugging timm model * Fixing bugs in Gemma3nForConditionalGeneration for text generation * Generating the modeling_gemma3n.py file * Removing the addition of an erroneous line in the modeling file * Adding gemma3n text model to modeling_auto * Bugfix: Updating the interleaving of inputs_embeds and vision_embeds * Updating the modeling file with the latest bugfix changes * Updating models/auto for Gemma 3n * using AutoTokenizer in forward test * Adding processing_gemma3n.py * Gemma 3n configured for AutoModel. Conversion script updated. * Removing errant merge artifacts --------- Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> * Removing errant debugging statements from Gemma 3 * Gemma3n audio model (#18) * testing utilities for numerics comparisons * Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock * Add audio version of forward script based on RyanMullins' implementation * Updating to match encoder tests. WIP: config question needs resolving * Updates to audio classes to enable end-to-end running * Removing vestigial classes, cleaning up print statements * Adding SiLU / Swish to audio conformer feed forward block * Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio * Adding outputs to audio test * Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model * Update forward test to load from local weights * Update conversion to process / output audio layers * Update __all__ to export audio encoder * AutoModel registration for Gemma 3n Audio * Use AutoModel for ConditionalGeneration.audio_tower * Fixing input_proj_linear transpose * Fixing Gemma3NanoAudioConformerAttention.post conversion * Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion * Correcting indentation issue on Gemma3p5RMSNorm --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Text + Vision Part 2 (#23) * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3p5.py * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Updating configs for the 2B variant in the conversion script * Using final generation config in conversion script --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Audio Integration (#12) * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma 3n overall and text config with vision and audio config placeholders (#3) * Adding Gemma 3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemma3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3n * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * Converting sl.Frontend to FeatureExtractor * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3n.py * Update modular Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Draft of audio data in chat template * Removing image processing. Using SigLIP instead. * Audio input going end-to-end * Fixing dtype issues in audio encoder * x-lib formatting consistency * Adding example data * Save preprocessor_config.json from conversion script * Instrumentaiton for debugging * Additional instrumentation for preprocessing debugging * Updates to preprocessor, padding; produces correct end-to-end results on sample * Tackling configuraiton TODOs * Start of feature extractor refatcor * Adds Numpy version of USM extractor, removes Torch version and dependencies * Fixing AltUp.correct coef permute * Supporting batches of single audio segment inputs * Docstrings updates for config * In-lining audio feature extraction * Adjustments to conversion script and smoke test script --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: pculliton <phillipculliton@gmail.com> * Gemma 3n renaming * Removing test data and utilities * Renaming test files * Gemma 3n refactor * Fix tokenizer config in conversion script * Address reviewer feedback * FeatureExtractor returns float32 by default * Adding basic tests for audio, and input name for audio encoder * Audio integration test, updates to model_id for other integration tests * Use scales for q and k norms (#26) * Update audio integration test to use HF dataset * Reviewer feedback * Expand embedding table to full vocab size in weights conversion * Mix-n-match MatFormers for Gemma 3n (#25) * Remove in-place operations (#30) * chore: removing inplace ops * remove [tensor] * n pattern * chore: reviewer feedback in AudioEncoder and AltUp * More grad clipping * Dynamo compatibility * fix: cache slicing error * chore: simplify shared kv cache slicing * chore: vision encoder rename in timm * fix: image processor do_normalize=False * fixup: style * chore: model_doc * fix: docs for code quality * chore: repo consistency * fix: RMSNorm in float as in prior Gemmas * fix: per_layer_inputs = None * chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint * chore: repo consistency * Add initial unit tests for Gemma3nAudioFeatureExtractor (#27) * Add initial unit tests for Gemma3nAudioFeatureExtractor * Add basic unit tests for Gemma3nProcessor (#28) Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> * parameterize tests --------- Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> * chore: code style * fix: test cases * style and consistency * fix config in the test to be coherent with layer cache sharing * fix hidden states in tests and code * inits and mappings * fix modality prefixes * test order and prefixes * fix test exception * fix class order and reduce model size for faster tests * restore _checkpoint_conversion_mapping to load Caual from Conditional * fix config mapping! * fix: reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: pculliton <phillipculliton@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix import test * add model args * auto_docstring * replace test path * consistency * skip tests for now * fix docstring for doc builder * skip unused attr --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: pculliton <phillipculliton@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Arthur <arthur.zucker@gmail.com>	2025-06-26 17:55:47 +02:00
Jaeyong Sung	583db52bc6	Add Dia model (#38405 ) * add dia model * add tokenizer files * cleanup some stuff * brut copy paste code * rough cleanup of the modeling code * nuke some stuff * more nuking * more cleanups * updates * add mulitLayerEmbedding vectorization * nits * more modeling simplifications * updates * update rope * update rope * just fixup * update configuration files * more cleanup! * default config values * update * forgotten comma * another comma! * update, more cleanups * just more nits * more config cleanups * time for the encoder * fix * sa=mall nit * nits * n * refacto a bit * cleanup * update cv scipt * fix last issues * fix last nits * styling * small fixes * just run 1 generation * fixes * nits * fix conversion * fix * more fixes * full generate * ouf! * fixes! * updates * fix * fix cvrt * fixup * nits * delete wrong test * update * update * test tokenization * let's start changing things bit by bit - fix encoder step * removing custom generation, moving to GenerationMixin * add encoder decoder attention masks for generation * mask changes, correctness checked against ad29837 in dia repo * refactor a bit already --> next cache * too important not to push :) * minimal cleanup + more todos * make main overwrite modeling utils * add cfg filter & eos filter * add eos countdown & delay pattern * update eos countdown * add max step eos countdown * fix tests * fix some things * fix generation with testing * move cfg & eos stuff to logits processor * make RepetitionPenaltyLogitsProcessor flexible - can accept 3D scores like (batch_size, channel, vocab) * fix input_ids concatenation dimension in GenerationMixin for flexibility * Add DiaHangoverLogitsProcessor and DiaExponentialDecayLengthPenalty classes; refactor logits processing in DiaForConditionalGeneration to utilize new configurations and improve flexibility. * Add stopping criteria * refactor * move delay pattern from processor to modeling like musicgen. - add docs - change eos countdown to eos delay pattern * fix processor & fix tests * refactor types * refactor imports * format code * fix docstring to pass ci * add docstring to DiaConfig & add DiaModel to test * fix docstring * add docstring * fix some bugs * check * porting / merging results from other branch - IMPORTANT: it very likely breaks generation, the goal is to have a proper forward path first * experimental testing of left padding for first channel * whoops * Fix merge to make generation work * fix cfg filter * add position ids * add todos, break things * revert changes to generation --> we will force 2d but go 3d on custom stuff * refactor a lot, change prepare decoder ids to work with left padding (needs testing), add todos * some first fixes to get to 10. in generation * some more generation fixes / adjustment * style + rope fixes * move cfg out, simplify a few things, more todos * nit * start working on custom logit processors * nit * quick fixes * cfg top k * more refactor of logits processing, needs a decision if gen config gets the new attributes or if we move it to config or similar * lets keep changes to core code minimal, only eos scaling is questionable atm * simpler eos delay logits processor * that was for debugging :D * proof of concept rope * small fix on device mismatch * cfg fixes + delay logits max len * transformers rope * modular dia * more cleanup * keep modeling consistently 3D, generate handles 2D internally * decoder starts with bos if nothing * post processing prototype * style * lol * force sample / greedy + fixes on padding * style * fixup tokenization * nits * revert * start working on dia tests * fix a lot of tests * more test fixes * nit * more test fixes + some features to simplify code more * more cleanup * forgot that one * autodocs * small consistency fixes * fix regression * small fixes * dia feature extraction * docs * wip processor * fix processor order * processing goes brrr * transpose before * small fix * fix major bug but needs now a closer look into the custom processors esp cfg * small thing on logits * nits * simplify indices and shifts * add simpler version of padding tests back (temporarily) * add logit processor tests * starting tests on processor * fix mask application during generation * some fixes on the weights conversion * style + fixup logits order * simplify conversion * nit * remove padding tests * nits on modeling * hmm * fix tests * trigger * probably gonna be reverted, just a quick design around audio tokenizer * fixup typing * post merge + more typing * initial design for audio tokenizer * more design changes * nit * more processor tests and style related things * add to init * protect import * not sure why tbh * add another protect * more fixes * wow * it aint stopping :D * another missed type issue * ... * change design around audio tokenizer to prioritize init and go for auto - in regards to the review * change to new causal mask function + docstrings * change ternary * docs * remove todo, i dont think its essential tbh * remove pipeline as current pipelines do not fit in the current scheme, same as csm * closer to wrapping up the processor * text to audio, just for demo purposes (will likely be reverted) * check if it's this * save audio function * ensure no grad * fixes on prefixed audio, hop length is used via preprocess dac, device fixes * integration tests (tested locally on a100) + some processor utils / fixes * style * nits * another round of smaller things * docs + some fixes (generate one might be big) * msytery solved * small fix on conversion * add abstract audio tokenizer, change init check to abstract class * nits * update docs + fix some processing :D * change inheritance scheme for audio tokenizer * delete dead / unnecessary code in copied generate loop * last nits on new pipeline behavior (+ todo on tests) + style * trigger --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Vasqu <antonprogamer@gmail.com>	2025-06-26 11:04:23 +00:00

1 2 3 4 5 ...

3405 Commits