HuggingFace_transformer

Author	SHA1	Message	Date
Joao Gante	daab2db33f	[CI] post-`GptOss` fixes for green CI (#39929 )	2025-08-07 16:27:00 +02:00
Arthur	7c38d8fc23	Add GPT OSS model from OpenAI (#39923 ) * fix * nice * where i am at * Bro this works * Update src/transformers/integrations/tensor_parallel.py * cleanups * yups that was breaking * Update src/transformers/models/openai_moe/modeling_openai_moe.py * gather on experts and not mlp * add changes for latest convert branch * adds options to get output_router_logits from config * bring chat temlate + special tokens back into the script. * initial commmit * update * working with shards * add model.safetensors.index.json * fix * fix * mxfp4 flag * rm print * Fix PAD/EOS/BOS (#18) * fix pad/eos/bos * base model maybe one day * add some doc * special tokens based on harmony. * add in tokenizer config as well. * prepare for rebase with main * Fix for initialize_tensor_parallelism now returning 4-tuple ``` [rank0]: File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module> [rank0]: model = AutoModelForCausalLM.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained [rank0]: return model_class.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper [rank0]: return func(args, kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained [rank0]: tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: ValueError: too many values to unpack (expected 3) ``` mxfp4 * mxfp4 draft * fix * fix import * draft * draft impl * finally working ! * simplify * add import * working version * consider blocks and scales * device mesh fix * initial commit * add working dequant + quant logic * update * non nan, gibberish output * working EP + quantization finally ! * start cleaning * remove reversing process * style * some cleaning * initial commmit * more cleaning * more cleaning * simplify * more cleaning * rm duplicated function * changing tp_plan * update tp plan check * add loading attribute * dequantizing logic * use subfunctions * import cleaning * update_param_name * adds clamped swiglu * add clamping to training path * simplify dequant logic * update * Bad merge * more simplifications & tests * fix ! * fix registering custom attention * fix order * fixes * some test nits * nits * nit * fix * Clamp sink logits * Clean * Soft-max trick * Clean up * p * fix deepspeed * update both modeling and modular for cleanup * contiguous * update tests * fix top_k router call * revert renaming * test nits * small fixes for EP * fix path for our local tests * update as I should not have broken that! * fix the loss of mixtral * revert part of the changes related to router_scores, kernel probably no ready for that! * deleting a small nit * update arch * fix post processing * update * running version but not expected output * moving to cuda * initial commit * revert * erroring when loading on cpu * updates * del blocks, scales * fix * style * rm comm * comment * add comment * style * remove duplicated lines * Fix minor issue with weight_map conversion script * fix sampling params * rename to final name * upate pre-final version of template * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * fix batched inference * serve fixes * swizzle ! * update final chat template by Matt. * fix responses; pin oai * sinplify * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * fix * Use ROCm kernels from HUB * Make kernel modes explicit * update final chat template by Matt. x2 * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> * Fix installation * Update setup.py Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com> * allow no content * fix: update message handling in write_tokenizer function * Fix template logic for user message role * last nits for CB and flash_paged! * there was one bad merge * fix CB (hardcode for now, its just using kv groups instead) * fix * better fix for device_map * minor device fix * Fix flash paged * updates * Revert "remove dtensors, not explicit (#39840)" This reverts commit `6dfd561d9c`. * update * Revert "remove dtensors, not explicit (#39840)" This reverts commit `6dfd561d9c`. * fix merge * fix * Fix line break when custom model indentity * nits testing * to locals first and pass sliding window to flash paged * register modes for MegaBlocksMoeMlp * add integration test in fixtures -> now update the tests to use it! * update integration tests * initial fix * style and update tests * fix * chore(gpt oss): remove mlp_bias from configuration It was just a leftover. * stats * Integration tests * whoops * Shouldn't move model * Ensure assistant messages without thinking always go to "final" channel * More checks to ensure expected format * Add pad_token_id to model configuration in write_model function (#51) * Add oai fix fast tests (#59) * Fix some fast tests * Force some updates * Remove unnecessary fixes * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * reasoning -> Reasoning * Add additional integration tests * fixup * Slight fixes * align chat template with harmony * simplify * Add comment * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * Revert fixup * skip 2 test remove todo * merge * padding side should be left for integration tests * fix modular wrt to changes made to modeling * style * isort * fix opies for the loss * mmmm --------- Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: edbeeching <edbeeching@gmail.com> Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com> Co-authored-by: MekkCyber <mekk.cyber@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com> Co-authored-by: Zhuohan Li <zhuohan@openai.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal> Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Akos Hadnagy <akos@ahadnagy.com> Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com> Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Matt <rocketknight1@gmail.com>	2025-08-05 18:02:18 +02:00
TaeHyeon Jeon	738c1a3899	🌐 [i18n-KO] Translated `cache_explanation.md` to Korean (#39535 ) * update: _toctree.yml * docs: ko: cache_explanation.md * feat: nmt draft * fix: apply yijun-lee's comments * fix: apply 4N3MONE's comments * docs: update cache_position * docs: update cache-storage-implementation * update: add h2 tag in cache-position --------- Co-authored-by: taehyeonjeon <xogus294@gmail.com>	2025-08-05 08:20:13 -07:00
ppaanngggg	c430047602	[docs] update object detection guide (#39909 ) * Update object_detection.md * Update object_detection.md	2025-08-05 14:07:21 +00:00
Lysandre Debut	00d47757bf	Reorder serving docs (#39634 ) * Slight reorg * LLMs + draft VLMs * Actual VLM examples * Initial responses * Reorder * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/tiny_agents.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/open_webui.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/cursor.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Responses API * Address Pedro's comments --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2025-08-05 08:43:06 +02:00
Arpon Kapuria	8c4ea670dc	chore: update DETR model card (#39822 ) * Update model card for DETR * fix: applied suggested changes * fix: simplified pipeline and modified notes and resources * Update detr.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-04 12:25:53 -07:00
Jan Netík	0bd91cc822	Add support for `ModernBertForMultipleChoice` (#39232 ) * implement ModernBertForMultipleChoice * fixup, style, repo consistency * generate modeling_modernbert * add tests + docs * fix test	2025-08-04 20:45:43 +02:00
rohitthewanderer	3bafa128dc	[DOCS] : Improved mimi model card (#39824 ) * [DOCS] : Improved mimi model card * Removed additional header * Review: addressed feedback * Update mimi.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-04 10:07:06 -07:00
Akib Jawad	2a9febd632	Add support for including in-memory videos (not just files/urls) in apply_chat_template (#39494 ) * added code for handling video object ,as dictionary of frames and metadata, in chat template * added new test where videos are passed as objects (dict of frames, metadata) in the chat template * modified hardcoded video_len check that does not match with increased number of tests cases. * Modify hardcoded video_len check that fails with increased number of tests * update documentation of multi-modal chat templating with extra information about including video object in chat template. * add array handling in load_video() * temporary test video inlcuded * skip testing smolvlm with videos that are list of frames * update documentation & make fixup * Address review comments	2025-08-04 11:49:42 +02:00
StevenBucaille	1ec0feccdd	[image-processing] deprecate `plot_keypoint_matching`, make `visualize_keypoint_matching` as a standard (#39830 ) * fix: deprecate plot_keypoint_matching and make visualize_keypoint_matching for all Keypoint Matching models * refactor: added copied from * fix: make style * fix: repo consistency * fix: make style * docs: added missing method in SuperGlue docs	2025-08-01 16:29:57 +00:00
Yoni Gozlan	7b4d9843ba	Add fast image processor Janus, Deepseek VL, Deepseek VL hybrid (#39739 ) * add fast image processor Janus, deepseek_vl, deepseek_vl_hybrid * fix after review	2025-08-01 12:20:08 -04:00
rziga	3951d4ad5d	Add MM Grounding DINO (#37925 ) * first commit Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed. * fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model * cleaned up a hack in the conversion script * Fixed the expected values in integration tests Cross att masking and cpu-gpu consistency tests are still failing however. * changes for make style and quality * add documentation * clean up contrastive embedding * add mm grounding dino to loss mapping * add model link to config docstring * hack fix for mm grounding dino consistency tests * add special cases for unused config attr check * add all models and update docs * update model doc to the new style * Use super_kwargs for modular config * Move init to the _init_weights function * Add copied from for tests * fixup * update typehints * Fix-copies for tests * fix-copies * Fix init test * fix snippets in docs * fix consistency * fix consistency * update conversion script * fix nits in readme and remove old comments from conversion script * add license * remove unused config args * remove unnecessary if/else in model init * fix quality * Update references * fix test * fixup --------- Co-authored-by: qubvel <qubvel@gmail.com>	2025-08-01 15:43:23 +01:00
Eric Bezzam	2c0af41ce5	Fix bad markdown links (#39819 ) Fix bad markdown links.	2025-07-31 09:14:14 -07:00
Tommy Chiang	4fcf455517	Fix broken links (#39809 ) Replace links in the form of `[text]((url))` to `[text](url)`. This is the correct format of a url in the markdown.	2025-07-31 13:23:04 +00:00
Raushan Turganbay	b937d47455	[cohere2 vision] move doc to multimodal section (#39820 ) move doc to multimodal section	2025-07-31 15:13:02 +02:00
Kyle Duffy	6ba8a1ff45	Update documentation for Cohere2Vision models (#39817 ) * Update docs with pipeline example * Add Cohere2Vision to list of vision models * Sort models	2025-07-31 11:58:45 +00:00
Raushan Turganbay	e1688d28d3	[Model] Cohere2 Vision (#39810 ) * Add cohere2_vision to support CohereLabs/command-a-vision-07-2025 * update and add modualr file * update processors and check with orig impl later * delete unused files * image processor reduce LOC and re-use GotOCR2 * update the config to use modular * model tests pass * processor fixes * check model outputs decorator * address one more comment * Update tokens. Temp - need to read from tokenizer' * fix for multi-gpu * Fix image token handling * upadte image token expansion logic * fix a few issues with remote code loading * not related but modular forces us to change all files now * Add overview and code sample to cohere vision docs * add scripts. TMP. * Update inference script * Create script * set dtype in export script * TO revert: modular export fix * Fix scripts * Revert "TO revert: modular export fix" This reverts commit bdb2f305b61027a05f0032ce70d6ca698879191c. * Use modular weights * Upload to hub Removed OOD weights ad script * Updated docs * fix import error Update docs Added pipeline test * Updated docs * Run modular script remove modular for config Added patch_size Added docstrings in modular Fix OOM Add docs, fixup integration tests. 8-gpu passing * tiny updates * address comments + fixup * add test for chat template * check model outputs workaround * aya vision fix check model inputs * Revert "add test for chat template" This reverts commit 42c756e397f588d76b449ff1f93292d8ee0202d8. * reveert more changes * last revert * skip and merge * faulty copy from --------- Co-authored-by: Julian Mack <julian.mack@cohere.com> Co-authored-by: kyle-cohere <kyle@cohere.com>	2025-07-31 10:57:34 +00:00
Joao Gante	6c3f27ba61	[docs] fix korean docs yet again (#39813 ) fix korean docs yet again	2025-07-31 09:13:25 +00:00
Drew Ross	7abb5d3992	Update mT5 model card (#39702 ) * Update mt5 model card * Fix casing of model title * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-30 08:35:04 -07:00
Arpon Kapuria	1019b00028	Update model card for Cohere2 (Command R7B) (#39604 ) * Update model card for Cohere2 (Command R7B) * fix: applied suggested changes	2025-07-30 08:34:26 -07:00
Ethan Villarosa	ecbb5ee194	standardized BARThez model card (#39701 ) * standardized barthez model card according to template * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * suggested changes to barthez model card --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-30 08:33:13 -07:00
Yana Mishula	551a89a4a3	Standardize CLAP model card format (#39738 ) * Standardize CLAP model card format * Apply review feedback * Remove Resources section	2025-07-29 14:13:04 -07:00
StevenBucaille	da70b1389a	docs: Update EfficientLoFTR documentation (#39620 ) * docs: Update EfficientLoFTR documentation * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-29 13:54:44 -07:00
Joao Gante	33aa49df9d	[docs] Ko doc fixes after toc update (#39660 ) * update docs * doc builder working * make fixup	2025-07-29 17:05:26 +01:00
Jaehyeon Shin	1d061536cf	🌐 [i18n-KO] Translated `how_to_hack_models.md` to Korean (#39536 ) * docs: ko: how_to_hack_models.md * feat: nmt draft * fix: manual edits	2025-07-29 08:09:16 -07:00
박종범	43fe41c0a8	🌐 [i18n-KO] Translated `perf_train_gpu_one.md` to Korean (#39552 ) * docs: ko: perf_train_gpu_one.md * feat: nmt draft * fix: manual edits * fix: Manually added missing backticks * Update docs/source/ko/perf_train_gpu_one.md fix: remove space between heading and GPU anchor Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/perf_train_gpu_one.md fix: clarify table headers to indicate training speed boost and memory savings Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/perf_train_gpu_one.md fix: improve readability Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> * Update docs/source/ko/perf_train_gpu_one.md fix : rephrase explanation of data preloading to improve readability Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> --------- Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>	2025-07-29 08:08:57 -07:00
Ahn Joon Sung	9f38763731	🌐 [i18n-KO] Translated `pipeline_gradio.md` to Korean (#39520 ) * docs: ko: pipeline_gradio.md * feat: nmt draft * fix: manual edits * docs: ko: pipeline_gradio.md	2025-07-29 08:04:30 -07:00
Lio (임승섭)	f72311796b	🌐 [i18n-KO] Translated `tokenizer.md` to Korean (#39532 ) * docs: ko: tokenizer.md * feat: nmt draft * fix: manual edits * fix: resolve suggestions Co-authored-by: Yijun Lee <yijun-lee@users.noreply.github.com> Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * fix: resolve suggestions Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> --------- Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>	2025-07-29 08:04:14 -07:00
Kim Juwon	d346d46752	🌐 [i18n-KO] Translated `tvp.md` to Korean (#39578 ) * docs: ko: tvp.md * feat: nmt draft * fix: manual edits * fix: manual edits * fix: manual edits * fix: manual edits * fix: manual edits Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> --------- Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>	2025-07-29 08:04:00 -07:00
Ahnjj_DEV	2f59c15b33	🌐 [i18n-KO] Translated albert.md to Korean (#39524 ) * docs: ko: albert.md * feat: nmt draft * fix: manual edits	2025-07-29 08:03:40 -07:00
Minseo Kim	98386dcee9	🌐 [i18n-KO] Translated `main_classes/peft.md` (#39515 ) * docs: ko: main_classes/peft.md * feat: nmt draft * docs: add missing TOC to documentation for `PeftAdapterMixin` section Added a table of contents (TOC) to the documentation, specifically for the `transformers.integrations.PeftAdapterMixin` section, following the structure and content outlined in [this link](https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin). * fix: Improve naturalness of purpose expression in Korean Changed '관리하기 위한' to '관리할 수 있도록' for more natural Korean expression when describing the purpose of providing functions. * fix: Simplify plural form and make expression more concise Changed '~할 수 없기 때문에' to '~할 수 없어' for more concise expression while maintaining clarity. * fix: Replace technical term '주입' with more natural '적용' Changed '주입할 수 없어' to '적용할 수 없어' for better readability. Considered alternatives: '삽입': Too literal translation of 'inject' '입력': Could be misunderstood as data input '통합': Implies merging two systems '추가': Simple but less precise '적용' was chosen as it's the most natural and widely used term in Korean technical documentation for this context. * fix: update toctree path for PEFT to lowercase Changed the toctree path from 'PEFT' (uppercase) to 'peft' (lowercase) to match the correct directory naming convention and prevent broken links. * docs: update as per reviewer feedback after rebase	2025-07-29 08:03:17 -07:00
Ramesh	4f8f51be4e	Add Fast Segformer Processor (#37024 ) * Add Fast Segformer Processor * Modified the params according to segformer model * modified test_image_processing_Segformer_fast args - removed redundant params like do_center_crop,center_crop which aren't present in the original segformer class * added segmentation_maps processing logic form the slow segformer processing module with references from beitimageprocessing fast * fixed code_quality * added recommended fixes and tests to make sure everything processess smoothly * Fixed SegmentationMapsLogic - modified the preprocessing of segmentation maps to use tensors - added batch support * fixed some mismatched files * modified the tolerance for tests * use modular * fix ci --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-07-28 19:22:32 +00:00
Avigyan Sinha	c353f2bb5e	Superpoint fast image processor (#37804 ) * feat: superpoint fast image processor * fix: reran fast cli command to generate fast config * feat: updated test cases * fix: removed old model add * fix: format fix * Update src/transformers/models/superpoint/image_processing_superpoint_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * fix: ported to torch and made requested changes * fix: removed changes to init * fix: init fix * fix: init format fix * fixed testcases and ported to torch * fix: format fixes * failed test case fix * fix superpoint fast * fix docstring --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-07-28 18:15:06 +00:00
jzhang533	02ea23cbde	update ernie model card (#39657 ) * update ernie model doc Signed-off-by: Zhang Jun <jzhang533@gmail.com> * address ruff format error reported by ci Signed-off-by: Zhang Jun <jzhang533@gmail.com> * address check_repository_consistency error reported by ci Signed-off-by: Zhang Jun <jzhang533@gmail.com> --------- Signed-off-by: Zhang Jun <jzhang533@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-28 10:21:18 +00:00
Garrett Goon	97f8c71f52	Add padding-free to Granite hybrid moe models (#39677 ) * start fixing kwarg handling * fmt * updates padding free tests * docs * add missing kwargs modeling_granitemoe.py * run modular util * rm unrelated changes from modular util	2025-07-25 20:10:50 +02:00
lgai-exaone	c06d4cd6ce	Add EXAONE 4.0 model (#39129 ) * Add EXAONE 4.0 model * Refactor EXAONE 4.0 modeling code * Fix cache slicing on SWA + FA2 * Fix cache slicing on FA2 + HybridCache * Update EXAONE 4.0 modeling code for main branch * Update o_proj for asymmetric projection * Address PR feedback * Add EXAONE 4.0 docs * Update EXAONE 4.0 modeling code for main branch * update * fix updates * updates * fix * fix * fix --------- Co-authored-by: Arthur <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-25 19:58:28 +02:00
Cyril Vallez	6630c5b714	Add xlstm model (#39665 ) * Add xLSTM cleanly with optimizations. * Fix style. * Fix modeling test. * Make xLSTM package optional. * Fix: Update torch version check. * Fix: Bad variable naming in test. * Fix: Import structure cleaning with Ruff. * Fix: Update docstrings. * Fix: Mitigate unused config attr tests by explicit usage. * Fix: Skip tests, if xlstm library is not installed. * Feat: Enable longer context window for inference by chunking. * Fix: Make training test pass by lowering target accuracy. * Chore: Increase test verbosity for failing generation test. * Update docs/source/en/model_doc/xlstm.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix: Make xlstm available even without CUDA. * Chore: Remove unnecessary import. * Fix: Remove BOS insertion. * Chore: Improve xLSTMCache documentation. * Integrate basic xLSTM fallback code. * Chore: Remove unnecessary import. * Chore: Remove duplicate LayerNorm. * chore: update copyright, minor reformatting * fix: refactor mLSTMStateType due to missing torch import * fix: add missing import * Chore: Replace einops. * fix: apply ruff formatting * fix: run `make fix-copies` to re-generate dummy_pt_objects.py * fix: make type hints Python 3.9 compatible * fix: remove obsolete import * fix: remove obsolete method from docs * chore: remove obsolete `force_bos_token_insert` from config * Chore: Remove duplicated xLSTMCache class. * Fix: Formatting of modeling_xlstm.py * Chore: Remove xlstm package requirement from test. Re-add update_rnn_state. * Fix: Update xLSTMCache docstring. * Feat: Add proper initialization of xLSTM. * Chore: Re-format files. * Chore: Adapt format. * Fix: xLSTMCache import restructuring. * Fix: Add __all__ lists to modeling and configuration files. * Chore: Reformat. * Fix: Remove unnecessary update_rnn_state function. * Fix: Undo test accuracy quickfix. * Fix: Update copyright year, remvoe config copy. * Chore: Flatten all internal configs to xLSTMConfig. * Fix: Unused config variables check. * Chore: Remove unnecessary imports. * Fix: Unify xlstm cache argument from batch_size to max_batch_size. * Chore: Remove bad default arg value for xLSTMCache. * Chore: Rename core configuration arguments to HF default in xLSTM. * Chore: Fix formatting. * Fix: xLSTM Cache config access. * Fix: Update xlstm tests for config update. * Feat: Re-add embbeding_dim, num_blocks config options for compat with xLSTM-7B. * Fix: Configuration xLSTM python3.9 syntax. * Fix: Difference to main in test_utils.py assertion. * Fix: Bad syntax in xlstm config for python3.9. * Fix: xLSTMConfig docstring. * Fix: xLSTMConfig docstring. * Fix typing issues in xLSTM and BeiT, Paligemma. * Fix: Exclude xLSTM from test cache utils. * Chore: Fix style. * Chore: Fix format. * Chore: Remove unnecessary LayerNorm, NormLayer layer abstractions. * Chore: Remove asserts and replace with ValueErrors. * Chore: Update __init__.py structure of xLSTM. * Chore: Clean xLSTM initialization of weights. * Fix index names in modeling_xlstm.py * Update xlstm model test typing annotations. * Fix: Remove all asserts. * Revert changes to the main __init__.py * Fix: Move xLSTMCache to modeling_xlstm.py * Fix: Remove xLSTMForCausalLM mapping from modeling_auto.py * Remove xLSTMCache from dummy_pt_objects.py * Fix: Remove extended torchdynamo compilation check integrating cuda graph captures. * Revert test_cache_utils.py xLSTM change. * Fix: Move xLSTM init functions before init call. * Remove xLSTMCache from generation utils. * Fix: Clean xLSTM init functionality for recursive calls. * Fix: Move xLSTMCache before its first call. * Fix formatting. * Add partial docstring for xLSTMModel forward. * Fix xLSTMCache docstring in xLSTMModel. * Remove xLSTMCache from public documentation. Update auto_docstring. * Remove all agressive shape comments * style * Fix names * simplify * remove output_hidden_states * Update modeling_xlstm.py * Update modeling_xlstm.py * Update test_modeling_xlstm.py * Update modeling_xlstm.py * Update modeling_xlstm.py * fix * fix * style * style --------- Co-authored-by: Korbinian Poeppel <korbinian.poeppel@nx-ai.com> Co-authored-by: Korbinian Pöppel <37810656+kpoeppel@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Sebastian Böck <sebastian.boeck@nx-ai.com> Co-authored-by: Korbinian Poeppel <poeppel@ml.jku.at>	2025-07-25 19:39:17 +02:00
Armaghan Shakir	69cff312f5	Add support for DeepseekAI's DeepseekVL (#36248 ) * upload initial code * update deepseek-vl adaptor * update hierarchy of vision model classes * udpate aligner model * add text model * Added Image Processor * Added Image Processor * Added Image Processor * apply masks * remove projection; add aligner * remove interpolate_pos_encoding * remove unused params in config * cleaning * Add the __init__ file * added processing deepseek_vl class * modified the deepseek-vl processor * modified the deepseek-vl processor * update __init__ * Update the image processor class name * Added Deepseek to src/transformers/__init__.py file * Added Deepseek to image_processing_auto.py * update the __init__ file * update deepseek_vl image processor * Update Deepseek Processor * upload fast image processor * Revert "upload fast image processor" This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4. * update image processor * flatten heirarchy * remove DeepseekVLModel * major update (complete modeling) * auto modeling and other files * formatting * fix quality * replace torchvision in modeling * set default do_normalize to False * add fast image processor template using tool * update image processors * add fast image processor to other files * update liscense * Added deepseek image testcases * update image test * update processor * write CHAT_TEMPLATE * update model for processor * fix processor * minor fixes and formatting * fix image processing and tests * fix interpolation in sam * fix output_attentions in DeepseekVLModel * upload test_modeling * fix tests because of vocab size * set use_high_res_vision=False in tests * fix all modeling tests * fix styling * remove explicit background_color from image processors * added test_processor * added test_processor * fix processor tests * update docs * update docs * update docs * update conversion script * Fixed typos * minor fixes from review - remove model_id comments in examples - remove from pre-trained auto mapping - move to image-text-to-text from vision-to-seq in auto mapping - add image_token_index to __init__ for config - remove outdated temporary config in conversion script - update example to use chat_template in docstring example - update liscense 2021->2025 * fix type in config docstring Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * update get_image_features * fix config * improve DeepseekVLImageProcessor.preprocess * return image_hidden_states * use AutoTokenizer and AutoImageProcessor in Processor * fix model outputs * make num_image_tokens configurable * fix docstring of processor * move system prompt to chat template * fix repo consistency * fix return_dict * replace SamVisionEncoder with SamVisionModel * update to remove deepcopy * 🛠️ Major Architectural Changes (Adds DeepseekVLHybrid) * fix quality checks * add missing hybrid in auto modeling * run make style * update sam_hq * update high_res_size in test * update docs following #36979 * update code with auto_docstring * update conversion scripts * fix style * fix failing test because of tuple * set weights_only=True in conversion script * use safetensors.torch.load_file instead of torch.load in conversion script * make output_dir optional in conversion script * fix code snippets in docs (now the examples work fine) * integration tests for DeepseekVL * update expected texts * make style * integration tests for DeepseekVLHybrid * fix class name * update expected texts for hybrid * run "make style" * update since changes in main * run make-style * nits since changes in main * undo changes in sam * fix tests * fix tests; update with main * update with main: output_attention/output_hidden_states * fix copied part in deepseek_vl * run fix-copies * fix output_hidden_states * sam: fix _init_weigths * use modular for DeepseekVL * make image processor more modular * modular: use JanusPreTrainedModel * janus: provide kwargs in loss * update processors in conversion script * Revert "sam: fix _init_weigths" This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27. * run fix-copies --------- Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>	2025-07-25 19:18:50 +02:00
Xibin Bayes Zhou	45c7bfb157	Add evolla rebase main (#36232 ) * add evolla * adding protein encoder part * add initial processing test * save processor * add docstring * add evolla processor * add two test * change vision to protein * change resampler to sequence_compressor * change vision to protein * initial update for llama * add initial update for llamaForCausalLM * add `test_processor`, `test_saprot_output`, `test_protein_encoder_output` * change evolla, but still working on it * add test_single_forward * pass test_attention_outputs * pass test_hidden_states_output * pass test_save_load and test_from_pretrained_no_checkpoint * pass test_cpu_offload * skip some tests * update new progress * skip test_model_is_small * pass test_model_weights_reload_no_missing_tied_weights * pass test_model_get_set_embeddings * pass test_cpu_offload * skip test_resize_embeddings * add pipeline_model_mapping * remote old setUp * pass processor save_pretrained and load_pretrained * remove pooling layer * pass test_inputs_embeds_matches_input_ids * pass test_model_is_small * pass test_attention_outputs * pass test_initialization * pass test_model_get_set_embeddings * pass test_single_forward * skip test_disk_offload_bin and test_disk_offload_safetensors * fix most tests * pass test_protein_encoder_output * remove useless code * add EvollaForProteinText2Text * pass test_saprot_output * pass all EvollaModelTest test and remove processor test * add processor test to its own file * skip is_training since esm skipped it and the saprot code causes error when setting is_training True * pass processor tests * solve all except config * pass most cases * change init * add doc to `configuration_evolla.py` * remove image_processing test * remove extra processor test * remove extra modules * remove extra modules * change all configs into one config * pass all evolla test * pass `make fixup` * update short summary * update Evolla-10B-hf * pass check_dummies.py and check_code_quality * fix `tests/models/auto/test_tokenization_auto.py::AutoTokenizerTest::test_model_name_edge_cases_in_mappings` * remove dummy codes * change format * fix llava issue * update format * update to solve llama3 access issue * update to make forward right * solve processor save load problem from instructblip solution * remove unexpected file * skip `test_generation_tester_mixin_inheritance` * add `test_single_forward_correct` and `test_inference_natural_language_protein_reasoning` * add `modular_evolla.py` * solved issue #36362 * run `make fixup` * update modular * solve float32 training * add fix * solve `utils/check_docstrings.py` * update * update * update * remove other files and replace sequential and einsum * add use case in document * update the models * update model * change some wrong code * Update src/transformers/models/evolla/modular_evolla.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/evolla/modular_evolla.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/evolla/modular_evolla.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/evolla/modular_evolla.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix issues mentioned in PR * update style and rearrange the placement * fix return_dict argument issue * solve SaProtConfig issue * Solve EvollaSaProtRotaryEmbedding issue * solve attention_mask issue * solve almosst all issues * make style * update config * remove unrelated pickle file * delete pickle files * fix config * simplify a lot * remove past k-v from encoder * continue work * style * skip it from init * fix init * fix init * simplify more * fill in docstrings * change test for generation * skip test * fix style --------- Co-authored-by: Chenchen Han <13980209828@163.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-25 19:11:57 +02:00
Anton Vlasjuk	a91653561e	[`Ernie 4.5`] Post merge adaptations (#39664 ) * ernie 4.5 fixes * Apply style fixes * fix --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-07-25 17:36:18 +02:00
Lysandre Debut	f90de364c2	Rename huggingface_cli to hf (#39630 ) * Rename huggingface_cli to hf * hfh	2025-07-25 14:10:04 +02:00
revanth	3b3f9c0c46	fix(voxtral): correct typo in apply_transcription_request (#39572 ) * fix(voxtral): correct typo in apply_transcription_request * temporary wrapper: apply_transcrition_request * Update processing_voxtral.py * style: sort imports in processing_voxtral.py * docs(voxtral): fix typo in voxtral.md * make style * doc update --------- Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>	2025-07-25 12:09:44 +00:00
Joao Gante	e3760501b0	[docs] fix ko cache docs (#39644 ) fix ko docs	2025-07-25 10:06:03 +01:00
lmarshall12	565c035a2e	Add owlv2 fast processor (#39041 ) * add owlv2 fast image processor * add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class * add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class * change references to owlVit to owlv2 in docstrings for post process methods * change type hints from List, Dict, Tuple to list, dict, tuple * remove unused typing imports * add disable grouping argument to group images by shape * run make quality and repo-consistency * use modular * fix auto_docstring --------- Co-authored-by: Lewis Marshall <lewism@elderda.co.uk> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-07-25 02:40:11 +00:00
Matthew Hernandez	7b897fe583	[Docs] Translate audio_classification.md from English to Spanish (#39513 ) * Docs: translate audio_classification to Spanish * Update audio_classification.md * Remove space * Normalize backticks * Update audio_classification.md * Apply corrections recommended by aaronjimv * Update _toctree.yml --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-23 15:55:13 -07:00
Ethan Villarosa	9b7244f189	standardized YOLOS model card according to template in #36979 (#39528 ) * standardized YOLOS model card according to template in #36979 * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * standardized YOLOS model card according to template in #36979 * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * replaced YOLOS architecture image, deleted quantization and AttentionMaskVisualizer sections * removed cli section * Update yolos.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-23 11:00:25 -07:00
JoestarGagan	ec8a09a5fe	Feature/standardize opt model card (#39568 ) * docs: Standardize OPT model card with enhanced details * Remove incorrect link from OPT model card * Address review feedback on OPT model card * Update opt.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-23 10:57:48 -07:00
Eric Bezzam	c5a80dd6c4	🔴 Fix EnCodec internals and integration tests (#39431 ) * EnCodec fixes and update integration tests. * Apply padding mask when normalize is False. * Update comment of copied function. * Fix padding mask within modeling. * Revert padding function. * Simplify handling of padding_mask. * Address variable codebook size. * Add output for padding for consistency with original model, fix docstrings. * last_frame_pad_length as int * Update example code. * Improve docstring/comments. * Shorten expected output. * Consistent docstring. * Parameterize tests. * Properties for derived variables. * Update expected outputs from GitHub runner. * Consistent outputs with runner GPUs.	2025-07-23 19:39:27 +02:00
Maxime Grenu	0fe03afeb8	Fix typos and grammar issues in documentation and code (#39598 ) - Fix Cyrillic 'Р' to Latin 'P' in Portuguese language link (README.md) - Fix 'meanginful' to 'meaningful' in training documentation - Fix duplicate 'Cohere' reference in modular transformers documentation - Fix duplicate 'the the' in trainer and chat command comments 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-07-23 12:43:11 +00:00
Andrei Panferov	623ab01039	FP-Quant support (#38696 ) * quartet * quartet qat -> quartet * format * bf16 backward * interfaces * forward_method * quartet -> fp_quant * style * List -> list * list typing * fixed format and annotations * test_fp_quant * docstrings and default dtypes * better docstring and removed noop checks * docs * pseudoquantization support to test on non-blackwell * pseudoquant * Pseudoquant docs * Update docs/source/en/quantization/fp_quant.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/quantization/fp_quant.md * Update docs/source/en/quantization/fp_quant.md * Update src/transformers/utils/quantization_config.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Update tests/quantization/fp_quant_integration/test_fp_quant.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Update tests/quantization/fp_quant_integration/test_fp_quant.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * small test fixes * dockerfile update * spec link * removed `_process_model_after_weight_loading` * toctree --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-07-23 11:41:10 +02:00

1 2 3 4 5 ...

3537 Commits