HuggingFace_transformer

Author	SHA1	Message	Date
Joao Gante	daab2db33f	[CI] post-`GptOss` fixes for green CI (#39929 )	2025-08-07 16:27:00 +02:00
Arthur	7c38d8fc23	Add GPT OSS model from OpenAI (#39923 ) * fix * nice * where i am at * Bro this works * Update src/transformers/integrations/tensor_parallel.py * cleanups * yups that was breaking * Update src/transformers/models/openai_moe/modeling_openai_moe.py * gather on experts and not mlp * add changes for latest convert branch * adds options to get output_router_logits from config * bring chat temlate + special tokens back into the script. * initial commmit * update * working with shards * add model.safetensors.index.json * fix * fix * mxfp4 flag * rm print * Fix PAD/EOS/BOS (#18) * fix pad/eos/bos * base model maybe one day * add some doc * special tokens based on harmony. * add in tokenizer config as well. * prepare for rebase with main * Fix for initialize_tensor_parallelism now returning 4-tuple ``` [rank0]: File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module> [rank0]: model = AutoModelForCausalLM.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained [rank0]: return model_class.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper [rank0]: return func(args, kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained [rank0]: tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: ValueError: too many values to unpack (expected 3) ``` mxfp4 * mxfp4 draft * fix * fix import * draft * draft impl * finally working ! * simplify * add import * working version * consider blocks and scales * device mesh fix * initial commit * add working dequant + quant logic * update * non nan, gibberish output * working EP + quantization finally ! * start cleaning * remove reversing process * style * some cleaning * initial commmit * more cleaning * more cleaning * simplify * more cleaning * rm duplicated function * changing tp_plan * update tp plan check * add loading attribute * dequantizing logic * use subfunctions * import cleaning * update_param_name * adds clamped swiglu * add clamping to training path * simplify dequant logic * update * Bad merge * more simplifications & tests * fix ! * fix registering custom attention * fix order * fixes * some test nits * nits * nit * fix * Clamp sink logits * Clean * Soft-max trick * Clean up * p * fix deepspeed * update both modeling and modular for cleanup * contiguous * update tests * fix top_k router call * revert renaming * test nits * small fixes for EP * fix path for our local tests * update as I should not have broken that! * fix the loss of mixtral * revert part of the changes related to router_scores, kernel probably no ready for that! * deleting a small nit * update arch * fix post processing * update * running version but not expected output * moving to cuda * initial commit * revert * erroring when loading on cpu * updates * del blocks, scales * fix * style * rm comm * comment * add comment * style * remove duplicated lines * Fix minor issue with weight_map conversion script * fix sampling params * rename to final name * upate pre-final version of template * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * fix batched inference * serve fixes * swizzle ! * update final chat template by Matt. * fix responses; pin oai * sinplify * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * fix * Use ROCm kernels from HUB * Make kernel modes explicit * update final chat template by Matt. x2 * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> * Fix installation * Update setup.py Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com> * allow no content * fix: update message handling in write_tokenizer function * Fix template logic for user message role * last nits for CB and flash_paged! * there was one bad merge * fix CB (hardcode for now, its just using kv groups instead) * fix * better fix for device_map * minor device fix * Fix flash paged * updates * Revert "remove dtensors, not explicit (#39840)" This reverts commit `6dfd561d9c`. * update * Revert "remove dtensors, not explicit (#39840)" This reverts commit `6dfd561d9c`. * fix merge * fix * Fix line break when custom model indentity * nits testing * to locals first and pass sliding window to flash paged * register modes for MegaBlocksMoeMlp * add integration test in fixtures -> now update the tests to use it! * update integration tests * initial fix * style and update tests * fix * chore(gpt oss): remove mlp_bias from configuration It was just a leftover. * stats * Integration tests * whoops * Shouldn't move model * Ensure assistant messages without thinking always go to "final" channel * More checks to ensure expected format * Add pad_token_id to model configuration in write_model function (#51) * Add oai fix fast tests (#59) * Fix some fast tests * Force some updates * Remove unnecessary fixes * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * reasoning -> Reasoning * Add additional integration tests * fixup * Slight fixes * align chat template with harmony * simplify * Add comment * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * Revert fixup * skip 2 test remove todo * merge * padding side should be left for integration tests * fix modular wrt to changes made to modeling * style * isort * fix opies for the loss * mmmm --------- Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: edbeeching <edbeeching@gmail.com> Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com> Co-authored-by: MekkCyber <mekk.cyber@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com> Co-authored-by: Zhuohan Li <zhuohan@openai.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal> Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Akos Hadnagy <akos@ahadnagy.com> Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com> Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Matt <rocketknight1@gmail.com>	2025-08-05 18:02:18 +02:00
ppaanngggg	c430047602	[docs] update object detection guide (#39909 ) * Update object_detection.md * Update object_detection.md	2025-08-05 14:07:21 +00:00
Lysandre Debut	00d47757bf	Reorder serving docs (#39634 ) * Slight reorg * LLMs + draft VLMs * Actual VLM examples * Initial responses * Reorder * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/tiny_agents.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/open_webui.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/cursor.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/serving.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Responses API * Address Pedro's comments --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2025-08-05 08:43:06 +02:00
Arpon Kapuria	8c4ea670dc	chore: update DETR model card (#39822 ) * Update model card for DETR * fix: applied suggested changes * fix: simplified pipeline and modified notes and resources * Update detr.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-04 12:25:53 -07:00
Jan Netík	0bd91cc822	Add support for `ModernBertForMultipleChoice` (#39232 ) * implement ModernBertForMultipleChoice * fixup, style, repo consistency * generate modeling_modernbert * add tests + docs * fix test	2025-08-04 20:45:43 +02:00
rohitthewanderer	3bafa128dc	[DOCS] : Improved mimi model card (#39824 ) * [DOCS] : Improved mimi model card * Removed additional header * Review: addressed feedback * Update mimi.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-04 10:07:06 -07:00
Akib Jawad	2a9febd632	Add support for including in-memory videos (not just files/urls) in apply_chat_template (#39494 ) * added code for handling video object ,as dictionary of frames and metadata, in chat template * added new test where videos are passed as objects (dict of frames, metadata) in the chat template * modified hardcoded video_len check that does not match with increased number of tests cases. * Modify hardcoded video_len check that fails with increased number of tests * update documentation of multi-modal chat templating with extra information about including video object in chat template. * add array handling in load_video() * temporary test video inlcuded * skip testing smolvlm with videos that are list of frames * update documentation & make fixup * Address review comments	2025-08-04 11:49:42 +02:00
StevenBucaille	1ec0feccdd	[image-processing] deprecate `plot_keypoint_matching`, make `visualize_keypoint_matching` as a standard (#39830 ) * fix: deprecate plot_keypoint_matching and make visualize_keypoint_matching for all Keypoint Matching models * refactor: added copied from * fix: make style * fix: repo consistency * fix: make style * docs: added missing method in SuperGlue docs	2025-08-01 16:29:57 +00:00
Yoni Gozlan	7b4d9843ba	Add fast image processor Janus, Deepseek VL, Deepseek VL hybrid (#39739 ) * add fast image processor Janus, deepseek_vl, deepseek_vl_hybrid * fix after review	2025-08-01 12:20:08 -04:00
rziga	3951d4ad5d	Add MM Grounding DINO (#37925 ) * first commit Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed. * fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model * cleaned up a hack in the conversion script * Fixed the expected values in integration tests Cross att masking and cpu-gpu consistency tests are still failing however. * changes for make style and quality * add documentation * clean up contrastive embedding * add mm grounding dino to loss mapping * add model link to config docstring * hack fix for mm grounding dino consistency tests * add special cases for unused config attr check * add all models and update docs * update model doc to the new style * Use super_kwargs for modular config * Move init to the _init_weights function * Add copied from for tests * fixup * update typehints * Fix-copies for tests * fix-copies * Fix init test * fix snippets in docs * fix consistency * fix consistency * update conversion script * fix nits in readme and remove old comments from conversion script * add license * remove unused config args * remove unnecessary if/else in model init * fix quality * Update references * fix test * fixup --------- Co-authored-by: qubvel <qubvel@gmail.com>	2025-08-01 15:43:23 +01:00
Eric Bezzam	2c0af41ce5	Fix bad markdown links (#39819 ) Fix bad markdown links.	2025-07-31 09:14:14 -07:00
Tommy Chiang	4fcf455517	Fix broken links (#39809 ) Replace links in the form of `[text]((url))` to `[text](url)`. This is the correct format of a url in the markdown.	2025-07-31 13:23:04 +00:00
Raushan Turganbay	b937d47455	[cohere2 vision] move doc to multimodal section (#39820 ) move doc to multimodal section	2025-07-31 15:13:02 +02:00
Kyle Duffy	6ba8a1ff45	Update documentation for Cohere2Vision models (#39817 ) * Update docs with pipeline example * Add Cohere2Vision to list of vision models * Sort models	2025-07-31 11:58:45 +00:00
Raushan Turganbay	e1688d28d3	[Model] Cohere2 Vision (#39810 ) * Add cohere2_vision to support CohereLabs/command-a-vision-07-2025 * update and add modualr file * update processors and check with orig impl later * delete unused files * image processor reduce LOC and re-use GotOCR2 * update the config to use modular * model tests pass * processor fixes * check model outputs decorator * address one more comment * Update tokens. Temp - need to read from tokenizer' * fix for multi-gpu * Fix image token handling * upadte image token expansion logic * fix a few issues with remote code loading * not related but modular forces us to change all files now * Add overview and code sample to cohere vision docs * add scripts. TMP. * Update inference script * Create script * set dtype in export script * TO revert: modular export fix * Fix scripts * Revert "TO revert: modular export fix" This reverts commit bdb2f305b61027a05f0032ce70d6ca698879191c. * Use modular weights * Upload to hub Removed OOD weights ad script * Updated docs * fix import error Update docs Added pipeline test * Updated docs * Run modular script remove modular for config Added patch_size Added docstrings in modular Fix OOM Add docs, fixup integration tests. 8-gpu passing * tiny updates * address comments + fixup * add test for chat template * check model outputs workaround * aya vision fix check model inputs * Revert "add test for chat template" This reverts commit 42c756e397f588d76b449ff1f93292d8ee0202d8. * reveert more changes * last revert * skip and merge * faulty copy from --------- Co-authored-by: Julian Mack <julian.mack@cohere.com> Co-authored-by: kyle-cohere <kyle@cohere.com>	2025-07-31 10:57:34 +00:00
Drew Ross	7abb5d3992	Update mT5 model card (#39702 ) * Update mt5 model card * Fix casing of model title * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-30 08:35:04 -07:00
Arpon Kapuria	1019b00028	Update model card for Cohere2 (Command R7B) (#39604 ) * Update model card for Cohere2 (Command R7B) * fix: applied suggested changes	2025-07-30 08:34:26 -07:00
Ethan Villarosa	ecbb5ee194	standardized BARThez model card (#39701 ) * standardized barthez model card according to template * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/barthez.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * suggested changes to barthez model card --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-30 08:33:13 -07:00
Yana Mishula	551a89a4a3	Standardize CLAP model card format (#39738 ) * Standardize CLAP model card format * Apply review feedback * Remove Resources section	2025-07-29 14:13:04 -07:00
StevenBucaille	da70b1389a	docs: Update EfficientLoFTR documentation (#39620 ) * docs: Update EfficientLoFTR documentation * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-29 13:54:44 -07:00
Ramesh	4f8f51be4e	Add Fast Segformer Processor (#37024 ) * Add Fast Segformer Processor * Modified the params according to segformer model * modified test_image_processing_Segformer_fast args - removed redundant params like do_center_crop,center_crop which aren't present in the original segformer class * added segmentation_maps processing logic form the slow segformer processing module with references from beitimageprocessing fast * fixed code_quality * added recommended fixes and tests to make sure everything processess smoothly * Fixed SegmentationMapsLogic - modified the preprocessing of segmentation maps to use tensors - added batch support * fixed some mismatched files * modified the tolerance for tests * use modular * fix ci --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-07-28 19:22:32 +00:00
Avigyan Sinha	c353f2bb5e	Superpoint fast image processor (#37804 ) * feat: superpoint fast image processor * fix: reran fast cli command to generate fast config * feat: updated test cases * fix: removed old model add * fix: format fix * Update src/transformers/models/superpoint/image_processing_superpoint_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * fix: ported to torch and made requested changes * fix: removed changes to init * fix: init fix * fix: init format fix * fixed testcases and ported to torch * fix: format fixes * failed test case fix * fix superpoint fast * fix docstring --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-07-28 18:15:06 +00:00
jzhang533	02ea23cbde	update ernie model card (#39657 ) * update ernie model doc Signed-off-by: Zhang Jun <jzhang533@gmail.com> * address ruff format error reported by ci Signed-off-by: Zhang Jun <jzhang533@gmail.com> * address check_repository_consistency error reported by ci Signed-off-by: Zhang Jun <jzhang533@gmail.com> --------- Signed-off-by: Zhang Jun <jzhang533@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-28 10:21:18 +00:00
Garrett Goon	97f8c71f52	Add padding-free to Granite hybrid moe models (#39677 ) * start fixing kwarg handling * fmt * updates padding free tests * docs * add missing kwargs modeling_granitemoe.py * run modular util * rm unrelated changes from modular util	2025-07-25 20:10:50 +02:00
lgai-exaone	c06d4cd6ce	Add EXAONE 4.0 model (#39129 ) * Add EXAONE 4.0 model * Refactor EXAONE 4.0 modeling code * Fix cache slicing on SWA + FA2 * Fix cache slicing on FA2 + HybridCache * Update EXAONE 4.0 modeling code for main branch * Update o_proj for asymmetric projection * Address PR feedback * Add EXAONE 4.0 docs * Update EXAONE 4.0 modeling code for main branch * update * fix updates * updates * fix * fix * fix --------- Co-authored-by: Arthur <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-07-25 19:58:28 +02:00
Cyril Vallez	6630c5b714	Add xlstm model (#39665 ) * Add xLSTM cleanly with optimizations. * Fix style. * Fix modeling test. * Make xLSTM package optional. * Fix: Update torch version check. * Fix: Bad variable naming in test. * Fix: Import structure cleaning with Ruff. * Fix: Update docstrings. * Fix: Mitigate unused config attr tests by explicit usage. * Fix: Skip tests, if xlstm library is not installed. * Feat: Enable longer context window for inference by chunking. * Fix: Make training test pass by lowering target accuracy. * Chore: Increase test verbosity for failing generation test. * Update docs/source/en/model_doc/xlstm.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix: Make xlstm available even without CUDA. * Chore: Remove unnecessary import. * Fix: Remove BOS insertion. * Chore: Improve xLSTMCache documentation. * Integrate basic xLSTM fallback code. * Chore: Remove unnecessary import. * Chore: Remove duplicate LayerNorm. * chore: update copyright, minor reformatting * fix: refactor mLSTMStateType due to missing torch import * fix: add missing import * Chore: Replace einops. * fix: apply ruff formatting * fix: run `make fix-copies` to re-generate dummy_pt_objects.py * fix: make type hints Python 3.9 compatible * fix: remove obsolete import * fix: remove obsolete method from docs * chore: remove obsolete `force_bos_token_insert` from config * Chore: Remove duplicated xLSTMCache class. * Fix: Formatting of modeling_xlstm.py * Chore: Remove xlstm package requirement from test. Re-add update_rnn_state. * Fix: Update xLSTMCache docstring. * Feat: Add proper initialization of xLSTM. * Chore: Re-format files. * Chore: Adapt format. * Fix: xLSTMCache import restructuring. * Fix: Add __all__ lists to modeling and configuration files. * Chore: Reformat. * Fix: Remove unnecessary update_rnn_state function. * Fix: Undo test accuracy quickfix. * Fix: Update copyright year, remvoe config copy. * Chore: Flatten all internal configs to xLSTMConfig. * Fix: Unused config variables check. * Chore: Remove unnecessary imports. * Fix: Unify xlstm cache argument from batch_size to max_batch_size. * Chore: Remove bad default arg value for xLSTMCache. * Chore: Rename core configuration arguments to HF default in xLSTM. * Chore: Fix formatting. * Fix: xLSTM Cache config access. * Fix: Update xlstm tests for config update. * Feat: Re-add embbeding_dim, num_blocks config options for compat with xLSTM-7B. * Fix: Configuration xLSTM python3.9 syntax. * Fix: Difference to main in test_utils.py assertion. * Fix: Bad syntax in xlstm config for python3.9. * Fix: xLSTMConfig docstring. * Fix: xLSTMConfig docstring. * Fix typing issues in xLSTM and BeiT, Paligemma. * Fix: Exclude xLSTM from test cache utils. * Chore: Fix style. * Chore: Fix format. * Chore: Remove unnecessary LayerNorm, NormLayer layer abstractions. * Chore: Remove asserts and replace with ValueErrors. * Chore: Update __init__.py structure of xLSTM. * Chore: Clean xLSTM initialization of weights. * Fix index names in modeling_xlstm.py * Update xlstm model test typing annotations. * Fix: Remove all asserts. * Revert changes to the main __init__.py * Fix: Move xLSTMCache to modeling_xlstm.py * Fix: Remove xLSTMForCausalLM mapping from modeling_auto.py * Remove xLSTMCache from dummy_pt_objects.py * Fix: Remove extended torchdynamo compilation check integrating cuda graph captures. * Revert test_cache_utils.py xLSTM change. * Fix: Move xLSTM init functions before init call. * Remove xLSTMCache from generation utils. * Fix: Clean xLSTM init functionality for recursive calls. * Fix: Move xLSTMCache before its first call. * Fix formatting. * Add partial docstring for xLSTMModel forward. * Fix xLSTMCache docstring in xLSTMModel. * Remove xLSTMCache from public documentation. Update auto_docstring. * Remove all agressive shape comments * style * Fix names * simplify * remove output_hidden_states * Update modeling_xlstm.py * Update modeling_xlstm.py * Update test_modeling_xlstm.py * Update modeling_xlstm.py * Update modeling_xlstm.py * fix * fix * style * style --------- Co-authored-by: Korbinian Poeppel <korbinian.poeppel@nx-ai.com> Co-authored-by: Korbinian Pöppel <37810656+kpoeppel@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Sebastian Böck <sebastian.boeck@nx-ai.com> Co-authored-by: Korbinian Poeppel <poeppel@ml.jku.at>	2025-07-25 19:39:17 +02:00
Armaghan Shakir	69cff312f5	Add support for DeepseekAI's DeepseekVL (#36248 ) * upload initial code * update deepseek-vl adaptor * update hierarchy of vision model classes * udpate aligner model * add text model * Added Image Processor * Added Image Processor * Added Image Processor * apply masks * remove projection; add aligner * remove interpolate_pos_encoding * remove unused params in config * cleaning * Add the __init__ file * added processing deepseek_vl class * modified the deepseek-vl processor * modified the deepseek-vl processor * update __init__ * Update the image processor class name * Added Deepseek to src/transformers/__init__.py file * Added Deepseek to image_processing_auto.py * update the __init__ file * update deepseek_vl image processor * Update Deepseek Processor * upload fast image processor * Revert "upload fast image processor" This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4. * update image processor * flatten heirarchy * remove DeepseekVLModel * major update (complete modeling) * auto modeling and other files * formatting * fix quality * replace torchvision in modeling * set default do_normalize to False * add fast image processor template using tool * update image processors * add fast image processor to other files * update liscense * Added deepseek image testcases * update image test * update processor * write CHAT_TEMPLATE * update model for processor * fix processor * minor fixes and formatting * fix image processing and tests * fix interpolation in sam * fix output_attentions in DeepseekVLModel * upload test_modeling * fix tests because of vocab size * set use_high_res_vision=False in tests * fix all modeling tests * fix styling * remove explicit background_color from image processors * added test_processor * added test_processor * fix processor tests * update docs * update docs * update docs * update conversion script * Fixed typos * minor fixes from review - remove model_id comments in examples - remove from pre-trained auto mapping - move to image-text-to-text from vision-to-seq in auto mapping - add image_token_index to __init__ for config - remove outdated temporary config in conversion script - update example to use chat_template in docstring example - update liscense 2021->2025 * fix type in config docstring Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * update get_image_features * fix config * improve DeepseekVLImageProcessor.preprocess * return image_hidden_states * use AutoTokenizer and AutoImageProcessor in Processor * fix model outputs * make num_image_tokens configurable * fix docstring of processor * move system prompt to chat template * fix repo consistency * fix return_dict * replace SamVisionEncoder with SamVisionModel * update to remove deepcopy * 🛠️ Major Architectural Changes (Adds DeepseekVLHybrid) * fix quality checks * add missing hybrid in auto modeling * run make style * update sam_hq * update high_res_size in test * update docs following #36979 * update code with auto_docstring * update conversion scripts * fix style * fix failing test because of tuple * set weights_only=True in conversion script * use safetensors.torch.load_file instead of torch.load in conversion script * make output_dir optional in conversion script * fix code snippets in docs (now the examples work fine) * integration tests for DeepseekVL * update expected texts * make style * integration tests for DeepseekVLHybrid * fix class name * update expected texts for hybrid * run "make style" * update since changes in main * run make-style * nits since changes in main * undo changes in sam * fix tests * fix tests; update with main * update with main: output_attention/output_hidden_states * fix copied part in deepseek_vl * run fix-copies * fix output_hidden_states * sam: fix _init_weigths * use modular for DeepseekVL * make image processor more modular * modular: use JanusPreTrainedModel * janus: provide kwargs in loss * update processors in conversion script * Revert "sam: fix _init_weigths" This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27. * run fix-copies --------- Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>	2025-07-25 19:18:50 +02:00
Xibin Bayes Zhou	45c7bfb157	Add evolla rebase main (#36232 ) * add evolla * adding protein encoder part * add initial processing test * save processor * add docstring * add evolla processor * add two test * change vision to protein * change resampler to sequence_compressor * change vision to protein * initial update for llama * add initial update for llamaForCausalLM * add `test_processor`, `test_saprot_output`, `test_protein_encoder_output` * change evolla, but still working on it * add test_single_forward * pass test_attention_outputs * pass test_hidden_states_output * pass test_save_load and test_from_pretrained_no_checkpoint * pass test_cpu_offload * skip some tests * update new progress * skip test_model_is_small * pass test_model_weights_reload_no_missing_tied_weights * pass test_model_get_set_embeddings * pass test_cpu_offload * skip test_resize_embeddings * add pipeline_model_mapping * remote old setUp * pass processor save_pretrained and load_pretrained * remove pooling layer * pass test_inputs_embeds_matches_input_ids * pass test_model_is_small * pass test_attention_outputs * pass test_initialization * pass test_model_get_set_embeddings * pass test_single_forward * skip test_disk_offload_bin and test_disk_offload_safetensors * fix most tests * pass test_protein_encoder_output * remove useless code * add EvollaForProteinText2Text * pass test_saprot_output * pass all EvollaModelTest test and remove processor test * add processor test to its own file * skip is_training since esm skipped it and the saprot code causes error when setting is_training True * pass processor tests * solve all except config * pass most cases * change init * add doc to `configuration_evolla.py` * remove image_processing test * remove extra processor test * remove extra modules * remove extra modules * change all configs into one config * pass all evolla test * pass `make fixup` * update short summary * update Evolla-10B-hf * pass check_dummies.py and check_code_quality * fix `tests/models/auto/test_tokenization_auto.py::AutoTokenizerTest::test_model_name_edge_cases_in_mappings` * remove dummy codes * change format * fix llava issue * update format * update to solve llama3 access issue * update to make forward right * solve processor save load problem from instructblip solution * remove unexpected file * skip `test_generation_tester_mixin_inheritance` * add `test_single_forward_correct` and `test_inference_natural_language_protein_reasoning` * add `modular_evolla.py` * solved issue #36362 * run `make fixup` * update modular * solve float32 training * add fix * solve `utils/check_docstrings.py` * update * update * update * remove other files and replace sequential and einsum * add use case in document * update the models * update model * change some wrong code * Update src/transformers/models/evolla/modular_evolla.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/evolla/modular_evolla.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/evolla/modular_evolla.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update src/transformers/models/evolla/modular_evolla.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix issues mentioned in PR * update style and rearrange the placement * fix return_dict argument issue * solve SaProtConfig issue * Solve EvollaSaProtRotaryEmbedding issue * solve attention_mask issue * solve almosst all issues * make style * update config * remove unrelated pickle file * delete pickle files * fix config * simplify a lot * remove past k-v from encoder * continue work * style * skip it from init * fix init * fix init * simplify more * fill in docstrings * change test for generation * skip test * fix style --------- Co-authored-by: Chenchen Han <13980209828@163.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-07-25 19:11:57 +02:00
Anton Vlasjuk	a91653561e	[`Ernie 4.5`] Post merge adaptations (#39664 ) * ernie 4.5 fixes * Apply style fixes * fix --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-07-25 17:36:18 +02:00
Lysandre Debut	f90de364c2	Rename huggingface_cli to hf (#39630 ) * Rename huggingface_cli to hf * hfh	2025-07-25 14:10:04 +02:00
revanth	3b3f9c0c46	fix(voxtral): correct typo in apply_transcription_request (#39572 ) * fix(voxtral): correct typo in apply_transcription_request * temporary wrapper: apply_transcrition_request * Update processing_voxtral.py * style: sort imports in processing_voxtral.py * docs(voxtral): fix typo in voxtral.md * make style * doc update --------- Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>	2025-07-25 12:09:44 +00:00
lmarshall12	565c035a2e	Add owlv2 fast processor (#39041 ) * add owlv2 fast image processor * add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class * add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class * change references to owlVit to owlv2 in docstrings for post process methods * change type hints from List, Dict, Tuple to list, dict, tuple * remove unused typing imports * add disable grouping argument to group images by shape * run make quality and repo-consistency * use modular * fix auto_docstring --------- Co-authored-by: Lewis Marshall <lewism@elderda.co.uk> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>	2025-07-25 02:40:11 +00:00
Ethan Villarosa	9b7244f189	standardized YOLOS model card according to template in #36979 (#39528 ) * standardized YOLOS model card according to template in #36979 * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * standardized YOLOS model card according to template in #36979 * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/yolos.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * replaced YOLOS architecture image, deleted quantization and AttentionMaskVisualizer sections * removed cli section * Update yolos.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-23 11:00:25 -07:00
JoestarGagan	ec8a09a5fe	Feature/standardize opt model card (#39568 ) * docs: Standardize OPT model card with enhanced details * Remove incorrect link from OPT model card * Address review feedback on OPT model card * Update opt.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-23 10:57:48 -07:00
Eric Bezzam	c5a80dd6c4	🔴 Fix EnCodec internals and integration tests (#39431 ) * EnCodec fixes and update integration tests. * Apply padding mask when normalize is False. * Update comment of copied function. * Fix padding mask within modeling. * Revert padding function. * Simplify handling of padding_mask. * Address variable codebook size. * Add output for padding for consistency with original model, fix docstrings. * last_frame_pad_length as int * Update example code. * Improve docstring/comments. * Shorten expected output. * Consistent docstring. * Parameterize tests. * Properties for derived variables. * Update expected outputs from GitHub runner. * Consistent outputs with runner GPUs.	2025-07-23 19:39:27 +02:00
Maxime Grenu	0fe03afeb8	Fix typos and grammar issues in documentation and code (#39598 ) - Fix Cyrillic 'Р' to Latin 'P' in Portuguese language link (README.md) - Fix 'meanginful' to 'meaningful' in training documentation - Fix duplicate 'Cohere' reference in modular transformers documentation - Fix duplicate 'the the' in trainer and chat command comments 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-07-23 12:43:11 +00:00
Andrei Panferov	623ab01039	FP-Quant support (#38696 ) * quartet * quartet qat -> quartet * format * bf16 backward * interfaces * forward_method * quartet -> fp_quant * style * List -> list * list typing * fixed format and annotations * test_fp_quant * docstrings and default dtypes * better docstring and removed noop checks * docs * pseudoquantization support to test on non-blackwell * pseudoquant * Pseudoquant docs * Update docs/source/en/quantization/fp_quant.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/quantization/fp_quant.md * Update docs/source/en/quantization/fp_quant.md * Update src/transformers/utils/quantization_config.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Update tests/quantization/fp_quant_integration/test_fp_quant.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Update tests/quantization/fp_quant_integration/test_fp_quant.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * small test fixes * dockerfile update * spec link * removed `_process_model_after_weight_loading` * toctree --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-07-23 11:41:10 +02:00
Sangbum Daniel Choi	d9b35c635e	Mask2former & Maskformer Fast Image Processor (#35685 ) * add maskformerfast * test * revert do_reduce_labels and add testing * make style & fix-copies * add mask2former and make fix-copies TO DO: add test for mask2former * make fix-copies * fill docstring * enable mask2former fast processor * python utils/custom_init_isort.py * make fix-copies * fix PR's comments * modular file update * add license * make style * modular file * make fix-copies * merge * temp commit * finish up maskformer mask2former * remove zero shot examples --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-07-23 02:47:47 +00:00
Quentin Gallouédec	6e9972962f	🎯 Trackio integration (#38814 ) * First attempt * fix * fix * Enhance TrackioCallback to log GPU memory usage and allocation * Enhance Trackio integration in callbacks and training arguments documentation * re order * remove unused lines * fix torch optional	2025-07-22 14:50:20 -07:00
space_samurai	c6d0500d15	[WIP] Add OneformerFastImageProcessor (#38343 ) * [WIP] OneformerFastImageProcessor * update init * Fully working oneformer image processor fast * change Nearest to Neares exact interpolation where needed * fix doc --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-07-22 20:41:39 +00:00
Harry Mellor	4884b6bf41	Fix link in "Inference server backends" doc (#39589 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-22 16:44:08 +00:00
Cássia Sampaio	cbcb8e6c1f	updated mistral3 model card (#39531 ) * updated mistral3 model card (#1) * updated mistral3 model card * applying suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * made all changes to mistral3.md * adding space between paragraphs in docs/source/en/model_doc/mistral3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * removing duplicate in mistral3.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * adding 4 backticks to preserve formatting --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-22 09:01:55 -07:00
Manuel de Prada Corral	c338fd43b0	[cache refactor] Move all the caching logic to a per-layer approach (#39106 ) * Squash for refactor: Replace monolithic cache classes with modular LayeredCache (#38077) - Introduces CacheLayer and Cache base classes - Ports Static, Dynamic, Offloaded, Quantized, Hybrid, etc. to use layers - Implements method/attr dispatch across layers to reduce boilerplate - Adds CacheProcessor hooks for offloading, quantization, etc. - Updates and passes tests * fix quantized, add tests * remove CacheProcessorList * raushan review, arthur review * joao review: minor things * remove cache configs, make CacheLayer a mixin (joaos review) * back to storage inside Cache() * remove cachebase for decorator * no more __getattr__ * fix tests * joaos review except docs * fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant` More verbose exceptions in `fix_docstring` on docstring formatting issues. * Revert "back to storage inside Cache()" This reverts commit 27916bc2737806bf849ce2148cb1e66d59573913. * cyril review * simplify cache export * fix lfm2 cache * HybridChunked to layer * BC proxy object for cache.key_cache[i]=... * reorder classes * bfff come on LFM2 * better tests for hybrid and hybridChunked * complete coverage for hybrid chunked caches (prefill chunking) * reimplementing HybridChunked * cyril review * fix ci * docs for cache refactor * docs * oopsie * oopsie * fix after merge * cyril review * arthur review * opsie * fix lfm2 * opsie2	2025-07-22 16:10:25 +02:00
Raushan Turganbay	1806583390	[docs] Create page on inference servers with transformers backend (#39550 ) * draft docs on inference servers * Update docs/source/en/_toctree.yml Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * update * dic build failed * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/transformers_as_backend.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * apply last suggestions --------- Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-22 15:31:10 +02:00
Raushan Turganbay	cd98c1fee3	[docs] update attention implementation and cache docs (#39547 ) * update docs * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * applu suggestions --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-22 15:06:43 +02:00
StevenBucaille	a88ea9cbc8	Add EfficientLoFTR model (#36355 ) * initial commit * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix: various typos, typehints, refactors from suggestions * fix: fine_matching method * Added EfficientLoFTRModel and AutoModelForKeypointMatching class * fix: got rid of compilation breaking instructions * docs: added todo for plot * fix: used correct hub repo * docs: added comments * fix: run modular * doc: added PyTorch badge * fix: model repo typo in config * fix: make modular * fix: removed mask values from outputs * feat: added plot_keypoint_matching to EfficientLoFTRImageProcessor * feat: added SuperGlueForKeypointMatching to AutoModelForKeypointMatching list * fix: reformat * refactor: renamed aggregation_sizes config parameter into q, kv aggregation kernel size and stride * doc: added q, kv aggregation kernel size and stride doc to config * refactor: converted efficientloftr implementation from modular to copied from mechanism * tests: overwrote batching_equivalence for "keypoints" specific tests * fix: changed EfficientLoFTRConfig import in test_modeling_rope_utils * fix: make fix-copies * fix: make style * fix: update rope function to make meta tests pass * fix: rename plot_keypoint_matching to visualize_output for clarity * refactor: optimize image pair processing by removing redundant target size calculations * feat: add EfficientLoFTRImageProcessor to image processor mapping * refactor: removed logger and updated attention forward * refactor: added auto_docstring and can_return_tuple decorators * refactor: update type imports * refactor: update type hints from List/Dict to list/dict for consistency * refactor: update MODEL_MAPPING_NAMES and __all__ to include LightGlue and AutoModelForKeypointMatching * fix: change type hint for size parameter in EfficientLoFTRImageProcessor to Optional[dict] * fix typing * fix some typing issues * nit * a few more typehint fixes * Remove output_attentions and output_hidden_states from modeling code * else -> elif to support efficientloftr * nit * tests: added EfficientLoFTR image processor tests * refactor: reorder functions * chore: update copyright year in EfficientLoFTR test file * Use default rope * Add docs * Update visualization method * fix doc order * remove 2d rope test * Update src/transformers/models/efficientloftr/modeling_efficientloftr.py * fix docs * Update src/transformers/models/efficientloftr/image_processing_efficientloftr.py * update gradient * refactor: removed unused codepath * Add motivation to keep postprocessing in modeling code * refactor: removed unnecessary variable declarations * docs: use load_image from image_utils * refactor: moved stage in and out channels computation to configuration * refactor: set an intermediate_size parameter to be more explicit * refactor: removed all mentions of attention masks as they are not used * refactor: moved position_embeddings to be computed once in the model instead of every layer * refactor: removed unnecessary hidden expansion parameter from config * refactor: removed completely hidden expansions * refactor: removed position embeddings slice function * tests: fixed broken tests because of previous commit * fix is_grayscale typehint * not refactoring * not renaming * move h/w to embeddings class * Precompute embeddings in init * fix: replaced cuda device in convert script to accelerate device * fix: replaced stevenbucaille repo to zju-community * Remove accelerator.device from conversion script * refactor: moved parameter computation in configuration instead of figuring it out when instantiating a Module * fix: removed unused attributes in configuration * fix: missing self * fix: refactoring and tests * fix: make style --------- Co-authored-by: steven <steven.bucaille@buawei.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-07-22 10:53:16 +01:00
nlhm	fbeaf96f9e	Update OLMoE model card (#39344 ) * Update OLMoE model card * Checks Test * Add license and code * Update docs/source/en/model_doc/olmoe.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update olmoe.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-21 16:41:01 -07:00
Orion Weller	641aaed7c0	Update modernbertdecoder docs (#39453 ) * update docs with paper and real model * nit * Apply suggestions from code review Thanks to @stevhlui! Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Remove usage examples, add quantization --------- Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-07-21 16:40:22 -07:00
Anton Vlasjuk	b4115a426e	[`Ernie 4.5`] Add ernie text models (#39228 ) Some checks failed Release - Conda / build_and_package (push) Has been cancelled Details Secret Leaks / trufflehog (push) Has been cancelled Details * init * copied from remote * add proper structure and llama like structure * fixup * revert to state that works * get closer to llama * slow and steady * some removal * masks work * it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm * nice * getting closer * closer to transformers style * let's simplify this, batching works now * simplified * working version with modular * it is indeed the rotation per weights, make it complete llama style * cleanup conversion, next to look at -> tokenizer * remove llama artefacts * fix modeling tests (common ones) * style * integration test + first look into tokenization (will need more work, focussing on modeling other models first) * style * working moe version, based on remote * lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view) * more cleanup * refactor namings and remove addition forXXX classes * our moe won't cut it it seems, correction bias seems to be missing in remote code version * tokenization change (remote) * our moe version works when adding normalization :D * cleanup moe * nits * cleanup modeling -> let's get to modular next * style * modular v1 * minor things + attempt at conversion (which doesn't work) * no conversion follow glm, fixup modular and other nits * modular cleanup * fixes * tests, tests, tests + some moe dtype forcing * simplify modular, fix fatal fa2 bug, remaining tests * fix import issue? * some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float * fix sdpa test, load on init dtype only * fixup post merge * style * fix doc links * tokenization cleanup beginnings * simplify tokenizer by a lot as its basically llama * tokenizer is full llama with different defaults + extra special tokens * sync og special tokens of ernie * fix decoding with numbers (also in remote done what a timing), begin of tok tests * align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama) * nits * docs * my daily post merge it is * check * tokenization update with explanations and conversion script * review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio) * post merge fixes * fixup tokenization, llama fast is the way to go * more fixups * check * import fixes * correction bias following the paddle code * fix * fix TP plan, fix correction bias sharding during forward * style * whoops * fix tied weights * docs and last nit * license * flasky tests * move repo id, update when merged on the hub	2025-07-21 19:51:49 +02:00

1 2 3 4 5 ...

1890 Commits