* Fix MXFP4 quantizer validation to enable CPU dequantization
Move dequantize check before CUDA availability check to allow
CPU inference when quantization_config.dequantize is True.
This enables users to run MXFP4 models on CPU by automatically
converting them to BF16 format.
* Add tests for MXFP4 quantizer CPU dequantization validation
* fix: format mxfp4 test file with ruff
* fix
* nice
* where i am at
* Bro this works
* Update src/transformers/integrations/tensor_parallel.py
* cleanups
* yups that was breaking
* Update src/transformers/models/openai_moe/modeling_openai_moe.py
* gather on experts and not mlp
* add changes for latest convert branch
* adds options to get output_router_logits from config
* bring chat temlate + special tokens back into the script.
* initial commmit
* update
* working with shards
* add model.safetensors.index.json
* fix
* fix
* mxfp4 flag
* rm print
* Fix PAD/EOS/BOS (#18)
* fix pad/eos/bos
* base model maybe one day
* add some doc
* special tokens based on harmony.
* add in tokenizer config as well.
* prepare for rebase with main
* Fix for initialize_tensor_parallelism now returning 4-tuple
```
[rank0]: File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module>
[rank0]: model = AutoModelForCausalLM.from_pretrained(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained
[rank0]: return model_class.from_pretrained(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained
[rank0]: tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ValueError: too many values to unpack (expected 3)
```
* mxfp4
* mxfp4 draft
* fix
* fix import
* draft
* draft impl
* finally working !
* simplify
* add import
* working version
* consider blocks and scales
* device mesh fix
* initial commit
* add working dequant + quant logic
* update
* non nan, gibberish output
* working EP + quantization finally !
* start cleaning
* remove reversing process
* style
* some cleaning
* initial commmit
* more cleaning
* more cleaning
* simplify
* more cleaning
* rm duplicated function
* changing tp_plan
* update tp plan check
* add loading attribute
* dequantizing logic
* use subfunctions
* import cleaning
* update_param_name
* adds clamped swiglu
* add clamping to training path
* simplify dequant logic
* update
* Bad merge
* more simplifications & tests
* fix !
* fix registering custom attention
* fix order
* fixes
* some test nits
* nits
* nit
* fix
* Clamp sink logits
* Clean
* Soft-max trick
* Clean up
* p
* fix deepspeed
* update both modeling and modular for cleanup
* contiguous
* update tests
* fix top_k router call
* revert renaming
* test nits
* small fixes for EP
* fix path for our local tests
* update as I should not have broken that!
* fix the loss of mixtral
* revert part of the changes related to router_scores, kernel probably no ready for that!
* deleting a small nit
* update arch
* fix post processing
* update
* running version but not expected output
* moving to cuda
* initial commit
* revert
* erroring when loading on cpu
* updates
* del blocks, scales
* fix
* style
* rm comm
* comment
* add comment
* style
* remove duplicated lines
* Fix minor issue with weight_map conversion script
* fix sampling params
* rename to final name
* upate pre-final version of template
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
* fix batched inference
* serve fixes
* swizzle !
* update final chat template by Matt.
* fix responses; pin oai
* sinplify
* Thanks Matt for his tireless efforts!
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* fix
* Use ROCm kernels from HUB
* Make kernel modes explicit
* update final chat template by Matt. x2
* Thanks Matt for his tireless efforts!
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
* Fix installation
* Update setup.py
Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>
* allow no content
* fix: update message handling in write_tokenizer function
* Fix template logic for user message role
* last nits for CB and flash_paged!
* there was one bad merge
* fix CB (hardcode for now, its just using kv groups instead)
* fix
* better fix for device_map
* minor device fix
* Fix flash paged
* updates
* Revert "remove dtensors, not explicit (#39840)"
This reverts commit 6dfd561d9c.
* update
* Revert "remove dtensors, not explicit (#39840)"
This reverts commit 6dfd561d9c.
* fix merge
* fix
* Fix line break when custom model indentity
* nits testing
* to locals first and pass sliding window to flash paged
* register modes for MegaBlocksMoeMlp
* add integration test in fixtures -> now update the tests to use it!
* update integration tests
* initial fix
* style and update tests
* fix
* chore(gpt oss): remove mlp_bias from configuration
It was just a leftover.
* stats
* Integration tests
* whoops
* Shouldn't move model
* Ensure assistant messages without thinking always go to "final" channel
* More checks to ensure expected format
* Add pad_token_id to model configuration in write_model function (#51)
* Add oai fix fast tests (#59)
* Fix some fast tests
* Force some updates
* Remove unnecessary fixes
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
* reasoning -> Reasoning
* Add additional integration tests
* fixup
* Slight fixes
* align chat template with harmony
* simplify
* Add comment
* torch testing assert close
* torch testing assert close
* torch testing assert close
* torch testing assert close
* torch testing assert close
* torch testing assert close
* Revert fixup
* skip 2 test remove todo
* merge
* padding side should be left for integration tests
* fix modular wrt to changes made to modeling
* style
* isort
* fix opies for the loss
* mmmm
---------
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: edbeeching <edbeeching@gmail.com>
Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan@openai.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal>
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Akos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>
Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Matt <rocketknight1@gmail.com>
* Revert "remove dtensors, not explicit (#39840)"
This did not work with generation (lm_head needs extra care!)
This reverts commit 6dfd561d9c.
* update
* style?
* added code for handling video object ,as dictionary of frames and metadata, in chat template
* added new test where videos are passed as objects (dict of frames, metadata) in the chat template
* modified hardcoded video_len check that does not match with increased number of tests cases.
* Modify hardcoded video_len check that fails with increased number of tests
* update documentation of multi-modal chat templating with extra information about including video object in chat template.
* add array handling in load_video()
* temporary test video inlcuded
* skip testing smolvlm with videos that are list of frames
* update documentation & make fixup
* Address review comments
* first commit
Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface.
TODO: Some tests are failing so that needs to be fixed.
* fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model
* cleaned up a hack in the conversion script
* Fixed the expected values in integration tests
Cross att masking and cpu-gpu consistency tests are still failing however.
* changes for make style and quality
* add documentation
* clean up contrastive embedding
* add mm grounding dino to loss mapping
* add model link to config docstring
* hack fix for mm grounding dino consistency tests
* add special cases for unused config attr check
* add all models and update docs
* update model doc to the new style
* Use super_kwargs for modular config
* Move init to the _init_weights function
* Add copied from for tests
* fixup
* update typehints
* Fix-copies for tests
* fix-copies
* Fix init test
* fix snippets in docs
* fix consistency
* fix consistency
* update conversion script
* fix nits in readme and remove old comments from conversion script
* add license
* remove unused config args
* remove unnecessary if/else in model init
* fix quality
* Update references
* fix test
* fixup
---------
Co-authored-by: qubvel <qubvel@gmail.com>
* fix?
* fixme and style
* Update src/transformers/modeling_utils.py
* update
* update
* fix
* small fixees
* nit
* nits
* fix init check?
* fix
* fix default
* or fucks me
* nits
* include a small nit
* does this make it hapy?
* fixup
* fix the remaining ones
* Add cohere2_vision to support CohereLabs/command-a-vision-07-2025
* update and add modualr file
* update processors and check with orig impl later
* delete unused files
* image processor reduce LOC and re-use GotOCR2
* update the config to use modular
* model tests pass
* processor fixes
* check model outputs decorator
* address one more comment
* Update tokens. Temp - need to read from tokenizer'
* fix for multi-gpu
* Fix image token handling
* upadte image token expansion logic
* fix a few issues with remote code loading
* not related but modular forces us to change all files now
* Add overview and code sample to cohere vision docs
* add scripts. TMP.
* Update inference script
* Create script
* set dtype in export script
* TO revert: modular export fix
* Fix scripts
* Revert "TO revert: modular export fix"
This reverts commit bdb2f305b61027a05f0032ce70d6ca698879191c.
* Use modular weights
* Upload to hub
Removed OOD weights ad script
* Updated docs
* fix import error
Update docs
Added pipeline test
* Updated docs
* Run modular script
remove modular for config
Added patch_size
Added docstrings in modular
Fix OOM
Add docs, fixup integration tests. 8-gpu passing
* tiny updates
* address comments + fixup
* add test for chat template
* check model outputs workaround
* aya vision fix check model inputs
* Revert "add test for chat template"
This reverts commit 42c756e397f588d76b449ff1f93292d8ee0202d8.
* reveert more changes
* last revert
* skip and merge
* faulty copy from
---------
Co-authored-by: Julian Mack <julian.mack@cohere.com>
Co-authored-by: kyle-cohere <kyle@cohere.com>
* feat(tokenization): add encode_message to tokenize messages one by one
* Fix the `encode_message` method, remove the `add_generation_prompt` parameter and add the corresponding error handling. Update the document to reflect this change and verify the error handling in the test.
* Optimize the `encode_message` method, improve the processing logic of the empty dialogue history, and ensure that the chat template can be applied correctly when the dialogue history is empty. Update the document to reflect these changes.
* The `_encode_message` method is deleted, the message coding logic is simplified, and the functional integrity of the `encode_message` method is ensured. Update the document to reflect these changes.
* Docs fix
* Revert changes in docstring of pad()
* Revert changes in docstring
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Repair the call of the `encode_message` method, update it to `encode_message_with_chat_template` to support the chat template, and adjust the relevant test cases to reflect this change.
* Optimize the call format of the `apply_chat_template` method, and merge multi-line calls into a single line to improve code readability.
---------
Co-authored-by: pco111 <15262555+pco111@user.noreply.gitee.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix: cache_position: RuntimeError: Boolean value of Tensor with more than one value is ambiguous
* test cache_position
* move test
* propagate changes
---------
Co-authored-by: Masataro Asai <guicho2.71828@gmail.com>
* Add Fast Segformer Processor
* Modified the params according to segformer model
* modified test_image_processing_Segformer_fast args
- removed redundant params like do_center_crop,center_crop which aren't present in the original segformer class
* added segmentation_maps processing logic form the slow segformer processing module with references from beitimageprocessing fast
* fixed code_quality
* added recommended fixes and tests to make sure everything processess smoothly
* Fixed SegmentationMapsLogic
- modified the preprocessing of segmentation maps to use tensors
- added batch support
* fixed some mismatched files
* modified the tolerance for tests
* use modular
* fix ci
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* feat: superpoint fast image processor
* fix: reran fast cli command to generate fast config
* feat: updated test cases
* fix: removed old model add
* fix: format fix
* Update src/transformers/models/superpoint/image_processing_superpoint_fast.py
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* fix: ported to torch and made requested changes
* fix: removed changes to init
* fix: init fix
* fix: init format fix
* fixed testcases and ported to torch
* fix: format fixes
* failed
test case fix
* fix superpoint fast
* fix docstring
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* Add missing cache_position argument.
* Pass cache_position to language model.
* Overwrite prepare_inputs_for_generation.
* Set model to half precision for Flash Attention test.
* Cast model to bfloat16.
* add tests for helpers
* duplicate test for each model
* why llava next video has no helper
* oops must have been in the commit
* fix test after rebase
* add copy from
* support `typing.Literal` as type of tool parameters
* validate the `args` of `typing.Literal` roughly
* add test to get json schema for `typing.Literal` type hint
* fix: add `"type"` attribute to the parsed result of `typing.Literal`
* test: add argument `booleanish` to test multi-type literal
* style: auto fixup