Compare commits

..

262 Commits

Author SHA1 Message Date
ssum21
d88745c6d7 fix: manual edits
Some checks failed
Secret Leaks / trufflehog (push) Has been cancelled
2025-07-20 19:06:32 -07:00
ssum21
3fb5d6ceda feat: nmt draft 2025-07-20 19:01:42 -07:00
Yoni Gozlan
34133d0a79 Fix placeholders replacement logic in auto_docstring (#39433)
Fix and simplify placeholders replacement logic
2025-07-18 22:56:23 +00:00
Yoni Gozlan
433d2a23d7 Update SAM/SAM HQ attention implementation + fix Cuda sync issues (#39386)
* update attention implementation and improve inference speed

* modular sam_hq + fix integration tests on A10

* fixup

* fix after review

* softmax in correct place

* return attn_weights in sam/sam_hq
2025-07-18 18:46:27 -04:00
Yoni Gozlan
541bed22d6 Improve @auto_docstring doc and rename args_doc.py to auto_docstring.py (#39439)
* rename `args_doc.py` to `auto_docstring.py` and improve doc

* modifs after review
2025-07-18 18:00:34 +00:00
Yoni Gozlan
de0dd3139d Add fast image processor SAM (#39385)
* add fast image processor sam

* nits
2025-07-18 17:27:16 +00:00
Enno Hermann
561a79a2f4 Fix BatchEncoding.to() for nested elements (#38985) 2025-07-18 14:14:45 +01:00
Mohit Deopujari
f4d076561f [gemma3] Fix do_convert_rgb in image processors. (#39438)
* [gemma3] Fix do_convert_rgb in image processors.

* [gemma3] Fix do_convert_rgb in image processors.
2025-07-18 12:33:00 +00:00
Raushan Turganbay
bcc0091937 [chat template] return assistant mask in processors (#38545)
* messed up the git history, squash commits

* raise error if slow and refine tests

* index was off by one

* fix the test
2025-07-18 12:23:20 +00:00
Joao Gante
328ca9cf1d [dependencies] Update datasets pin (#39500)
* pyarrow pin

* make fixup

* test?

* like this?

* like this?

* like this?

* datasets pin

* comment
2025-07-18 12:05:28 +00:00
Ákos Hadnagy
fb58377700 Slack CI bot: set default result for non-existing artifacts (#39499)
* Set default result for non-existing artifacts

* FMT

* Address review comments
2025-07-18 11:45:47 +00:00
Cyril Vallez
4ded9a4113 🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling (#39423)
* first try

* Update modeling_utils.py

* Update modeling_utils.py

* big refactor

* Update modeling_utils.py

* style

* docstrings and simplify inner workings of configs

* remove all trace of _internal

* Update modeling_utils.py

* fix logic error

* Update modeling_utils.py

* recursive on config

* Update configuration_utils.py

* fix

* Update configuration_dpt.py

* Update configuration_utils.py

* Update configuration_utils.py

* Update modeling_idefics.py

* Update modeling_utils.py

* fix for old models

* more old models fixup

* Update modeling_utils.py

* Update configuration_utils.py

* Remove outdated test

* remove the deepcopy!! 🥵🥵

* Update test_modeling_gpt_bigcode.py

* fix qwen dispatch

* restrict to only models supporting it

* style

* switch name

* Update modeling_utils.py

* Update modeling_utils.py

* add tests!

* fix

* rypo

* remove bad copies

* fix

* Update modeling_utils.py

* additional check

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* fix

* skip
2025-07-18 13:41:54 +02:00
Joao Gante
2b819ba4e3 [dependencies] temporary pyarrow pin (#39496)
* pyarrow pin

* make fixup

* test?

* like this?

* like this?

* like this?
2025-07-18 10:05:40 +00:00
eustlb
967045082f Add voxtral (#39429)
* draft

* draft update (conversion working)

* mend

* draft update

* draft update: working generate

* refactor

* VoxtralProcessor draft

* processor update

* update convert_tekken_tokenizer

* refactor processor

* update convert

* make style

* better handle prefil

* make style

* add tests

* add mistral_common audio loading

* processor update

* revert changes

* audio utils update

* add audio to apply chat template mistral update

* voxtral processor update

* fix

* udpate converstion script

* make mistral tokenier from pretrain work from local dir

* fix udpates

* add integration tests

* add batched version

* processor docstring

* make style

* revert convert_tekken_tokenizer changes

* revert processing_qwen2.5 changes

* add multi-turn test

* processor improvements

* address review changes

* Update src/transformers/tokenization_mistral_common.py

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>

* update audio utils

* nits

* integration test update

* correct _support

* update tests

* test update

* update integration tests

* fix

* fix

* fix

* add test_apply_chat_template_with_audio

* add model doc

* model doc

* nit

* doc uptade

* nit

* processor improvement

* ensure default is 3B

* nits

* make

* make

* convert modular

* update checkpoint

* fix test

* make

* make

* autos

* make

* make

* nit

* nit

* nit

---------

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-18 00:02:04 +00:00
Qizhi Chen
73869f2e81 Fix typing order (#39467)
* fix type order

* change all Union[str, dict] to Union[dict, str]

* add hf_parser test && fix test order

* add deepspeed dependency

* replace deepspeed with accelerator
2025-07-17 15:47:31 +00:00
he pang
bda75b4011 Add unified logits_to_keep support to LLMClass (#39472)
* add supports for logits_to_keep for qwen25vl and glm4v

* Update relevant modular files
2025-07-17 17:07:12 +02:00
Joao Gante
bf6c997685 [serve] Add speech to text (/v1/audio/transcriptions) (#39434)
* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* use openai

* validate request, including detecting unused fields

* dict indexing

* dict var access

* tmp commit (tests failing)

* add slow

* use oai output type in completions

* (little rebase errors)

* working spec?

* guard type hint

* type hints. fix state (CB can now load different models)

* type hints; fn names; error type

* add docstrings

* responses + kv cache

* metadata support; fix kv cache; error event

* add output_index and content_index

* docstrings

* add test_build_response_event

* docs/comments

* gate test requirements; terminate cb manager on model switch

* nasty type hints

* more type hints

* disable validation by default; enable force models

* todo

* experiment: base model from typed dict

* audio working

* fix bad rebase

* load audio with librosa

* implement timed models

* almost working

* make fixup

* fix tests

* transcription request type

* tokenizer -> processor

* add example in docs

---------

Co-authored-by: Lysandre <hi@lysand.re>
2025-07-17 14:29:57 +00:00
zhaiji0727
8b3de61a65 Update integration_utils.py (#39469)
* Update integration_utils.py

sanitize mlflow upload metric

* Update integration_utils.py

change import order to pass CI

* Update integration_utils.py

add comments

* Update integration_utils.py

Remove whitespace from blank line
2025-07-17 13:57:49 +00:00
Peter Schneider
7fd60047c8 fix: ImageTextToTextPipeline handles user-defined generation_config (#39374)
fix: ImageTextToTextPipeline handles user-defined generation_config passed to the pipeline

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-07-17 13:23:29 +00:00
Yuanyuan Chen
60b5471da3 Enable some ruff checks for performance and readability (#39383)
* Fix inefficient sequence tests

Signed-off-by: cyy <cyyever@outlook.com>

* Enable PERF102

Signed-off-by: cyy <cyyever@outlook.com>

* Enable PLC1802

Signed-off-by: cyy <cyyever@outlook.com>

* Enable PLC0208

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-17 13:21:59 +00:00
Stonepia
fc700c2a26 Fix convert_and_export_with_cache failures for GPU models (#38976)
* Add the `device` option for `generate()`

* Add device for default tensors to avoid tensor mismatch

* [test] Enable test_static_cache_exportability for torch_device

* infer device from the prompt_token_ids

* Add device for generated tensor

* [Test] Make `test_export_static_cache` tests to run on devices rather than only CPU

* fix format

* infer device from the model
2025-07-17 13:12:32 +00:00
Yih-Dar
54680d75c9 Update GemmaIntegrationTest::test_model_2b_bf16_dola (#39362)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-17 14:06:23 +01:00
klimarissa17
322400af58 fix a comment typo in utils.py (#39459) 2025-07-17 13:06:04 +00:00
Yuanyuan Chen
43f07018cf Use newer typing notation (#38934)
Signed-off-by: cyy <cyyever@outlook.com>
2025-07-17 13:05:21 +00:00
Marc Sun
565dd0bad7 Fix tests due to breaking change in accelerate (#39451)
* update values

* fix
2025-07-17 13:51:50 +01:00
Zhongkai Zhao
26fed50460 fix max_length calculating using cu_seq_lens (#39341) 2025-07-17 10:54:23 +02:00
Yusuf Shihata
cdfe6164b3 fix(pipelines): QA pipeline returns fewer than top_k results in batch mode (#39193)
* fixing the bug

* Try a simpler approach

* make fixup

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2025-07-17 10:24:30 +02:00
renet10
b85ed49e0a Corrections to PR #38642 and enhancements to Wav2Vec2Processor __call__ and pad docstrings (#38822)
* Correcting PR #38642.  The PR removed references to the deprecated method "as_target_processor()" in the
__call__ and pad method docstrings, which is correct, but also removed all references to PreTrainedTokenizer,
which is incorrect.  This commit adds back the reference to PreTrainedTokenizer and also takes the
opportunity to enhance the docstrings with the invocation procedure post removal of "as_target_processor()"
and adds information on return values.

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: René Tio <tor@Jammer.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-16 14:13:07 -07:00
Dhruv Malik
787a0128a9 create ijepa modelcard (ref : PR #36979 ). (#39354)
* wip: adding first version of the IJEPA model card.

* refactor based on the @stevhliu feedbacks

* refactor:
- revert the accidental removal of the autodoc api description and the image reerece architecture

- general context updation.

* - changes of model for example quantization.
- merging the  quantization content.
2025-07-16 12:40:22 -07:00
ridima11
48f2233cdf Improve grammar and clarity in perf_hardware.md (#39428) 2025-07-16 12:15:15 -07:00
Yaowei Zheng
e68ebb695f fix cached file error when repo type is dataset (#36909)
* fix cached file

* Update hub.py
2025-07-16 18:02:26 +02:00
Krishnan Vignesh
35a416c400 Fix indentation bug in SmolVLM image processor causing KeyError (#39452)
Fix indentation bug in Idefics3 image processor

- Fix KeyError when do_image_splitting=False
- Move split_images_grouped assignment inside loop
- Ensures all image shapes are stored, not just the last one
- This fixes the bug in both Idefics3 and generated SmolVLM processors

cc @yonigozlan

Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>
2025-07-16 11:59:28 -04:00
Luke Friedrichs
2c58705dc2 Updated Megatron conversion script for gpt2 checkpoints (#38969)
* update script to support new megatron gpt format

* fixed quality failures

---------

Co-authored-by: Luke Friedrichs <LckyLke>
2025-07-16 15:54:29 +00:00
Anton Vlasjuk
26be7f717e [CI] Fix partially red CI (#39448)
fix
2025-07-16 15:53:43 +02:00
sebastianvlad1
0a88751940 Fixes #39204: add fallback if get_base_model missing (#39226)
* Fixes #39204: add fallback if get_base_model missing

* Inline try_get_base_model logic as suggested in PR review

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-16 15:51:30 +02:00
Wing Lian
ba506f87db make the loss context manager easier to extend (#39321) 2025-07-16 15:47:24 +02:00
Arthur
9f1ac6f185 Remove something that should have never been there (#38254)
* what the hell

* update

* style

* style

* typing

* fix init issue

* fix granite moe hybrid as well
2025-07-16 15:22:44 +02:00
Raushan Turganbay
a7ca5b5d67 Fix processor tests (#39450)
fix
2025-07-16 15:01:35 +02:00
Kyle Sayers
71818f570b [Bugfix] [Quantization] Remove unused init arg (#39324)
remove unused arg from ct config init

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-16 14:57:42 +02:00
Pavel Iakubovskii
cc24b0378e Better typing for model.config (#39132)
* Apply to all models config annotation

* Update modular to preserve order

* Apply modular

* fix define docstring

* fix dinov2 consistency (docs<->modular)

* fix InstructBlipVideoForConditionalGeneration docs<->modular consistency

* fixup

* remove duplicate code

* Delete config_class attribute from the modeling code

* Add config_class attribute in base model

* Update init sub class

* Deprecated models update

* Update new models

* Fix remote code BC issue

* fixup

* fixing more corner cases

* fix new models

* add test

* modular docs update

* fix comment a bit

* fix for py3.9
2025-07-16 14:50:35 +02:00
Eon Kim
4b258454a7 Fix typo in generation configuration for Janus model weight conversion (#39432)
* Fix typo in generation configuration for Janus model weight conversion

* Fix typo

* Update Janus model generation configuration

* Update Janus model to use generation_kwargs
2025-07-16 14:28:02 +02:00
Lysandre Debut
de5ca373ac Responses API in transformers serve (#39155)
* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* Responses API (to be merged into #39155) (#39338)

* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* use openai

* validate request, including detecting unused fields

* dict indexing

* dict var access

* tmp commit (tests failing)

* add slow

* use oai output type in completions

* (little rebase errors)

* working spec?

* guard type hint

* type hints. fix state (CB can now load different models)

* type hints; fn names; error type

* add docstrings

* responses + kv cache

* metadata support; fix kv cache; error event

* add output_index and content_index

* docstrings

* add test_build_response_event

* docs/comments

* gate test requirements; terminate cb manager on model switch

* nasty type hints

* more type hints

* disable validation by default; enable force models

* todo

---------

Co-authored-by: Lysandre <hi@lysand.re>

* Slight bugfixes

* PR comments from #39338

* make fixup

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2025-07-16 14:16:16 +02:00
Raushan Turganbay
c8524aeb07 [cache] make all classes cache compatible finally (#38635)
* dump

* push other models

* fix simple greedy generation

* xmod

* add fmst and clean up some mentions of old cache format

* gpt-bigcode now follows standards

* delete tuple cache reference in generation

* fix some models

* fix some models

* fix mambas and support cache in tapas

* fix some more tests

* fix copies

* delete `_reorder_cache`

* another fix copies

* fix typos and delete unnecessary test

* fix rag generate, needs special cache reordering

* fix tapas and superglue

* reformer create special cache

* recurrent gemma `reorder_cache` was a no-op, delete

* fix-copies

* fix blio and musicgen pipeline tests

* fix reformer

* fix reformer, again...

* delete `_supports_cache_class`

* delete `supports_quantized_cache`

* fix failing tests

* fix copies

* some minor clean up

* style

* style

* fix copies

* fix tests

* fix copies

* create causal mask now needs positions?

* fixc copies

* style

* Update tests/test_modeling_common.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* clean-up of non-generative model after merging main

* check `is_decoder` for cache

* delete transpose for scores

* remove tuple cache from docs everywhere

* fix tests

* fix copies

* fix copies once more

* properly deprecate `encoder_attention_mask` in Bert-like models

* import `deprecate_kwarg` where needed

* fix copies again

* fix copies

* delete `nex_decoder_cache`

* fix copies asks to update for PLM

* fix copies

* rebasing had a few new models, fix them and merge asap!

* fix copies once more

* fix slow tests

* fix tests and updare PLM checkpoint

* add read token and revert accidentally removed line

* oh com -on, style

* just skip it, read token has no access to PLM yet

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-16 14:00:17 +02:00
Ilias Aarab
6cb43defd0 docs: add missing numpy import to minimal example (#39444)
docs: add numpy import to minimal example
2025-07-16 11:57:13 +00:00
Yuanyuan Chen
61163099f1 Remove runtime conditions for type checking (#37340)
Remove dynamic conditions for type checking

Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-16 13:36:48 +02:00
Marc Sun
bfc9ddf5c6 Add StableAdamW Optimizer (#39446)
* Added StableAdamW as an optimizer option for Trainer. Also wrote tests to verify its behaviour.

* Fixed issue with

* Added docs for StableAdamW. Also fixed a typo in schedule free optimizers

---------

Co-authored-by: Gautham Krithiwas <gauthamkrithiwas2003@gmail.com>
2025-07-16 13:35:53 +02:00
Pablo Montalvo
b9ee528246 add test scanner (#39419)
* add test scanner

* add doc + license

* refactor for only 1 tree traversal

* add back test of only one method

* document single method scan

* format

* fixup generate tests

* minor fix

* fixup

* fixup doc
2025-07-16 12:45:46 +02:00
Ákos Hadnagy
79941c61ce Fix missing definition of diff_file_url in notification service (#39445)
Fix missing definition of diff_file_url
2025-07-16 12:09:18 +02:00
richardodliu
e048d48bd0 Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer (#31870)
* add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in trainer

* Update src/transformers/optimization.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update optimization.py

fix the error of the unclosed "("

* Update optimization.py

remove whitespace in line 402 in order to pass the quality test

* Update src/transformers/optimization.py

* Update src/transformers/optimization.py

* Apply style fixes

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-16 12:01:08 +02:00
Quentin Gallouédec
0cf08e90dd Change log level from warning to info for scheduled request logging in ContinuousBatchProcessor (#39372)
Change log level from warning to info for scheduled request logging in ContinuousBatchProcessor
2025-07-16 11:54:20 +02:00
Yuanyuan Chen
ae4e306a40 Defaults to adamw_torch_fused for Pytorch>=2.8 (#37358)
* Defaults to adamw_torch_fused for latest Pytorch

Signed-off-by: cyy <cyyever@outlook.com>

* Fix test

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-16 09:52:33 +00:00
Jeonghwan Kim
4524a68c66 Fix L270 - hasattr("moe_args") returning False error (#38715)
* Fix L270 - hasattr("moe_args") returning False error

* Update src/transformers/models/llama4/convert_llama4_weights_to_hf.py

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-16 09:45:58 +00:00
Raushan Turganbay
d33a1c389f [chat template] add a testcase for kwargs (#39415)
add a testcase
2025-07-16 11:31:35 +02:00
S1quence
99c9763398 Fixed a bug calculating cross entropy loss in JetMoeForCausalLM (#37830)
fix: 🐛 Fixed a bug in calculating Cross Entropy loss in JetMoeForCausalLM

In the original code, we shift the logits and pass shift_logits into the self.loss_function, but in self.loss_function, the shift_logits will be shifted again, so we are actually doing "next next token prediction", which is incorrect. I have removed the logits shifting before calling self.loss_function.

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-16 11:22:00 +02:00
Klaus-Rudolf Kladny
667ad02374 Remove double soft-max in load-balancing loss. Fixes #39055 . (#39056)
Remove double soft-max in load-balancing loss. Fixes #39055
2025-07-16 09:20:23 +00:00
Kyle Sayers
31d81943c9 [Core] [Offloading] Fix saving offloaded submodules (#39280)
* fix counting meta tensors, fix onloading meta tensors

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated change

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add clarifying comment

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add test_save_offloaded_model_with_direct_params

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix merge conflict, add decorators

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-16 08:44:40 +00:00
Raushan Turganbay
add43c4d09 [autodocstring] add video and audio inputs (#39420)
* add  video and audio inputs in auto docstring

* fix copies
2025-07-16 09:41:50 +02:00
Ákos Hadnagy
0dc2df5dda CI workflow for performed test regressions (#39198)
* WIP script to compare test runs for models

* Update line normalitzation logic

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-07-16 04:20:02 +02:00
StevenBucaille
1bc9ac5107 docs: update LightGlue docs (#39407)
* docs: update LightGlue docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-15 12:40:50 -07:00
StevenBucaille
d9574f2fe3 docs: update SuperGlue docs (#39406)
* docs: update SuperGlue docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-15 12:40:26 -07:00
Raushan Turganbay
9f41f67135 [vlm] fix loading of retrieval VLMs (#39242)
* fix vlm with retrieval

* we can't use AutoModel because new ColQwen was released after refactor

* no need for colqwen

* tied weight keys are necessary, if using IMageTextToText

* need to apply renaming in tied weights, only for ColPali

* overwrite tied keys in ColPali

* fix copies, modular can't handle if-statements
2025-07-15 17:23:54 +02:00
Wing Lian
b1d14086e4 handle training summary when creating modelcard but offline mode is set (#37095)
* handle training summary when creating modelcard but offline mode is set

* chore: lint
2025-07-15 17:21:15 +02:00
Dario Salvati
67f42928f0 Remove residual quantization attribute from dequantized models (#39373)
* fix: removing quantization trace attribute from dequantized model

Fixes #39295

* add: test `to(dtype=torch.float16)` after dequantization
2025-07-15 17:16:10 +02:00
Wangyi Jiang
30c508dbcb Remove deprecated audio utils functions (#39330)
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-15 14:02:25 +00:00
Hosein Rezaei
d8e05951b8 Fix bugs in pytorch example run_clm when streaming is enabled (#39286) 2025-07-15 15:37:28 +02:00
Matt
a989bf8d84 Fix bugs from pipeline preprocessor overhaul (#39425)
* Correct load classes for VideoClassificationPipeline

* Correct load classes for the ASR pipeline
2025-07-15 14:28:59 +01:00
Luc Georges
53c9dcd6fd refactor: remove set_tracer_provider and set_meter_provider calls (#39422) 2025-07-15 14:22:12 +02:00
Yuanyuan Chen
f03b384149 Fix invalid property (#39384)
Signed-off-by: cyy <cyyever@outlook.com>
2025-07-15 12:11:37 +00:00
jiqing-feng
c4d41567fa set document_question_answering pipeline _load_tokenizer to True (#39411)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-15 12:05:49 +00:00
Matt
f56b49f48f Ignore extra position embeddings weights for ESM (#39063)
* Ignore extra position embeddings weights

* Slight name fix
2025-07-15 11:57:32 +00:00
44670
2b79f14375 support loading qwen3 gguf (#38645)
* support loading qwen3 gguf

* Add qwen3 into GGUF_TO_FAST_CONVERTERS for tokenizer conversion

* Add testcase

* Fix formatting
2025-07-15 09:53:41 +00:00
Orion Weller
0e4b7938d0 Add ModernBERT Decoder Models - ModernBERT, but trained with CLM! (#38967)
Some checks failed
Release - Conda / build_and_package (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
* working locally; need to style and test

* added docs and initial tests; need to debug and flesh out

* fixed tests

* working long context; batches

* working fa2 and eager

* update tests

* add missing confnigs

* remove default autoset

* fix spacing

* fix most tests

* fixed tests

* fix to init

* refactor to match new transformers updates

* remove static cache option

* fa2 fix

* fix docs

* in progress

* working on tests

* fixed issue with attn outputs

* remove debug

* fix local config attr

* update doc string

* fix docstring

* add docs to toc

* correct typo in toc

* add new updates from main w.r.t. ModernBERT RoPE

* fix local param

---------

Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l07.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@n02.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l08.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l01.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l02.mgmt.ai.cluster>
2025-07-15 10:40:41 +02:00
Alvaro Bartolome
0b724114cf Fix typo in /v1/models output payload (#39414) 2025-07-15 08:59:25 +01:00
Raushan Turganbay
8d6259b0b8 [refactor] set attention implementation (#38974)
* update

* fix some tests

* init from config, changes it in-place, add deepcopy in tests

* fix modernbert

* don't delete thsi config attr

* update

* style and copies

* skip tests in generation

* fix style

* accidentally removed flash-attn-3, revert

* docs

* forgot about flags set to False

* fix copies

* address a few comments

* fix copies

* custom code BC
2025-07-15 09:34:06 +02:00
Sameeraja Shyam
6017f5e8ed [siglip] fix pooling comment (#39378)
* feat(siglip2): add forward pass with pooled output logic in Siglip2TextModel

* test(siglip2): add test_text_model.py to verify pooled output behavior

* style(siglip2): fix formatting in test_text_model.py using Ruff

* fix(siglip2): remove misleading 'sticky EOS' comment and sync modular-classic files

* fix(siglip2): remove misleading 'sticky EOS' comment and sync modular-classic files

* chore(siglip2): regenerate classic model after modular change

* Update
2025-07-14 17:47:19 +00:00
Tanuj Rai
8d40ca5749 Update phi4_multimodal.md (#38830)
* Update phi4_multimodal.md

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update phi4_multimodal.md

* Update phi4_multimodal.md

* Update phi4_multimodal.md

* Update phi4_multimodal.md

* Update phi4_multimodal.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-14 10:35:17 -07:00
MilkClouds
3635415af2 [Docs] Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic (#39391)
Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic
2025-07-14 09:25:06 -07:00
rasmi
3a48e9534c Use np.pad instead of np.lib.pad. (#39346)
* Use np.pad instead of np.lib.pad.

* Update audio_utils.py

Formatting
2025-07-14 16:05:28 +00:00
Matt
3d8be20cd2 Totally rewrite how pipelines load preprocessors (#38947)
* Totally rewrite how pipelines load preprocessors

* Delete more mappings

* Fix conditionals, thanks Cyril!
2025-07-14 16:40:04 +01:00
eromomon
903944a411 [examples] fix do_reduce_labels argument for run_semantic_segmentation_no_trainer (#39322)
* no use do_reduce_labels argument in model

* use do_reducer_labels in AutoImageProcessor
2025-07-14 10:16:49 +00:00
Cyril Vallez
8165c703ab Fix Lfm2 and common tests (#39398)
* fix

* better fix

* typo
2025-07-14 12:02:59 +02:00
Raushan Turganbay
878d60a3cb Deprecate AutoModelForVision2Seq (#38900)
deprecate vision2seq
2025-07-14 11:42:06 +02:00
sabari
ad333d4852 [Qwen2.5-VL] Fix torch.finfo() TypeError for integer attention_mask_tensor (#39333)
* Update modeling_qwen2_5_vl.py

### 🐛 Bug Description

When using Unsloth’s Qwen2.5-VL vision models (both 3B and 7B) with the latest HuggingFace Transformers (commit: 520b9dcb42), the model crashes due to a type mismatch in the attention mask handling.

---

### 🔥 Error Traceback

* Fix dtype compatibility in attention mask processing

Replace hardcoded torch.finfo() usage with dtype-aware function selection to handle both integer and floating-point attention mask tensors.
Technical Details:

Problem: Line 1292 assumes floating-point dtype for attention_mask_tensor
Solution: Add dtype check to use torch.iinfo() for integer types and torch.finfo() for float types
Files Modified: transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

* Update modeling_qwen2_5_vl.py

* Update modeling_qwen2_5_vl.py

* Fix: Cast to float before applying torch.finfo

* # Fix: Use appropriate function based on dtype

* Update modular_qwen2_5_vl.py

* Fix: Cast to float before applying torch.finfo

* Fix: Use appropriate function based on dtype

* Fix: Use appropriate function based on dtype

* Updatet modeling_glm4v.py

* Only apply conversion for floating point tensors (inverted masks)

* corrected the format issue

reformatted modeling_glm4v.py

All done!  🍰 
1 file reformatted

* Fix: Cast to float before applying torch.finfo

Corrected the format issue

* Fix torch.finfo() for integer attention mask

#39333

* Run make fix-copies and make style for CI compliance

- Updated dependency versions table
- Fixed code formatting and style issues
- Sorted auto mappings
- Updated documentation TOC

* Fix torch.finfo() TypeError for

Fix torch.finfo() TypeError for integer attention_mask_tensor #39333

* Fix torch.finfo() TypeError for integer
2025-07-14 07:47:39 +00:00
Raushan Turganbay
c30af65521 [BLIP] remove cache from Qformer (#39335)
* remove cache from Qformer

* fix

* this was never correct...
2025-07-14 09:20:01 +02:00
Raushan Turganbay
66cd995618 [shieldgemma] fix checkpoint loading (#39348)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-14 08:34:58 +02:00
Yoni Gozlan
a1ad9197c5 Fix overriding Fast Image/Video Processors instance attributes affect other instances (#39363)
* fix and add tests

* nit
2025-07-12 23:39:06 +00:00
Yih-Dar
dc98fb3e5e update docker file to use latest timm (for perception_lm) (#39380)
update docker file for timm

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-12 23:19:37 +02:00
Parag Ekbote
5c30f7e390 Update Model Card for Encoder Decoder Model (#39272)
* update model card.

* add back the model contributors for mamba and mamba2.

* update the model card.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update batches with correct alignment.

* update examples and remove quantization example.

* update the examples.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update example.

* correct the example.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-11 11:23:08 -07:00
Xiang Chendong
0d7efe3e4b fix gpt2 usage doc (#39351)
fix typo of gpt2 doc usage
2025-07-11 10:59:41 -07:00
Muhammad Shaheer Malik
a646fd55fd Updated CamemBERT model card to new standardized format (#39227)
* Updated CamemBERT model card to new standardized format

* Applied review suggestions for CamemBERT: restored API refs, added examples, badges, and attribution

* Updated CamemBERT usage examples, quantization, badges, and format

* Updated CamemBERT badges

* Fixed CLI Section
2025-07-11 10:59:09 -07:00
eromomon
af74ec65a7 Update Readme to Run Multiple Choice Script from Example Directory (#39323)
* Update Readme to run in current place

* Update Readme files to execute PyTorch examples from their respective folders
2025-07-11 10:58:26 -07:00
Julien Denize
70e57e4710 Add mistral common support (#38906)
* wip: correct docstrings

* Add mistral-common support.

* quality

* wip: add requested methods

* wip: fix tests

* wip: add internally some methods not being supported in mistral-common

* wip

* wip: add opencv dependency and update test list

* wip: add mistral-common to testing dependencies

* wip: revert some test changes

* wip: ci

* wip: ci

* clean

* check

* check

* check

* wip: add hf image format to apply_chat_template and return pixel_values

* wip: make mistral-common non-installed safe

* wip: clean zip

* fix: from_pretrained

* fix: path and base64

* fix: path and import root

* wip: add docs

* clean

* clean

* revert

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-07-11 16:26:58 +00:00
Jay Zhuang
665418dacc Remove device check in HQQ quantizer (#39299)
* Remove device check in HQQ quantizer

Fix https://github.com/huggingface/transformers/issues/38439

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-11 14:59:51 +00:00
Manuel de Prada Corral
601bea2c4e Verbose error in fix mode for utils/check_docstrings.py (#38915)
* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`

More verbose exceptions in `fix_docstring` on docstring formatting issues.
2025-07-11 14:36:10 +00:00
Yih-Dar
24f771a043 fix failing test_sdpa_can_dispatch_on_flash (#39259)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-11 16:30:56 +02:00
Arthur
ee74397d20 update cb TP (#39361)
* update cb TP

* safety
2025-07-11 15:54:25 +02:00
Cyril Vallez
9bc675b3b6 Fix link for testpypi (#39360)
fix link
2025-07-11 15:34:01 +02:00
Shuming Hu
bf607f6d3b PerceptionLM (#37878)
* plm template

* A working plm with fixed image features

* hacked processor

* First version that reproduced PLM output using PE from timm.

* Simplify and fix tie_word_embeddings

* Use PIL resize. Simplify converstion.

* First version that works with video input.

* simplifed image preprocessing (not batched)

* Minor fixes after rebasing on main.

* Video processor based on new API.

* Revert to use _preprocess for image processor.

* refactor with modular

* fix tie_word_embedding

* Testing with timm PE

* check in missed converstion from modular to model.py

* First working version of PLM with Eva PE. PLM-1B and 3B outputs are exactly the same as before. PLM-8B output has some differences.

* address review comments

* Fixed batching if video and image examples mixed.

* Simplify PE configuration.

* Enable AutoModel for PerceptionEncoder.

* Update PE config style.

* update all headers

* Minor fixes.

* Move lm_head to PerceptionLMForConditionalGeneration.
Fix vit_G model specification.

* Fix for testing_modeling_perception_lm.py

* Image processing refactoring to use more common parts.

* Fix processor test.

* update tests to use model from hub

* More test fixes.

* integration test GT update after rebasing; probably due to video preprocessing

* update test media path to hub

* Stop tracking local scripts

* address some review comments

* refactor image processing.

* small fixes

* update documentation and minor fixes

* remove scripts

* Minor fix for CI

* Fix image processing

* CI and doc fix

* CI formatting fix

* ruff fix

* ruff formatting

* ran utils/sort_auto_mappings.py

* update docstring

* more docstring udpates

* add vision_input_type default fallback for image processing

* more verbose variable naming

* test update

* Remove PE and PEConfig use AutoModel(TimmWrapper) instead

* Minor cleanup.

* Minor Fix: remove any ref to PE. Ruff format and check.

* fix docstring

* Fix modular/model consistency.Improvex docstringfor  .

* Fix PerceptionLMForConditionalGenerationModelTest

* ruff fix

* fix for check_repo

* minor formatting

* dummy size arg to fix for processor test.

* Update docstring for PerceptionLMConfig

* Minor fixes from review feedback.

* Revert some minor changes per reviewer feedback.

* update base_model_prefix

* address reviewer feedback

* fix comment in modeling file

* address reviewer feedback

* ruff format

* Pre-merge test update.

* reapply modular and fix checkpoint name

* processor test path

* use modular a bit more

* remove dead code

* add token decorator

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-11 11:07:32 +02:00
Giuseppe Coccia
4b47b2b8ea Updated Switch Transformers model card with standardized format (Issue #36979) (#39305)
* Updated Switch Transformers model card with standardized format (Issue #36979)

* Apply reviewer suggestions to the new standardised Switch Transformer's model card

* Update switch_transformers.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-10 15:34:10 -07:00
Pavel Iakubovskii
fe1a5b73e6 [modular] speedup check_modular_conversion with multiprocessing (#37456)
* Change topological sort to return level-based output (lists of lists)

* Update main for modular converter

* Update test

* update check_modular_conversion

* Update gitignore

* Fix missing conversion for glm4

* Update

* Fix error msg

* Fixup

* fix docstring

* update docs

* Add comment

* delete qwen3_moe
2025-07-10 19:07:59 +01:00
Cyril Vallez
571a8c2131 Add a default value for position_ids in masking_utils (#39310)
* set default

* Update masking_utils.py

* add small test
2025-07-10 18:53:40 +02:00
Kyle Sayers
bdc8028cb3 [Core] [Offloading] Enable saving offloaded models with multiple shared tensor groups (#39263)
* fix counting meta tensors, fix onloading meta tensors

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add test

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-10 18:33:30 +02:00
Joao Gante
df49b399dc [tests] tag serve tests as slow (#39343)
* maybe they need more cpu resources?

* add todo
2025-07-10 15:40:08 +00:00
Paul Pak
36e80a18da [modeling][lfm2] LFM2: Remove deprecated seen_tokens (#39342)
* [modeling][lfm2] remove deprecated seen_tokens

* [modular][lfm2] remove deprecated seen_tokens from modular file
2025-07-10 17:27:55 +02:00
Paul Pak
9682d07f92 LFM2 (#39340)
* [modeling][lfm2] LFM2 model on 4.53.0 interface

* [configuration] hook in LFM2 keys

* [modeling][lfm2] update modeling interface for 4.53.1

* [modeling][lfm2] apply mask to hidden conv states

* [misc] ruff format/lint

* [modeling][lfm2] minor: NotImplemented legacy cache conversion

* Create lfm2.md

* create nice modular

* style

* Update modeling_auto.py

* clean and start adding tests

* style

* Update test_modeling_lfm2.py

* Update __init__.py

* small test model size

* config

* small fix

* fix

* remove useless config attrs -> block_dim and conv_dim are hiden_size

* fix prepare inputs

* fix config

* test

* typo

* skip tests accordingly

* config docstrings

* add doc to .md

* skip config docstring check

---------

Co-authored-by: Maxime Labonne <81252890+mlabonne@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-10 16:07:33 +02:00
Joao Gante
38c3931362 [server] add tests and fix passing a custom generation_config (#39230)
* add tests; fix passing a custom generation_config

* tool integration test

* add install step

* add accelerate as dep to serving

* add todo
2025-07-10 13:41:38 +00:00
edwko
6b09c8eab0 Handle DAC conversion when using weight_norm with newer PyTorch versions (#36393)
* Update convert_dac_checkpoint.py

* Update convert_dac_checkpoint.py

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-07-10 10:36:58 +00:00
Yih-Dar
92043bde29 fix phi3 tests (#39312)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-10 11:51:55 +02:00
Kingsley
520b9dcb42 fix Glm4v batch videos forward (#39172)
* changes for video

* update modular

* change get_video_features

* update video token replacement

* update modular

* add test and fix typo

* lint

* fix order

* lint

* fix

* remove dependency

* lint

* lint

* remove todo

* resize video for test

* lint..

* fix test

* new a processor for video_test

* fix test
2025-07-10 10:44:28 +02:00
Raushan Turganbay
bc161d5d06 Delete deprecated stuff (#38838)
* delete deprecated stuff

* fix copies

* remove unused tests

* fix modernbert and fuyu

* Update src/transformers/cache_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* bye bye `seen_tokens`

* address comments

* update typings

* ecnoder decoder models follow same pattern as whisper

* fix copies

* why is it set to False?

* fix switch transformers

* fix encoder decoder models shared weight

* fix copies and RAG

* remove `next_cache`

* fix gptj/git

* fix copies

* fix copies

* style...

* another forgotten docsrting

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-10 05:18:44 +00:00
Yoni Gozlan
c6ee0b1da8 Fix broken SAM after #39120 (#39289)
fix
2025-07-09 17:46:22 -04:00
jiqing-feng
aff7df8436 enable static cache on TP model (#39164)
* enable static cache on TP model

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check tp size before init kv cache

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix docstring

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add tp tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix other cache head size

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-09 21:14:45 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
2ef59646b8 Fix max_length_q and max_length_k types to flash_attn_varlen_func (#37206)
Also add notes asking users to set `TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1`
or call `torch._dynamo.config.capture_scalar_outputs = True`, as currently
this will cause a graph break.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-07-09 23:12:39 +02:00
Avihu Dekel
2d600a4363 Granite speech speedups (#39197)
* ensure the query is updated during training

avoid unused parameters that DDP does not like

* avoid a crash when `kwargs` contain `padding=True`

trainers often pass this argument automatically

* minor

* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)

* minor - most feature extractors has a `sampling_rate` property

* speedup relative position embeddings

* fix several issues in model saving/loading:
- avoid modifying `self._hf_peft_config_loaded` when saving
- adapter_config automatically points to the original base model - a finetuned version should point to the model save dir.
- fixing model weights names, that are changed by adding an adapter.

* minor

* minor

* minor

* fixing a crash without peft active

* add todo to replace einsum

* granite speech speedups:
1. register attention_dist to avoid cpu-to-gpu transfer every layer.
2. pad_sequence is much faster than per-sample-padding + concat.
3. avoid returning audio back to cpu when using a compute device.

* support audio.shape=(1,L)
2025-07-09 23:09:50 +02:00
Tom Aarsen
5111c8ea2f Fix typo: langauge -> language (#39317) 2025-07-09 12:06:46 -07:00
Priya aka Priyamvadha Balakrishnan
2781ad092d docs: update LLaVA-NeXT model card (#38894)
* docs: update LLaVA-NeXT model card

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [docs] Updated llava_next model card

* Update docs/source/en/model_doc/llava_next.md remove image sources

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [fix] Change Flash Attention to SDPA badge

* [doc] fixed quantization example

* docs: updated contribution details and badges

* Update llava_next.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-09 11:32:40 -07:00
Yih-Dar
16dd7f48d0 skip files in src/ for doctest (for now) (#39316)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 19:36:48 +02:00
Eman Risha
d61c0d087c Updated the Model docs - for the MARIAN model (#39138)
* Update marian.md

This update improves the Marian model card to follow the Hugging Face standardized model card format. The changes include:

- Added a clear description of MarianMT, its architecture, and how it differs from other models.
- Provided usage examples for Pipeline and AutoModel.
- Added a quantization example for optimizing model inference.
- Included instructions and examples for multilingual translation with language codes.
- Added an Attention Mask Visualizer example.
- Added a Resources section with relevant links to papers, the Marian framework, language codes, tokenizer guides, and quantization documentation.
- Fixed formatting issues in the code blocks for correct rendering.

This update improves the readability, usability, and consistency of the Marian model documentation for users.

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update marian.md

* Update marian.md

* Update marian.md

* Update marian.md

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update marian.md

* Update marian.md

* Update marian.md

* Update marian.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-09 10:23:03 -07:00
Yih-Dar
161cf3415e add stevhliu to the list in self-comment-ci.yml (#39315)
add

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 19:07:44 +02:00
Cyril Vallez
3be10c6d19 Fix consistency and a few docstrings warnings (#39314)
* Update modeling_deepseek_v2.py

* fix docstrings

* fix

* fix
2025-07-09 18:40:37 +02:00
MaCAT
4652677c89 🌐 [i18n-KO] Translated quark.md to Korean (#39268)
* initial translation

* removed english parts

* maintain consistency

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* add toctree

* fixed indentation

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
2025-07-09 09:29:51 -07:00
Vladislav Bronzov
c980904204 Add DeepSeek V2 Model into Transformers (#36400)
* add initial structure

* doc fixes, add model base logic

* update init files

* some fixes to config and modular

* some improvements for attention

* format

* remove unused attn

* some fixes for moe layer and for decoder

* adapt _compute_yarn_parameters for deepseek

* format

* small fix

* fix for decoder forward

* add tests, small refactoring

* fix dummies

* fix init

* fix doc

* fix config docs

* add sequce doc, fix init for gate

* fix issues in tests

* fix config doc

* remove unused args

* some fixes and refactoring after review

* fix doc for config

* small fixes for config args

* revert config refactoring

* small refactoring

* minor fixes after rebase

* small fix after merge

* fix modular

* remove rotaryembd from public init

* small test fix

* some rotary pos calculation improvement

* fix format

* some improvements and fixes

* fix config

* some refactoring

* adjust some unit tests

* skip test

* small fixes and tests adjustment

* reapply modular

* fix all tests except Integration

* fix integration testzs

* cleanup BC stuff

* rope

* fix integrations tests based on a10

* style

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-09 17:04:28 +02:00
Raushan Turganbay
accbd8e0fe [sliding window] revert and deprecate (#39301)
* bring back and deprecate

* oops

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-09 16:10:38 +02:00
Cyril Vallez
1cefb5d788 [modular] Allow method with the same name in case of @property decorator (#39308)
* fix

* add example

* fix

* Update modular_model_converter.py
2025-07-09 15:46:53 +02:00
Yih-Dar
4798c05c64 skip test_torchscript_* for now until the majority of the community ask for it (#39307)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 15:35:48 +02:00
Yih-Dar
fe5f3c85d2 fix aria tests (#39277)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 13:49:33 +02:00
Raushan Turganbay
0687d481e2 [flash attn 3] bring back flags (#39294)
* flash attn 3 flag

* fix copies
2025-07-09 09:45:01 +02:00
JJJYmmm
25343aafee Fix SDPA attention precision issue in Qwen2.5-VL (#37363)
* solve conflicts and remove  redundant attention_mask in qwenvit

* update decoded text check

* remove trailing whitespace
2025-07-09 07:03:44 +02:00
Yaswanth Gali
0e1c281745 [Tests] Update model_id in AIMv2 Tests (#39281)
* Update model_id in tests

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 21:46:32 +02:00
Biao Zhang
7ef592c96c Update T5gemma (#39210)
* bug fix: add vocab_size to t5gemmaconfig for pipeline.

* Update checkpoint placeholder

* minor change

* minor change

* minor change: update example.

* fix: add vocab_size as an explict arg.

* buf fix:

remove vocab_size verification; instead, re-set encoder/decoder vocab size.

Note, in t5gemma, vocab size of encoder/decoder shoud be always the same.

* add `add_generation_prompt` for message preprocessing.
2025-07-08 19:08:48 +02:00
Quentin Lhoest
1ecd52e50a Add torchcodec in docstrings/tests for datasets 4.0 (#39156)
* fix dataset run_object_detection

* bump version

* keep same dataset actually

* torchcodec in docstrings and testing utils

* torchcodec in dockerfiles and requirements

* remove duplicate

* add torchocodec to all the remaining docker files

* fix tests

* support torchcodec in audio classification and ASR

* [commit to revert] build ci-dev images

* [commit to revert] trigger circleci

* [commit to revert] build ci-dev images

* fix

* fix modeling_hubert

* backward compatible run_object_detection

* revert ci trigger commits

* fix mono conversion and support torch tensor as input

* revert map_to_array docs + fix it

* revert mono

* nit in docstring

* style

* fix modular

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 17:06:12 +02:00
StevenBucaille
1255480fd2 [lightglue] add support for remote code DISK keypoint detector (#39253)
* feat: add trust_remote_code in LightGlueConfig

* fix: made sure trust_remote_code is provided only when necessary

* fix: make style

* docs: added missing trust_remote_code docstring

* refactor: refactored LightGlue config init

* fix: removed unnecessary argument
2025-07-08 15:03:04 +00:00
Yih-Dar
838a0268b8 fix flaky test_generate_compile_model_forward (#39276)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 15:36:05 +02:00
Pavel Iakubovskii
29d0030e23 Refactor PretrainedConfig.__init__ method to make it more explicit (#39158)
* cleanup

* fix no `__init__` test

* fix missing inits
2025-07-08 14:24:39 +01:00
Joao Gante
1580f64653 [smollm3] add tokenizer mapping for smollm3 (#39271)
add tok mapping to smollm3
2025-07-08 10:44:01 +00:00
Kashif Rasul
db05e4ff33 [pagged-attention] fix off-by-1 error in pagged attention generation (#39258)
* fix off-by-1 error in pagged attention generation

* formatting

* use update_with_token
2025-07-08 12:34:22 +02:00
Joao Gante
6f1a43896c [CI] fix docs (#39273)
* fix docs

* add ko gloassary file to toctree
2025-07-08 11:31:03 +01:00
Yaswanth Gali
fbdaa7b099 Add Aimv2 model (#36625)
* Model skelton

* changes

* temp push

* changes

* Added support for aimv2-native

* More changes

* More changes

* Stupid mistake correction

* Added config and refactor

* Added vison model

* update

* Refactor for lit variant

* Added Text Model

* Minor fixes

* nits

* update

* Preliminary tests

* More fixes

* Updated tests 🤗

* Refactor

* Updated testcase

* Updated config

* make fixup

* more fixes

* Bug fix and updates

* deadcode

* Fixes

* nit

* up

* Happy CI 

* Reduce LOC

* nit

* nit

* make style

* return_dict refactor

* bug fix

* fix

* doc update

* nit

* make fixup

* Minor update

* _init_weigths modifcation

* update tests

* Minor fixes post review

* Update w.r.t GradientCheckpointingLayer

* docs update

* update

* nit

* Use more Modular 😉

* Change name from AIMv2 to Aimv2

* Nit

* make style

* Add model doc pointer

* make style

* Update model doc section

* updates

* Modify attn mask and interface

* update test

* Final change

* Utilize flash and flex attn

* keep attn mask

* camelcase model name in test file

* Fix docstring

* Fix config warning finally and create_causal_mask

* disable torchscript

* remove unused arg

* remove from tests

* balance model size for tests

* fix device

* tests

* tests

* flaky test

* fix import

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-08 11:53:21 +02:00
Jingze Shi
d8590b4b0c Add Doge model (#35891)
* Add Doge Model

* Fix code quality

* Rollback an error commit

* Fix config for open-source weights

* Revert "Fix config for open-source weights"

This reverts commit 229cdcac10a6a4274d1dd13b729bc14c98eb0c76.

* Add modular_doge

* Update Doge inherits from Llama

* Fix import bug

* [docs] Add usage of doge model

* Fix Doge import pretrainedconfig from modeling_utils to configuration_utils

* [docs] remove trust remote code from doge

* Fix dynamo bug in doge model

* Update docstrings

* Import apply_rotary_pos_emb and repeat_kv from Llama

* Fix all nits

* Fix code quality

* Fix some bugs

* Fix code quality

* Remove inherited `_update_causal_mask` from Llama
This leads to incorrect weight initialization.

* Fix the wrong tensor orderings in DogeCDMoE

* Fix attention mask bug
We have to provide attention_mask for dynamic mask computation

* Modify most implementations to inherit from Llama
But there are two problems:
1. `flex_attention_forward` is not updated properly
2. `Example` error in the forward method of DogeForCausalLM

* Modify CDMoE for batch efficient implementation

* Uniform MoE configuration names, just like QwenMoE

* Fix code quality

* Fix code quality

* Fix code quality

* Add tp plan of CDMoE Module

* Hybird DMA with sliding window

* Update valid tokens greater than window size

* Fix code quality

* Add `convert_doge_weights_to_hf`

* Fix STATE_DICT_MAPPING in convert_doge_weights_to_hf.py

* Fix nits in modular_doge

* Fix code quality

* Fix all nits

* Fix all nits

* Make sure the attention function is updated inside the class

* Fix code quality issues in the Doge model and add a test for it

* Fix `test_generate`

* Fix code quality

* Fix nits fllowing suggestions

* Fix code quality

* Fix code quality issues

* Fix nits

* Fix code quality nits

* Fix the missing parameters in the configuration.

* Fix the missing parameters in the configuration.

* Fix nits

* Add initialization of attention

* Fix last nits

* Simplify dynamic mask generation logic

* Rename router_logits to gate_logits for matching latest changes of MixtralModel

* Rename typings for matching latest changes of MixtralModel

* Fixes typo in comment

* Update src/transformers/models/doge/modular_doge.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fix code quality issues to match other modular

* Fix code quality issues to match other modular

* Fix the static compilation errors

* Update model weights link

* Fix code quality issues to match other modular

* reapply modular and support for new outputs

* style

* simplify a lot

* fix import location

* reapply modular

* fix

* fix integration test

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-08 11:44:29 +02:00
Joonchen Liau
d370bc64c6 Fix errors when use verl to train GLM4.1v model (#39199)
* Fix errors when use verl to train GLM4.1v model

* Support glm4v load from AutoModelForVision2Seq
* Set glm4v model _checkpoint_conversion_mapping attr from None to {}

* Update modeling_auto.py
2025-07-08 09:39:31 +00:00
Arthur
5fb8bb3e1a fix recompiles due to instance key, and deepcopy issues (#39270)
* fix recompiles due to instance key, and deepcopy issues

* dict
2025-07-08 11:38:11 +02:00
Guang Yang
356fd68109 fix(generation): stop beam search per-instance when heuristic satisfied (#38778)
* fix(decoding): stop beam search per-instance when heuristic satisfied

Previously, when early_stopping is set to `False`, the early-stopping heuristic only halted generation when **all** batch instances reached the criterion. This caused instances that are impossible (suggested by the heuristic) to improve keep generating, leading to inconsistent and overlong outputs across the batch.

Now we apply the heuristic **per-instance**: once a certain instance of batch has its all beams impossibe to improve, we mark that instance finished while letting others continue. This restores expected behavior and ensures consistency in batched generation.

* Add test case GenerationIntegrationTests.test_beam_search_early_stop_heuristic

* Update naming improvement_possibility -> is_early_stop_heuristic_unsatisfied

* Add comments for early stop heuristic

* Update src/transformers/generation/utils.py

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-08 08:59:37 +00:00
Pablo Montalvo
0b0ede8b2b remove broken block (#39255)
* remove broken block

* fixup
2025-07-08 10:41:44 +02:00
Yih-Dar
a21557fa3e Skip test_eager_matches sdpa generate and update an integration test for blip-like models (#39248)
* skip

* skip

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 10:38:25 +02:00
gudwls215
ea3c2c0277 Fix license text, duplicate assignment, and typo in constant names (#39250)
- Complete Apache License text in Italian documentation
- Remove duplicate variable assignment in Perceiver converter
- Fix typo in MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES constant
2025-07-08 10:20:52 +02:00
Yao Matrix
b2816da802 fix xpu failures on PT 2.7 and 2.8 w/o IPEX and enable hqq cases on XPU (#39187)
* chameleon xpu bnb groundtruth update on bnb triton backend since we are
deprecating ipex backend

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* enable hqq uts on XPU, all passed

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix comment

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-07-08 10:18:26 +02:00
Yuxuan Zhang
17b3c96c00 Glm 4 doc (#39247)
* update the glm4 model readme

* update test

* update GLM-4.1V model

* update as format

* update

* fix some tests

* fix the rest

* fix on a10, not t4

* nit: dummy import

---------

Co-authored-by: raushan <raushan@huggingface.co>
2025-07-08 08:22:04 +02:00
Drew Ross
bbca9782ca Update LED model card (#39233)
* Update LED model card

* Remove extra arguments

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-07 15:56:57 -07:00
Yih-Dar
41e865bb8d fix some flaky tests in tests/generation/test_utils.py (#39254)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 19:49:41 +02:00
Cyril Vallez
93747d89ea Simplify Mixtral and its modular children (#39252)
* simplify mixtral a lot

* fix

* other moes

* mixtral

* qwen3

* back

* Update modular_qwen3_moe.py
2025-07-07 19:40:41 +02:00
Mikhail Moskovchenko
3993ee1e98 Add segmentation_maps support to MobileNetV2ImageProcessor (#37312)
* Add `segmentation_maps` support to mobilenet_v2 image processor and `reduce_labels` to mobilevit

* Changed mobilenetv2 tests to support fastimageprocessor

* added `segmentation_maps` support to fast image processor

* reverted to upstream/main

* Add optional

* Use autodocstring

* Changed docs

* Docs fix

* Changed fp to match beit fp

* Change typing imports

* Fixed repo inconsistency

* Added fast-slow equivalence tests

* Removed unnecessary call

* Add `reduce_labels` to Mobilevit fast processor

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-07 13:34:59 -04:00
Shohail Ismail
b96f213fcf Clarify per_device_train_batch_size scaling in TrainingArguments (#38… (#38857)
Clarify global batch size calculation in TrainingArguments (#38484)
2025-07-07 16:57:42 +00:00
Joosun Hwang
9698052560 Add Korean translation for glossary.md (#38804)
* Add Korean translation for glossary.md

* Update docs/source/ko/glossary.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Joosun40 <77312900+Joosun40@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2025-07-07 09:12:55 -07:00
Lucain
bf203aa9da Update tiny-agents example (#39245) 2025-07-07 15:58:36 +02:00
kaixuanliu
c4e39ee59c adjust input and output texts for test_modeling_recurrent_gemma.py (#39190)
* adjust input and output texts for test_modeling_recurrent_gemma.py

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix bug

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* adjust

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update Expectation match

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 15:13:25 +02:00
jiqing-feng
14cba7ad33 enable xpu on kv-cache and hqq doc (#39246)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-07 13:12:02 +00:00
Cyril Vallez
32db48db73 Fix patch helper (#39216)
remove -1
2025-07-07 15:11:48 +02:00
Pavel Iakubovskii
a3618d485a RotaryEmbeddings change is not None -> isinstance(..., dict) (#39145)
is None -> isinstance dict
2025-07-07 14:05:28 +01:00
Yih-Dar
9b09fe479f fix fastspeech2_conformer tests (#39229)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 15:04:26 +02:00
Zhen
00e9efceab [bugfix] fix flash attention 2 unavailable error on Ascend NPU (#39166)
[bugfix] fix flash attention 2 error on Ascend NPU
2025-07-07 13:03:39 +00:00
Cyril Vallez
056fa73fae [modular] Simplify logic and docstring handling (#39185)
* simplify a lot

* Update modular_model_converter.py

* finalize

* remove outdated functions

* apply it

* and examples
2025-07-07 14:52:57 +02:00
Xavier Dupré
f16fbfb89a Make _compute_dynamic_ntk_parameters exportable (#39171)
* Make _compute_dynamic_ntk_parameters exportable

* add unit test
2025-07-07 14:48:31 +02:00
kaixuanliu
4243bb844d fix bug using FSDP V1 will lead to model device not properly set (#39177)
* fix bug using FSDP V1 will lead to model device not properly set

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update the code

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-07-07 14:47:04 +02:00
Yih-Dar
34c16167eb Don't send new comment if the previous one is less than 30 minutes (unless the content is changed) (#39170)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 14:43:50 +02:00
Daniel van Strien
b8f397e456 fix typo in Gemma3n notes (#39196) 2025-07-07 14:41:33 +02:00
Cyril Vallez
5348fbc005 [modular] Follow global indexing and attribute setting, and their dependencies (#39180)
* export global indexing statements

* add example

* style

* examples
2025-07-07 14:36:43 +02:00
Isotr0py
8570bc29f3 Fix missing fast tokenizer/image_processor in whisper/qwen2.5-omni processor (#39244)
* fix missing fast tokenizer in whisper processor

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix processor test

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix qwen2.5 omni processor

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-07 13:54:18 +02:00
Joshua Lochner
b283d52f7f [vjepa2] replace einsum with unsqueeze (#39234) 2025-07-07 11:14:08 +01:00
Rémi Ouazan
a325409a50 Expectations re-order and corrected FA3 skip (#39195)
* Fix Expectations and a FA3 skip

* Fixed docstring

* Added context for Default expectation
2025-07-07 11:42:33 +02:00
zrohyun
b0a8e0b8d7 [video processors] Support float fps for precise frame sampling (#39134)
* [video processors] Support float fps for precise frame sampling

Enable fractional fps values (e.g., 1.5, 29.97) in video processors
for more precise frame sampling control.

- Change fps type from int to float across all video processors
- Maintain backward compatibility with integer values

Extends: #38105

* [video processors] Refine fps typing to Union[int, float]

Change fps type from Optional[float] to Optional[Union[int, float]]
for more explicit type information about supporting both integer
and floating-point frame rates.

- Update type hints and docstrings across 8 files
- Maintain backward compatibility
- Clarify support for both int and float values

Extends: #38105

* Revert "[video processors] Support float fps for precise frame sampling"

This reverts commit 7360d6e661b413ca0239e5ef61f9b1abbeab8e65.
2025-07-07 03:43:43 +00:00
Arthur
ca7e1a3756 Refactor the way we handle outputs for new llamas and new models (#39120)
* just update 2 files

* update other models as well just making fix-copies

* also add the changes needed to modeling utils

* put this on the pretrained model instead

* nits and fixes

* update generic, fix to use config value

* update other modelings

* use transformers kwargs instead

* update

* update

* update other models

* update

* updates

* update

* update

* update

* fix

* finally

* very small nits

* this fixes more tests

* fix other models as well!

* update modularqwen2

* update models based on qwen2

* update

* update

* remove the **flash stuff in favor of noraml kwargs

* update

* propagate gemma?

* remove output attentions

* propagate

* support cross attention edge case

* same

* test this

* fixes

* more fix

* update

* update

* fix conflicts

* update

* fix emu3

* fix emu3

* move the fix a bit

* quel enfer

* some fixes, loss_kwargs should never had been

* finish fixing gemma3n

* fix small lm3

* fix another one

* fix csm now

* fux csm and mistral

* fix mistral now

* small fixes

* fix janusss

* only for some models

* fixup

* phix phi3

* more fixes?

* dose this fix it?

* update

* holy shit it was just graph breaks

* protect torch

* updates

* fix samhq?

* fix moonshine

* more moonshine fixes, 3 failures left!

* nits

* generic needs to support more

* more fixes to moonshine!

* fix cross attention outputs!

* fix csm!

* nits

* fix stupid kosmos2

* current updates

* fixes

* use output recorder?

* nicer!

* a little bit of magic

* update

* fix protect

* fix

* small fixes

* protect import

* fix a bunch of more models

* fix fixups

* fix some of the last ones

* nit

* partly fix phi

* update

* fix import path

* make something that is fullgraph compatible just to be sure

* typing was wrong on llama so the rest was wrong as well

* fucking ugly but at least it is still exportable

* syle

* supposed to fix moonshine, it still breaks

* fix some default

* fix the last bits of sam

* update samhq

* more fixes to am hq

* nit

* fix all output+hidden states and output_attentions!

* fix?

* fix diffllama

* updates to fix initialization on the sam pips

* ups there was a bug

* fix the last sam hq test

* fix gotocr

* fix gotocr2!

* fixes

* skip stupid tests

* there was one left :)

* fixup

* fix fix copies issues with this test file

* fix copies for sam_hq

* rm some comments

* skip 2 more failing tests

* fix

* fix everything

* Apply suggestions from code review

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* add more doc!

* fix public init

* fix modular qwen3

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2025-07-05 11:34:28 +02:00
Yih-Dar
e6a8063ef1 Update expected values (after switching to A10) - part 8 - Final (#39220)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-04 13:35:53 +02:00
Yih-Dar
cd8a041a4f Update expected values (after switching to A10) - part 7 (#39218)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-04 12:48:10 +02:00
Cyril Vallez
0cf27916f0 Add packed tensor format support for flex/sdpa/eager through the mask! (#39194)
* Add the necesary logic to mask_utils

* add it everywhere

* Update masking_utils.py

* style

* Update masking_utils.py

* Update modeling_mimi.py

* Update masking_utils.py

* add support for more than batch size 1

* Update masking_utils.py

* add test

* style

* Update test_masking_utils.py

* Update masking_utils.py

* add require_token

* fix tests

* fix
2025-07-04 09:01:56 +02:00
Yih-Dar
037755ed54 Update expected values (after switching to A10) - part 6 (#39207)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-03 22:45:30 +02:00
Yih-Dar
1168f57abf Update expected values (after switching to A10) - part 5 (#39205)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-03 19:56:02 +02:00
Lysandre Debut
7d9e52f376 Fix continuous batching in transformers serve (#39149)
* Fix CB

* Nit

* Update src/transformers/commands/serving.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add todos

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-03 18:15:31 +02:00
Joao Gante
85d93cc6e3 [serve] Cursor support, move docs into separate page, add more examples (#39133)
* jan docs

* rm

* [cursor] tmp commit

* Cursor working :D

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/commands/serving.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* cursor docs

* try to fix agents/tools docs?

* try to fix agents/tools docs?

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* add transformers chat example with transformers serve

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2025-07-03 17:04:16 +01:00
Pavel Iakubovskii
e15b06d8dc [typing] better return typehints for from_pretrained (#39184)
* config

* processor

* feature-extractor

* jukebox

* fixup

* update other methods in config

* remove "PretrainedConfig" annotations
2025-07-03 14:22:47 +00:00
Yih-Dar
a25fc3592e Update expected values (after switching to A10) - part 4 (#39189)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-03 15:13:06 +02:00
Anton Vlasjuk
b31e9d19a6 [Dia] Change ckpt path in docs (#39181)
fix ckpt path
2025-07-03 10:02:58 +00:00
Ilyas Moutawwakil
18e0cae207 Fix many HPU failures in the CI (#39066)
* more torch.hpu patches

* increase top_k because it results in flaky behavior when Tempreture, TopP and TopK are used together, which ends up killing beams early.

* remove temporal fix

* fix scatter operation when input and src are the same

* trigger

* fix and reduce

* skip finding batch size as it makes the hpu go loco

* fix fsdp (yay all are passing)

* fix checking equal nan values

* style

* remove models list

* order

* rename to cuda_extensions

* Update src/transformers/trainer.py
2025-07-03 11:17:27 +02:00
Marc Sun
bff964c429 Decouple device_map='auto' and tp_plan='auto' (#38942)
* dissociate

* better place

* fix
2025-07-03 11:07:11 +02:00
Wing Lian
8178c43112 when delaying optimizer creation only prepare the model (#39152) 2025-07-03 09:04:16 +02:00
Raushan Turganbay
91221da2f1 [glm4v] fix video inference (#39174)
fix video inference
2025-07-03 05:20:41 +00:00
Rémi Ouazan
ebfbcd42da Test fixes for Aria (and some Expectation for llava_next_video) (#39131)
* Expectations for llava_next_video

* Updated image src in aria

* Fix test_small_model_integration_test

* Fix small model integration llama

* Fix a bunch of tests

* Style

* Shortened generation in test from 900 to 90
2025-07-02 23:41:14 +02:00
Yih-Dar
37a239ca50 Update expected values (after switching to A10) - part 3 (#39179)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-02 22:48:30 +02:00
Yih-Dar
9326fc332d Update expected values (after switching to A10) - part 2 (#39165)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* empty

* [skip ci]

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-02 22:47:55 +02:00
Pedro Cuenca
25cd65ac43 Random serve fixes (#39176)
* Fix index out of bounds exception on wrong kv reuse

* Prevent loading same model twice

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2025-07-02 22:09:58 +02:00
Lysandre Debut
548794b886 [serve] Model name or path should be required (#39178)
* Model name or path should be required

* Fix + add tests

* Change print to log so it doesn't display in transformers chat
2025-07-02 22:06:47 +02:00
Joao Gante
2d561713f8 [generate] document non-canonical beam search default behavior (#39000) 2025-07-02 18:29:16 +01:00
Steven Liu
df12d87d18 [docs] ViTPose (#38630)
* vitpose

* fix?

* fix?

* feedback

* fix

* feedback

* feedback

* update sample image
2025-07-02 07:56:29 -07:00
Cyril Vallez
2b4a12b5bf Reduce Glm4v model test size significantly (#39173)
* fix test size

* Update test_modeling_glm4v.py
2025-07-02 15:55:05 +02:00
BUI Van Tuan
e355c0a11c Fix missing initializations for models created in 2024 (#38987)
* fix GroundingDino

* fix SuperGlue

* fix GroundingDino

* fix MambaModel

* fix OmDetTurbo

* fix SegGpt

* fix Qwen2Audio

* fix Mamba2

* fix DabDetr

* fix Dac

* fix FalconMamba

* skip timm initialization

* fix Encodec and MusicgenMelody

* fix Musicgen

* skip timm initialization test

* fix OmDetTurbo

* clean the code

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* add reviewed changes

* add back timm

* style

* better check for parametrizations

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-02 15:03:57 +02:00
Rémi Ouazan
1125513a8d Blip2 fixes (#39080)
* Fixed some devices errors

* Fixed other device issues and more expectations

* Reverted support flags

* style

* More granular support

* Fixed some rebase stuff

* add a not None check before .to
2025-07-02 14:39:39 +02:00
Isotr0py
28df7f854a Fix multimodal processor get duplicate arguments when receive kwargs for initialization (#39125)
* fix processor tokenizer override

Signed-off-by: Isotr0py <2037008807@qq.com>

* code format

Signed-off-by: Isotr0py <2037008807@qq.com>

* add regression test

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

* check image processor same

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-02 19:57:15 +08:00
Yaswanth Gali
b61023a1b7 🚨🚨🚨 [eomt] make EoMT compatible with pipeline (#39122)
* Make EoMT compatible with pipeline

* Implicit patch offsets

* remove patch offsets from arg

* Modify tests

* Update example

* fix proc testcase

* Add few more args

* add pipeline test suite

* fix

* docstring fixes

* add pipeline test

* changes w.r.t review

* 🙈 MB

* should fix device mismatch

* debug

* Fixes device mismatch

* use decorator

* we can split mlp

* expected values update

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2025-07-02 12:25:26 +01:00
Raushan Turganbay
4d5822e65d [smolvlm] fix video inference (#39147)
* fix smolvlm

* better do as before, set sampling params in overwritten `apply_chat_template`

* style

* update with `setdefault`
2025-07-02 12:05:10 +02:00
वेदांत
9b2f5b66d8 fix default value of config to match checkpionts in LLaVa-OV models (#39163) 2025-07-02 09:45:50 +00:00
Chong You
e8e0c76162 Add activation sparsity reference in gemma3n doc (#39160)
Add activation sparsity reference in the description of gemma3n
2025-07-02 04:11:03 +02:00
Yih-Dar
8e87adc45f fix llama tests (#39161)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-01 23:27:22 +02:00
Yih-Dar
4c1715b610 Update expected values (after switching to A10) (#39157)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* empty

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-01 20:54:31 +02:00
Yih-Dar
ab59cc27fe Suggest jobs to use in run-slow (#39100)
* pr

* pr

* pr

* pr

* pr

* pr

* pr

* pr

* pr

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-01 20:19:06 +02:00
jiqing-feng
db2f535443 update bnb ground truth (#39117)
* update bnb resulte

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* set seed to avoid sampling different results

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix int8 tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add comments

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-01 20:06:37 +02:00
ybkurt
260846efad fix: remove undefined variable (#39146) 2025-07-01 19:10:29 +02:00
rasmi
cdfe49a4d0 Change @lru_cache() to @lru_cache to match styles from #38883. (#39093)
Match styles in #38883
2025-07-01 18:29:16 +02:00
DavidS2106
f46798193e Fix: Ensure wandb logs config in offline mode (#38992)
* Fix: Ensure wandb logs config in offline mode

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-07-01 16:17:58 +00:00
Yih-Dar
fe838d6631 Fix missing fsdp & trainer jobs in daily CI (#39153)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-01 18:10:30 +02:00
StevenBucaille
1283877571 [superglue] fix wrong concatenation which made batching results wrong (#38850) 2025-07-01 12:14:44 +00:00
Raushan Turganbay
f8b88866f5 [VLMs] support passing embeds along with pixels (#38467)
* VLMs can work with embeds now

* update more models

* fix tests

* fix copies

* fixup

* fix

* style

* unskip tests

* fix copies

* fix tests

* style

* omni modality models

* qwen models had extra indentation

* fix some other tests

* fix copies

* fix test last time

* unrelated changes revert

* we can't rely only on embeds

* delete file

* de-flake mistral3

* fix qwen models

* fix style

* fix tests

* fix copies

* deflake the test

* modular reverted by fixes, fix again

* flaky test, overwritten

* fix copies

* style
2025-07-01 11:33:20 +00:00
Ayush Singh
20901f1d68 [typing] LlamaAttention return typehint (#38998)
* helo llama

* helo llama

* helo llama

* apply modular

* fix dia

---------

Co-authored-by: qubvel <qubvel@gmail.com>
2025-07-01 11:29:52 +01:00
Raushan Turganbay
7a25f8dfdb [qwen2-vl] fix FA2 inference (#39121)
* fix FA2

* update is causal flag and remove mask for FA2

* update for FA2 with varlen path

* how the tests were passing with different devices?

* add comment and ref to the PR

* move mask preparation to base pretrained model

* seq len is the first dim, not second

* fix copies to fix GLM4V
2025-07-01 10:18:37 +00:00
Mehant Kammakomati
def9663239 feat: support indivisible shards for TP model loading and TPlizing. (#37220)
* feat: support uneven loading and sharding
resolve merge conflicts
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: allow for empty tensor computations

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* test: add llama1b test case

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* due to q_proj colwise it has to be multi of 2

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* refactor: use slice API

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* refactor: use slice API

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* refactor: use slice API

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* refactor: use slice API

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

---------

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
2025-07-01 10:03:22 +00:00
jiqing-feng
06c4a4d499 fix caching_allocator_warmup with tie weights (#39070)
* fix caching_allocator_warmup with tie weights

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-01 11:32:20 +02:00
Raushan Turganbay
e435574721 🚨 Don't use cache in non-generative models (#38751)
* deprecate for 1 version

* style

* fix some tests

* fix esm

* skip for now, GC requires positional args but we have keyword args

* remove transpose for scores in modified models only

* skip fx trace tests
2025-07-01 09:08:21 +00:00
Cyril Vallez
dbc98328da Several fixes for Gemma3n (#39135)
* remove the skips

* fix the epsilon to a small value (does not make sense otherwise)

* safeguard

* overload test_eager_matches_sdpa

* Update test_modeling_common.py

* skip appropriate tests

* correct no_split_layer

* fix all devices issue

* fix backward

* fix
2025-07-01 10:34:53 +02:00
BUI Van Tuan
d53518c5f2 Fix key mapping for VLMs (#39029)
* fix key mapping for VLMs

* use __mro__ instead

* update key mapping in save_pretrained
2025-07-01 09:47:53 +02:00
eustlb
3457e8e73e [Whisper] update token timestamps tests (#39126)
* fixes

* update comment

* update for A10

* all a10

* all a10

* all a10

* all a10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-30 21:55:36 +02:00
Drew Ross
fe35eca7bd Update BigBirdPegasus model card (#39104)
* Update igbird_pegasus.md

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-30 10:42:56 -07:00
Yao Matrix
29a3f5ed8c switch default xpu tp backend to pytorch built-in XCCL from pytorch 2.8 (#39024)
* switch default xpu tp backend to pytorch built-in XCCL from pytorch 2.8

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update docs/source/en/perf_infer_gpu_multi.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update perf_infer_gpu_multi.md

* Update perf_infer_gpu_multi.md

* Update perf_infer_gpu_multi.md

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-30 08:54:05 -07:00
Vladimir Gutuev
9e0c865b8b docs: correct two typos in awesome-transformers.md (#39102)
* docs(awesome-projects): fix typo “Itt leverages” → “It leverages” (#39101)

closes #39101

* docs(awesome-projects): fix grammar “We provides” → “We provide” (#39101)

closes #39101
2025-06-30 08:53:43 -07:00
jiqing-feng
03db2700ab Enable XPU doc (#38929)
* fix example with dataset

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update torchao doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update torchao doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device type

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert torchao change

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix torchao doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert torchao change

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update xpu torchao doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update chat_templating_multimodal.md

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* use full name for int8

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert int8 title

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-06-30 07:56:55 -07:00
Joao Gante
ea0ea392e5 Fix chat (#39128) 2025-06-30 13:47:48 +00:00
Lysandre Debut
ed36f8490e Licenses (#39127)
* Licenses

* Licenses
2025-06-30 15:25:36 +02:00
Lysandre Debut
e8f90b5397 Split transformers chat and transformers serve (#38443)
* Next token

* Split chat and serve

* Support both generation methods

* Style

* Generation Config

* temp

* temp

* Finalize serving.py

Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>

* Finalize chat.py

* Update src/transformers/commands/serving.py

Co-authored-by: célina <hanouticelina@gmail.com>

* Lucain's comments

Co-authored-by: Lucain <lucain@huggingface.co>

* Update

* Last comments on PR

* Better error handling

* Better error handling

* CI errors

* CI errors

* Add tests

* Fix tests

* Fix tests

* [chat] Split chat/serve (built on top of lysandre's PR) (#39031)

* Next token

* Split chat and serve

* Support both generation methods

* Style

* Generation Config

* temp

* temp

* Finalize serving.py

Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>

* Finalize chat.py

* Update src/transformers/commands/serving.py

Co-authored-by: célina <hanouticelina@gmail.com>

* Lucain's comments

Co-authored-by: Lucain <lucain@huggingface.co>

* Update

* Last comments on PR

* Better error handling

* Better error handling

* CI errors

* CI errors

* Add tests

* Fix tests

* Fix tests

* streaming tool call

* abstract tool state; set tool start as eos

* todos

* server working on models without tools

* rm chat's deprecated flags

* chat defaults

* kv cache persists across calls

* add server docs

* link

* Update src/transformers/commands/serving.py

* Apply suggestions from code review

* i love merge conflicts

* solve multi turn with tiny-agents

* On the fly switching of the models

* Remove required positional arg

---------

Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>
Co-authored-by: Lucain <lucain@huggingface.co>

* Protect names

* Fix tests

---------

Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-06-30 15:10:53 +02:00
Yih-Dar
539c6c2fa8 All CI jobs with A10 (#39119)
all a10

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-30 14:23:27 +02:00
Ryan Mullins
ed9f252608 docs: Gemma 3n audio encoder (#39087)
Updating Gemma 3n docs and docstrings to clarify the relationship
between the newly trained audio encoder used in Gemma 3n and the USM
model from the original paper.
2025-06-30 14:10:51 +02:00
Yuxuan Zhang
4a79bf947d Fix some bug for finetune and batch infer For GLM-4.1V (#39090)
* update

* 1
2025-06-30 12:16:22 +02:00
Yao Matrix
2100ee6545 fix UT failures on XPU w/ stock PyTorch 2.7 & 2.8 (#39116)
* fix UT failures on XPU w/ stock PyTorch 2.7 & 2.8

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* zamba2

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* xx

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* internvl

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* tp cases

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-30 11:49:03 +02:00
Yih-Dar
ccf2ca162e skip some test_sdpa_can_dispatch_on_flash (#39092)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-27 23:08:14 +02:00
st81
a11f692895 Fixes the failing test test_is_split_into_words in test_pipelines_token_classification.py (#39079)
* Fix test pipelines token classification for is_split_into_words

* Fix incorrect import format
2025-06-27 19:25:32 +01:00
Sandeep Yadav
18143c76bf Sandeepyadav1478/2025 06 19 deberta v2 model card update (#38895)
* [docs]: update deberta-v2.md model card

* chore: req updates

* chore: address code review feedback and update docs

* chore: review feedback and updates

* chore: model selection updates

* chores: quantizations review updates
2025-06-27 10:35:30 -07:00
Steven Liu
02a769b058 [fix] Add FastSpeech2ConformerWithHifiGan (#38207)
* add to mapping

* oops

* oops

* add to config_mapping_names

* revert

* fix?

* config-mapping-names

* fix?

* fix?
2025-06-27 09:38:21 -07:00
Benjamin Bossan
c2dc72bb5f TST Fix PEFT integration test bitsandbytes config (#39082)
TST Fix PEFT integration test bitsandbytes config

The PEFT integration tests still used load_in_{4,8}_bit, which is
deprecated, moving to properly setting BitsAndBytesConfig. For 4bit,
also ensure that nf4 is being used to prevent

> RuntimeError: quant_type must be nf4 on CPU, got fp4
2025-06-27 18:33:11 +02:00
Matej Sirovatka
c8064bea9a Fix: unprotected import of tp plugin (#39083) 2025-06-27 17:28:05 +02:00
farrosalferro
dd7dc4a4a2 Add Fast Image Processor for Chameleon (#37140)
* Add Fast Image Processor for Chameleon

* add warning to resize and move blend_rgba to convert_to_rgb

* Remove unrelated files

* Update image_processing_chameleon_fast to use auto_docstring

* fix equivalence test

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-06-27 15:26:57 +00:00
Yih-Dar
6d773fc3bc fix dots1 tests (#39088)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-27 16:54:11 +02:00
Tijana Vukovic
c8764ab935 guard torch distributed check (#39057)
* guard torch distributed check

* Update src/transformers/pipelines/base.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-06-27 14:49:47 +00:00
MinJu-Ha
49d9fd49bd Add Fast Image Processor for mobileViT (#37143)
* Add image_processing_mobilevit_fast.py

* Fix copies

* update _preprocess for channel_flip

* Update for batched image processing

* Resolve merge conflicts with main

* Fix import order and remove trailing whitespace (ruff clean-up)

* Fix copy inconsistencies

* Add NotImplementedError for post_process_semantic_segmentation to satisfy repo checks

* Add auto_docstring

* Adjust style

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/mobilevit/image_processing_mobilevit_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/mobilevit/image_processing_mobilevit_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Delete not used function

* test: add missing tests for  and

* Add post_process_semantic_segmentation to mobilevit_fast.py

* Add preprocess function to image_processing_mobilebit_fast.py

* ruff check for formatting

* fix: modify preprocess method to handle BatchFeature correctly

* Remove logic for default value assignment

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove normalization adn RGB conversion logic not used in slow processor

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Simplify return_tensors logic using one-liner conditional expression

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove unused normalization and format parameters

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* add **kwargs and remove default values in _preprocess

* add slow_fast equivalence tests for segmentation

* style: autoformat code with ruff

* Fix slow_fast equivalence test

* merge + remove skipped test

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-06-27 14:40:24 +00:00
Nahieli
4336ecd1ea add fast image processor nougat (#37661)
* add fast image processor nougat

* test fixes

* docstring white space

* last fixes

* docstring_type

* tolerance unit test

* fix tolerance

* fix rtol

* remove traling white space

* remove white space

* note for tolerance unit test

* fix tests

* remove print

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-06-27 14:39:43 +00:00
Benjamin Bossan
0c35280e58 TST PEFT integration tests with pipeline generate (#39086)
Some PEFT integration tests involving text generation pipelines were
failing since #38129 because the base model is too small to generate
longer sequences. Setting max_new_tokens fixes this.
2025-06-27 15:58:10 +02:00
JINO ROHIT
993665a5ff fixed typo for docstring in prepare_inputs method (#39071) 2025-06-27 13:57:56 +00:00
Yih-Dar
839893c86b fix mistral3 tests (#38989)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-27 15:44:10 +02:00
eustlb
2b85b6ce19 [Whisper] 🚨 Fix pipeline word timestamp: timestamp token is end of token time !!! (#36632)
* timestamp token is end of token time !!!

* ensure correct alignment between tokens and timestamp tokens

* ignore input tokens for DTW computation

* use num_frames to avoid token timestamp hallucinations

* token timestamps test updates !

* num_frames: deprecate and use attention_mask instead

* avoid breaking change

* fix the pipeline usage for chunk approach

* make style

* better logging

* better logging

* make style

* update tests with correct values
2025-06-27 12:51:43 +00:00
eustlb
9c8d3a70b8 Pipeline: fix unnecessary warnings (#35753)
* return attention mask

* use correct model input name

* fix

* make
2025-06-27 14:32:03 +02:00
Yaswanth Gali
1750c518dd Add EoMT Model || 🚨 Fix Mask2Former loss calculation (#37610)
* Initial Commit

* up

* More changes

* up

* Only mask_logits mismatch

* close enough logits debug later

* fixes

* format

* Add dummy loss

* Close enough processing for semantic seg

* nit

* Added panoptic postprocessor

* refactor

* refactor

* finally fixed panoptic postprocessor

* temp update

* Refactor ForUniversalSegmentation class

* nits and config update

* Few fixes and inference matches

* change mapping

* Added training support but loss slightly off 🥲

* Loss is matching 😀

* update

* Initial tests skelton

* changes

* tests update

* more modular

* initial tests

* updates

* better docstrings

* changes

* proc tests passing :)

* Image processor update

* tiny change

* QOL changes

* Update test w.r.t latest attn refactor

* repo-consistency fixes

* up

* Image proc fix and integration tests :)

* docs update

* integration tests

* fix

* docs update 🥰

* minor fix

* Happy CI

* fix

* obvious refactoring

* refactoring w.r.t review

* Add fask image proc skelton

* Fast Image proc and cleanups

* Use more modular

* tests update

* Add more tests

* Nit

* QOL updates

* change init_weights to torch default

* add eager func coz of make style

* up

* changes

* typo fix

* Updates

* More deterministic tests

* More modular

* go more modular 🚀

* up

* dump

* add supprot for giant ckpts

* overhaul

* modular

* refactor

* instace seg is ready

* cleanup

* forgot this

* docs cleanup

* minor changes

* EoMT - > Eomt

* Happy CI

* remove redundant comment

* Change model references

* final change

* check annealing per block

* My other PR changes 😂

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-06-27 14:18:18 +02:00
Yao Matrix
0106a50a6b fix a bunch of XPU UT failures on stock PyTorch 2.7 and 2.8 (#39069)
* fix a bunch of XPU UT failures on stock PyTorch 2.7 and 2.8

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* qwen3

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* quanto

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* models

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* idefics2

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-27 14:01:53 +02:00
Mohamed Mekkouri
cb17103bd5 Uninstallling Flash attention from quantization docker (#39078)
* update

* revert
2025-06-27 13:51:46 +02:00
BUI Van Tuan
371c471113 Fix initialization of OneFormer (#38901)
* fix initialization of OneFormer

* remove redundant initializations

* remove redundant initializations

* remove redundant initializations

* keep BC
2025-06-27 12:39:37 +02:00
Yih-Dar
540a10848c fix Gemma3nProcessorTest (#39068)
* fix

* fix

* oups forgot style

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-27 12:28:10 +02:00
Yaswanth Gali
0d66ef7792 Cleanup Attention class for Siglip and dependent models (#39040)
* cleanup attention class

* More models

* more models

* Changes

* make style

* Should fix CI

* This should work 🙏
2025-06-27 12:14:09 +02:00
eustlb
1ccc73dee9 [Whisper] fix shape mismatch in tests (#39074)
fix shape mismatch
2025-06-27 09:27:42 +00:00
Steven Liu
a52478253b [docs] Tensor parallelism (#38241)
* updates

* feedback

* badges

* fix?

* fix?

* fix?

* fix?
2025-06-26 14:40:45 -07:00
Steven Liu
84e8696cae [docs] @auto_docstring (#39011)
* refactor

* feedback
2025-06-26 14:21:54 -07:00
Drew Ross
018855de63 Update PEGASUS-X model card (#38971)
* Update PEGASUS-X model card

* Add cache_implementation argument in quantization code example

* Update CLI example

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Remove TensorFlow and Flax badges

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-26 13:54:48 -07:00
Steven Liu
757c26fb40 [docs] Model contribution (#38995)
improve
2025-06-26 12:25:14 -07:00
Yih-Dar
b372bb5ed1 fix layoutlmv3 tests (#39050)
* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-26 20:07:17 +02:00
StevenBucaille
f171e7e884 Update SuperPoint model card (#38896)
* docs: first draft to more standard SuperPoint documentation

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs: reverted changes on Auto classes

* docs: addressed the rest of the comments

* docs: remove outdated reference to keypoint detection task guide in SuperPoint documentation

* Update superpoint.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-26 10:13:06 -07:00
Yih-Dar
2f50230c59 fix t5gemma tests (#39052)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-26 18:48:14 +02:00
Yih-Dar
23b7e73f05 fix test_compare_unprocessed_logit_scores (#39053)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-26 18:36:56 +02:00
Anton Vlasjuk
58c7689226 [Flex Attn] Fix torch 2.5.1 incompatibilities (#37406)
* remove compile on mask creation, ensure kv blocks do not explode on indices

* trigger ci

* switch dynamic compilation to false

* patch new masking functions as well

* add len check

* i was wrong

* last comment
2025-06-26 18:23:55 +02:00
Lysandre
5154497607 Dev version 2025-06-26 18:04:36 +02:00
1384 changed files with 129556 additions and 29892 deletions

View File

@@ -303,7 +303,7 @@ non_model_job = CircleCIJob(
docker_image=[{"image": "huggingface/transformers-torch-light"}],
# networkx==3.3 (after #36957) cause some issues
# TODO: remove this once it works directly
install_steps=["uv venv && uv pip install ."],
install_steps=["uv venv && uv pip install .[serving]"],
marker="not generate",
parallelism=6,
)

View File

@@ -18,6 +18,10 @@ jobs:
notebook_folder: transformers_doc
languages: ar de en es fr hi it ko pt tr zh ja te
custom_container: huggingface/transformers-doc-builder
# Temporary pin to work around datasets exception in the docbuilder.Remove after docker images and main have
# the right dependencies (which **should** be the case by 2025-07-20). See
# https://github.com/huggingface/transformers/actions/runs/16365952006/job/46243081358?pr=38545
pre_command: uv pip install datasets>=2.15.0
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

View File

@@ -15,3 +15,7 @@ jobs:
pr_number: ${{ github.event.number }}
package: transformers
languages: en
# Temporary pin to work around datasets exception in the docbuilder. Remove after docker images and main have
# the right dependencies (which **should** be the case by 2025-07-20). See
# https://github.com/huggingface/transformers/actions/runs/16365952006/job/46243081358?pr=38545
pre_command: uv pip install datasets>=2.15.0

View File

@@ -41,7 +41,7 @@ jobs:
check_new_failures:
name: " "
runs-on:
group: aws-g4dn-4xlarge-cache
group: aws-g5-4xlarge-cache
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

View File

@@ -28,7 +28,7 @@ jobs:
matrix:
split_keys: ${{ fromJson(inputs.split_keys) }}
runs-on:
group: aws-g4dn-4xlarge-cache
group: aws-g5-4xlarge-cache
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

View File

@@ -15,7 +15,7 @@ jobs:
setup:
name: Setup
runs-on:
group: aws-g4dn-4xlarge-cache
group: aws-g5-4xlarge-cache
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

157
.github/workflows/get-pr-info.yml vendored Normal file
View File

@@ -0,0 +1,157 @@
name: Get PR commit SHA
on:
workflow_call:
inputs:
pr_number:
required: true
type: string
outputs:
PR_HEAD_REPO_FULL_NAME:
description: "The full name of the repository from which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REPO_FULL_NAME }}
PR_BASE_REPO_FULL_NAME:
description: "The full name of the repository to which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_REPO_FULL_NAME }}
PR_HEAD_REPO_OWNER:
description: "The owner of the repository from which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}
PR_BASE_REPO_OWNER:
description: "The owner of the repository to which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_REPO_OWNER }}
PR_HEAD_REPO_NAME:
description: "The name of the repository from which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}
PR_BASE_REPO_NAME:
description: "The name of the repository to which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_REPO_NAME }}
PR_HEAD_REF:
description: "The branch name of the pull request in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REF }}
PR_BASE_REF:
description: "The branch name in the base repository (to merge into)"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_REF }}
PR_HEAD_SHA:
description: "The head sha of the pull request branch in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_SHA }}
PR_BASE_SHA:
description: "The head sha of the target branch in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_SHA }}
PR_MERGE_COMMIT_SHA:
description: "The sha of the merge commit for the pull request (created by GitHub) in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_SHA }}
PR_HEAD_COMMIT_DATE:
description: "The date of the head sha of the pull request branch in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_COMMIT_DATE }}
PR_MERGE_COMMIT_DATE:
description: "The date of the merge commit for the pull request (created by GitHub) in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_DATE }}
PR_HEAD_COMMIT_TIMESTAMP:
description: "The timestamp of the head sha of the pull request branch in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_COMMIT_TIMESTAMP }}
PR_MERGE_COMMIT_TIMESTAMP:
description: "The timestamp of the merge commit for the pull request (created by GitHub) in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
PR:
description: "The PR"
value: ${{ jobs.get-pr-info.outputs.PR }}
PR_FILES:
description: "The files touched in the PR"
value: ${{ jobs.get-pr-info.outputs.PR_FILES }}
jobs:
get-pr-info:
runs-on: ubuntu-22.04
name: Get PR commit SHA better
outputs:
PR_HEAD_REPO_FULL_NAME: ${{ steps.pr_info.outputs.head_repo_full_name }}
PR_BASE_REPO_FULL_NAME: ${{ steps.pr_info.outputs.base_repo_full_name }}
PR_HEAD_REPO_OWNER: ${{ steps.pr_info.outputs.head_repo_owner }}
PR_BASE_REPO_OWNER: ${{ steps.pr_info.outputs.base_repo_owner }}
PR_HEAD_REPO_NAME: ${{ steps.pr_info.outputs.head_repo_name }}
PR_BASE_REPO_NAME: ${{ steps.pr_info.outputs.base_repo_name }}
PR_HEAD_REF: ${{ steps.pr_info.outputs.head_ref }}
PR_BASE_REF: ${{ steps.pr_info.outputs.base_ref }}
PR_HEAD_SHA: ${{ steps.pr_info.outputs.head_sha }}
PR_BASE_SHA: ${{ steps.pr_info.outputs.base_sha }}
PR_MERGE_COMMIT_SHA: ${{ steps.pr_info.outputs.merge_commit_sha }}
PR_HEAD_COMMIT_DATE: ${{ steps.pr_info.outputs.head_commit_date }}
PR_MERGE_COMMIT_DATE: ${{ steps.pr_info.outputs.merge_commit_date }}
PR_HEAD_COMMIT_TIMESTAMP: ${{ steps.get_timestamps.outputs.head_commit_timestamp }}
PR_MERGE_COMMIT_TIMESTAMP: ${{ steps.get_timestamps.outputs.merge_commit_timestamp }}
PR: ${{ steps.pr_info.outputs.pr }}
PR_FILES: ${{ steps.pr_info.outputs.files }}
if: ${{ inputs.pr_number != '' }}
steps:
- name: Extract PR details
id: pr_info
uses: actions/github-script@v6
with:
script: |
const { data: pr } = await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: ${{ inputs.pr_number }}
});
const { data: head_commit } = await github.rest.repos.getCommit({
owner: pr.head.repo.owner.login,
repo: pr.head.repo.name,
ref: pr.head.ref
});
const { data: merge_commit } = await github.rest.repos.getCommit({
owner: pr.base.repo.owner.login,
repo: pr.base.repo.name,
ref: pr.merge_commit_sha,
});
const { data: files } = await github.rest.pulls.listFiles({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: ${{ inputs.pr_number }}
});
core.setOutput('head_repo_full_name', pr.head.repo.full_name);
core.setOutput('base_repo_full_name', pr.base.repo.full_name);
core.setOutput('head_repo_owner', pr.head.repo.owner.login);
core.setOutput('base_repo_owner', pr.base.repo.owner.login);
core.setOutput('head_repo_name', pr.head.repo.name);
core.setOutput('base_repo_name', pr.base.repo.name);
core.setOutput('head_ref', pr.head.ref);
core.setOutput('base_ref', pr.base.ref);
core.setOutput('head_sha', pr.head.sha);
core.setOutput('base_sha', pr.base.sha);
core.setOutput('merge_commit_sha', pr.merge_commit_sha);
core.setOutput('pr', pr);
core.setOutput('head_commit_date', head_commit.commit.committer.date);
core.setOutput('merge_commit_date', merge_commit.commit.committer.date);
core.setOutput('files', files);
console.log('PR head commit:', {
head_commit: head_commit,
commit: head_commit.commit,
date: head_commit.commit.committer.date
});
console.log('PR merge commit:', {
merge_commit: merge_commit,
commit: merge_commit.commit,
date: merge_commit.commit.committer.date
});
- name: Convert dates to timestamps
id: get_timestamps
run: |
head_commit_date=${{ steps.pr_info.outputs.head_commit_date }}
merge_commit_date=${{ steps.pr_info.outputs.merge_commit_date }}
echo $head_commit_date
echo $merge_commit_date
head_commit_timestamp=$(date -d "$head_commit_date" +%s)
merge_commit_timestamp=$(date -d "$merge_commit_date" +%s)
echo $head_commit_timestamp
echo $merge_commit_timestamp
echo "head_commit_timestamp=$head_commit_timestamp" >> $GITHUB_OUTPUT
echo "merge_commit_timestamp=$merge_commit_timestamp" >> $GITHUB_OUTPUT

36
.github/workflows/get-pr-number.yml vendored Normal file
View File

@@ -0,0 +1,36 @@
name: Get PR number
on:
workflow_call:
outputs:
PR_NUMBER:
description: "The extracted PR number"
value: ${{ jobs.get-pr-number.outputs.PR_NUMBER }}
jobs:
get-pr-number:
runs-on: ubuntu-22.04
name: Get PR number
outputs:
PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}
steps:
- name: Get PR number
shell: bash
run: |
if [[ "${{ github.event.issue.number }}" != "" && "${{ github.event.issue.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.issue.number }}" >> $GITHUB_ENV
elif [[ "${{ github.event.pull_request.number }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.pull_request.number }}" >> $GITHUB_ENV
elif [[ "${{ github.event.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.number }}" >> $GITHUB_ENV
else
echo "PR_NUMBER=" >> $GITHUB_ENV
fi
- name: Check PR number
shell: bash
run: |
echo "${{ env.PR_NUMBER }}"
- name: Set PR number
id: set_pr_number
run: echo "PR_NUMBER=${{ env.PR_NUMBER }}" >> "$GITHUB_OUTPUT"

View File

@@ -107,9 +107,9 @@ jobs:
run: |
echo "${{ inputs.machine_type }}"
if [ "${{ inputs.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ inputs.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ inputs.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ inputs.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ inputs.machine_type }}

199
.github/workflows/pr_run_slow_ci.yml vendored Normal file
View File

@@ -0,0 +1,199 @@
name: PR slow CI
on:
pull_request_target:
types: [opened, synchronize, reopened]
jobs:
get-pr-number:
name: Get PR number
uses: ./.github/workflows/get-pr-number.yml
get-pr-info:
name: Get PR commit SHA
needs: get-pr-number
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
uses: ./.github/workflows/get-pr-info.yml
with:
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
# We only need to verify the timestamp if the workflow is triggered by `issue_comment`.
verity_pr_commit:
name: Verity PR commit corresponds to a specific event by comparing timestamps
if: ${{ github.event.comment.created_at != '' }}
runs-on: ubuntu-22.04
needs: get-pr-info
env:
COMMENT_DATE: ${{ github.event.comment.created_at }}
PR_MERGE_COMMIT_DATE: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_DATE }}
PR_MERGE_COMMIT_TIMESTAMP: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
steps:
- run: |
COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")
echo "COMMENT_DATE: $COMMENT_DATE"
echo "PR_MERGE_COMMIT_DATE: $PR_MERGE_COMMIT_DATE"
echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"
echo "PR_MERGE_COMMIT_TIMESTAMP: $PR_MERGE_COMMIT_TIMESTAMP"
if [ $COMMENT_TIMESTAMP -le $PR_MERGE_COMMIT_TIMESTAMP ]; then
echo "Last commit on the pull request is newer than the issue comment triggering this run! Abort!";
exit -1;
fi
get-jobs:
name: Get test files to run
runs-on: ubuntu-22.04
needs: [get-pr-number, get-pr-info]
outputs:
jobs: ${{ steps.get_jobs.outputs.jobs_to_run }}
steps:
- name: Get repository content
id: repo_content
uses: actions/github-script@v6
with:
script: |
const { data: tests_dir } = await github.rest.repos.getContent({
owner: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}',
repo: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}',
path: 'tests',
ref: '${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}',
});
const { data: tests_models_dir } = await github.rest.repos.getContent({
owner: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}',
repo: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}',
path: 'tests/models',
ref: '${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}',
});
const { data: tests_quantization_dir } = await github.rest.repos.getContent({
owner: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}',
repo: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}',
path: 'tests/quantization',
ref: '${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}',
});
core.setOutput('tests_dir', tests_dir);
core.setOutput('tests_models_dir', tests_models_dir);
core.setOutput('tests_quantization_dir', tests_quantization_dir);
# This checkout to the main branch
- uses: actions/checkout@v4
with:
fetch-depth: "0"
- name: Write pr_files file
run: |
cat > pr_files.txt << 'EOF'
${{ needs.get-pr-info.outputs.PR_FILES }}
EOF
- name: Write tests_dir file
run: |
cat > tests_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_dir }}
EOF
- name: Write tests_models_dir file
run: |
cat > tests_models_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_models_dir }}
EOF
- name: Write tests_quantization_dir file
run: |
cat > tests_quantization_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_quantization_dir }}
EOF
- name: Run script to get jobs to run
id: get_jobs
run: |
python utils/get_pr_run_slow_jobs.py | tee output.txt
echo "jobs_to_run: $(tail -n 1 output.txt)"
echo "jobs_to_run=$(tail -n 1 output.txt)" >> $GITHUB_OUTPUT
send_comment:
# Will delete the previous comment and send a new one if:
# - either the content is changed
# - or the previous comment is 30 minutes or more old
name: Send a comment to suggest jobs to run
if: ${{ needs.get-jobs.outputs.jobs != '' }}
needs: [get-pr-number, get-jobs]
permissions:
pull-requests: write
runs-on: ubuntu-22.04
steps:
- name: Check and update comment if needed
uses: actions/github-script@v7
env:
BODY: "\n\nrun-slow: ${{ needs.get-jobs.outputs.jobs }}"
with:
script: |
const prNumber = ${{ needs.get-pr-number.outputs.PR_NUMBER }};
const commentPrefix = "**[For maintainers]** Suggested jobs to run (before merge)";
const thirtyMinutesAgo = new Date(Date.now() - 30 * 60 * 1000); // 30 minutes ago
const newBody = `${commentPrefix}${process.env.BODY}`;
// Get all comments on the PR
const { data: comments } = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber
});
// Find existing comments that start with our prefix
const existingComments = comments.filter(comment =>
comment.user.login === 'github-actions[bot]' &&
comment.body.startsWith(commentPrefix)
);
let shouldCreateNewComment = true;
let commentsToDelete = [];
if (existingComments.length > 0) {
// Get the most recent comment
const mostRecentComment = existingComments
.sort((a, b) => new Date(b.created_at) - new Date(a.created_at))[0];
const commentDate = new Date(mostRecentComment.created_at);
const isOld = commentDate < thirtyMinutesAgo;
const isDifferentContent = mostRecentComment.body !== newBody;
console.log(`Most recent comment created: ${mostRecentComment.created_at}`);
console.log(`Is older than 30 minutes: ${isOld}`);
console.log(`Has different content: ${isDifferentContent}`);
if (isOld || isDifferentContent) {
// Delete all existing comments and create new one
commentsToDelete = existingComments;
console.log(`Will delete ${commentsToDelete.length} existing comment(s) and create new one`);
} else {
// Content is same and comment is recent, skip
shouldCreateNewComment = false;
console.log('Comment is recent and content unchanged, skipping update');
}
} else {
console.log('No existing comments found, will create new one');
}
// Delete old comments if needed
for (const comment of commentsToDelete) {
console.log(`Deleting comment #${comment.id} (created: ${comment.created_at})`);
await github.rest.issues.deleteComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: comment.id
});
}
// Create new comment if needed
if (shouldCreateNewComment) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: newBody
});
console.log('✅ New comment created');
} else {
console.log(' No comment update needed');
}

View File

@@ -29,7 +29,7 @@ jobs:
runs-on: ubuntu-22.04
name: Get PR number
# For security: only allow team members to run
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "qubvel", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "qubvel", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad", "stevhliu"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
outputs:
PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}
steps:
@@ -185,7 +185,7 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.get-tests.outputs.models) }}
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@@ -239,9 +239,9 @@ jobs:
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@@ -292,7 +292,7 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.get-tests.outputs.quantizations) }}
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@@ -338,9 +338,9 @@ jobs:
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}

View File

@@ -31,7 +31,7 @@ jobs:
name: Setup
strategy:
matrix:
machine_type: [aws-g4dn-2xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@@ -131,7 +131,7 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [aws-g4dn-2xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@@ -169,9 +169,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@@ -244,7 +244,7 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@@ -282,9 +282,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@@ -357,7 +357,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-2xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@@ -395,9 +395,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@@ -467,7 +467,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@@ -505,9 +505,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}

View File

@@ -84,8 +84,6 @@ jobs:
machine_type: ${{ matrix.machine_type }}
folder_slices: ${{ needs.setup.outputs.folder_slices }}
runner: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
report_name_prefix: run_models_gpu
secrets: inherit
run_trainer_and_fsdp_gpu:
@@ -104,11 +102,10 @@ jobs:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
runner: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
report_name_prefix: run_trainer_and_fsdp_gpu
secrets: inherit
run_pipelines_gpu:
if: ${{ inputs.job == 'run_pipelines_gpu' }}
run_pipelines_torch_gpu:
if: ${{ inputs.job == 'run_pipelines_torch_gpu' }}
name: Pipelines
strategy:
fail-fast: false
@@ -161,20 +158,20 @@ jobs:
- name: Run all pipeline tests on Intel Gaudi
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_pipelines_gpu_test_reports tests/pipelines -m "not not_device_test"
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports tests/pipelines -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: |
cat reports/${{ env.machine_type }}_run_pipelines_gpu_test_reports/failures_short.txt
cat reports/${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_pipelines_gpu_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_pipelines_gpu_test_reports
path: reports/${{ env.machine_type }}_run_pipelines_gpu_test_reports
name: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
path: reports/${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
run_examples_gpu:
if: ${{ inputs.job == 'run_examples_gpu' }}
@@ -248,8 +245,8 @@ jobs:
name: ${{ env.machine_type }}_run_examples_gpu_test_reports
path: reports/${{ env.machine_type }}_run_examples_gpu_test_reports
run_deepspeed_gpu:
if: ${{ inputs.job == 'run_deepspeed_gpu' }}
run_torch_cuda_extensions_gpu:
if: ${{ inputs.job == 'run_torch_cuda_extensions_gpu' }}
name: Intel Gaudi deepspeed tests
strategy:
fail-fast: false
@@ -305,20 +302,20 @@ jobs:
- name: Run all deepspeed tests on intel Gaudi
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_deepspeed_gpu_test_reports tests/deepspeed -m "not not_device_test"
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: |
cat reports/${{ env.machine_type }}_run_deepspeed_gpu_test_reports/failures_short.txt
cat reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_deepspeed_gpu_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_deepspeed_gpu_test_reports
path: reports/${{ env.machine_type }}_run_deepspeed_gpu_test_reports
name: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
path: reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
send_results:
name: Slack Report
@@ -327,8 +324,8 @@ jobs:
setup,
run_models_gpu,
run_examples_gpu,
run_pipelines_gpu,
run_deepspeed_gpu,
run_torch_cuda_extensions_gpu,
run_pipelines_torch_gpu,
run_trainer_and_fsdp_gpu,
]
if: ${{ always() }}

View File

@@ -23,7 +23,7 @@ jobs:
name: Pipeline CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_pipelines_gpu
job: run_pipelines_torch_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
@@ -47,7 +47,7 @@ jobs:
name: DeepSpeed CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_deepspeed_gpu
job: run_torch_cuda_extensions_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"

View File

@@ -50,7 +50,7 @@ jobs:
name: Setup
strategy:
matrix:
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@@ -128,13 +128,14 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
slice_id: [0, 1]
uses: ./.github/workflows/model_jobs.yml
with:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
machine_type: ${{ matrix.machine_type }}
slice_id: ${{ matrix.slice_id }}
runner_map: ${{ needs.setup.outputs.runner_map }}
docker: ${{ inputs.docker }}
report_name_prefix: run_trainer_and_fsdp_gpu
secrets: inherit
@@ -145,7 +146,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@@ -179,9 +180,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@@ -213,7 +214,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-4xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@@ -247,9 +248,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@@ -282,7 +283,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@@ -344,9 +345,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@@ -381,7 +382,7 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.quantization_matrix) }}
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@@ -424,9 +425,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}

3
.gitignore vendored
View File

@@ -167,3 +167,6 @@ tags
# ruff
.ruff_cache
# modular conversion
*.modular_backup

View File

@@ -288,7 +288,7 @@ Keywords: Music understanding, Music generation
## [dalle-flow](https://github.com/jina-ai/dalle-flow)
DALL·E Flow is an interactive workflow for generating high-definition images from a text prompt. Itt leverages DALL·E-Mega, GLID-3 XL, and Stable Diffusion to generate image candidates, and then calls CLIP-as-service to rank the candidates w.r.t. the prompt.
DALL·E Flow is an interactive workflow for generating high-definition images from a text prompt. It leverages DALL·E-Mega, GLID-3 XL, and Stable Diffusion to generate image candidates, and then calls CLIP-as-service to rank the candidates w.r.t. the prompt.
The preferred candidate is fed to GLID-3 XL for diffusion, which often enriches the texture and background. Finally, the candidate is upscaled to 1024x1024 via SwinIR.
Keywords: High-definition image generation, Stable Diffusion, DALL-E Mega, GLID-3 XL, CLIP, SwinIR
@@ -526,7 +526,7 @@ Keywords: Model deployment, CLoud, Mobile, Edge
## [underthesea](https://github.com/undertheseanlp/underthesea)
[underthesea](https://github.com/undertheseanlp/underthesea) is a Vietnamese NLP toolkit. Underthesea is a suite of open source Python modules data sets and tutorials supporting research and development in Vietnamese Natural Language Processing. We provides extremely easy API to quickly apply pretrained NLP models to your Vietnamese text, such as word segmentation, part-of-speech tagging (PoS), named entity recognition (NER), text classification and dependency parsing.
[underthesea](https://github.com/undertheseanlp/underthesea) is a Vietnamese NLP toolkit. Underthesea is a suite of open source Python modules data sets and tutorials supporting research and development in Vietnamese Natural Language Processing. We provide extremely easy API to quickly apply pretrained NLP models to your Vietnamese text, such as word segmentation, part-of-speech tagging (PoS), named entity recognition (NER), text classification and dependency parsing.
Keywords: Vietnamese, NLP

View File

@@ -28,6 +28,7 @@ from transformers.testing_utils import HfDoctestModule, HfDocTestParser
NOT_DEVICE_TESTS = {
"test_tokenization",
"test_tokenization_mistral_common",
"test_processor",
"test_processing",
"test_beam_constraints",

View File

@@ -2,10 +2,10 @@ FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git ffmpeg
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing]" seqeval albumentations jiwer
RUN uv pip uninstall transformers

View File

@@ -2,10 +2,10 @@ FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git pkg-config openssh-client git
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git pkg-config openssh-client git ffmpeg
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing]"
RUN uv pip uninstall transformers

View File

@@ -2,10 +2,10 @@ FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git git-lfs
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git git-lfs ffmpeg
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing,tiktoken,num2words,video]"
RUN uv pip uninstall transformers

View File

@@ -26,10 +26,12 @@ RUN git clone https://github.com/huggingface/transformers && cd transformers &&
# 1. Put several commands in a single `RUN` to avoid image/layer exporting issue. Could be revised in the future.
# 2. Regarding `torch` part, We might need to specify proper versions for `torchvision` and `torchaudio`.
# Currently, let's not bother to specify their versions explicitly (so installed with their latest release versions).
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA && python3 -m pip uninstall -y tensorflow tensorflow_text tensorflow_probability
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA && python3 -m pip uninstall -y tensorflow tensorflow_text tensorflow_probability
RUN python3 -m pip uninstall -y flax jax
RUN python3 -m pip install --no-cache-dir -U timm
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract
RUN python3 -m pip install -U "itsdangerous<2.1.0"

View File

@@ -21,7 +21,7 @@ RUN python3 -m pip install --no-cache-dir './transformers[deepspeed-testing]' 'p
# Install latest release PyTorch
# (PyTorch must be installed before pre-compiling any DeepSpeed c++/cuda ops.)
# (https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops)
RUN python3 -m pip uninstall -y torch torchvision torchaudio && python3 -m pip install --no-cache-dir -U torch==$PYTORCH torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip uninstall -y torch torchvision torchaudio && python3 -m pip install --no-cache-dir -U torch==$PYTORCH torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate

View File

@@ -19,7 +19,7 @@ RUN python3 -m pip uninstall -y torch torchvision torchaudio
# Install **nightly** release PyTorch (flag `--pre`)
# (PyTorch must be installed before pre-compiling any DeepSpeed c++/cuda ops.)
# (https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops)
RUN python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
RUN python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
# `datasets` requires pandas, pandas has some modules compiled with numpy=1.x causing errors
RUN python3 -m pip install --no-cache-dir './transformers[deepspeed-testing]' 'pandas<2' 'numpy<2'

View File

@@ -26,7 +26,7 @@ RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch';
RUN echo torch=$VERSION
# `torchvision` and `torchaudio` should be installed along with `torch`, especially for nightly build.
# Currently, let's just use their latest releases (when `torch` is installed with a release version)
RUN python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
@@ -93,6 +93,9 @@ RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch]
# `kernels` may give different outputs (within 1e-5 range) even with the same model (weights) and the same inputs
RUN python3 -m pip uninstall -y kernels
# Uninstall flash-attn installed by autoawq, it causes issues here : https://github.com/huggingface/transformers/actions/runs/15915442841/job/44892146131
RUN python3 -m pip uninstall -y flash-attn
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@@ -17,12 +17,12 @@
title: Customizing model components
- local: model_sharing
title: Sharing
- local: add_new_model
title: Adding a new model to Transformers
- local: modular_transformers
title: Modular Transformers
title: Contributing a new model to Transformers
- local: add_new_model
title: Legacy model contribution
- local: auto_docstring
title: Document your models
title: Documenting a model
- local: attention_interface
title: Customizing attention function
title: Models
@@ -97,11 +97,9 @@
- local: perf_infer_gpu_one
title: GPU
- local: perf_infer_gpu_multi
title: Distributed GPU inference
title: Distributed inference
- local: perf_infer_cpu
title: CPU
- local: tf_xla
title: XLA
title: Optimization
- local: agents
title: Agents
@@ -141,8 +139,6 @@
title: GPU
- local: perf_train_cpu
title: CPU
- local: perf_train_tpu_tf
title: TPU
- local: perf_train_special
title: Apple Silicon
- local: perf_train_gaudi
@@ -433,6 +429,8 @@
title: DiffLlama
- local: model_doc/distilbert
title: DistilBERT
- local: model_doc/doge
title: Doge
- local: model_doc/dots1
title: dots1
- local: model_doc/dpr
@@ -519,6 +517,8 @@
title: Jukebox
- local: model_doc/led
title: LED
- local: model_doc/lfm2
title: LFM2
- local: model_doc/llama
title: LLaMA
- local: model_doc/llama2
@@ -563,6 +563,8 @@
title: MobileBERT
- local: model_doc/modernbert
title: ModernBert
- local: model_doc/modernbert-decoder
title: ModernBERTDecoder
- local: model_doc/mpnet
title: MPNet
- local: model_doc/mpt
@@ -693,6 +695,8 @@
title: Zamba2
title: Text models
- sections:
- local: model_doc/aimv2
title: Aimv2
- local: model_doc/beit
title: BEiT
- local: model_doc/bit
@@ -709,6 +713,8 @@
title: D-FINE
- local: model_doc/dab-detr
title: DAB-DETR
- local: model_doc/deepseek_v2
title: DeepSeek-V2
- local: model_doc/deformable_detr
title: Deformable DETR
- local: model_doc/deit
@@ -737,6 +743,8 @@
title: EfficientFormer
- local: model_doc/efficientnet
title: EfficientNet
- local: model_doc/eomt
title: EoMT
- local: model_doc/focalnet
title: FocalNet
- local: model_doc/glpn
@@ -1033,6 +1041,8 @@
title: PaliGemma
- local: model_doc/perceiver
title: Perceiver
- local: model_doc/perception_lm
title: PerceptionLM
- local: model_doc/phi4_multimodal
title: Phi4 Multimodal
- local: model_doc/pix2struct
@@ -1085,6 +1095,8 @@
title: Vision Text Dual Encoder
- local: model_doc/visual_bert
title: VisualBERT
- local: model_doc/voxtral
title: Voxtral
- local: model_doc/xclip
title: X-CLIP
title: Multimodal models
@@ -1142,4 +1154,3 @@
title: Environment Variables
title: Reference
title: API

View File

@@ -13,7 +13,7 @@ rendered properly in your Markdown viewer.
-->
# Adding a new model to Transformers
# Legacy model contribution
> [!TIP]
> Try adding new models with a more [modular](./modular_transformers) approach first. This makes it significantly easier to contribute a model to Transformers!

View File

@@ -14,5 +14,9 @@ rendered properly in your Markdown viewer.
-->
# Agents
(deprecated)
> [!WARNING]
> Agents and tools were spun out into the standalone [smolagents](https://huggingface.co/docs/smolagents/index) library. They were removed from `transformers` in v4.52.

View File

@@ -60,11 +60,11 @@ You will see it prints "I just entered the attention computation" as many times
## Dynamically switching attention function
You could dynamically change the model's attention function as well, by overriding the `config._attn_implementation` field:
You could dynamically change the model's attention function as well:
```python
# Back to use original sdpa implementation
model.config._attn_implementation = "sdpa"
model.set_attn_implementation("sdpa")
model(torch.ones(1, 5, dtype=int))
```

View File

@@ -14,43 +14,26 @@ rendered properly in your Markdown viewer.
-->
# Utilizing the @auto_docstring Decorator
# Documenting a model
The `@auto_docstring` decorator in the Hugging Face Transformers library helps generate docstrings for model classes and their methods, which will be used to build the documentation for the library. It aims to improve consistency and reduce boilerplate by automatically including standard argument descriptions and allowing for targeted overrides and additions.
The `@auto_docstring` decorator in Transformers generates consistent docstrings for model classes and their methods. It reduces boilerplate by automatically including standard argument descriptions while also allowing overrides to add new or custom arguments. [Contributing a new model](./modular_transformers) is easier because you don't need to manually add the standard docstrings, and only focus on documenting new arguments.
---
This guide describes how to use the `@auto_docstring` decorator and how it works.
## 📜 How it Works
## @auto_docstring
The `@auto_docstring` decorator constructs docstrings by:
1. **Signature Inspection:** It inspects the signature (arguments, types, defaults) of the decorated class's `__init__` method or the decorated function.
2. **Centralized Docstring Fetching:** It retrieves predefined docstrings for common arguments (e.g., `input_ids`, `attention_mask`) from internal library sources (like `ModelArgs` or `ImageProcessorArgs` in `utils/args_doc.py`).
3. **Overriding or Adding Arguments Descriptions:**
* **Direct Docstring Block:** It incorporates custom docstring content from an `r""" """` (or `""" """`) block below the method signature or within the `__init__` docstring. This is for documenting new arguments or overriding standard descriptions.
* **Decorator Arguments (`custom_args`):** A `custom_args` docstring block can be passed to the decorator to provide docstrings for specific arguments directly in the decorator call. This can be used to define the docstring block for new arguments once if they are repeated in multiple places in the modeling file.
4. **Adding Classes and Functions Introduction:**
* **`custom_intro` argument:** Allows prepending a custom introductory paragraph to a class or function docstring.
* **Automatic Introduction Generation:** For model classes with standard naming patterns (like `ModelForCausalLM`) or belonging to a pipeline, the decorator automatically generates an appropriate introductory paragraph using `ClassDocstring` in `utils/args_doc.py` as the source.
5. **Templating:** The decorator uses a templating system, allowing predefined docstrings to include dynamic information deduced from the `auto_modules` of the library, such as `{{processor_class}}` or `{{config_class}}`.
6. **Deducing Relevant Examples:** The decorator attempts to find appropriate usage examples based on the model's task or pipeline compatibility. It extracts checkpoint information from the model's configuration class to provide concrete examples with real model identifiers.
7. **Adding Return Value Documentation:** For methods like `forward`, the decorator can automatically generate the "Returns" section based on the method's return type annotation. For example, for a method returning a `ModelOutput` subclass, it will extracts field descriptions from that class's docstring to create a comprehensive return value description. A custom `Returns` section can also be manually specified in the function docstring block.
8. **Unrolling Kwargs Typed With Unpack Operator:** For specific methods (defined in `UNROLL_KWARGS_METHODS`) or classes (defined in `UNROLL_KWARGS_CLASSES`), the decorator processes `**kwargs` parameters that are typed with `Unpack[KwargsTypedDict]`. It extracts the documentation from the TypedDict and adds each parameter to the function's docstring. Currently, this functionality is only supported for `FastImageProcessorKwargs`.
---
## 🚀 How to Use @auto_docstring
### 1. Importing the Decorator
Import the decorator into your modeling file:
Start by importing the decorator in the modeling file (`modular_model.py` or `modeling_model.py`).
```python
from ...utils import auto_docstring
```
### 2. Applying to Classes
Place `@auto_docstring` directly above the class definition. It uses the `__init__` method's signature and its docstring for parameter descriptions.
Select whether you'd like to apply `@auto_docstring` to a class or function below to see how to use it.
<hfoptions id="type">
<hfoption id="classes">
Place `@auto_docstring` directly above the class definition. The decorator derives parameter descriptions from the `__init__` method's signature and docstring.
```python
from transformers.modeling_utils import PreTrainedModel
@@ -73,9 +56,7 @@ class MyAwesomeModel(PreTrainedModel):
# ... other methods
```
#### Advanced Class Decoration:
Arguments can be passed directly to `@auto_docstring` for more control:
Arguments can also be passed directly to `@auto_docstring` for more control. Use the `custom_intro` parameter to describe the argument and the `custom_args` parameter to describe the arguments.
```python
@auto_docstring(
@@ -83,9 +64,9 @@ Arguments can be passed directly to `@auto_docstring` for more control:
It builds upon the standard Transformer architecture with unique modifications.""",
custom_args="""
custom_parameter (`type`, *optional*, defaults to `default_value`):
A concise description for custom_parameter if not defined or overriding the description in `args_doc.py`.
A concise description for custom_parameter if not defined or overriding the description in `auto_docstring.py`.
internal_helper_arg (`type`, *optional*, defaults to `default_value`):
A concise description for internal_helper_arg if not defined or overriding the description in `args_doc.py`.
A concise description for internal_helper_arg if not defined or overriding the description in `auto_docstring.py`.
"""
)
class MySpecialModel(PreTrainedModel):
@@ -93,7 +74,7 @@ class MySpecialModel(PreTrainedModel):
# ...
```
Or:
You can also choose to only use `custom_intro` and define the custom arguments directly in the class.
```python
@auto_docstring(
@@ -104,15 +85,44 @@ class MySpecialModel(PreTrainedModel):
def __init__(self, config: ConfigType, custom_parameter: "type" = "default_value", internal_helper_arg=None):
r"""
custom_parameter (`type`, *optional*, defaults to `default_value`):
A concise description for custom_parameter if not defined or overriding the description in `args_doc.py`.
A concise description for custom_parameter if not defined or overriding the description in `auto_docstring.py`.
internal_helper_arg (`type`, *optional*, defaults to `default_value`):
A concise description for internal_helper_arg if not defined or overriding the description in `args_doc.py`.
A concise description for internal_helper_arg if not defined or overriding the description in `auto_docstring.py`.
"""
# ...
```
### 3. Applying to Functions (e.g., `forward` method)
Apply the decorator above method definitions, such as the `forward` method.
You should also use the `@auto_docstring` decorator for classes that inherit from [`~utils.ModelOutput`].
```python
@dataclass
@auto_docstring(
custom_intro="""
Custom model outputs with additional fields.
"""
)
class MyModelOutput(ImageClassifierOutput):
r"""
loss (`torch.FloatTensor`, *optional*):
The loss of the model.
custom_field (`torch.FloatTensor` of shape `(batch_size, hidden_size)`, *optional*):
A custom output field specific to this model.
"""
# Standard fields like hidden_states, logits, attentions etc. can be automatically documented if the description is the same as the standard arguments.
# However, given that the loss docstring is often different per model, you should document it in the docstring above.
loss: Optional[torch.FloatTensor] = None
logits: Optional[torch.FloatTensor] = None
hidden_states: Optional[tuple[torch.FloatTensor, ...]] = None
attentions: Optional[tuple[torch.FloatTensor, ...]] = None
# Custom fields need to be documented in the docstring above
custom_field: Optional[torch.FloatTensor] = None
```
</hfoption>
<hfoption id="functions">
Place `@auto_docstring` directly above the method definition. The decorator derives parameter descriptions from the function signature.
```python
@auto_docstring
@@ -131,9 +141,10 @@ Apply the decorator above method definitions, such as the `forward` method.
# ...
```
#### Advanced Function Decoration:
Arguments can also be passed directly to `@auto_docstring` for more control. Use the `custom_intro` parameter to describe the argument and the `custom_args` parameter to describe the arguments.
The `Returns` and `Examples` parts of the docstring can also be manually specified.
Arguments can be passed directly to `@auto_docstring` for more control. `Returns` and `Examples` sections can also be manually specified:
```python
MODEL_COMMON_CUSTOM_ARGS = r"""
@@ -180,100 +191,117 @@ class MyModel(PreTrainedModel):
# ...
```
---
</hfoption>
</hfoptions>
### ✍️ Documenting Arguments: Approach & Priority
## Documenting arguments
1. **Standard Arguments (e.g., `input_ids`, `attention_mask`, `pixel_values`, `encoder_hidden_states` etc.):**
* `@auto_docstring` retrieves descriptions from a central source. Do not redefine these locally if their description and shape are the same as in `args_doc.py`.
There are some rules for documenting different types of arguments and they're listed below.
- Standard arguments (`input_ids`, `attention_mask`, `pixel_values`, etc.) are defined and retrieved from `auto_docstring.py`. It is the single source of truth for standard arguments and should not be redefined locally if an argument's description and shape is the same as an argument in `auto_docstring.py`.
If a standard argument behaves differently in your model, then you can override it locally in a `r""" """` block. This local definition has a higher priority. For example, the `labels` argument is often customized per model and typically requires overriding.
- New or custom arguments should be documented within an `r""" """` block after the signature if it is a function or in the `__init__` method's docstring if it is a class.
```py
argument_name (`type`, *optional*, defaults to `X`):
Description of the argument.
Explain its purpose, expected shape/type if complex, and default behavior.
This can span multiple lines.
```
2. **New or Custom Arguments:**
* **Primary Method:** Document these within an `r""" """` docstring block following the signature (for functions) or in the `__init__` method's docstring (for class parameters).
* **Format:**
```
argument_name (`type`, *optional*, defaults to `X`):
Description of the argument.
Explain its purpose, expected shape/type if complex, and default behavior.
This can span multiple lines.
```
* Include `type` in backticks.
* Add "*optional*" if the argument is not required (has a default value).
* Add "defaults to `X`" if it has a default value (no need to specify "defaults to `None`" if the default value is `None`).
* Add *optional* if the argument is not required or has a default value.
* Add "defaults to X" if it has a default value. You don't need to add "defaults to `None`" if the default value is `None`.
3. **Overriding Standard Arguments:**
* If a standard argument behaves differently (e.g., different expected shape, model-specific behavior), provide its complete description in the local `r""" """` docstring. This local definition takes precedence.
* The `labels` argument is often customized per model and typically requires a specific docstring.
These arguments can also be passed to `@auto_docstring` as a `custom_args` argument. It is used to define the docstring block for new arguments once if they are repeated in multiple places in the modeling file.
4. **Using Decorator Arguments for Overrides or New Arguments (`custom_args`):**
* New or custom arguments docstrings can also be passed to `@auto_docstring` as a `custom_args` argument. This can be used to define the docstring block for new arguments once if they are repeated in multiple places in the modeling file.
```py
class MyModel(PreTrainedModel):
# ...
@auto_docstring(
custom_intro="""
This is a custom introduction for the function.
"""
custom_args=r"""
common_arg_1 (`torch.Tensor`, *optional*, defaults to `default_value`):
Description of common_arg_1
"""
)
```
---
## Checking the docstrings
### Usage with [modular files](./modular_transformers)
Transformers includes a utility script to validate the docstrings when you open a Pull Request which triggers CI (continuous integration) checks. The script checks for the following criteria.
When working with modular files, follow these guidelines for applying the `@auto_docstring` decorator:
* Ensures `@auto_docstring` is applied to relevant mode classes and public methods.
* Ensures arguments are complete and consistent. It checks that documented arguments exist in the signature and verifies whether the types and default values in the docstring match the signature. Arguments that aren't known standard arguments or if they lack a local description are flagged.
* Reminds you to complete placeholders like `<fill_type>` and `<fill_docstring>`.
* Ensures docstrings are formatted according to the expected docstring style.
- **For standalone models in modular files:**
Apply the `@auto_docstring` decorator just as you would in regular modeling files.
- **For models inheriting from other library models:**
- When inheriting from a parent model, decorators (including `@auto_docstring`) are automatically carried over to the generated modeling file without needing to add them in your modular file.
- If you need to modify the `@auto_docstring` behavior, apply the customized decorator in your modular file, making sure to *include all other decorators* that were present on the original function/class.
> **Warning**: When overriding any decorator in a modular file, you must include ALL decorators that were applied to that function/class in the parent model. If you only override some decorators, the others won't be included in the generated modeling file.
**Note**: The `check_auto_docstrings` tool doesn't check modular files directly, but it will check (and modify when using `--fix_and_overwrite`) the generated modeling files. If issues are found in the generated files, you'll need to update your modular files accordingly.
---
## ✅ Checking Your Docstrings with `check_auto_docstrings`
The library includes a utility script to validate docstrings. This check is typically run during Continuous Integration (CI).
#### What it Checks:
* **Decorator Presence:** Ensures `@auto_docstring` is applied to relevant model classes and public methods. (TODO)
* **Argument Completeness & Consistency:**
* Flags arguments in the signature that are not known standard arguments and lack a local description.
* Ensures documented arguments exist in the signature. (TODO)
* Verifies that types and default values in the docstring match the signature. (TODO)
* **Placeholder Detection:** Reminds you to complete placeholders like `<fill_type>` or `<fill_docstring>`.
* **Formatting:** Adherence to the expected docstring style.
#### Running the Check Locally:
Run this check locally before committing. The common command is:
You can run this check locally - before committing - by running the following command.
```bash
make fix-copies
```
Alternatively, to only perform docstrings and auto-docstring checks, you can use:
`make fix-copies` runs several other checks as well. If you don't need those checks, run the command below to only perform docstring and auto-docstring checks.
```bash
python utils/check_docstrings.py # to only check files included in the diff without fixing them
# Or: python utils/check_docstrings.py --fix_and_overwrite # to fix and overwrite the files in the diff
# Or: python utils/check_docstrings.py --fix_and_overwrite --check_all # to fix and overwrite all files
# python utils/check_docstrings.py --fix_and_overwrite # to fix and overwrite the files in the diff
# python utils/check_docstrings.py --fix_and_overwrite --check_all # to fix and overwrite all files
```
#### Workflow with the Checker:
## modular_model.py files
1. Add `@auto_docstring(...)` to the class or method.
2. For new, custom, or overridden arguments, add descriptions in an `r""" """` block.
3. Run `make fix-copies` (or the `check_docstrings.py` utility).
* For unrecognized arguments lacking documentation, the utility will create placeholder entries.
4. Manually edit these placeholders with accurate types and descriptions.
5. Re-run the check to ensure all issues are resolved.
When working with modular files (`modular_model.py`), follow the guidelines below for applying `@auto_docstring`.
---
- For standalone models in modular files, apply `@auto_docstring` like you would in a `modeling_model.py` file.
- For models that inherit from other library models, `@auto_docstring` is automatically carried over to the generated modeling file. You don't need to add `@auto_docstring` in your modular file.
## 🔑 Key Takeaways & Best Practices
If you need to modify the `@auto_docstring` behavior, apply the customized decorator in your modular file. Make sure to **include all other decorators** that are present in the original function or class.
* Use `@auto_docstring` for new PyTorch model classes (`PreTrainedModel` subclasses) and their primary for methods (e.g., `forward`, `get_text_features` etc.).
* For classes, the `__init__` method's docstring is the main source for parameter descriptions when using `@auto_docstring` on the class.
* Rely on standard docstrings; do not redefine common arguments unless their behavior is different in your specific model.
> [!WARNING]
> When overriding any decorator in a modular file, you must include **all** decorators that were applied to that function or class in the parent model. If you only override some decorators, the others won't be included in the generated modeling file.
## How it works
The `@auto_docstring` decorator automatically generates docstrings by:
1. Inspecting the signature (arguments, types, defaults) of the decorated class' `__init__` method or the decorated function.
2. Retrieving the predefined docstrings for common arguments (`input_ids`, `attention_mask`, etc.) from internal library sources like [`ModelArgs`], [`ImageProcessorArgs`], and the `auto_docstring.py` file.
3. Adding argument descriptions in one of two ways as shown below.
| method | description | usage |
|---|---|---|
| `r""" """` | add custom docstring content directly to a method signature or within the `__init__` docstring | document new arguments or override standard descriptions |
| `custom_args` | add custom docstrings for specific arguments directly in `@auto_docstring` | define docstring for new arguments once if they're repeated in multiple places in the modeling file |
4. Adding class and function descriptions. For model classes with standard naming patterns, like `ModelForCausalLM`, or if it belongs to a pipeline, `@auto_docstring` automatically generates the appropriate descriptions with `ClassDocstring` from `auto_docstring.py`.
`@auto_docstring` also accepts the `custom_intro` argument to describe a class or function.
5. Using a templating system to allow predefined docstrings to include dynamic information from Transformers' [auto_modules](https://github.com/huggingface/transformers/tree/main/src/transformers/models/auto) such as `{{processor_class}}` and `{{config_class}}`.
6. Finding appropriate usage examples based on the model's task or pipeline compatibility. It extracts checkpoint information form the model's configuration class to provide concrete examples with real model identifiers.
7. Adding return values to the docstring. For methods like `forward`, the decorator automatically generates the `Returns` field in the docstring based on the method's return type annotation.
For example, if a method returns a [`~transformers.utils.ModelOutput`] subclass, `@auto_docstring` extracts the field descriptions from the class' docstring to create a comprehensive return value description. You can also manually specifiy a custom `Returns` field in a functions docstring.
8. Unrolling kwargs typed with the unpack operator. For specific methods (defined in `UNROLL_KWARGS_METHODS`) or classes (defined in `UNROLL_KWARGS_CLASSES`), the decorator processes `**kwargs` parameters that are typed with `Unpack[KwargsTypedDict]`. It extracts the documentations from the `TypedDict` and adds each parameter to the function's docstring.
Currently only supported for [`FastImageProcessorKwargs`].
## Best practices
Follow the best practices below to help maintain consistent and informative documentation for Transformers!
* Use `@auto_docstring` for new PyTorch model classes ([`PreTrainedModel`] subclasses) and their primary methods like `forward` or `get_text_features`.
* For classes, `@auto_docstring` retrieves parameter descriptions from the `__init__` method's docstring.
* Rely on standard docstrings and do not redefine common arguments unless their behavior is different in your model.
* Document new or custom arguments clearly.
* Run `check_docstrings` locally and iteratively.
By following these guidelines, you help maintain consistent and informative documentation for the Hugging Face Transformers library 🤗.

View File

@@ -99,8 +99,6 @@ self.value_cache[layer_idx] = torch.cat([self.value_cache[layer_idx], value_stat
2. The cache grows dynamically as more tokens are processed. The sequence length dimension (`seq_len`) increases with each new token.
3. The cache maintains a count of seen tokens through `self._seen_tokens`. This is updated when the first layer processes a new token.
The example below demonstrates how to create a generation loop with [`DynamicCache`]. As discussed, the attention mask is a concatenation of past and current token values and `1` is added to the cache position for the next token.
```py
@@ -143,7 +141,7 @@ The legacy format is essentially the same data structure but organized different
- The tensors have the same shape `[batch_size, num_heads, seq_len, head_dim]`.
- The format is less flexible and doesn't support features like quantization or offloading.
If your project depends on this legacy format, you can convert between [`DynamicCache`] and a tuple of tuples as shown below with the [`~DynamicCache.from_legacy_cache`] and [`DynamicCache.to_legacy_cache`] functions. This is helpful if you have custom logic for manipulating a cache in a specific format.
If your project depends on this legacy format, we recommend to convert to [`DynamicCache`] with [`~DynamicCache.from_legacy_cache`]. Note that legacy cache format is deprecated and not used anymore in `Transformers`. You can convert back to tuple format with [`DynamicCache.to_legacy_cache`] functions, which is helpful if you have custom logic for manipulating a cache in a specific format.
```py
import torch

View File

@@ -56,7 +56,7 @@ Create a [`ImageTextToTextPipeline`] and pass the chat to it. For large models,
import torch
from transformers import pipeline
pipeline = pipeline("image-text-to-text", model="llava-hf/llava-onevision-qwen2-0.5b-ov-hf", device="cuda", torch_dtype=torch.float16)
pipeline = pipeline("image-text-to-text", model="llava-hf/llava-onevision-qwen2-0.5b-ov-hf", device_map="auto", torch_dtype=torch.float16)
pipeline(text=messages, max_new_tokens=50, return_full_text=False)
[{'input_text': [{'role': 'system',
'content': [{'type': 'text',
@@ -175,7 +175,7 @@ processed_chat = processor.apply_chat_template(
add_generation_prompt=True,
tokenize=True,
return_dict=True,
video_fps=32,
video_fps=16,
video_load_backend="decord",
)
print(processed_chat.keys())

View File

@@ -25,7 +25,7 @@ Check model leaderboards like [OpenLLM](https://hf.co/spaces/HuggingFaceH4/open_
This guide shows you how to quickly start chatting with Transformers from the command line, how build and format a conversation, and how to chat using the [`TextGenerationPipeline`].
## transformers CLI
## chat CLI
After you've [installed Transformers](./installation.md), chat with a model directly from the command line as shown below. It launches an interactive session with a model, with a few base commands listed at the start of the session.
@@ -49,7 +49,8 @@ For a full list of options, run the command below.
transformers chat -h
```
The chat is implemented on top of the [AutoClass](./model_doc/auto), using tooling from [text generation](./llm_tutorial) and [chat](./chat_templating).
The chat is implemented on top of the [AutoClass](./model_doc/auto), using tooling from [text generation](./llm_tutorial) and [chat](./chat_templating). It uses the `transformers serve` CLI under the hood ([docs](./serving.md#serve-cli)).
## TextGenerationPipeline

View File

@@ -26,6 +26,7 @@ Pass the audio signal, typically stored in `array`, to the feature extractor and
from transformers import AutoFeatureExtractor
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")
dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")
processed_sample = feature_extractor(dataset[0]["audio"]["array"], sampling_rate=16000)
processed_sample
{'input_values': [array([ 9.4472744e-05, 3.0777880e-03, -2.8888427e-03, ...,

View File

@@ -247,3 +247,114 @@ first and last layer will be shown. This is useful when some layers (typically c
layers.
[[autodoc]] model_addition_debugger_context
## Analyzer of skipped tests
### Scan skipped tests - for model adders and maintainers
This small util is a power user tool intended for model adders and maintainers. It lists all test methods
existing in `test_modeling_common.py`, inherited by all model tester classes, and scans the repository to measure
how many tests are being skipped and for which models.
### Rationale
When porting models to transformers, tests fail as they should, and sometimes `test_modeling_common` feels irreconcilable with the peculiarities of our brand new model. But how can we be sure we're not breaking everything by adding a seemingly innocent skip?
This utility:
- scans all test_modeling_common methods
- looks for times where a method is skipped
- returns a summary json you can load as a DataFrame/inspect
**For instance test_inputs_embeds is skipped in a whooping 39% proportion at the time of writing this util.**
![download-icon](https://huggingface.co/datasets/huggingface/documentation-images/resolve/f7f671f69b88ce4967e19179172c248958d35742/transformers/tests_skipped_visualisation.png)
### Usage
You can run the skipped test analyzer in two ways:
#### Full scan (default)
From the root of `transformers` repo, scans all common test methods and outputs the results to a JSON file (default: `all_tests_scan_result.json`).
```bash
python utils/scan_skipped_tests.py --output_dir path/to/output
```
- `--output_dir` (optional): Directory where the JSON results will be saved. Defaults to the current directory.
**Example output:**
```
🔬 Parsing 331 model test files once each...
📝 Aggregating 224 tests...
(224/224) test_update_candidate_strategy_with_matches_1es_3d_is_nonecodet_schedule_fa_kwargs
✅ Scan complete.
📄 JSON saved to /home/pablo/git/transformers/all_tests_scan_result.json
```
And it will generate `all_tests_scan_result.json` file that you can inspect. The JSON is indexed by method name, and each entry follows this schema, indicating the origin as well (from `common`or `GenerationMixin`.)
```json
{
"<method_name>": {
"origin": "<test suite>"
"models_ran": ["<model_name>", ...],
"models_skipped": ["<model_name>", ...],
"skipped_proportion": <float>,
"reasons_skipped": ["<model_name>: <reason>",
...
]
},
...
}
```
Which you can visualise as above with e.g. `pandas`
```python
df = pd.read_json('all_tests_scan_result.json').T
df.sort_values(by=['skipped_proportion'], ascending=False)
```
### Scan a single test method
You can focus on a specific test method using `--test_method_name`:
```bash
$ python utils/scan_skipped_tests.py --test_method_name test_inputs_embeds --output_dir path/to/output
```
- `--test_method_name`: Name of the test method to scan (e.g., `test_inputs_embeds`).
- `--output_dir` (optional): Directory where the JSON result will be saved.
**Example output:**
```bash
$ python utils/scan_skipped_tests.py --test_method_name test_inputs_embeds
🔬 Parsing 331 model test files once each...
== test_inputs_embeds ==
Ran : 199/323
Skipped : 124/323 (38.4%)
- aimv2: Aimv2 does not use inputs_embeds
- align: Inputs_embeds is tested in individual model tests
- altclip: Inputs_embeds is tested in individual model tests
- audio_spectrogram_transformer: AST does not use inputs_embeds
- beit: BEiT does not use inputs_embeds
- bit: Bit does not use inputs_embeds
- blip: Blip does not use inputs_embeds
- blip_2: Inputs_embeds is tested in individual model tests
- bridgetower:
- canine: CANINE does not have a get_input_embeddings() method.
- ...
📄 JSON saved to /home/pablo/git/transformers/scan_test_inputs_embeds.json
```

View File

@@ -44,7 +44,7 @@ import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.float16).to("cuda:0")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.float16, device_map="auto")
inputs = tokenizer("I like rock music because", return_tensors="pt").to(model.device)
model.generate(**inputs, do_sample=False, max_new_tokens=20, use_cache=False)
@@ -59,7 +59,7 @@ import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, DynamicCache
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.float16).to("cuda:0")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.float16, device_map="auto")
inputs = tokenizer("I like rock music because", return_tensors="pt").to(model.device)
past_key_values = DynamicCache()
@@ -142,13 +142,14 @@ Enable [`QuantizedCache`] by configuring `cache_implementation="quantized"` in [
For [`HQQQuantizedCache`], we recommend setting the `axis-key` and `axis-value` parameters to `1`.
```py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, HQQQuantizedCache, QuantizedCacheConfig
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.float16).to("cuda:0")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.float16, device_map="auto")
inputs = tokenizer("I like rock music because", return_tensors="pt").to(model.device)
out = model.generate(**inputs, do_sample=False, max_new_tokens=20, cache_implementation="quantized", cache_config={"axis-key": 1, "axis-value": 1, "backend": "hqq"})
out = model.generate(**inputs, do_sample=False, max_new_tokens=20, cache_implementation="quantized", cache_config={"backend": "HQQ"})
print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])
I like rock music because it's loud and energetic. It's a great way to express myself and rel
```
@@ -159,13 +160,14 @@ I like rock music because it's loud and energetic. It's a great way to express m
For [`QuantoQuantizedCache`], we recommend setting the `axis-key` and `axis-value` parameters to `0`.
```py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, QuantoQuantizedCache, QuantizedCacheConfig
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.float16).to("cuda:0")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.float16, device_map="auto")
inputs = tokenizer("I like rock music because", return_tensors="pt").to(model.device)
out = model.generate(**inputs, do_sample=False, max_new_tokens=20, cache_implementation="quantized", cache_config={"nbits": 4, "axis-key": 0, "axis-value": 0, "backend": "quanto"})
out = model.generate(**inputs, do_sample=False, max_new_tokens=20, cache_implementation="quantized", cache_config={"nbits": 4, "backend": "quanto"})
print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])
I like rock music because it's loud and energetic. It's a great way to express myself and rel
```
@@ -207,14 +209,14 @@ import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.float16, device_map="auto")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.float16, device_map={"": 0})
inputs = tokenizer("Hello, my name is", return_tensors="pt").to(model.device)
out = model.generate(**inputs, do_sample=False, max_new_tokens=20, cache_implementation="offloaded_static")
tokenizer.batch_decode(out, skip_special_tokens=True)[0]
"Hello, my name is [Your Name], and I am a [Your Profession] with [Number of Years] of"
```
Cache offloading requires a CUDA GPU.
Cache offloading requires a CUDA GPU or Intel XPU.
### Sliding window cache
@@ -227,7 +229,7 @@ import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", torch_dtype=torch.float16).to("cuda:0")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", torch_dtype=torch.float16, device_map="auto")
inputs = tokenizer("Yesterday I was on a rock concert and.", return_tensors="pt").to(model.device)
out = model.generate(**inputs, do_sample=False, max_new_tokens=30, cache_implementation="sliding_window")
@@ -306,15 +308,15 @@ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, DynamicCache, StaticCache
model_id = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="cuda")
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map={"": 0})
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Init StaticCache with big enough max-length (1024 tokens for the below example)
# You can also init a DynamicCache, if that suits you better
prompt_cache = StaticCache(config=model.config, max_batch_size=1, max_cache_len=1024, device="cuda", dtype=torch.bfloat16)
prompt_cache = StaticCache(config=model.config, max_batch_size=1, max_cache_len=1024, device=model.device.type, dtype=torch.bfloat16)
INITIAL_PROMPT = "You are a helpful assistant. "
inputs_initial_prompt = tokenizer(INITIAL_PROMPT, return_tensors="pt").to("cuda")
inputs_initial_prompt = tokenizer(INITIAL_PROMPT, return_tensors="pt").to(model.device.type)
# This is the common prompt cached, we need to run forward without grad to be able to copy
with torch.no_grad():
prompt_cache = model(**inputs_initial_prompt, past_key_values = prompt_cache).past_key_values
@@ -322,7 +324,7 @@ with torch.no_grad():
prompts = ["Help me to write a blogpost about travelling.", "What is the capital of France?"]
responses = []
for prompt in prompts:
new_inputs = tokenizer(INITIAL_PROMPT + prompt, return_tensors="pt").to("cuda")
new_inputs = tokenizer(INITIAL_PROMPT + prompt, return_tensors="pt").to(model.device.type)
past_key_values = copy.deepcopy(prompt_cache)
outputs = model.generate(**new_inputs, past_key_values=past_key_values,max_new_tokens=20)
response = tokenizer.batch_decode(outputs)[0]

View File

@@ -0,0 +1,104 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# AIMv2
## Overview
The AIMv2 model was proposed in [Multimodal Autoregressive Pre-training of Large Vision Encoders](https://arxiv.org/abs/2411.14402) by Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Victor Guilherme Turrisi da Costa, Louis Béthune, Zhe Gan, Alexander T Toshev, Marcin Eichner, Moin Nabi, Yinfei Yang, Joshua M. Susskind, Alaaeldin El-Nouby.
The abstract from the paper is the following:
*We introduce a novel method for pre-training of large-scale vision encoders. Building on recent advancements in autoregressive pre-training of vision models, we extend this framework to a multimodal setting, i.e., images and text. In this paper, we present AIMV2, a family of generalist vision encoders characterized by a straightforward pre-training process, scalability, and remarkable performance across a range of downstream tasks. This is achieved by pairing the vision encoder with a multimodal decoder that autoregressively generates raw image patches and text tokens. Our encoders excel not only in multimodal evaluations but also in vision benchmarks such as localization, grounding, and classification. Notably, our AIMV2-3B encoder achieves 89.5% accuracy on ImageNet-1k with a frozen trunk. Furthermore, AIMV2 consistently outperforms state-of-the-art contrastive models (e.g., CLIP, SigLIP) in multimodal image understanding across diverse settings.*
This model was contributed by [Yaswanth Gali](https://huggingface.co/yaswanthgali).
The original code can be found [here](https://github.com/apple/ml-aim).
## Usage Example
Here is an example of Image Feature Extraction using specific checkpoints on resized images and native resolution images:
```python
import requests
from PIL import Image
from transformers import AutoImageProcessor, AutoModel
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
processor = AutoImageProcessor.from_pretrained("apple/aimv2-large-patch14-native")
model = AutoModel.from_pretrained("apple/aimv2-large-patch14-native")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
```
Here is an example of a checkpoint performing zero-shot classification:
```python
import requests
from PIL import Image
from transformers import AutoProcessor, AutoModel
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
text = ["Picture of a dog.", "Picture of a cat.", "Picture of a horse."]
processor = AutoProcessor.from_pretrained("apple/aimv2-large-patch14-224-lit")
model = AutoModel.from_pretrained("apple/aimv2-large-patch14-224-lit")
inputs = processor(
images=image,
text=text,
add_special_tokens=True,
truncation=True,
padding=True,
return_tensors="pt",
)
outputs = model(**inputs)
probs = outputs.logits_per_image.softmax(dim=-1)
```
## Aimv2Config
[[autodoc]] Aimv2Config
## Aimv2TextConfig
[[autodoc]] Aimv2TextConfig
## Aimv2VisionConfig
[[autodoc]] Aimv2VisionConfig
## Aimv2Model
[[autodoc]] Aimv2Model
- forward
## Aimv2VisionModel
[[autodoc]] Aimv2VisionModel
- forward
## Aimv2TextModel
[[autodoc]] Aimv2TextModel
- forward
</pt>
<tf>

View File

@@ -14,59 +14,123 @@ rendered properly in your Markdown viewer.
-->
# BigBirdPegasus
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
## Overview
# BigBirdPegasus
The BigBird model was proposed in [Big Bird: Transformers for Longer Sequences](https://huggingface.co/papers/2007.14062) by
Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon,
Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others. BigBird, is a sparse-attention
based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse
attention, BigBird also applies global attention as well as random attention to the input sequence. Theoretically, it
has been shown that applying sparse, global, and random attention approximates full attention, while being
computationally much more efficient for longer sequences. As a consequence of the capability to handle longer context,
BigBird has shown improved performance on various long document NLP tasks, such as question answering and
summarization, compared to BERT or RoBERTa.
[BigBirdPegasus](https://huggingface.co/papers/2007.14062) is an encoder-decoder (sequence-to-sequence) transformer model for long-input summarization. It extends the [BigBird](./big_bird) architecture with an additional pretraining objective borrowed from [Pegasus](./pegasus) called gap sequence generation (GSG). Whole sentences are masked and the model has to fill in the gaps in the document. BigBirdPegasus's ability to keep track of long contexts makes it effective at summarizing lengthy inputs, surpassing the performance of base Pegasus models.
The abstract from the paper is the following:
You can find all the original BigBirdPegasus checkpoints under the [Google](https://huggingface.co/google/models?search=bigbird-pegasus) organization.
*Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP.
Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence
length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that
reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and
is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our
theoretical analysis reveals some of the benefits of having O(1) global tokens (such as CLS), that attend to the entire
sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to
8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context,
BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also
propose novel applications to genomics data.*
> [!TIP]
> This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta).
>
> Click on the BigBirdPegasus models in the right sidebar for more examples of how to apply BigBirdPegasus to different language tasks.
The original code can be found [here](https://github.com/google-research/bigbird).
The example below demonstrates how to summarize text with [`Pipeline`], [`AutoModel`], and from the command line.
## Usage tips
<hfoptions id="usage">
<hfoption id="Pipeline">
- For an in-detail explanation on how BigBird's attention works, see [this blog post](https://huggingface.co/blog/big-bird).
- BigBird comes with 2 implementations: **original_full** & **block_sparse**. For the sequence length < 1024, using
**original_full** is advised as there is no benefit in using **block_sparse** attention.
- The code currently uses window size of 3 blocks and 2 global blocks.
- Sequence length must be divisible by block size.
- Current implementation supports only **ITC**.
- Current implementation doesn't support **num_random_blocks = 0**.
- BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pegasus/tokenization_pegasus.py).
- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
```py
import torch
from transformers import pipeline
pipeline = pipeline(
task="summarization",
model="google/bigbird-pegasus-large-arxiv",
torch_dtype=torch.float32,
device=0
)
pipeline("""Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet.
Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems.
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle.""")
```
</hfoption>
<hfoption id="AutoModel">
```py
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained(
"google/bigbird-pegasus-large-arxiv"
)
model = AutoModelForSeq2SeqLM.from_pretrained(
"google/bigbird-pegasus-large-arxiv",
torch_dtype=torch.bfloat16,
device_map="auto",
)
input_text = """Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet.
Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems.
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle."""
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
output = model.generate(**input_ids, cache_implementation="static")
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
</hfoption>
<hfoption id="transformers-cli">
```bash
echo -e "Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet. Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts." | transformers-cli run --task summarization --model google/bigbird-pegasus-large-arxiv --device 0
```
</hfoption>
</hfoptions>
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
The example below uses [bitsandbytes](../quantization/bitsandbytes) to only quantize the weights to int4.
```py
import torch
from transformers import BitsAndBytesConfig, AutoModelForSeq2SeqLM, AutoTokenizer
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4"
)
model = AutoModelForSeq2SeqLM.from_pretrained(
"google/bigbird-pegasus-large-arxiv",
torch_dtype=torch.bfloat16,
device_map="auto",
quantization_config=quantization_config
)
tokenizer = AutoTokenizer.from_pretrained(
"google/bigbird-pegasus-large-arxiv"
)
input_text = """Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet.
Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems.
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle."""
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
output = model.generate(**input_ids, cache_implementation="static")
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
## Notes
- BigBirdPegasus also uses the [`PegasusTokenizer`].
- Inputs should be padded on the right because BigBird uses absolute position embeddings.
- BigBirdPegasus supports `original_full` and `block_sparse` attention. If the input sequence length is less than 1024, it is recommended to use `original_full` since sparse patterns don't offer much benefit for smaller inputs.
- The current implementation uses window size of 3 blocks and 2 global blocks, only supports the ITC-implementation, and doesn't support `num_random_blocks=0`.
- The sequence length must be divisible by the block size.
## Resources
- [Text classification task guide](../tasks/sequence_classification)
- [Question answering task guide](../tasks/question_answering)
- [Causal language modeling task guide](../tasks/language_modeling)
- [Translation task guide](../tasks/translation)
- [Summarization task guide](../tasks/summarization)
Read the [Understanding BigBird's Block Sparse Attention](https://huggingface.co/blog/big-bird) blog post for more details about how BigBird's attention works.
## BigBirdPegasusConfig

View File

@@ -14,49 +14,105 @@ rendered properly in your Markdown viewer.
-->
# CamemBERT
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
## Overview
# CamemBERT
The CamemBERT model was proposed in [CamemBERT: a Tasty French Language Model](https://huggingface.co/papers/1911.03894) by
[Louis Martin](https://huggingface.co/louismartin), [Benjamin Muller](https://huggingface.co/benjamin-mlr), [Pedro Javier Ortiz Suárez](https://huggingface.co/pjox), Yoann Dupont, Laurent Romary, Éric Villemonte de la
Clergerie, [Djamé Seddah](https://huggingface.co/Djame), and [Benoît Sagot](https://huggingface.co/sagot). It is based on Facebook's RoBERTa model released in 2019. It is a model
trained on 138GB of French text.
[CamemBERT](https://huggingface.co/papers/1911.03894) is a language model based on [RoBERTa](./roberta), but trained specifically on French text from the OSCAR dataset, making it more effective for French language tasks.
The abstract from the paper is the following:
What sets CamemBERT apart is that it learned from a huge, high quality collection of French data, as opposed to mixing lots of languages. This helps it really understand French better than many multilingual models.
*Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available
models have either been trained on English data or on the concatenation of data in multiple languages. This makes
practical use of such models --in all languages except English-- very limited. Aiming to address this issue for French,
we release CamemBERT, a French version of the Bi-directional Encoders for Transformers (BERT). We measure the
performance of CamemBERT compared to multilingual models in multiple downstream tasks, namely part-of-speech tagging,
dependency parsing, named-entity recognition, and natural language inference. CamemBERT improves the state of the art
for most of the tasks considered. We release the pretrained model for CamemBERT hoping to foster research and
downstream applications for French NLP.*
Common applications of CamemBERT include masked language modeling (Fill-mask prediction), text classification (sentiment analysis), token classification (entity recognition) and sentence pair classification (entailment tasks).
This model was contributed by [the ALMAnaCH team (Inria)](https://huggingface.co/almanach). The original code can be found [here](https://camembert-model.fr/).
You can find all the original CamemBERT checkpoints under the [ALMAnaCH](https://huggingface.co/almanach/models?search=camembert) organization.
<Tip>
> [!TIP]
> This model was contributed by the [ALMAnaCH (Inria)](https://huggingface.co/almanach) team.
>
> Click on the CamemBERT models in the right sidebar for more examples of how to apply CamemBERT to different NLP tasks.
This implementation is the same as RoBERTa. Refer to the [documentation of RoBERTa](roberta) for usage examples as well
as the information relative to the inputs and outputs.
The examples below demonstrate how to predict the `<mask>` token with [`Pipeline`], [`AutoModel`], and from the command line.
</Tip>
<hfoptions id="usage">
## Resources
<hfoption id="Pipeline">
- [Text classification task guide](../tasks/sequence_classification)
- [Token classification task guide](../tasks/token_classification)
- [Question answering task guide](../tasks/question_answering)
- [Causal language modeling task guide](../tasks/language_modeling)
- [Masked language modeling task guide](../tasks/masked_language_modeling)
- [Multiple choice task guide](../tasks/multiple_choice)
```python
import torch
from transformers import pipeline
pipeline = pipeline("fill-mask", model="camembert-base", torch_dtype=torch.float16, device=0)
pipeline("Le camembert est un délicieux fromage <mask>.")
```
</hfoption>
<hfoption id="AutoModel">
```python
import torch
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("camembert-base")
model = AutoModelForMaskedLM.from_pretrained("camembert-base", torch_dtype="auto", device_map="auto", attn_implementation="sdpa")
inputs = tokenizer("Le camembert est un délicieux fromage <mask>.", return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits
masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print(f"The predicted token is: {predicted_token}")
```
</hfoption>
<hfoption id="transformers CLI">
```bash
echo -e "Le camembert est un délicieux fromage <mask>." | transformers run --task fill-mask --model camembert-base --device 0
```
</hfoption>
</hfoptions>
Quantization reduces the memory burden of large models by representing weights in lower precision. Refer to the [Quantization](../quantization/overview) overview for available options.
The example below uses [bitsandbytes](../quantization/bitsandbytes) quantization to quantize the weights to 8-bits.
```python
from transformers import AutoTokenizer, AutoModelForMaskedLM, BitsAndBytesConfig
import torch
quant_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForMaskedLM.from_pretrained(
"almanach/camembert-large",
quantization_config=quant_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("almanach/camembert-large")
inputs = tokenizer("Le camembert est un délicieux fromage <mask>.", return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits
masked_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print(f"The predicted token is: {predicted_token}")
```
## CamembertConfig
@@ -137,5 +193,4 @@ as the information relative to the inputs and outputs.
[[autodoc]] TFCamembertForQuestionAnswering
</tf>
</frameworkcontent>
</frameworkcontent>

View File

@@ -191,6 +191,11 @@ model = ChameleonForConditionalGeneration.from_pretrained(
[[autodoc]] ChameleonImageProcessor
- preprocess
## ChameleonImageProcessorFast
[[autodoc]] ChameleonImageProcessorFast
- preprocess
## ChameleonVQVAE
[[autodoc]] ChameleonVQVAE

View File

@@ -3,6 +3,7 @@
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
</div>

View File

@@ -4,6 +4,7 @@
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
## Overview

View File

@@ -14,66 +14,111 @@ rendered properly in your Markdown viewer.
-->
# DeBERTa-v2
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white" >
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
</div>
</div>
## Overview
The DeBERTa model was proposed in [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://huggingface.co/papers/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google's
BERT model released in 2018 and Facebook's RoBERTa model released in 2019.
# DeBERTa-v2
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in
RoBERTa.
[DeBERTa-v2](https://huggingface.co/papers/2006.03654) improves on the original [DeBERTa](./deberta) architecture by using a SentencePiece-based tokenizer and a new vocabulary size of 128K. It also adds an additional convolutional layer within the first transformer layer to better learn local dependencies of input tokens. Finally, the position projection and content projection matrices are shared in the attention layer to reduce the number of parameters.
The abstract from the paper is the following:
*Recent progress in pre-trained neural language models has significantly improved the performance of many natural
language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with
disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the
disentangled attention mechanism, where each word is represented using two vectors that encode its content and
position, respectively, and the attention weights among words are computed using disentangled matrices on their
contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to
predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency
of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of
the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9%
(90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
You can find all the original [DeBERTa-v2] checkpoints under the [Microsoft](https://huggingface.co/microsoft?search_models=deberta-v2) organization.
The following information is visible directly on the [original implementation
repository](https://github.com/microsoft/DeBERTa). DeBERTa v2 is the second version of the DeBERTa model. It includes
the 1.5B model used for the SuperGLUE single-model submission and achieving 89.9, versus human baseline 89.8. You can
find more details about this submission in the authors'
[blog](https://www.microsoft.com/en-us/research/blog/microsoft-deberta-surpasses-human-performance-on-the-superglue-benchmark/)
> [!TIP]
> This model was contributed by [Pengcheng He](https://huggingface.co/DeBERTa).
>
> Click on the DeBERTa-v2 models in the right sidebar for more examples of how to apply DeBERTa-v2 to different language tasks.
New in v2:
The example below demonstrates how to classify text with [`Pipeline`] or the [`AutoModel`] class.
- **Vocabulary** In v2 the tokenizer is changed to use a new vocabulary of size 128K built from the training data.
Instead of a GPT2-based tokenizer, the tokenizer is now
[sentencepiece-based](https://github.com/google/sentencepiece) tokenizer.
- **nGiE(nGram Induced Input Encoding)** The DeBERTa-v2 model uses an additional convolution layer aside with the first
transformer layer to better learn the local dependency of input tokens.
- **Sharing position projection matrix with content projection matrix in attention layer** Based on previous
experiments, this can save parameters without affecting the performance.
- **Apply bucket to encode relative positions** The DeBERTa-v2 model uses log bucket to encode relative positions
similar to T5.
- **900M model & 1.5B model** Two additional model sizes are available: 900M and 1.5B, which significantly improves the
performance of downstream tasks.
<hfoptions id="usage">
<hfoption id="Pipeline">
This model was contributed by [DeBERTa](https://huggingface.co/DeBERTa). This model TF 2.0 implementation was
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/microsoft/DeBERTa).
```py
import torch
from transformers import pipeline
## Resources
pipeline = pipeline(
task="text-classification",
model="microsoft/deberta-v2-xlarge-mnli",
device=0,
torch_dtype=torch.float16
)
result = pipeline("DeBERTa-v2 is great at understanding context!")
print(result)
```
</hfoption>
<hfoption id="AutoModel">
```py
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained(
"microsoft/deberta-v2-xlarge-mnli"
)
model = AutoModelForSequenceClassification.from_pretrained(
"microsoft/deberta-v2-xlarge-mnli",
torch_dtype=torch.float16,
device_map="auto"
)
inputs = tokenizer("DeBERTa-v2 is great at understanding context!", return_tensors="pt").to("cuda")
outputs = model(**inputs)
logits = outputs.logits
predicted_class_id = logits.argmax().item()
predicted_label = model.config.id2label[predicted_class_id]
print(f"Predicted label: {predicted_label}")
```
</hfoption>
<hfoption id="transformers CLI">
```bash
echo -e "DeBERTa-v2 is great at understanding context!" | transformers-cli run --task fill-mask --model microsoft/deberta-v2-xlarge-mnli --device 0
```
</hfoption>
</hfoptions>
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
The example below uses [bitsandbytes quantization](../quantization/bitsandbytes) to only quantize the weights to 4-bit.
```py
from transformers import AutoModelForSequenceClassification, AutoTokenizer, BitsAndBytesConfig
model_id = "microsoft/deberta-v2-xlarge-mnli"
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16",
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(
model_id,
quantization_config=quantization_config,
torch_dtype="float16"
)
inputs = tokenizer("DeBERTa-v2 is great at understanding context!", return_tensors="pt").to("cuda")
outputs = model(**inputs)
logits = outputs.logits
predicted_class_id = logits.argmax().item()
predicted_label = model.config.id2label[predicted_class_id]
print(f"Predicted label: {predicted_label}")
```
- [Text classification task guide](../tasks/sequence_classification)
- [Token classification task guide](../tasks/token_classification)
- [Question answering task guide](../tasks/question_answering)
- [Masked language modeling task guide](../tasks/masked_language_modeling)
- [Multiple choice task guide](../tasks/multiple_choice)
## DebertaV2Config

View File

@@ -0,0 +1,49 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# DeepSeek-V2
## Overview
The DeepSeek-V2 model was proposed in [DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model](https://arxiv.org/abs/2405.04434) by DeepSeek-AI Team.
The abstract from the paper is the following:
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
This model was contributed by [VladOS95-cyber](https://github.com/VladOS95-cyber).
The original code can be found [here](https://huggingface.co/deepseek-ai/DeepSeek-V2).
### Usage tips
The model uses Multi-head Latent Attention (MLA) and DeepSeekMoE architectures for efficient inference and cost-effective training. It employs an auxiliary-loss-free strategy for load balancing and multi-token prediction training objective. The model can be used for various language tasks after being pre-trained on 14.8 trillion tokens and going through Supervised Fine-Tuning and Reinforcement Learning stages.
## DeepseekV2Config
[[autodoc]] DeepseekV2Config
## DeepseekV2Model
[[autodoc]] DeepseekV2Model
- forward
## DeepseekV2ForCausalLM
[[autodoc]] DeepseekV2ForCausalLM
- forward
## DeepseekV2ForSequenceClassification
[[autodoc]] DeepseekV2ForSequenceClassification
- forward

View File

@@ -44,7 +44,7 @@ tokens and decodes them back into audio.
from transformers import AutoProcessor, DiaForConditionalGeneration
torch_device = "cuda"
model_checkpoint = "buttercrab/dia-v1-1.6b"
model_checkpoint = "nari-labs/Dia-1.6B-0626"
text = ["[S1] Dia is an open weights text to dialogue model."]
processor = AutoProcessor.from_pretrained(model_checkpoint)
@@ -66,7 +66,7 @@ from datasets import load_dataset, Audio
from transformers import AutoProcessor, DiaForConditionalGeneration
torch_device = "cuda"
model_checkpoint = "buttercrab/dia-v1-1.6b"
model_checkpoint = "nari-labs/Dia-1.6B-0626"
ds = load_dataset("hf-internal-testing/dailytalk-dummy", split="train")
ds = ds.cast_column("audio", Audio(sampling_rate=44100))
@@ -93,7 +93,7 @@ from datasets import load_dataset, Audio
from transformers import AutoProcessor, DiaForConditionalGeneration
torch_device = "cuda"
model_checkpoint = "buttercrab/dia-v1-1.6b"
model_checkpoint = "nari-labs/Dia-1.6B-0626"
ds = load_dataset("hf-internal-testing/dailytalk-dummy", split="train")
ds = ds.cast_column("audio", Audio(sampling_rate=44100))

View File

@@ -0,0 +1,103 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Doge
## Overview
Doge is a series of small language models based on the [Doge](https://github.com/SmallDoges/small-doge) architecture, aiming to combine the advantages of state-space and self-attention algorithms, calculate dynamic masks from cached value states using the zero-order hold method, and solve the problem of existing mainstream language models getting lost in context. It uses the `wsd_scheduler` scheduler to pre-train on the `smollm-corpus`, and can continue training on new datasets or add sparse activation feedforward networks from stable stage checkpoints.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/refs%2Fpr%2F426/transformers/model_doc/doge_architecture.png" alt="drawing" width="600"/>
As shown in the figure below, the sequence transformation part of the Doge architecture uses `Dynamic Mask Attention`, which can be understood as using self-attention related to value states during training, and using state-space without past state decay during inference, to solve the problem of existing Transformers or SSMs getting lost in long text. The state transformation part of Doge uses `Cross Domain Mixture of Experts`, which consists of dense linear layers and sparse embedding layers, and can additionally increase sparse parameters to continue training from dense weight checkpoints without retraining the entire model, thereby reducing the cost of continuous iteration of the model. In addition, Doge also uses `RMSNorm` and `Residual` with learnable parameters to adapt the gradient range of deep models.
Checkout all Doge model checkpoints [here](https://huggingface.co/collections/SmallDoge/doge-slm-679cc991f027c4a3abbded4a).
## Usage
<details>
<summary>Using Doge-Base for text generation</summary>
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-20M")
model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-20M")
inputs = tokenizer("Hey how are you doing?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.batch_decode(outputs))
```
</details>
<details>
<summary>Using Doge-Instruct for question answering</summary>
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer
tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-20M-Instruct")
model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-20M-Instruct")
generation_config = GenerationConfig(
max_new_tokens=100,
use_cache=True,
do_sample=True,
temperature=0.8,
top_p=0.9,
repetition_penalty=1.0
)
steamer = TextStreamer(tokenizer=tokenizer, skip_prompt=True)
prompt = "Hi, how are you doing today?"
conversation = [
{"role": "user", "content": prompt}
]
inputs = tokenizer.apply_chat_template(
conversation=conversation,
tokenize=True,
return_tensors="pt",
)
outputs = model.generate(
inputs,
tokenizer=tokenizer,
generation_config=generation_config,
streamer=steamer
)
```
</details>
## DogeConfig
[[autodoc]] DogeConfig
## DogeModel
[[autodoc]] DogeModel
- forward
## DogeForCausalLM
[[autodoc]] DogeForCausalLM
- forward
## DogeForSequenceClassification
[[autodoc]] DogeForSequenceClassification
- forward

View File

@@ -14,115 +14,88 @@ rendered properly in your Markdown viewer.
-->
# Encoder Decoder Models
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAC0AAAAtCAMAAAANxBKoAAAC7lBMVEUAAADg5vYHPVgAoJH+/v76+v39/f9JbLP///9+AIgAnY3///+mcqzt8fXy9fgkXa3Ax9709fr+///9/f8qXq49qp5AaLGMwrv8/P0eW60VWawxYq8yqJzG2dytt9Wyu9elzci519Lf3O3S2efY3OrY0+Xp7PT///////+dqNCexMc6Z7AGpJeGvbenstPZ5ejQ1OfJzOLa7ejh4+/r8fT29vpccbklWK8PVa0AS6ghW63O498vYa+lsdKz1NDRt9Kw1c672tbD3tnAxt7R6OHp5vDe7OrDyuDn6vLl6/EAQKak0MgATakkppo3ZK/Bz9y8w9yzu9jey97axdvHzeG21NHH4trTwthKZrVGZLSUSpuPQJiGAI+GAI8SWKydycLL4d7f2OTi1+S9xNzL0ePT6OLGzeEAo5U0qJw/aLEAo5JFa7JBabEAp5Y4qZ2QxLyKmsm3kL2xoMOehrRNb7RIbbOZgrGre68AUqwAqZqNN5aKJ5N/lMq+qsd8kMa4pcWzh7muhLMEV69juq2kbKqgUaOTR5uMMZWLLZSGAI5VAIdEAH+ovNDHuNCnxcy3qcaYx8K8msGplrx+wLahjbYdXrV6vbMvYK9DrZ8QrZ8tqJuFms+Sos6sw8ecy8RffsNVeMCvmb43aLltv7Q4Y7EZWK4QWa1gt6meZKUdr6GOAZVeA4xPAISyveLUwtivxtKTpNJ2jcqfvcltiMiwwcfAoMVxhL+Kx7xjdrqTe60tsaNQs6KaRKACrJ6UTZwkqpqTL5pkHY4AloSgsd2ptNXPvNOOncuxxsqFl8lmg8apt8FJcr9EbryGxLqlkrkrY7dRa7ZGZLQ5t6iXUZ6PPpgVpZeJCJFKAIGareTa0+KJod3H0deY2M+esM25usmYu8d2zsJOdcBVvrCLbqcAOaaHaKQAMaScWqKBXqCXMJ2RHpiLF5NmJZAdAHN2kta11dKu1M+DkcZLdb+Mcql3TppyRJdzQ5ZtNZNlIY+DF4+voCOQAAAAZ3RSTlMABAT+MEEJ/RH+/TP+Zlv+pUo6Ifz8+fco/fz6+evr39S9nJmOilQaF/7+/f38+smmoYp6b1T+/v7++vj189zU0tDJxsGzsrKSfv34+Pf27dDOysG9t6+n/vv6+vr59uzr1tG+tZ6Qg9Ym3QAABR5JREFUSMeNlVVUG1EQhpcuxEspXqS0SKEtxQp1d3d332STTRpIQhIISQgJhODu7lAoDoUCpe7u7u7+1puGpqnCPOyZvffbOXPm/PsP9JfQgyCC+tmTABTOcbxDz/heENS7/1F+9nhvkHePG0wNDLbGWwdXL+rbLWvpmZHXD8+gMfBjTh+aSe6Gnn7lwQIOTR0c8wfX3PWgv7avbdKwf/ZoBp1Gp/PvuvXW3vw5ib7emnTW4OR+3D4jB9vjNJ/7gNvfWWeH/TO/JyYrsiKCRjVEZA3UB+96kON+DxOQ/NLE8PE5iUYgIXjFnCOlxEQMaSGVxjg4gxOnEycGz8bptuNjVx08LscIgrzH3umcn+KKtiBIyvzOO2O99aAdR8cF19oZalnCtvREUw79tCd5sow1g1UKM6kXqUx4T8wsi3sTjJ3yzDmmhenLXLpo8u45eG5y4Vvbk6kkC4LLtJMowkSQxmk4ggVJEG+7c6QpHT8vvW9X7/o7+3ELmiJi2mEzZJiz8cT6TBlanBk70cB5GGIGC1gRDdZ00yADLW1FL6gqhtvNXNG5S9gdSrk4M1qu7JAsmYshzDS4peoMrU/gT7qQdqYGZaYhxZmVbGJAm/CS/HloWyhRUlknQ9KYcExTwS80d3VNOxUZJpITYyspl0LbhArhpZCD9cRWEQuhYkNGMHToQ/2Cs6swJlb39CsllxdXX6IUKh/H5jbnSsPKjgmoaFQ1f8wRLR0UnGE/RcDEjj2jXG1WVTwUs8+zxfcrVO+vSsuOpVKxCfYZiQ0/aPKuxQbQ8lIz+DClxC8u+snlcJ7Yr1z1JPqUH0V+GDXbOwAib931Y4Imaq0NTIXPXY+N5L18GJ37SVWu+hwXff8l72Ds9XuwYIBaXPq6Shm4l+Vl/5QiOlV+uTk6YR9PxKsI9xNJny31ygK1e+nIRC1N97EGkFPI+jCpiHe5PCEy7oWqWSwRrpOvhFzcbTWMbm3ZJAOn1rUKpYIt/lDhW/5RHHteeWFN60qo98YJuoq1nK3uW5AabyspC1BcIEpOhft+SZAShYoLSvnmSfnYADUERP5jJn2h5XtsgCRuhYQqAvwTwn33+YWEKUI72HX5AtfSAZDe8F2DtPPm77afhl0EkthzuCQU0BWApgQIH9+KB0JhopMM7bJrdTRoleM2JAVNMyPF+wdoaz+XJpGoVAQ7WXUkcV7gT3oUZyi/ISIJAVKhgNp+4b4veCFhYVJw4locdSjZCp9cPUhLF9EZ3KKzURepMEtCDPP3VcWFx4UIiZIklIpFNfHpdEafIF2aRmOcrUmjohbT2WUllbmRvgfbythbQO3222fpDJoufaQPncYYuqoGtUEsCJZL6/3PR5b4syeSjZMQG/T2maGANlXT2v8S4AULWaUkCxfLyW8iW4kdka+nEMjxpL2NCwsYNBp+Q61PF43zyDg9Bm9+3NNySn78jMZUUkumqE4Gp7JmFOdP1vc8PpRrzj9+wPinCy8K1PiJ4aYbnTYpCCbDkBSbzhu2QJ1Gd82t8jI8TH51+OzvXoWbnXUOBkNW+0mWFwGcGOUVpU81/n3TOHb5oMt2FgYGjzau0Nif0Ss7Q3XB33hjjQHjHA5E5aOyIQc8CBrLdQSs3j92VG+3nNEjbkbdbBr9zm04ruvw37vh0QKOdeGIkckc80fX3KH/h7PT4BOjgCty8VZ5ux1MoO5Cf5naca2LAsEgehI+drX8o/0Nu+W0m6K/I9gGPd/dfx/EN/wN62AhsBWuAAAAAElFTkSuQmCC
">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAC0AAAAtCAMAAAANxBKoAAAC7lBMVEUAAADg5vYHPVgAoJH+/v76+v39/f9JbLP///9+AIgAnY3///+mcqzt8fXy9fgkXa3Ax9709fr+///9/f8qXq49qp5AaLGMwrv8/P0eW60VWawxYq8yqJzG2dytt9Wyu9elzci519Lf3O3S2efY3OrY0+Xp7PT///////+dqNCexMc6Z7AGpJeGvbenstPZ5ejQ1OfJzOLa7ejh4+/r8fT29vpccbklWK8PVa0AS6ghW63O498vYa+lsdKz1NDRt9Kw1c672tbD3tnAxt7R6OHp5vDe7OrDyuDn6vLl6/EAQKak0MgATakkppo3ZK/Bz9y8w9yzu9jey97axdvHzeG21NHH4trTwthKZrVGZLSUSpuPQJiGAI+GAI8SWKydycLL4d7f2OTi1+S9xNzL0ePT6OLGzeEAo5U0qJw/aLEAo5JFa7JBabEAp5Y4qZ2QxLyKmsm3kL2xoMOehrRNb7RIbbOZgrGre68AUqwAqZqNN5aKJ5N/lMq+qsd8kMa4pcWzh7muhLMEV69juq2kbKqgUaOTR5uMMZWLLZSGAI5VAIdEAH+ovNDHuNCnxcy3qcaYx8K8msGplrx+wLahjbYdXrV6vbMvYK9DrZ8QrZ8tqJuFms+Sos6sw8ecy8RffsNVeMCvmb43aLltv7Q4Y7EZWK4QWa1gt6meZKUdr6GOAZVeA4xPAISyveLUwtivxtKTpNJ2jcqfvcltiMiwwcfAoMVxhL+Kx7xjdrqTe60tsaNQs6KaRKACrJ6UTZwkqpqTL5pkHY4AloSgsd2ptNXPvNOOncuxxsqFl8lmg8apt8FJcr9EbryGxLqlkrkrY7dRa7ZGZLQ5t6iXUZ6PPpgVpZeJCJFKAIGareTa0+KJod3H0deY2M+esM25usmYu8d2zsJOdcBVvrCLbqcAOaaHaKQAMaScWqKBXqCXMJ2RHpiLF5NmJZAdAHN2kta11dKu1M+DkcZLdb+Mcql3TppyRJdzQ5ZtNZNlIY+DF4+voCOQAAAAZ3RSTlMABAT+MEEJ/RH+/TP+Zlv+pUo6Ifz8+fco/fz6+evr39S9nJmOilQaF/7+/f38+smmoYp6b1T+/v7++vj189zU0tDJxsGzsrKSfv34+Pf27dDOysG9t6+n/vv6+vr59uzr1tG+tZ6Qg9Ym3QAABR5JREFUSMeNlVVUG1EQhpcuxEspXqS0SKEtxQp1d3d332STTRpIQhIISQgJhODu7lAoDoUCpe7u7u7+1puGpqnCPOyZvffbOXPm/PsP9JfQgyCC+tmTABTOcbxDz/heENS7/1F+9nhvkHePG0wNDLbGWwdXL+rbLWvpmZHXD8+gMfBjTh+aSe6Gnn7lwQIOTR0c8wfX3PWgv7avbdKwf/ZoBp1Gp/PvuvXW3vw5ib7emnTW4OR+3D4jB9vjNJ/7gNvfWWeH/TO/JyYrsiKCRjVEZA3UB+96kON+DxOQ/NLE8PE5iUYgIXjFnCOlxEQMaSGVxjg4gxOnEycGz8bptuNjVx08LscIgrzH3umcn+KKtiBIyvzOO2O99aAdR8cF19oZalnCtvREUw79tCd5sow1g1UKM6kXqUx4T8wsi3sTjJ3yzDmmhenLXLpo8u45eG5y4Vvbk6kkC4LLtJMowkSQxmk4ggVJEG+7c6QpHT8vvW9X7/o7+3ELmiJi2mEzZJiz8cT6TBlanBk70cB5GGIGC1gRDdZ00yADLW1FL6gqhtvNXNG5S9gdSrk4M1qu7JAsmYshzDS4peoMrU/gT7qQdqYGZaYhxZmVbGJAm/CS/HloWyhRUlknQ9KYcExTwS80d3VNOxUZJpITYyspl0LbhArhpZCD9cRWEQuhYkNGMHToQ/2Cs6swJlb39CsllxdXX6IUKh/H5jbnSsPKjgmoaFQ1f8wRLR0UnGE/RcDEjj2jXG1WVTwUs8+zxfcrVO+vSsuOpVKxCfYZiQ0/aPKuxQbQ8lIz+DClxC8u+snlcJ7Yr1z1JPqUH0V+GDXbOwAib931Y4Imaq0NTIXPXY+N5L18GJ37SVWu+hwXff8l72Ds9XuwYIBaXPq6Shm4l+Vl/5QiOlV+uTk6YR9PxKsI9xNJny31ygK1e+nIRC1N97EGkFPI+jCpiHe5PCEy7oWqWSwRrpOvhFzcbTWMbm3ZJAOn1rUKpYIt/lDhW/5RHHteeWFN60qo98YJuoq1nK3uW5AabyspC1BcIEpOhft+SZAShYoLSvnmSfnYADUERP5jJn2h5XtsgCRuhYQqAvwTwn33+YWEKUI72HX5AtfSAZDe8F2DtPPm77afhl0EkthzuCQU0BWApgQIH9+KB0JhopMM7bJrdTRoleM2JAVNMyPF+wdoaz+XJpGoVAQ7WXUkcV7gT3oUZyi/ISIJAVKhgNp+4b4veCFhYVJw4locdSjZCp9cPUhLF9EZ3KKzURepMEtCDPP3VcWFx4UIiZIklIpFNfHpdEafIF2aRmOcrUmjohbT2WUllbmRvgfbythbQO3222fpDJoufaQPncYYuqoGtUEsCJZL6/3PR5b4syeSjZMQG/T2maGANlXT2v8S4AULWaUkCxfLyW8iW4kdka+nEMjxpL2NCwsYNBp+Q61PF43zyDg9Bm9+3NNySn78jMZUUkumqE4Gp7JmFOdP1vc8PpRrzj9+wPinCy8K1PiJ4aYbnTYpCCbDkBSbzhu2QJ1Gd82t8jI8TH51+OzvXoWbnXUOBkNW+0mWFwGcGOUVpU81/n3TOHb5oMt2FgYGjzau0Nif0Ss7Q3XB33hjjQHjHA5E5aOyIQc8CBrLdQSs3j92VG+3nNEjbkbdbBr9zm04ruvw37vh0QKOdeGIkckc80fX3KH/h7PT4BOjgCty8VZ5ux1MoO5Cf5naca2LAsEgehI+drX8o/0Nu+W0m6K/I9gGPd/dfx/EN/wN62AhsBWuAAAAAElFTkSuQmCC
">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
## Overview
# Encoder Decoder Models
The [`EncoderDecoderModel`] can be used to initialize a sequence-to-sequence model with any
pretrained autoencoding model as the encoder and any pretrained autoregressive model as the decoder.
[`EncoderDecoderModel`](https://huggingface.co/papers/1706.03762) initializes a sequence-to-sequence model with any pretrained autoencoder and pretrained autoregressive model. It is effective for sequence generation tasks as demonstrated in [Text Summarization with Pretrained Encoders](https://huggingface.co/papers/1908.08345) which uses [`BertModel`] as the encoder and decoder.
The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation tasks
was shown in [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://huggingface.co/papers/1907.12461) by
Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
> [!TIP]
> This model was contributed by [thomwolf](https://huggingface.co/thomwolf) and the TensorFlow/Flax version by [ydshieh](https://huggingface.co/ydshieh).
>
> Click on the Encoder Decoder models in the right sidebar for more examples of how to apply Encoder Decoder to different language tasks.
After such an [`EncoderDecoderModel`] has been trained/fine-tuned, it can be saved/loaded just like
any other models (see the examples for more information).
The example below demonstrates how to generate text with [`Pipeline`], [`AutoModel`], and from the command line.
An application of this architecture could be to leverage two pretrained [`BertModel`] as the encoder
and decoder for a summarization model as was shown in: [Text Summarization with Pretrained Encoders](https://huggingface.co/papers/1908.08345) by Yang Liu and Mirella Lapata.
## Randomly initializing `EncoderDecoderModel` from model configurations.
[`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config. In the following example, we show how to do this using the default [`BertModel`] configuration for the encoder and the default [`BertForCausalLM`] configuration for the decoder.
<hfoptions id="usage">
<hfoption id="Pipeline">
```python
>>> from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel
from transformers import pipeline
>>> config_encoder = BertConfig()
>>> config_decoder = BertConfig()
summarizer = pipeline(
"summarization",
model="patrickvonplaten/bert2bert-cnn_dailymail-fp16",
device=0
)
>>> config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
>>> model = EncoderDecoderModel(config=config)
text = "Plants create energy through a process known as photosynthesis. This involves capturing sunlight and converting carbon dioxide and water into glucose and oxygen."
print(summarizer(text))
```
## Initialising `EncoderDecoderModel` from a pretrained encoder and a pretrained decoder.
[`EncoderDecoderModel`] can be initialized from a pretrained encoder checkpoint and a pretrained decoder checkpoint. Note that any pretrained auto-encoding model, *e.g.* BERT, can serve as the encoder and both pretrained auto-encoding models, *e.g.* BERT, pretrained causal language models, *e.g.* GPT2, as well as the pretrained decoder part of sequence-to-sequence models, *e.g.* decoder of BART, can be used as the decoder.
Depending on which architecture you choose as the decoder, the cross-attention layers might be randomly initialized.
Initializing [`EncoderDecoderModel`] from a pretrained encoder and decoder checkpoint requires the model to be fine-tuned on a downstream task, as has been shown in [the *Warm-starting-encoder-decoder blog post*](https://huggingface.co/blog/warm-starting-encoder-decoder).
To do so, the `EncoderDecoderModel` class provides a [`EncoderDecoderModel.from_encoder_decoder_pretrained`] method.
</hfoption>
<hfoption id="AutoModel">
```python
>>> from transformers import EncoderDecoderModel, BertTokenizer
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = EncoderDecoderModel.from_encoder_decoder_pretrained("google-bert/bert-base-uncased", "google-bert/bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16")
model = AutoModelForCausalLM.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16", torch_dtype=torch.bfloat16, device_map="auto",attn_implementation="sdpa")
text = "Plants create energy through a process known as photosynthesis. This involves capturing sunlight and converting carbon dioxide and water into glucose and oxygen."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(model.device)
summary = model.generate(**inputs, max_length=60, num_beams=4, early_stopping=True)
print(tokenizer.decode(summary[0], skip_special_tokens=True))
```
## Loading an existing `EncoderDecoderModel` checkpoint and perform inference.
</hfoption>
<hfoption id="transformers CLI">
To load fine-tuned checkpoints of the `EncoderDecoderModel` class, [`EncoderDecoderModel`] provides the `from_pretrained(...)` method just like any other model architecture in Transformers.
```bash
echo -e "Plants create energy through a process known as photosynthesis. This involves capturing sunlight and converting carbon dioxide and water into glucose and oxygen." | transformers-cli run --task summarization --model "patrickvonplaten/bert2bert-cnn_dailymail-fp16" --device 0
```
To perform inference, one uses the [`generate`] method, which allows to autoregressively generate text. This method supports various forms of decoding, such as greedy, beam search and multinomial sampling.
</hfoption>
</hfoptions>
## Notes
- [`EncoderDecoderModel`] can be initialized using any pretrained encoder and decoder. But depending on the decoder architecture, the cross-attention layers may be randomly initialized.
These models require downstream fine-tuning, as discussed in this [blog post](https://huggingface.co/blog/warm-starting-encoder-decoder). Use [`~EncoderDecoderModel.from_encoder_decoder_pretrained`] to combine encoder and decoder checkpoints.
```python
>>> from transformers import AutoTokenizer, EncoderDecoderModel
from transformers import EncoderDecoderModel, BertTokenizer
>>> # load a fine-tuned seq2seq model and corresponding tokenizer
>>> model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert_cnn_daily_mail")
>>> tokenizer = AutoTokenizer.from_pretrained("patrickvonplaten/bert2bert_cnn_daily_mail")
>>> # let's perform inference on a long piece of text
>>> ARTICLE_TO_SUMMARIZE = (
... "PG&E stated it scheduled the blackouts in response to forecasts for high winds "
... "amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were "
... "scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."
... )
>>> input_ids = tokenizer(ARTICLE_TO_SUMMARIZE, return_tensors="pt").input_ids
>>> # autoregressively generate summary (uses greedy decoding by default)
>>> generated_ids = model.generate(input_ids)
>>> generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
>>> print(generated_text)
nearly 800 thousand customers were affected by the shutoffs. the aim is to reduce the risk of wildfires. nearly 800, 000 customers were expected to be affected by high winds amid dry conditions. pg & e said it scheduled the blackouts to last through at least midday tomorrow.
tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = EncoderDecoderModel.from_encoder_decoder_pretrained(
"google-bert/bert-base-uncased",
"google-bert/bert-base-uncased"
)
```
## Loading a PyTorch checkpoint into `TFEncoderDecoderModel`.
[`TFEncoderDecoderModel.from_pretrained`] currently doesn't support initializing the model from a
pytorch checkpoint. Passing `from_pt=True` to this method will throw an exception. If there are only pytorch
checkpoints for a particular encoder-decoder model, a workaround is:
```python
>>> # a workaround to load from pytorch checkpoint
>>> from transformers import EncoderDecoderModel, TFEncoderDecoderModel
>>> _model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16")
>>> _model.encoder.save_pretrained("./encoder")
>>> _model.decoder.save_pretrained("./decoder")
>>> model = TFEncoderDecoderModel.from_encoder_decoder_pretrained(
... "./encoder", "./decoder", encoder_from_pt=True, decoder_from_pt=True
... )
>>> # This is only for copying some specific attributes of this particular model.
>>> model.config = _model.config
```
## Training
Once the model is created, it can be fine-tuned similar to BART, T5 or any other encoder-decoder model.
As you can see, only 2 inputs are required for the model in order to compute a loss: `input_ids` (which are the
`input_ids` of the encoded input sequence) and `labels` (which are the `input_ids` of the encoded
target sequence).
- Encoder Decoder models can be fine-tuned like BART, T5 or any other encoder-decoder model. Only 2 inputs are required to compute a loss, `input_ids` and `labels`. Refer to this [notebook](https://colab.research.google.com/drive/1WIk2bxglElfZewOHboPFNj8H44_VAyKE?usp=sharing#scrollTo=ZwQIEhKOrJpl) for a more detailed training example.
```python
>>> from transformers import BertTokenizer, EncoderDecoderModel
@@ -147,11 +120,42 @@ target sequence).
>>> loss = model(input_ids=input_ids, labels=labels).loss
```
Detailed [colab](https://colab.research.google.com/drive/1WIk2bxglElfZewOHboPFNj8H44_VAyKE?usp=sharing#scrollTo=ZwQIEhKOrJpl) for training.
- [`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config as shown below.
This model was contributed by [thomwolf](https://github.com/thomwolf). This model's TensorFlow and Flax versions
were contributed by [ydshieh](https://github.com/ydshieh).
```python
>>> from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel
>>> config_encoder = BertConfig()
>>> config_decoder = BertConfig()
>>> config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
>>> model = EncoderDecoderModel(config=config)
```
- The Encoder Decoder Model can also be used for translation as shown below.
```python
from transformers import AutoTokenizer, EncoderDecoderModel
# Load a pre-trained translation model
model_name = "google/bert2bert_L-24_wmt_en_de"
tokenizer = AutoTokenizer.from_pretrained(model_name, pad_token="<pad>", eos_token="</s>", bos_token="<s>")
model = EncoderDecoderModel.from_pretrained(model_name)
# Input sentence to translate
input_text = "Plants create energy through a process known as"
# Encode the input text
inputs = tokenizer(input_text, return_tensors="pt", add_special_tokens=False).input_ids
# Generate the translated output
outputs = model.generate(inputs)[0]
# Decode the output tokens to get the translated sentence
translated_text = tokenizer.decode(outputs, skip_special_tokens=True)
print("Translated text:", translated_text)
```
## EncoderDecoderConfig

View File

@@ -0,0 +1,210 @@
<!--Copyright 2025 Mobile Perception Systems Lab at TU/e and The HuggingFace Inc. team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# EoMT
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
## Overview
The Encoder-only Mask Transformer (EoMT) model was introduced in the CVPR 2025 Highlight Paper [Your ViT is Secretly an Image Segmentation Model](https://www.tue-mps.org/eomt) by Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus.
EoMT reveals Vision Transformers can perform image segmentation efficiently without task-specific components.
The abstract from the paper is the following:
*Vision Transformers (ViTs) have shown remarkable performance and scalability across various computer vision tasks. To apply single-scale ViTs to image segmentation, existing methods adopt a convolutional adapter to generate multi-scale features, a pixel decoder to fuse these features, and a Transformer decoder that uses the fused features to make predictions. In this paper, we show that the inductive biases introduced by these task-specific components can instead be learned by the ViT itself, given sufficiently large models and extensive pre-training. Based on these findings, we introduce the Encoder-only Mask Transformer (EoMT), which repurposes the plain ViT architecture to conduct image segmentation. With large-scale models and pre-training, EoMT obtains a segmentation accuracy similar to state-of-the-art models that use task-specific components. At the same time, EoMT is significantly faster than these methods due to its architectural simplicity, e.g., up to 4x faster with ViT-L. Across a range of model sizes, EoMT demonstrates an optimal balance between segmentation accuracy and prediction speed, suggesting that compute resources are better spent on scaling the ViT itself rather than adding architectural complexity.*
This model was contributed by [Yaswanth Gali](https://huggingface.co/yaswanthgali).
The original code can be found [here](https://github.com/tue-mps/eomt).
## Architecture Info
The `EoMT` model uses a DINOv2-pretrained Vision Transformer with **register tokens** as its backbone. EoMT simplifies the segmentation pipeline by relying solely on the encoder, eliminating the need for task-specific decoders commonly used in prior approaches.
Architecturally, EoMT introduces a small set of **learned queries** and a lightweight **mask prediction module**. These queries are injected into the final encoder blocks, enabling **joint attention** between image patches and object queries. During training, **masked attention** is applied to constrain each query to focus on its corresponding region—effectively mimicking cross-attention. This constraint is gradually phased out via a **mask annealing strategy**, allowing for **efficient, decoder-free inference** without compromising segmentation performance.
<div style="text-align: center;">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/eomt_architecture.png"
alt="drawing" width="500"/>
</div>
The model supports semantic, instance, and panoptic segmentation using a unified architecture and task-specific post-processing.
## Usage Examples
Use the Hugging Face implementation of EoMT for inference with pre-trained models.
### Semantic Segmentation
The EoMT model performs semantic segmentation using sliding-window inference. The input image is resized such that the shorter side matches the target input size, then it is split into overlapping crops. Each crop is then passed through the model. After inference, the predicted logits from each crop are stitched back together and rescaled to the original image size to get the final segmentation mask.
> **Note:**
> If you want to use a custom target size for **semantic segmentation**, specify it in the following format:
> `{"shortest_edge": 512}`
> Notice that `longest_edge` is not provided here — this is intentional. For semantic segmentation, images are typically **scaled so that the shortest edge is greater than or equal to the target size** hence longest_edge is not necessary.
```python
import matplotlib.pyplot as plt
import requests
import torch
from PIL import Image
from transformers import EomtForUniversalSegmentation, AutoImageProcessor
model_id = "tue-mps/ade20k_semantic_eomt_large_512"
processor = AutoImageProcessor.from_pretrained(model_id)
model = EomtForUniversalSegmentation.from_pretrained(model_id)
image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
inputs = processor(
images=image,
return_tensors="pt",
)
with torch.inference_mode():
outputs = model(**inputs)
# Prepare the original image size in the format (height, width)
target_sizes = [(image.height, image.width)]
# Post-process the model outputs to get final segmentation prediction
preds = processor.post_process_semantic_segmentation(
outputs,
target_sizes=target_sizes,
)
# Visualize the segmentation mask
plt.imshow(preds[0])
plt.axis("off")
plt.title("Semantic Segmentation")
plt.show()
```
### Instance Segmentation
The EoMT model performs instance segmentation using padded inference. The input image is resized so that the longer side matches the target input size, and the shorter side is zero-padded to form a square. The resulting mask and class logits are combined through post-processing (adapted from Mask2Former) to produce a unified instance segmentation map, along with segment metadata like segment id, class labels and confidence scores.
> **Note:**
> To use a custom target size, specify the size as a dictionary in the following format:
> `{"shortest_edge": 512, "longest_edge": 512}`
> For both instance and panoptic segmentation, input images will be **scaled and padded** to this target size.
```python
import matplotlib.pyplot as plt
import requests
import torch
from PIL import Image
from transformers import EomtForUniversalSegmentation, AutoImageProcessor
model_id = "tue-mps/coco_instance_eomt_large_640"
processor = AutoImageProcessor.from_pretrained(model_id)
model = EomtForUniversalSegmentation.from_pretrained(model_id)
image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
inputs = processor(
images=image,
return_tensors="pt",
)
with torch.inference_mode():
outputs = model(**inputs)
# Prepare the original image size in the format (height, width)
target_sizes = [(image.height, image.width)]
# Post-process the model outputs to get final segmentation prediction
preds = processor.post_process_instance_segmentation(
outputs,
target_sizes=target_sizes,
)
# Visualize the segmentation mask
plt.imshow(preds[0]["segmentation"])
plt.axis("off")
plt.title("Instance Segmentation")
plt.show()
```
### Panoptic Segmentation
The EoMT model performs panoptic segmentation using the same padded inference strategy as in instance segmentation. After padding and normalization, the model predicts both thing (instances) and stuff (amorphous regions) classes. The resulting mask and class logits are combined through post-processing (adapted from Mask2Former) to produce a unified panoptic segmentation map, along with segment metadata like segment id, class labels and confidence scores.
```python
import matplotlib.pyplot as plt
import requests
import torch
from PIL import Image
from transformers import EomtForUniversalSegmentation, AutoImageProcessor
model_id = "tue-mps/coco_panoptic_eomt_large_640"
processor = AutoImageProcessor.from_pretrained(model_id)
model = EomtForUniversalSegmentation.from_pretrained(model_id)
image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
inputs = processor(
images=image,
return_tensors="pt",
)
with torch.inference_mode():
outputs = model(**inputs)
# Prepare the original image size in the format (height, width)
target_sizes = [(image.height, image.width)]
# Post-process the model outputs to get final segmentation prediction
preds = processor.post_process_panoptic_segmentation(
outputs,
target_sizes=target_sizes,
)
# Visualize the panoptic segmentation mask
plt.imshow(preds[0]["segmentation"])
plt.axis("off")
plt.title("Panoptic Segmentation")
plt.show()
```
## EomtImageProcessor
[[autodoc]] EomtImageProcessor
- preprocess
- post_process_semantic_segmentation
- post_process_instance_segmentation
- post_process_panoptic_segmentation
## EomtImageProcessorFast
[[autodoc]] EomtImageProcessorFast
- preprocess
- post_process_semantic_segmentation
- post_process_instance_segmentation
- post_process_panoptic_segmentation
## EomtConfig
[[autodoc]] EomtConfig
## EomtForUniversalSegmentation
[[autodoc]] EomtForUniversalSegmentation
- forward

View File

@@ -23,6 +23,7 @@ rendered properly in your Markdown viewer.
">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
</div>

View File

@@ -22,6 +22,7 @@ rendered properly in your Markdown viewer.
">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
</div>

View File

@@ -29,11 +29,11 @@ rendered properly in your Markdown viewer.
Gemma3n is a multimodal model with pretrained and instruction-tuned variants, available in E4B and E2B sizes. While
large portions of the language model architecture are shared with prior Gemma releases, there are many new additions in
this model, including [Alternating Updates][altup] (AltUp), [Learned Augmented Residual Layer][laurel] (LAuReL),
[MatFormer][matformer], Per-Layer Embeddings (PLE), activation sparsity, and KV cache sharing. The language model uses
[MatFormer][matformer], Per-Layer Embeddings (PLE), [Activation Sparsity with Statistical Top-k][spark-transformer], and KV cache sharing. The language model uses
a similar attention pattern to [Gemma 3](./gemma3.md) with alternating 4 local sliding window self-attention layers for
every global self-attention layer with a maximum context length of 32k tokens. Gemma 3n introduces
[MobileNet v5][mobilenetv5] as the vision encoder, using a default resolution of 768x768 pixels, and adds a
[Universal Speech Model][usm] (USM) as the audio encoder.
[MobileNet v5][mobilenetv5] as the vision encoder, using a default resolution of 768x768 pixels, and adds a newly
trained audio encoder based on the [Universal Speech Model][usm] (USM) architecture.
The instruction-tuned variant was post-trained with knowledge distillation and reinforcement learning.
@@ -121,7 +121,7 @@ echo -e "Plants create energy through a process known as" | transformers run --t
## Notes
- Use [`Gemma3nForConditionalGeneration`] for image-audio-and-text, image-and-text, image-and-audio, audio-and-text,
image-only and aduio-only inputs.
image-only and audio-only inputs.
- Gemma 3n supports multiple images per input, but make sure the images are correctly batched before passing them to
the processor. Each batch should be a list of one or more images.
@@ -201,4 +201,5 @@ echo -e "Plants create energy through a process known as" | transformers run --t
[gemma3n-collection]: https://huggingface.co/collections/google/gemma-3n
[laurel]: https://arxiv.org/abs/2411.07501
[matformer]: https://arxiv.org/abs/2310.07707
[spark-transformer]: https://arxiv.org/abs/2506.06644
[usm]: https://arxiv.org/abs/2303.01037

View File

@@ -20,6 +20,7 @@ rendered properly in your Markdown viewer.
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
## Overview

View File

@@ -18,7 +18,37 @@ rendered properly in your Markdown viewer.
## Overview
To be released with the official model launch.
The GLM family welcomes new members [GLM-4-0414](https://arxiv.org/pdf/2406.12793) series models.
The **GLM-4-32B-0414** series models, featuring 32 billion parameters. Its performance is comparable to OpenAIs GPT
series and DeepSeeks V3/R1 series. It also supports very user-friendly local deployment features. GLM-4-32B-Base-0414
was pre-trained on 15T of high-quality data, including substantial reasoning-type synthetic data. This lays the
foundation for subsequent reinforcement learning extensions. In the post-training stage, we employed human preference
alignment for dialogue scenarios. Additionally, using techniques like rejection sampling and reinforcement learning, we
enhanced the models performance in instruction following, engineering code, and function calling, thus strengthening
the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in engineering code, Artifact
generation, function calling, search-based Q&A, and report generation. In particular, on several benchmarks, such as
code generation or specific Q&A tasks, GLM-4-32B-Base-0414 achieves comparable performance with those larger models like
GPT-4o and DeepSeek-V3-0324 (671B).
**GLM-Z1-32B-0414** is a reasoning model with deep thinking capabilities. This was developed based on GLM-4-32B-0414
through cold start, extended reinforcement learning, and further training on tasks including mathematics, code, and
logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to
solve complex tasks. During training, we also introduced general reinforcement learning based on pairwise ranking
feedback, which enhances the model's general capabilities.
**GLM-Z1-Rumination-32B-0414** is a deep reasoning model with rumination capabilities (against OpenAI's Deep Research).
Unlike typical deep thinking models, the rumination model is capable of deeper and longer thinking to solve more
open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future
development plans). Z1-Rumination is trained through scaling end-to-end reinforcement learning with responses graded by
the ground truth answers or rubrics and can make use of search tools during its deep thinking process to handle complex
tasks. The model shows significant improvements in research-style writing and complex tasks.
Finally, **GLM-Z1-9B-0414** is a surprise. We employed all the aforementioned techniques to train a small model (9B).
GLM-Z1-9B-0414 exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is
top-ranked among all open-source models of the same size. Especially in resource-constrained scenarios, this model
achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking
lightweight deployment.
## Glm4Config

View File

@@ -23,6 +23,29 @@ rendered properly in your Markdown viewer.
# GLM-4.1V
## Overview
**GLM-4.1V-9B-Thinking** is a bilingual vision-language model optimized for reasoning, built on GLM-4-9B. It introduces
a "thinking paradigm" with reinforcement learning, achieving state-of-the-art results among 10B-class models and
rivaling 72B-scale models. It supports 64k context, 4K resolution, and arbitrary aspect ratios, with an open-source base
model for further research. You can check our paper [here](https://huggingface.co/papers/2507.01006). and below is a abstract.
*We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal understanding
and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework.
We first develop a capable vision foundation model with significant potential through large-scale pre-training, which
arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum
Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a
diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding,
GUI-based agents, and long document understanding. We open-source GLM-4.1V-9B-Thinking, which achieves state-of-the-art
performance among models of comparable size. In a comprehensive evaluation across 28 public benchmarks, our model
outperforms Qwen2.5-VL-7B on nearly all tasks and achieves comparable or even superior performance on 18 benchmarks
relative to the significantly larger Qwen2.5-VL-72B. Notably, GLM-4.1V-9B-Thinking also demonstrates competitive or
superior performance compared to closed-source models such as GPT-4o on challenging tasks including long document
understanding and STEM reasoning, further underscoring its strong capabilities. Code, models and more information
are released at https://github.com/THUDM/GLM-4.1V-Thinking.*
## Usage
The example below demonstrates how to generate text based on an image with [`Pipeline`] or the [`AutoModel`] class.
<hfoptions id="usage">

View File

@@ -57,7 +57,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2", torch_dtype=torch.float16, device_map="auto", attn_implementation="sdpa")
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
input_ids = tokenzier("Hello, I'm a language model". return_tensors="pt").to("cuda")
input_ids = tokenizer("Hello, I'm a language model", return_tensors="pt").to("cuda")
output = model.generate(**input_ids, cache_implementation="static")
print(tokenizer.decode(output[0], skip_special_tokens=True))

View File

@@ -19,6 +19,7 @@ rendered properly in your Markdown viewer.
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
# Granite

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -14,53 +14,107 @@ rendered properly in your Markdown viewer.
-->
# I-JEPA
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
## Overview
# I-JEPA
The I-JEPA model was proposed in [Image-based Joint-Embedding Predictive Architecture](https://huggingface.co/papers/2301.08243) by Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas.
I-JEPA is a self-supervised learning method that predicts the representations of one part of an image based on other parts of the same image. This approach focuses on learning semantic features without relying on pre-defined invariances from hand-crafted data transformations, which can bias specific tasks, or on filling in pixel-level details, which often leads to less meaningful representations.
[I-JEPA](https://huggingface.co/papers/2301.08243) is a self-supervised learning method that learns semantic image representations by predicting parts of an image from other parts of the image. It compares the abstract representations of the image (rather than pixel level comparisons), which avoids the typical pitfalls of data augmentation bias and pixel-level details that don't capture semantic meaning.
The abstract from the paper is the following:
You can find the original I-JEPA checkpoints under the [AI at Meta](https://huggingface.co/facebook/models?search=ijepa) organization.
> [!TIP]
> This model was contributed by [jmtzt](https://huggingface.co/jmtzt).
This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image- based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample tar- get blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically, when combined with Vision Transform- ers, we find I-JEPA to be highly scalable. For instance, we train a ViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strong downstream performance across a wide range of tasks, from linear classification to object counting and depth prediction.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/ijepa_architecture.jpg"
alt="drawing" width="600"/>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/ijepa_architecture.jpg">
<small> I-JEPA architecture. Taken from the <a href="https://huggingface.co/papers/2301.08243">original paper.</a> </small>
This model was contributed by [jmtzt](https://huggingface.co/jmtzt).
The original code can be found [here](https://github.com/facebookresearch/ijepa).
> Click on the I-JEPA models in the right sidebar for more examples of how to apply I-JEPA to different image representation and classification tasks.
## How to use
The example below demonstrates how to extract image features with [`Pipeline`] or the [`AutoModel`] class.
Here is how to use this model for image feature extraction:
<hfoptions id="usage">
<hfoption id="Pipeline">
```python
```py
import torch
from transformers import pipeline
feature_extractor = pipeline(
task="image-feature-extraction",
model="facebook/ijepa_vith14_1k",
device=0,
torch_dtype=torch.bfloat16
)
features = feature_extractor("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg", return_tensors=True)
print(f"Feature shape: {features.shape}")
```
</hfoption>
<hfoption id="AutoModel">
```py
import requests
import torch
from PIL import Image
from torch.nn.functional import cosine_similarity
from transformers import AutoModel, AutoProcessor
from transformers import AutoModel, AutoProcessor
url_1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
url_2 = "http://images.cocodataset.org/val2017/000000219578.jpg"
image_1 = Image.open(requests.get(url_1, stream=True).raw)
image_2 = Image.open(requests.get(url_2, stream=True).raw)
processor = AutoProcessor.from_pretrained("facebook/ijepa_vith14_1k")
model = AutoModel.from_pretrained("facebook/ijepa_vith14_1k", torch_dtype="auto", attn_implementation="sdpa")
def infer(image):
inputs = processor(image, return_tensors="pt")
outputs = model(**inputs)
return outputs.last_hidden_state.mean(dim=1)
embed_1 = infer(image_1)
embed_2 = infer(image_2)
similarity = cosine_similarity(embed_1, embed_2)
print(similarity)
```
</hfoption>
</hfoptions>
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
The example below uses [bitsandbytes](../quantization/bitsandbytes) to only quantize the weights to 4-bits.
```py
import torch
from transformers import BitsAndBytesConfig, AutoModel, AutoProcessor
from datasets import load_dataset
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
url_1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
url_2 = "http://images.cocodataset.org/val2017/000000219578.jpg"
image_1 = Image.open(requests.get(url_1, stream=True).raw)
image_2 = Image.open(requests.get(url_2, stream=True).raw)
model_id = "facebook/ijepa_vith14_1k"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained("facebook/ijepa_vitg16_22k")
model = AutoModel.from_pretrained("facebook/ijepa_vitg16_22k", quantization_config=quantization_config, torch_dtype="auto", attn_implementation="sdpa")
@torch.no_grad()
def infer(image):
inputs = processor(image, return_tensors="pt")
outputs = model(**inputs)
@@ -74,15 +128,6 @@ similarity = cosine_similarity(embed_1, embed_2)
print(similarity)
```
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with I-JEPA.
<PipelineTag pipeline="image-classification"/>
- [`IJepaForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
- See also: [Image classification task guide](../tasks/image_classification)
## IJepaConfig
[[autodoc]] IJepaConfig
@@ -95,4 +140,5 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
## IJepaForImageClassification
[[autodoc]] IJepaForImageClassification
- forward
- forward

View File

@@ -14,62 +14,135 @@ rendered properly in your Markdown viewer.
-->
# LED
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
</div>
</div>
## Overview
# LED
The LED model was proposed in [Longformer: The Long-Document Transformer](https://huggingface.co/papers/2004.05150) by Iz
Beltagy, Matthew E. Peters, Arman Cohan.
[Longformer-Encoder-Decoder (LED)](https://huggingface.co/papers/2004.05150) is an encoder-decoder transformer model for sequence-to-sequence tasks like summarization. It extends [Longformer](.longformer), an encoder-only model designed to handle long inputs, by adding a decoder layer. The decoder uses full self-attention on the encoded tokens and previously decoded locations. Because of Longformer's linear self-attention mechanism, LED is more efficient than standard encoder-decoder models when processing long sequences.
The abstract from the paper is the following:
You can find all the original [LED] checkpoints under the [Ai2](https://huggingface.co/allenai/models?search=led) organization.
*Transformer-based models are unable to process long sequences due to their self-attention operation, which scales
quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention
mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or
longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local
windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we
evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In
contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our
pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on
WikiHop and TriviaQA. We finally introduce the Longformer-Encoder-Decoder (LED), a Longformer variant for supporting
long document generative sequence-to-sequence tasks, and demonstrate its effectiveness on the arXiv summarization
dataset.*
> [!TIP]
> This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten).
>
> Click on the LED models in the right sidebar for more examples of how to apply LED to different language tasks.
## Usage tips
The example below demonstrates how to summarize text with [`Pipeline`], [`AutoModel`], and from the command line.
- [`LEDForConditionalGeneration`] is an extension of
[`BartForConditionalGeneration`] exchanging the traditional *self-attention* layer with
*Longformer*'s *chunked self-attention* layer. [`LEDTokenizer`] is an alias of
[`BartTokenizer`].
- LED works very well on long-range *sequence-to-sequence* tasks where the `input_ids` largely exceed a length of
1024 tokens.
- LED pads the `input_ids` to be a multiple of `config.attention_window` if required. Therefore a small speed-up is
gained, when [`LEDTokenizer`] is used with the `pad_to_multiple_of` argument.
- LED makes use of *global attention* by means of the `global_attention_mask` (see
[`LongformerModel`]). For summarization, it is advised to put *global attention* only on the first
`<s>` token. For question answering, it is advised to put *global attention* on all tokens of the question.
- To fine-tune LED on all 16384, *gradient checkpointing* can be enabled in case training leads to out-of-memory (OOM)
errors. This can be done by executing `model.gradient_checkpointing_enable()`.
Moreover, the `use_cache=False`
flag can be used to disable the caching mechanism to save memory.
- LED is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
<hfoptions id="usage">
<hfoption id="Pipeline">
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten).
```python
import torch
from transformers import pipeline
pipeline = pipeline(
task="summarization",
model="allenai/led-base-16384",
torch_dtype=torch.float16,
device=0
)
pipeline("""Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet.
Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems.
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle.""")
```
</hfoption>
<hfoption id="AutoModel">
```python
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained(
"allenai/led-base-16384"
)
model = AutoModelForSeq2SeqLM.from_pretrained(
"allenai/led-base-16384",
torch_dtype=torch.float16,
device_map="auto"
)
input_text = """Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet.
Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems.
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle."""
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
# Place global attention on the first token
global_attention_mask = torch.zeros_like(input_ids.input_ids).to("cuda")
global_attention_mask[:, 0] = 1
output = model.generate(**input_ids, global_attention_mask=global_attention_mask, cache_implementation="static")
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
</hfoption>
<hfoption id="transformers-cli">
```bash
!echo -e "Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet. Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts." | transformers-cli run --task summarization --model allenai/led-base-16384 --device 0
```
</hfoption>
</hfoptions>
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
The example below uses [bitsandbytes](../quantization/bitsandbytes) to only quantize the weights to int4.
```python
import torch
from transformers import BitsAndBytesConfig, AutoModelForSeq2SeqLM, AutoTokenizer
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4"
)
model = AutoModelForSeq2SeqLM.from_pretrained(
"allenai/led-large-16384",
torch_dtype=torch.bfloat16,
device_map="auto",
quantization_config=quantization_config
)
tokenizer = AutoTokenizer.from_pretrained(
"allenai/led-large-16384"
)
input_text = """Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet.
Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems.
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle."""
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
# Place global attention on the first token
global_attention_mask = torch.zeros_like(input_ids.input_ids).to("cuda")
global_attention_mask[:, 0] = 1
output = model.generate(**input_ids, global_attention_mask=global_attention_mask, cache_implementation="static")
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
## Notes
- [`LEDForConditionalGeneration`] is an extension of [`BartForConditionalGeneration`] exchanging the traditional self-attention layer with Longformer's chunked self-attention layer. [`LEDTokenizer`] is an alias of [`BartTokenizer`].
- LED pads the `input_ids` to be a multiple of `config.attention_window` if required. A small speedup is gained when [`LEDTokenizer`] is used with the `pad_to_multiple_of` argument.
- LED works best on long-range sequence-to-sequence tasks where the `input_ids` are significantly longer than 1024 tokens.
- LED uses global attention by means of the `global_attention_mask` (see [`LongformerModel`]). For summarization, it is advised to put global attention only on the first `<s>` token. For question answering, it is advised to put global attention on all tokens of the question.
- To fine-tune LED on all 16384 parameters, gradient checkpointing can be enabled in case training leads to out-of-memory (OOM) errors. Enable gradient checkpointing by adding `model.gradient_checkpointing_enable()` and setting `use_cache=False` to disable the caching mechanism to save memory.
- Inputs should be padded on the right because LED uses absolute position embeddings.
## Resources
- [A notebook showing how to evaluate LED](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing).
- [A notebook showing how to fine-tune LED](https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing).
- [Text classification task guide](../tasks/sequence_classification)
- [Question answering task guide](../tasks/question_answering)
- [Translation task guide](../tasks/translation)
- [Summarization task guide](../tasks/summarization)
- Read the [LED on Arxiv notebook](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing) to see how LED can achieve state-of-the-art performance on Arxiv article summarization.
- Read the [Fine-tune LED notebook](https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing) to learn how to fine-tune LED on PubMed articles.
## LEDConfig

View File

@@ -0,0 +1,84 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
# LFM2
## Overview
[LFM2](https://www.liquid.ai/blog/liquid-foundation-models-v2-our-second-series-of-generative-ai-models) represents a new generation of Liquid Foundation Models developed by [Liquid AI](https://liquid.ai/), specifically designed for edge AI and on-device deployment.
The models are available in three sizes (350M, 700M, and 1.2B parameters) and are engineered to run efficiently on CPU, GPU, and NPU hardware, making them particularly well-suited for applications requiring low latency, offline operation, and privacy.
## Architecture
The architecture consists of 16 blocks total: 10 double-gated short-range convolution blocks and 6 blocks of grouped query attention. This design stems from the concept of dynamical systems, where linear operations are modulated by input-dependent gates, allowing for "liquid" dynamics that can adapt in real-time. The short convolutions are particularly optimized for embedded SoC CPUs, making them ideal for devices that require fast, local inference without relying on cloud connectivity.
The key architectural innovation of LFM2 lies in its systematic approach to balancing quality, latency, and memory efficiency through our STAR neural architecture search engine. Using STAR, Liquid AI optimized the models for real-world performance on embedded hardware, measuring actual peak memory usage and inference speed on Qualcomm Snapdragon processors. This results in models that achieve 2x faster decode and prefill performance compared to similar-sized models, while maintaining superior benchmark performance across knowledge, mathematics, instruction following, and multilingual tasks.
## Example
The following example shows how to generate an answer using the `AutoModelForCausalLM` class.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model_id = "LiquidAI/LFM2-1.2B"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Generate answer
prompt = "What is C. elegans?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
)
output = model.generate(
input_ids,
do_sample=True,
temperature=0.3,
min_p=0.15,
repetition_penalty=1.05,
max_new_tokens=512,
)
print(tokenizer.decode(output[0], skip_special_tokens=False))
```
## Lfm2Config
[[autodoc]] Lfm2Config
## Lfm2Model
[[autodoc]] Lfm2Model
- forward
## Lfm2ForCausalLM
[[autodoc]] Lfm2ForCausalLM
- forward

View File

@@ -10,37 +10,31 @@ specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white" >
</div>
</div>
# LightGlue
## Overview
[LightGlue](https://arxiv.org/abs/2306.13643) is a deep neural network that learns to match local features across images. It revisits multiple design decisions of SuperGlue and derives simple but effective improvements. Cumulatively, these improvements make LightGlue more efficient - in terms of both memory and computation, more accurate, and much easier to train. Similar to [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor), this model consists of matching two sets of local features extracted from two images, with the goal of being faster than SuperGlue. Paired with the [SuperPoint model](https://huggingface.co/magic-leap-community/superpoint), it can be used to match two images and estimate the pose between them.
The LightGlue model was proposed in [LightGlue: Local Feature Matching at Light Speed](https://arxiv.org/abs/2306.13643)
by Philipp Lindenberger, Paul-Edouard Sarlin and Marc Pollefeys.
You can find all the original LightGlue checkpoints under the [ETH-CVG](https://huggingface.co/ETH-CVG) organization.
Similar to [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor), this model consists of matching
two sets of local features extracted from two images, its goal is to be faster than SuperGlue. Paired with the
[SuperPoint model](https://huggingface.co/magic-leap-community/superpoint), it can be used to match two images and
estimate the pose between them. This model is useful for tasks such as image matching, homography estimation, etc.
> [!TIP]
> This model was contributed by [stevenbucaille](https://huggingface.co/stevenbucaille).
>
> Click on the LightGlue models in the right sidebar for more examples of how to apply LightGlue to different computer vision tasks.
The abstract from the paper is the following:
The example below demonstrates how to match keypoints between two images with the [`AutoModel`] class.
*We introduce LightGlue, a deep neural network that learns to match local features across images. We revisit multiple
design decisions of SuperGlue, the state of the art in sparse matching, and derive simple but effective improvements.
Cumulatively, they make LightGlue more efficient - in terms of both memory and computation, more accurate, and much
easier to train. One key property is that LightGlue is adaptive to the difficulty of the problem: the inference is much
faster on image pairs that are intuitively easy to match, for example because of a larger visual overlap or limited
appearance change. This opens up exciting prospects for deploying deep matchers in latency-sensitive applications like
3D reconstruction. The code and trained models are publicly available at this [https URL](https://github.com/cvg/LightGlue)*
<hfoptions id="usage">
<hfoption id="AutoModel">
## How to use
Here is a quick example of using the model. Since this model is an image matching model, it requires pairs of images to be matched.
The raw outputs contain the list of keypoints detected by the keypoint detector as well as the list of matches with their corresponding
matching scores.
```python
```py
from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
@@ -59,31 +53,70 @@ model = AutoModel.from_pretrained("ETH-CVG/lightglue_superpoint")
inputs = processor(images, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
```
You can use the `post_process_keypoint_matching` method from the `LightGlueImageProcessor` to get the keypoints and matches in a readable format:
```python
# Post-process to get keypoints and matches
image_sizes = [[(image.height, image.width) for image in images]]
outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
for i, output in enumerate(outputs):
print("For the image pair", i)
for keypoint0, keypoint1, matching_score in zip(
output["keypoints0"], output["keypoints1"], output["matching_scores"]
):
print(
f"Keypoint at coordinate {keypoint0.numpy()} in the first image matches with keypoint at coordinate {keypoint1.numpy()} in the second image with a score of {matching_score}."
)
processed_outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
```
You can visualize the matches between the images by providing the original images as well as the outputs to this method:
```python
processor.plot_keypoint_matching(images, outputs)
```
</hfoption>
</hfoptions>
![image/png](https://cdn-uploads.huggingface.co/production/uploads/632885ba1558dac67c440aa8/duPp09ty8NRZlMZS18ccP.png)
## Notes
This model was contributed by [stevenbucaille](https://huggingface.co/stevenbucaille).
The original code can be found [here](https://github.com/cvg/LightGlue).
- LightGlue is adaptive to the task difficulty. Inference is much faster on image pairs that are intuitively easy to match, for example, because of a larger visual overlap or limited appearance change.
```py
from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
import requests
processor = AutoImageProcessor.from_pretrained("ETH-CVG/lightglue_superpoint")
model = AutoModel.from_pretrained("ETH-CVG/lightglue_superpoint")
# LightGlue requires pairs of images
images = [image1, image2]
inputs = processor(images, return_tensors="pt")
outputs = model(**inputs)
# Extract matching information
keypoints0 = outputs.keypoints0 # Keypoints in first image
keypoints1 = outputs.keypoints1 # Keypoints in second image
matches = outputs.matches # Matching indices
matching_scores = outputs.matching_scores # Confidence scores
```
- The model outputs matching indices, keypoints, and confidence scores for each match, similar to SuperGlue but with improved efficiency.
- For better visualization and analysis, use the [`LightGlueImageProcessor.post_process_keypoint_matching`] method to get matches in a more readable format.
```py
# Process outputs for visualization
image_sizes = [[(image.height, image.width) for image in images]]
processed_outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
for i, output in enumerate(processed_outputs):
print(f"For the image pair {i}")
for keypoint0, keypoint1, matching_score in zip(
output["keypoints0"], output["keypoints1"], output["matching_scores"]
):
print(f"Keypoint at {keypoint0.numpy()} matches with keypoint at {keypoint1.numpy()} with score {matching_score}")
```
- Visualize the matches between the images using the built-in plotting functionality.
```py
# Easy visualization using the built-in plotting method
processor.plot_keypoint_matching(images, processed_outputs)
```
<div class="flex justify-center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/632885ba1558dac67c440aa8/duPp09ty8NRZlMZS18ccP.png">
</div>
## Resources
- Refer to the [original LightGlue repository](https://github.com/cvg/LightGlue) for more examples and implementation details.
## LightGlueConfig
@@ -97,8 +130,13 @@ The original code can be found [here](https://github.com/cvg/LightGlue).
- post_process_keypoint_matching
- plot_keypoint_matching
<frameworkcontent>
<pt>
## LightGlueForKeypointMatching
[[autodoc]] LightGlueForKeypointMatching
- forward
</pt>
</frameworkcontent>

View File

@@ -21,6 +21,7 @@ rendered properly in your Markdown viewer.
">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
</div>

View File

@@ -19,6 +19,7 @@ rendered properly in your Markdown viewer.
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAC0AAAAtCAMAAAANxBKoAAAC7lBMVEUAAADg5vYHPVgAoJH+/v76+v39/f9JbLP///9+AIgAnY3///+mcqzt8fXy9fgkXa3Ax9709fr+///9/f8qXq49qp5AaLGMwrv8/P0eW60VWawxYq8yqJzG2dytt9Wyu9elzci519Lf3O3S2efY3OrY0+Xp7PT///////+dqNCexMc6Z7AGpJeGvbenstPZ5ejQ1OfJzOLa7ejh4+/r8fT29vpccbklWK8PVa0AS6ghW63O498vYa+lsdKz1NDRt9Kw1c672tbD3tnAxt7R6OHp5vDe7OrDyuDn6vLl6/EAQKak0MgATakkppo3ZK/Bz9y8w9yzu9jey97axdvHzeG21NHH4trTwthKZrVGZLSUSpuPQJiGAI+GAI8SWKydycLL4d7f2OTi1+S9xNzL0ePT6OLGzeEAo5U0qJw/aLEAo5JFa7JBabEAp5Y4qZ2QxLyKmsm3kL2xoMOehrRNb7RIbbOZgrGre68AUqwAqZqNN5aKJ5N/lMq+qsd8kMa4pcWzh7muhLMEV69juq2kbKqgUaOTR5uMMZWLLZSGAI5VAIdEAH+ovNDHuNCnxcy3qcaYx8K8msGplrx+wLahjbYdXrV6vbMvYK9DrZ8QrZ8tqJuFms+Sos6sw8ecy8RffsNVeMCvmb43aLltv7Q4Y7EZWK4QWa1gt6meZKUdr6GOAZVeA4xPAISyveLUwtivxtKTpNJ2jcqfvcltiMiwwcfAoMVxhL+Kx7xjdrqTe60tsaNQs6KaRKACrJ6UTZwkqpqTL5pkHY4AloSgsd2ptNXPvNOOncuxxsqFl8lmg8apt8FJcr9EbryGxLqlkrkrY7dRa7ZGZLQ5t6iXUZ6PPpgVpZeJCJFKAIGareTa0+KJod3H0deY2M+esM25usmYu8d2zsJOdcBVvrCLbqcAOaaHaKQAMaScWqKBXqCXMJ2RHpiLF5NmJZAdAHN2kta11dKu1M+DkcZLdb+Mcql3TppyRJdzQ5ZtNZNlIY+DF4+voCOQAAAAZ3RSTlMABAT+MEEJ/RH+/TP+Zlv+pUo6Ifz8+fco/fz6+evr39S9nJmOilQaF/7+/f38+smmoYp6b1T+/v7++vj189zU0tDJxsGzsrKSfv34+Pf27dDOysG9t6+n/vv6+vr59uzr1tG+tZ6Qg9Ym3QAABR5JREFUSMeNlVVUG1EQhpcuxEspXqS0SKEtxQp1d3d332STTRpIQhIISQgJhODu7lAoDoUCpe7u7u7+1puGpqnCPOyZvffbOXPm/PsP9JfQgyCC+tmTABTOcbxDz/heENS7/1F+9nhvkHePG0wNDLbGWwdXL+rbLWvpmZHXD8+gMfBjTh+aSe6Gnn7lwQIOTR0c8wfX3PWgv7avbdKwf/ZoBp1Gp/PvuvXW3vw5ib7emnTW4OR+3D4jB9vjNJ/7gNvfWWeH/TO/JyYrsiKCRjVEZA3UB+96kON+DxOQ/NLE8PE5iUYgIXjFnCOlxEQMaSGVxjg4gxOnEycGz8bptuNjVx08LscIgrzH3umcn+KKtiBIyvzOO2O99aAdR8cF19oZalnCtvREUw79tCd5sow1g1UKM6kXqUx4T8wsi3sTjJ3yzDmmhenLXLpo8u45eG5y4Vvbk6kkC4LLtJMowkSQxmk4ggVJEG+7c6QpHT8vvW9X7/o7+3ELmiJi2mEzZJiz8cT6TBlanBk70cB5GGIGC1gRDdZ00yADLW1FL6gqhtvNXNG5S9gdSrk4M1qu7JAsmYshzDS4peoMrU/gT7qQdqYGZaYhxZmVbGJAm/CS/HloWyhRUlknQ9KYcExTwS80d3VNOxUZJpITYyspl0LbhArhpZCD9cRWEQuhYkNGMHToQ/2Cs6swJlb39CsllxdXX6IUKh/H5jbnSsPKjgmoaFQ1f8wRLR0UnGE/RcDEjj2jXG1WVTwUs8+zxfcrVO+vSsuOpVKxCfYZiQ0/aPKuxQbQ8lIz+DClxC8u+snlcJ7Yr1z1JPqUH0V+GDXbOwAib931Y4Imaq0NTIXPXY+N5L18GJ37SVWu+hwXff8l72Ds9XuwYIBaXPq6Shm4l+Vl/5QiOlV+uTk6YR9PxKsI9xNJny31ygK1e+nIRC1N97EGkFPI+jCpiHe5PCEy7oWqWSwRrpOvhFzcbTWMbm3ZJAOn1rUKpYIt/lDhW/5RHHteeWFN60qo98YJuoq1nK3uW5AabyspC1BcIEpOhft+SZAShYoLSvnmSfnYADUERP5jJn2h5XtsgCRuhYQqAvwTwn33+YWEKUI72HX5AtfSAZDe8F2DtPPm77afhl0EkthzuCQU0BWApgQIH9+KB0JhopMM7bJrdTRoleM2JAVNMyPF+wdoaz+XJpGoVAQ7WXUkcV7gT3oUZyi/ISIJAVKhgNp+4b4veCFhYVJw4locdSjZCp9cPUhLF9EZ3KKzURepMEtCDPP3VcWFx4UIiZIklIpFNfHpdEafIF2aRmOcrUmjohbT2WUllbmRvgfbythbQO3222fpDJoufaQPncYYuqoGtUEsCJZL6/3PR5b4syeSjZMQG/T2maGANlXT2v8S4AULWaUkCxfLyW8iW4kdka+nEMjxpL2NCwsYNBp+Q61PF43zyDg9Bm9+3NNySn78jMZUUkumqE4Gp7JmFOdP1vc8PpRrzj9+wPinCy8K1PiJ4aYbnTYpCCbDkBSbzhu2QJ1Gd82t8jI8TH51+OzvXoWbnXUOBkNW+0mWFwGcGOUVpU81/n3TOHb5oMt2FgYGjzau0Nif0Ss7Q3XB33hjjQHjHA5E5aOyIQc8CBrLdQSs3j92VG+3nNEjbkbdbBr9zm04ruvw37vh0QKOdeGIkckc80fX3KH/h7PT4BOjgCty8VZ5ux1MoO5Cf5naca2LAsEgehI+drX8o/0Nu+W0m6K/I9gGPd/dfx/EN/wN62AhsBWuAAAAAElFTkSuQmCC
">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
</div>

View File

@@ -20,6 +20,7 @@ rendered properly in your Markdown viewer.
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAC0AAAAtCAMAAAANxBKoAAAC7lBMVEUAAADg5vYHPVgAoJH+/v76+v39/f9JbLP///9+AIgAnY3///+mcqzt8fXy9fgkXa3Ax9709fr+///9/f8qXq49qp5AaLGMwrv8/P0eW60VWawxYq8yqJzG2dytt9Wyu9elzci519Lf3O3S2efY3OrY0+Xp7PT///////+dqNCexMc6Z7AGpJeGvbenstPZ5ejQ1OfJzOLa7ejh4+/r8fT29vpccbklWK8PVa0AS6ghW63O498vYa+lsdKz1NDRt9Kw1c672tbD3tnAxt7R6OHp5vDe7OrDyuDn6vLl6/EAQKak0MgATakkppo3ZK/Bz9y8w9yzu9jey97axdvHzeG21NHH4trTwthKZrVGZLSUSpuPQJiGAI+GAI8SWKydycLL4d7f2OTi1+S9xNzL0ePT6OLGzeEAo5U0qJw/aLEAo5JFa7JBabEAp5Y4qZ2QxLyKmsm3kL2xoMOehrRNb7RIbbOZgrGre68AUqwAqZqNN5aKJ5N/lMq+qsd8kMa4pcWzh7muhLMEV69juq2kbKqgUaOTR5uMMZWLLZSGAI5VAIdEAH+ovNDHuNCnxcy3qcaYx8K8msGplrx+wLahjbYdXrV6vbMvYK9DrZ8QrZ8tqJuFms+Sos6sw8ecy8RffsNVeMCvmb43aLltv7Q4Y7EZWK4QWa1gt6meZKUdr6GOAZVeA4xPAISyveLUwtivxtKTpNJ2jcqfvcltiMiwwcfAoMVxhL+Kx7xjdrqTe60tsaNQs6KaRKACrJ6UTZwkqpqTL5pkHY4AloSgsd2ptNXPvNOOncuxxsqFl8lmg8apt8FJcr9EbryGxLqlkrkrY7dRa7ZGZLQ5t6iXUZ6PPpgVpZeJCJFKAIGareTa0+KJod3H0deY2M+esM25usmYu8d2zsJOdcBVvrCLbqcAOaaHaKQAMaScWqKBXqCXMJ2RHpiLF5NmJZAdAHN2kta11dKu1M+DkcZLdb+Mcql3TppyRJdzQ5ZtNZNlIY+DF4+voCOQAAAAZ3RSTlMABAT+MEEJ/RH+/TP+Zlv+pUo6Ifz8+fco/fz6+evr39S9nJmOilQaF/7+/f38+smmoYp6b1T+/v7++vj189zU0tDJxsGzsrKSfv34+Pf27dDOysG9t6+n/vv6+vr59uzr1tG+tZ6Qg9Ym3QAABR5JREFUSMeNlVVUG1EQhpcuxEspXqS0SKEtxQp1d3d332STTRpIQhIISQgJhODu7lAoDoUCpe7u7u7+1puGpqnCPOyZvffbOXPm/PsP9JfQgyCC+tmTABTOcbxDz/heENS7/1F+9nhvkHePG0wNDLbGWwdXL+rbLWvpmZHXD8+gMfBjTh+aSe6Gnn7lwQIOTR0c8wfX3PWgv7avbdKwf/ZoBp1Gp/PvuvXW3vw5ib7emnTW4OR+3D4jB9vjNJ/7gNvfWWeH/TO/JyYrsiKCRjVEZA3UB+96kON+DxOQ/NLE8PE5iUYgIXjFnCOlxEQMaSGVxjg4gxOnEycGz8bptuNjVx08LscIgrzH3umcn+KKtiBIyvzOO2O99aAdR8cF19oZalnCtvREUw79tCd5sow1g1UKM6kXqUx4T8wsi3sTjJ3yzDmmhenLXLpo8u45eG5y4Vvbk6kkC4LLtJMowkSQxmk4ggVJEG+7c6QpHT8vvW9X7/o7+3ELmiJi2mEzZJiz8cT6TBlanBk70cB5GGIGC1gRDdZ00yADLW1FL6gqhtvNXNG5S9gdSrk4M1qu7JAsmYshzDS4peoMrU/gT7qQdqYGZaYhxZmVbGJAm/CS/HloWyhRUlknQ9KYcExTwS80d3VNOxUZJpITYyspl0LbhArhpZCD9cRWEQuhYkNGMHToQ/2Cs6swJlb39CsllxdXX6IUKh/H5jbnSsPKjgmoaFQ1f8wRLR0UnGE/RcDEjj2jXG1WVTwUs8+zxfcrVO+vSsuOpVKxCfYZiQ0/aPKuxQbQ8lIz+DClxC8u+snlcJ7Yr1z1JPqUH0V+GDXbOwAib931Y4Imaq0NTIXPXY+N5L18GJ37SVWu+hwXff8l72Ds9XuwYIBaXPq6Shm4l+Vl/5QiOlV+uTk6YR9PxKsI9xNJny31ygK1e+nIRC1N97EGkFPI+jCpiHe5PCEy7oWqWSwRrpOvhFzcbTWMbm3ZJAOn1rUKpYIt/lDhW/5RHHteeWFN60qo98YJuoq1nK3uW5AabyspC1BcIEpOhft+SZAShYoLSvnmSfnYADUERP5jJn2h5XtsgCRuhYQqAvwTwn33+YWEKUI72HX5AtfSAZDe8F2DtPPm77afhl0EkthzuCQU0BWApgQIH9+KB0JhopMM7bJrdTRoleM2JAVNMyPF+wdoaz+XJpGoVAQ7WXUkcV7gT3oUZyi/ISIJAVKhgNp+4b4veCFhYVJw4locdSjZCp9cPUhLF9EZ3KKzURepMEtCDPP3VcWFx4UIiZIklIpFNfHpdEafIF2aRmOcrUmjohbT2WUllbmRvgfbythbQO3222fpDJoufaQPncYYuqoGtUEsCJZL6/3PR5b4syeSjZMQG/T2maGANlXT2v8S4AULWaUkCxfLyW8iW4kdka+nEMjxpL2NCwsYNBp+Q61PF43zyDg9Bm9+3NNySn78jMZUUkumqE4Gp7JmFOdP1vc8PpRrzj9+wPinCy8K1PiJ4aYbnTYpCCbDkBSbzhu2QJ1Gd82t8jI8TH51+OzvXoWbnXUOBkNW+0mWFwGcGOUVpU81/n3TOHb5oMt2FgYGjzau0Nif0Ss7Q3XB33hjjQHjHA5E5aOyIQc8CBrLdQSs3j92VG+3nNEjbkbdbBr9zm04ruvw37vh0QKOdeGIkckc80fX3KH/h7PT4BOjgCty8VZ5ux1MoO5Cf5naca2LAsEgehI+drX8o/0Nu+W0m6K/I9gGPd/dfx/EN/wN62AhsBWuAAAAAElFTkSuQmCC
">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
```py3

View File

@@ -21,6 +21,7 @@ rendered properly in your Markdown viewer.
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
</div>

View File

@@ -14,287 +14,178 @@ rendered properly in your Markdown viewer.
-->
# LLaVA-NeXT
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
## Overview
# LLaVA-NeXT
The LLaVA-NeXT model was proposed in [LLaVA-NeXT: Improved reasoning, OCR, and world knowledge](https://llava-vl.github.io/blog/2024-01-30-llava-next/) by Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, Yong Jae Lee. LLaVa-NeXT (also called LLaVa-1.6) improves upon [LLaVa](llava) by increasing the input image resolution and training on an improved visual instruction tuning dataset to improve OCR and common sense reasoning.
[LLaVANeXT](https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/) improves on [Llava](./llava) by increasing the input image resolution by 4x more pixels and supporting 3 aspect ratios (up to 672x672, 336x1344, 1344x336) to better grasp visual details. It is also trained on an improved visual instruction tuning dataset covering more scenarios and applications to improve OCR and common sense reasoning.
The introduction from the blog is the following:
You can find all the original LLaVANeXT checkpoints under the [LLaVA-NeXT](https://huggingface.co/collections/llava-hf/llava-next-65f75c4afac77fd37dbbe6cf) collection.
*In October 2023, we released LLaVA-1.5 with a simple and efficient design along with great performance on a benchmark suite of 12 datasets. It has since served as the foundation of many comprehensive studies of data, model, and capabilities of large multimodal models (LMM), and has enabled various new applications.
> [!TIP]
> This model was contributed by [nielsr](https://huggingface.co/nielsr).
>
> Click on the LLaVANeXT models in the right sidebar for more examples of how to apply Llava-NeXT to different multimodal tasks.
Today, we are thrilled to present LLaVA-NeXT, with improved reasoning, OCR, and world knowledge. LLaVA-NeXT even exceeds Gemini Pro on several benchmarks.
The example below demonstrates how to generate text based on an image with [`Pipeline`] or the [`AutoModel`] class.
Compared with LLaVA-1.5, LLaVA-NeXT has several improvements:
<hfoptions id="usage">
Increasing the input image resolution to 4x more pixels. This allows it to grasp more visual details. It supports three aspect ratios, up to 672x672, 336x1344, 1344x336 resolution.
Better visual reasoning and OCR capability with an improved visual instruction tuning data mixture.
Better visual conversation for more scenarios, covering different applications. Better world knowledge and logical reasoning.
Efficient deployment and inference with SGLang.
Along with performance improvements, LLaVA-NeXT maintains the minimalist design and data efficiency of LLaVA-1.5. It re-uses the pretrained connector of LLaVA-1.5, and still uses less than 1M visual instruction tuning samples. The largest 34B variant finishes training in ~1 day with 32 A100s.*
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/llava_next_overview.png"
alt="drawing" width="600"/>
<small> LLaVa-NeXT incorporates a higher input resolution by encoding various patches of the input image. Taken from the <a href="https://huggingface.co/papers/2310.03744">original paper.</a> </small>
This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/haotian-liu/LLaVA/tree/main).
## Usage tips
- We advise users to use `padding_side="left"` when computing batched generation as it leads to more accurate results. Simply make sure to call `processor.tokenizer.padding_side = "left"` before generating.
<Tip warning={true}>
- Llava-Next uses different number of patches for images and thus has to pad the inputs inside modeling code, aside from the padding done when processing the inputs. The default setting is "left-padding" if model is in `eval()` mode, otherwise "right-padding".
</Tip>
> [!NOTE]
> LLaVA models after release v4.46 will raise warnings about adding `processor.patch_size = {{patch_size}}`, `processor.num_additional_image_tokens = {{num_additional_image_tokens}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. It is strongly recommended to add the attributes to the processor if you own the model checkpoint, or open a PR if it is not owned by you.
Adding these attributes means that LLaVA will try to infer the number of image tokens required per image and expand the text with as many `<image>` placeholders as there will be tokens. Usually it is around 500 tokens per image, so make sure that the text is not truncated as otherwise there will be failure when merging the embeddings.
The attributes can be obtained from model config, as `model.config.vision_config.patch_size` or `model.config.vision_feature_select_strategy`. The `num_additional_image_tokens` should be `1` if the vision backbone adds a CLS token or `0` if nothing extra is added to the vision patches.
### Formatting Prompts with Chat Templates
Each **checkpoint** is trained with a specific prompt format, depending on the underlying large language model backbone. To ensure correct formatting, use the processors `apply_chat_template` method.
**Important:**
- You must construct a conversation history — passing a plain string won't work.
- Each message should be a dictionary with `"role"` and `"content"` keys.
- The `"content"` should be a list of dictionaries for different modalities like `"text"` and `"image"`.
Heres an example of how to structure your input. We will use [llava-v1.6-mistral-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf) and a conversation history of text and image.
<hfoption id="Pipeline">
```python
from transformers import LlavaNextProcessor
import torch
from transformers import pipeline
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
conversation = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Whats shown in this image?"},
],
},
{
"role": "assistant",
"content": [{"type": "text", "text": "This image shows a red stop sign."},]
},
{
"role": "user",
"content": [
{"type": "text", "text": "Describe the image in more details."},
],
},
]
text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
# Note that the template simply formats your prompt, you still have to tokenize it and obtain pixel values for your images
print(text_prompt)
>>> "[INST] <image>\nWhat's shown in this image? [/INST] This image shows a red stop sign. [INST] Describe the image in more details. [/INST]"
pipeline = pipeline(
task="image-text-to-text",
model="llava-hf/llava-v1.6-mistral-7b-hf",
device=0,
torch_dtype=torch.bfloat16
)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg",
},
{ "type": "text", "text": "Describe this image."},
]
}
]
pipeline(text=messages, max_new_tokens=20, return_full_text=False)
```
- If you want to construct a chat prompt yourself, below is a list of possible formats
.
[llava-v1.6-mistral-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf) requires the following format:
```bash
"[INST] <image>\nWhat is shown in this image? [/INST]"
</hfoption>
<hfoption id="AutoModel">
```python
import torch
import requests
from PIL import Image
from transformers import AutoProcessor, LlavaNextForConditionalGeneration
processor = AutoProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", torch_dtype=torch.float16).to("cuda")
url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
conversation = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What is shown in this image?"},
],
},
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(image, prompt, return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=100)
print(processor.decode(output[0], skip_special_tokens=True))
```
[llava-v1.6-vicuna-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-7b-hf) and [llava-v1.6-vicuna-13b-hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf) require the following format:
```bash
"A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <image>\nWhat is shown in this image? ASSISTANT:"
</hfoption>
</hfoptions>
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
The example below uses [bitsandbytes](../quantization/bitsandbytes) to only quantize the weights to int4.
```python
import torch
import requests
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor, BitsAndBytesConfig
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4"
)
processor = AutoProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = AutoModelForImageTextToText.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", quantization_config=quant_config, device_map="auto")
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/llava_next_ocr.png"
image = Image.open(requests.get(url, stream=True).raw)
conversation = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What does this chart show?"},
],
},
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(image, prompt, return_tensors="pt").to("cuda")
with torch.inference_mode():
output = model.generate(**inputs, max_new_tokens=100)
print(processor.decode(output[0], skip_special_tokens=True))
```
[llava-v1.6-34b-hf](https://huggingface.co/llava-hf/llava-v1.6-34b-hf) requires the following format:
```bash
"<|im_start|>system\nAnswer the questions.<|im_end|><|im_start|>user\n<image>\nWhat is shown in this image?<|im_end|><|im_start|>assistant\n"
## Notes
* Different checkpoints (Mistral, Vicuna, etc.) require a specific prompt format depending on the underlying LLM. Always use [`~ProcessorMixin.apply_chat_template`] to ensure correct formatting. Refer to the [Templates](../chat_templating) guide for more details.
* Set `padding_side="left"` during batched generation for more accurate results.
```py
processor.tokenizer.padding_side = "left"
```
[llama3-llava-next-8b-hf](https://huggingface.co/llava-hf/llava-next-8b-hf) requires the following format:
* LLaVA-NeXT uses different numbers of patches for images and pads the inputs inside the modeling code except when padding is done during processing. The default setting is *left-padding* if the model is in `eval()` mode, otherwise it is *right-padding*.
```bash
"<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language.<|eot_id|><|start_header_id|><|start_header_id|>user<|end_header_id|>\n\n<image>\nWhat is shown in this image?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
```
* LLaVA models after v4.46 raises warnings about adding `processor.patch_size = {{patch_size}}`, `processor.num_additional_image_tokens = {{num_additional_image_tokens}}`, and `processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. It is strongly recommended to add these attributes to the processor if you own the model checkpoint or open a PR if it isn't.
[llava-next-72b-hf](https://huggingface.co/llava-hf/llava-next-72b-hf) and [llava-next-110b-hf](https://huggingface.co/llava-hf/llava-next-110b-hf) require the following format:
Adding these attributes means LLaVA will try to infer the number of image tokens required per image and expand the text with the same number of `<image>` token placeholders. There are usually ~500 tokens per image, so make sure the text is not truncated because it will cause a failure when merging the embeddings. The attributes can be found in `model.config.vision_config.patch_size` or `model.config.vision_feature_select_strategy`.
```bash
"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<image>\nWhat is shown in this image?<|im_end|>\n<|im_start|>assistant\n"
```
The `num_additional_image_tokens` should be `1` if the vision backbone adds a `CLS` token or `0` if nothing extra is added.
🚀 **Bonus:** If you're using `transformers>=4.49.0`, you can also get a vectorized output from `apply_chat_template`. See the **Usage Examples** below for more details on how to use it.
## Usage example
### Single image inference
Here's how to load the model and perform inference in half-precision (`torch.float16`):
* The example below demonstrates inference with multiple input images.
```python
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch
from PIL import Image
import requests
import requests, torch
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained(
"llava-hf/llava-v1.6-mistral-7b-hf", torch_dtype=torch.float16
).to("cuda")
model = LlavaNextForConditionalGeneration.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", torch_dtype=torch.float16)
model.to("cuda:0")
# Load multiple images
url1 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/llava_next_ocr.png"
url2 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/llava_next_comparison.png"
# prepare image and text prompt, using the appropriate prompt template
url = "https://github.com/haotian-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)
image1 = Image.open(requests.get(url1, stream=True).raw)
image2 = Image.open(requests.get(url2, stream=True).raw)
conversation = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What is shown in this image?"},
],
},
{"role": "user", "content": [{"type": "image"}, {"type": "image"}, {"type": "text", "text": "Compare these two images and describe the differences."}]}
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(image, prompt, return_tensors="pt").to("cuda:0")
inputs = processor([image1, image2], prompt, return_tensors="pt").to("cuda")
# autoregressively complete prompt
output = model.generate(**inputs, max_new_tokens=100)
print(processor.decode(output[0], skip_special_tokens=True))
```
### Multi image inference
LLaVa-Next can perform inference with multiple images as input, where images either belong to the same prompt or different prompts (in batched inference). Here is how you can do it:
```python
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
# Load the model in half-precision
model = AutoModelForImageTextToText.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", torch_dtype=torch.float16, device_map="auto")
processor = AutoProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
# Get three different images
url = "https://www.ilankelman.org/stopsigns/australia.jpg"
image_stop = Image.open(requests.get(url, stream=True).raw)
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image_cats = Image.open(requests.get(url, stream=True).raw)
url = "https://huggingface.co/microsoft/kosmos-2-patch14-224/resolve/main/snowman.jpg"
image_snowman = Image.open(requests.get(url, stream=True).raw)
# Prepare a batch of two prompts, where the first one is a multi-turn conversation and the second is not
conversation_1 = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What is shown in this image?"},
],
},
{
"role": "assistant",
"content": [
{"type": "text", "text": "There is a red stop sign in the image."},
],
},
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What about this image? How many cats do you see?"},
],
},
]
conversation_2 = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What is shown in this image?"},
],
},
]
prompt_1 = processor.apply_chat_template(conversation_1, add_generation_prompt=True)
prompt_2 = processor.apply_chat_template(conversation_2, add_generation_prompt=True)
prompts = [prompt_1, prompt_2]
# We can simply feed images in the order they have to be used in the text prompt
# Each "<image>" token uses one image leaving the next for the subsequent "<image>" tokens
inputs = processor(images=[image_stop, image_cats, image_snowman], text=prompts, padding=True, return_tensors="pt").to(model.device)
# Generate
generate_ids = model.generate(**inputs, max_new_tokens=30)
processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
```
## Model optimization
### Quantization using Bitsandbytes
The model can be loaded in 8 or 4 bits, greatly reducing the memory requirements while maintaining the performance of the original model. First make sure to install bitsandbytes, `pip install bitsandbytes`, and to have access to a GPU/accelerator that is supported by the library.
<Tip>
bitsandbytes is being refactored to support multiple backends beyond CUDA. Currently, ROCm (AMD GPU) and Intel CPU implementations are mature, with Intel XPU in progress and Apple Silicon support expected by Q4/Q1. For installation instructions and the latest backend updates, visit [this link](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend).
We value your feedback to help identify bugs before the full release! Check out [these docs](https://huggingface.co/docs/bitsandbytes/main/en/non_cuda_backends) for more details and feedback links.
</Tip>
Simply change the snippet above with:
```python
from transformers import AutoModelForImageTextToText, BitsAndBytesConfig
# specify how to quantize the model
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
model = AutoModelForImageTextToText.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf", quantization_config=quantization_config, device_map="auto")
```
### Use Flash-Attention 2 to further speed-up generation
First make sure to install flash-attn. Refer to the [original repository of Flash Attention](https://github.com/Dao-AILab/flash-attention) regarding that package installation. Simply change the snippet above with:
```python
from transformers import AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.float16,
use_flash_attention_2=True
).to(0)
```
## LlavaNextConfig

View File

@@ -28,6 +28,7 @@ You can find all the original Mamba checkpoints under the [State Space Models](h
> [!TIP]
> This model was contributed by [Molbap](https://huggingface.co/Molbap) and [AntonV](https://huggingface.co/AntonV).
> Click on the Mamba models in the right sidebar for more examples of how to apply Mamba to different language tasks.
The example below demonstrates how to generate text with [`Pipeline`], [`AutoModel`], and from the command line.

View File

@@ -26,6 +26,7 @@ rendered properly in your Markdown viewer.
You can find all the original Mamba 2 checkpoints under the [State Space Models](https://huggingface.co/state-spaces) organization, but the examples shown below use [mistralai/Mamba-Codestral-7B-v0.1](https://huggingface.co/mistralai/Mamba-Codestral-7B-v0.1) because a Hugging Face implementation isn't supported yet for the original checkpoints.
> [!TIP]
> This model was contributed by [ArthurZ](https://huggingface.co/ArthurZ).
> Click on the Mamba models in the right sidebar for more examples of how to apply Mamba to different language tasks.
The example below demonstrates how to generate text with [`Pipeline`], [`AutoModel`], and from the command line.

View File

@@ -14,159 +14,138 @@ rendered properly in your Markdown viewer.
-->
# MarianMT
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAC0AAAAtCAMAAAANxBKoAAAC7lBMVEUAAADg5vYHPVgAoJH+/v76+v39/f9JbLP///9+AIgAnY3///+mcqzt8fXy9fgkXa3Ax9709fr+///9/f8qXq49qp5AaLGMwrv8/P0eW60VWawxYq8yqJzG2dytt9Wyu9elzci519Lf3O3S2efY3OrY0+Xp7PT///////+dqNCexMc6Z7AGpJeGvbenstPZ5ejQ1OfJzOLa7ejh4+/r8fT29vpccbklWK8PVa0AS6ghW63O498vYa+lsdKz1NDRt9Kw1c672tbD3tnAxt7R6OHp5vDe7OrDyuDn6vLl6/EAQKak0MgATakkppo3ZK/Bz9y8w9yzu9jey97axdvHzeG21NHH4trTwthKZrVGZLSUSpuPQJiGAI+GAI8SWKydycLL4d7f2OTi1+S9xNzL0ePT6OLGzeEAo5U0qJw/aLEAo5JFa7JBabEAp5Y4qZ2QxLyKmsm3kL2xoMOehrRNb7RIbbOZgrGre68AUqwAqZqNN5aKJ5N/lMq+qsd8kMa4pcWzh7muhLMEV69juq2kbKqgUaOTR5uMMZWLLZSGAI5VAIdEAH+ovNDHuNCnxcy3qcaYx8K8msGplrx+wLahjbYdXrV6vbMvYK9DrZ8QrZ8tqJuFms+Sos6sw8ecy8RffsNVeMCvmb43aLltv7Q4Y7EZWK4QWa1gt6meZKUdr6GOAZVeA4xPAISyveLUwtivxtKTpNJ2jcqfvcltiMiwwcfAoMVxhL+Kx7xjdrqTe60tsaNQs6KaRKACrJ6UTZwkqpqTL5pkHY4AloSgsd2ptNXPvNOOncuxxsqFl8lmg8apt8FJcr9EbryGxLqlkrkrY7dRa7ZGZLQ5t6iXUZ6PPpgVpZeJCJFKAIGareTa0+KJod3H0deY2M+esM25usmYu8d2zsJOdcBVvrCLbqcAOaaHaKQAMaScWqKBXqCXMJ2RHpiLF5NmJZAdAHN2kta11dKu1M+DkcZLdb+Mcql3TppyRJdzQ5ZtNZNlIY+DF4+voCOQAAAAZ3RSTlMABAT+MEEJ/RH+/TP+Zlv+pUo6Ifz8+fco/fz6+evr39S9nJmOilQaF/7+/f38+smmoYp6b1T+/v7++vj189zU0tDJxsGzsrKSfv34+Pf27dDOysG9t6+n/vv6+vr59uzr1tG+tZ6Qg9Ym3QAABR5JREFUSMeNlVVUG1EQhpcuxEspXqS0SKEtxQp1d3d332STTRpIQhIISQgJhODu7lAoDoUCpe7u7u7+1puGpqnCPOyZvffbOXPm/PsP9JfQgyCC+tmTABTOcbxDz/heENS7/1F+9nhvkHePG0wNDLbGWwdXL+rbLWvpmZHXD8+gMfBjTh+aSe6Gnn7lwQIOTR0c8wfX3PWgv7avbdKwf/ZoBp1Gp/PvuvXW3vw5ib7emnTW4OR+3D4jB9vjNJ/7gNvfWWeH/TO/JyYrsiKCRjVEZA3UB+96kON+DxOQ/NLE8PE5iUYgIXjFnCOlxEQMaSGVxjg4gxOnEycGz8bptuNjVx08LscIgrzH3umcn+KKtiBIyvzOO2O99aAdR8cF19oZalnCtvREUw79tCd5sow1g1UKM6kXqUx4T8wsi3sTjJ3yzDmmhenLXLpo8u45eG5y4Vvbk6kkC4LLtJMowkSQxmk4ggVJEG+7c6QpHT8vvW9X7/o7+3ELmiJi2mEzZJiz8cT6TBlanBk70cB5GGIGC1gRDdZ00yADLW1FL6gqhtvNXNG5S9gdSrk4M1qu7JAsmYshzDS4peoMrU/gT7qQdqYGZaYhxZmVbGJAm/CS/HloWyhRUlknQ9KYcExTwS80d3VNOxUZJpITYyspl0LbhArhpZCD9cRWEQuhYkNGMHToQ/2Cs6swJlb39CsllxdXX6IUKh/H5jbnSsPKjgmoaFQ1f8wRLR0UnGE/RcDEjj2jXG1WVTwUs8+zxfcrVO+vSsuOpVKxCfYZiQ0/aPKuxQbQ8lIz+DClxC8u+snlcJ7Yr1z1JPqUH0V+GDXbOwAib931Y4Imaq0NTIXPXY+N5L18GJ37SVWu+hwXff8l72Ds9XuwYIBaXPq6Shm4l+Vl/5QiOlV+uTk6YR9PxKsI9xNJny31ygK1e+nIRC1N97EGkFPI+jCpiHe5PCEy7oWqWSwRrpOvhFzcbTWMbm3ZJAOn1rUKpYIt/lDhW/5RHHteeWFN60qo98YJuoq1nK3uW5AabyspC1BcIEpOhft+SZAShYoLSvnmSfnYADUERP5jJn2h5XtsgCRuhYQqAvwTwn33+YWEKUI72HX5AtfSAZDe8F2DtPPm77afhl0EkthzuCQU0BWApgQIH9+KB0JhopMM7bJrdTRoleM2JAVNMyPF+wdoaz+XJpGoVAQ7WXUkcV7gT3oUZyi/ISIJAVKhgNp+4b4veCFhYVJw4locdSjZCp9cPUhLF9EZ3KKzURepMEtCDPP3VcWFx4UIiZIklIpFNfHpdEafIF2aRmOcrUmjohbT2WUllbmRvgfbythbQO3222fpDJoufaQPncYYuqoGtUEsCJZL6/3PR5b4syeSjZMQG/T2maGANlXT2v8S4AULWaUkCxfLyW8iW4kdka+nEMjxpL2NCwsYNBp+Q61PF43zyDg9Bm9+3NNySn78jMZUUkumqE4Gp7JmFOdP1vc8PpRrzj9+wPinCy8K1PiJ4aYbnTYpCCbDkBSbzhu2QJ1Gd82t8jI8TH51+OzvXoWbnXUOBkNW+0mWFwGcGOUVpU81/n3TOHb5oMt2FgYGjzau0Nif0Ss7Q3XB33hjjQHjHA5E5aOyIQc8CBrLdQSs3j92VG+3nNEjbkbdbBr9zm04ruvw37vh0QKOdeGIkckc80fX3KH/h7PT4BOjgCty8VZ5ux1MoO5Cf5naca2LAsEgehI+drX8o/0Nu+W0m6K/I9gGPd/dfx/EN/wN62AhsBWuAAAAAElFTkSuQmCC
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAC0AAAAtCAMAAAANxBKoAAAC7lBMVEUAAADg5vYHPVgAoJH+/v76+v39/f9JbLP///9+AIgAnY3///+mcqzt8fXy9fgkXa3Ax9709fr+///9/f8qXq49qp5AaLGMwrv8/P0eW60VWawxYq8yqJzG2dytt9Wyu9elzci519Lf3O3S2efY3OrY0+Xp7PT///////+dqNCexMc6Z7AGpJeGvbenstPZ5ejQ1OfJzOLa7ejh4+/r8fT29vpccbklWK8PVa0AS6ghW63O498vYa+lsdKz1NDRt9Kw1c672tbD3tnAxt7R6OHp5vDe7OrDyuDn6vLl6/EAQKak0MgATakkppo3ZK/Bz9y8w9yzu9jey97axdvHzeG21NHH4trTwthKZrVGZLSUSpuPQJiGAI+GAI8SWKydycLL4d7f2OTi1+S9xNzL0ePT6OLGzeEAo5U0qJw/aLEAo5JFa7JBabEAp5Y4qZ2QxLyKmsm3kL2xoMOehrRNb7RIbbOZgrGre68AUqwAqZqNN5aKJ5N/lMq+qsd8kMa4pcWzh7muhLMEV69juq2kbKqgUaOTR5uMMZWLLZSGAI5VAIdEAH+ovNDHuNCnxcy3qcaYx8K8msGplrx+wLahjbYdXrV6vbMvYK9DrZ8QrZ8tqJuFms+Sos6sw8ecy8RffsNVeMCvmb43aLltv7Q4Y7EZWK4QWa1gt6meZKUdr6GOAZVeA4xPAISyveLUwtivxtKTpNJ2jcqfvcltiMiwwcfAoMVxhL+Kx7xjdrqTe60tsaNQs6KaRKACrJ6UTZwkqpqTL5pkHY4AloSgsd2ptNXPvNOOncuxxsqFl8lmg8apt8FJcr9EbryGxLqlkrkrY7dRa7ZGZLQ5t6iXUZ6PPpgVpZeJCJFKAIGareTa0+KJod3H0deY2M+esM25usmYu8d2zsJOdcBVvrCLbqcAOaaHaKQAMaScWqKBXqCXMJ2RHpiLF5NmJZAdAHN2kta11dKu1M+DkcZLdb+Mcql3TppyRJdzQ5ZtNZNlIY+DF4+voCOQAAAAZ3RSTlMABAT+MEEJ/RH+/TP+Zlv+pUo6Ifz8+fco/fz6+evr39S9nJmOilQaF/7+/f38+smmoYp6b1T+/v7++vj189zU0tDJxsGzsrKSfv34+Pf27dDOysG9t6+n/vv6+vr59uzr1tG+tZ6Qg9Ym3QAABR5JREFUSMeNlVVUG1EQhpcuxEspXqS0SKEtxQp1d3d332STTRpIQhIISQgJhODu7lAoDoUCpe7u7u7+1puGpqnCPOyZvffbOXPm/PsP9JfQgyCC+tmTABTOcbxDz/heENS7/1F+9nhvkHePG0wNDLbGWwdXL+rbLWvpmZHXD8+gMfBjTh+aSe6Gnn7lwQIOTR0c8wfX3PWgv7avbdKwf/ZoBp1Gp/PvuvXW3vw5ib7emnTW4OR+3D4jB9vjNJ/7gNvfWWeH/TO/JyYrsiKCRjVEZA3UB+96kON+DxOQ/NLE8PE5iUYgIXjFnCOlxEQMaSGVxjg4gxOnEycGz8bptuNjVx08LscIgrzH3umcn+KKtiBIyvzOO2O99aAdR8cF19oZalnCtvREUw79tCd5sow1g1UKM6kXqUx4T8wsi3sTjJ3yzDmmhenLXLpo8u45eG5y4Vvbk6kkC4LLtJMowkSQxmk4ggVJEG+7c6QpHT8vvW9X7/o7+3ELmiJi2mEzZJiz8cT6TBlanBk70cB5GGIGC1gRDdZ00yADLW1FL6gqhtvNXNG5S9gdSrk4M1qu7JAsmYshzDS4peoMrU/gT7qQdqYGZaYhxZmVbGJAm/CS/HloWyhRUlknQ9KYcExTwS80d3VNOxUZJpITYyspl0LbhArhpZCD9cRWEQuhYkNGMHToQ/2Cs6swJlb39CsllxdXX6IUKh/H5jbnSsPKjgmoaFQ1f8wRLR0UnGE/RcDEjj2jXG1WVTwUs8+zxfcrVO+vSsuOpVKxCfYZiQ0/aPKuxQbQ8lIz+DClxC8u+snlcJ7Yr1z1JPqUH0V+GDXbOwAib931Y4Imaq0NTIXPXY+N5L18GJ37SVWu+hwXff8l72Ds9XuwYIBaXPq6Shm4l+Vl/5QiOlV+uTk6YR9PxKsI9xNJny31ygK1e+nIRC1N97EGkFPI+jCpiHe5PCEy7oWqWSwRrpOvhFzcbTWMbm3ZJAOn1rUKpYIt/lDhW/5RHHteeWFN60qo98YJuoq1nK3uW5AabyspC1BcIEpOhft+SZAShYoLSvnmSfnYADUERP5jJn2h5XtsgCRuhYQqAvwTwn33+YWEKUI72HX5AtfSAZDe8F2DtPPm77afhl0EkthzuCQU0BWApgQIH9+KB0JhopMM7bJrdTRoleM2JAVNMyPF+wdoaz+XJpGoVAQ7WXUkcV7gT3oUZyi/ISIJAVKhgNp+4b4veCFhYVJw4locdSjZCp9cPUhLF9EZ3KKzURepMEtCDPP3VcWFx4UIiZIklIpFNfHpdEafIF2aRmOcrUmjohbT2WUllbmRvgfbythbQO3222fpDJoufaQPncYYuqoGtUEsCJZL6/3PR5b4syeSjZMQG/T2maGANlXT2v8S4AULWaUkCxfLyW8iW4kdka+nEMjxpL2NCwsYNBp+Q61PF43zyDg9Bm9+3NNySn78jMZUUkumqE4Gp7JmFOdP1vc8PpRrzj9+wPinCy8K1PiJ4aYbnTYpCCbDkBSbzhu2QJ1Gd82t8jI8TH51+OzvXoWbnXUOBkNW+0mWFwGcGOUVpU81/n3TOHb5oMt2FgYGjzau0Nif0Ss7Q3XB33hjjQHjHA5E5aOyIQc8CBrLdQSs3j92VG+3nNEjbkbdbBr9zm04ruvw37vh0QKOdeGIkckc80fX3KH/h7PT4BOjgCty8VZ5ux1MoO5Cf5naca2LAsEgehI+drX8o/0Nu+W0m6K/I9gGPd/dfx/EN/wN62AhsBWuAAAAAElFTkSuQmCC
">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
## Overview
A framework for translation models, using the same models as BART. Translations should be similar, but not identical to output in the test set linked to in each model card.
This model was contributed by [sshleifer](https://huggingface.co/sshleifer).
# MarianMT
## Implementation Notes
- Each model is about 298 MB on disk, there are more than 1,000 models.
- The list of supported language pairs can be found [here](https://huggingface.co/Helsinki-NLP).
- Models were originally trained by [Jörg Tiedemann](https://researchportal.helsinki.fi/en/persons/j%C3%B6rg-tiedemann) using the [Marian](https://marian-nmt.github.io/) C++ library, which supports fast training and translation.
- All models are transformer encoder-decoders with 6 layers in each component. Each model's performance is documented
in a model card.
- The 80 opus models that require BPE preprocessing are not supported.
- The modeling code is the same as [`BartForConditionalGeneration`] with a few minor modifications:
[MarianMT](https://huggingface.co/papers/1804.00344) is a machine translation model trained with the Marian framework which is written in pure C++. The framework includes its own custom auto-differentiation engine and efficient meta-algorithms to train encoder-decoder models like BART.
- static (sinusoid) positional embeddings (`MarianConfig.static_position_embeddings=True`)
- no layernorm_embedding (`MarianConfig.normalize_embedding=False`)
- the model starts generating with `pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses
`<s/>`),
- Code to bulk convert models can be found in `convert_marian_to_pytorch.py`.
All MarianMT models are transformer encoder-decoders with 6 layers in each component, use static sinusoidal positional embeddings, don't have a layernorm embedding, and the model starts generating with the prefix `pad_token_id` instead of `<s/>`.
## Naming
- All model names use the following format: `Helsinki-NLP/opus-mt-{src}-{tgt}`
- The language codes used to name models are inconsistent. Two digit codes can usually be found [here](https://developers.google.com/admin-sdk/directory/v1/languages), three digit codes require googling "language
code {code}".
- Codes formatted like `es_AR` are usually `code_{region}`. That one is Spanish from Argentina.
- The models were converted in two stages. The first 1000 models use ISO-639-2 codes to identify languages, the second
group use a combination of ISO-639-5 codes and ISO-639-2 codes.
You can find all the original MarianMT checkpoints under the [Language Technology Research Group at the University of Helsinki](https://huggingface.co/Helsinki-NLP/models?search=opus-mt) organization.
## Examples
> [!TIP]
> This model was contributed by [sshleifer](https://huggingface.co/sshleifer).
>
> Click on the MarianMT models in the right sidebar for more examples of how to apply MarianMT to translation tasks.
- Since Marian models are smaller than many other translation models available in the library, they can be useful for
fine-tuning experiments and integration tests.
- [Fine-tune on GPU](https://github.com/huggingface/transformers/blob/master/examples/legacy/seq2seq/train_distil_marian_enro.sh)
## Multilingual Models
The example below demonstrates how to translate text using [`Pipeline`] or the [`AutoModel`] class.
- All model names use the following format: `Helsinki-NLP/opus-mt-{src}-{tgt}`:
- If a model can output multiple languages, and you should specify a language code by prepending the desired output
language to the `src_text`.
- You can see a models's supported language codes in its model card, under target constituents, like in [opus-mt-en-roa](https://huggingface.co/Helsinki-NLP/opus-mt-en-roa).
- Note that if a model is only multilingual on the source side, like `Helsinki-NLP/opus-mt-roa-en`, no language
codes are required.
New multi-lingual models from the [Tatoeba-Challenge repo](https://github.com/Helsinki-NLP/Tatoeba-Challenge)
require 3 character language codes:
<hfoptions id="usage">
<hfoption id="Pipeline">
```python
>>> from transformers import MarianMTModel, MarianTokenizer
>>> src_text = [
... ">>fra<< this is a sentence in english that we want to translate to french",
... ">>por<< This should go to portuguese",
... ">>esp<< And this to Spanish",
... ]
import torch
from transformers import pipeline
>>> model_name = "Helsinki-NLP/opus-mt-en-roa"
>>> tokenizer = MarianTokenizer.from_pretrained(model_name)
>>> print(tokenizer.supported_language_codes)
['>>zlm_Latn<<', '>>mfe<<', '>>hat<<', '>>pap<<', '>>ast<<', '>>cat<<', '>>ind<<', '>>glg<<', '>>wln<<', '>>spa<<', '>>fra<<', '>>ron<<', '>>por<<', '>>ita<<', '>>oci<<', '>>arg<<', '>>min<<']
pipeline = pipeline("translation_en_to_de", model="Helsinki-NLP/opus-mt-en-de", torch_dtype=torch.float16, device=0)
pipeline("Hello, how are you?")
>>> model = MarianMTModel.from_pretrained(model_name)
>>> translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
>>> [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
["c'est une phrase en anglais que nous voulons traduire en français",
'Isto deve ir para o português.',
'Y esto al español']
```
Here is the code to see all available pretrained models on the hub:
</hfoption>
<hfoption id="AutoModel">
```python
from huggingface_hub import list_models
model_list = list_models()
org = "Helsinki-NLP"
model_ids = [x.id for x in model_list if x.id.startswith(org)]
suffix = [x.split("/")[1] for x in model_ids]
old_style_multi_models = [f"{org}/{s}" for s in suffix if s != s.lower()]
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-de", torch_dtype=torch.float16, attn_implementation="sdpa", device_map="auto")
inputs = tokenizer("Hello, how are you?", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, cache_implementation="static")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Old Style Multi-Lingual Models
</hfoption>
</hfoptions>
These are the old style multi-lingual models ported from the OPUS-MT-Train repo: and the members of each language
group:
```python no-style
['Helsinki-NLP/opus-mt-NORTH_EU-NORTH_EU',
'Helsinki-NLP/opus-mt-ROMANCE-en',
'Helsinki-NLP/opus-mt-SCANDINAVIA-SCANDINAVIA',
'Helsinki-NLP/opus-mt-de-ZH',
'Helsinki-NLP/opus-mt-en-CELTIC',
'Helsinki-NLP/opus-mt-en-ROMANCE',
'Helsinki-NLP/opus-mt-es-NORWAY',
'Helsinki-NLP/opus-mt-fi-NORWAY',
'Helsinki-NLP/opus-mt-fi-ZH',
'Helsinki-NLP/opus-mt-fi_nb_no_nn_ru_sv_en-SAMI',
'Helsinki-NLP/opus-mt-sv-NORWAY',
'Helsinki-NLP/opus-mt-sv-ZH']
GROUP_MEMBERS = {
'ZH': ['cmn', 'cn', 'yue', 'ze_zh', 'zh_cn', 'zh_CN', 'zh_HK', 'zh_tw', 'zh_TW', 'zh_yue', 'zhs', 'zht', 'zh'],
'ROMANCE': ['fr', 'fr_BE', 'fr_CA', 'fr_FR', 'wa', 'frp', 'oc', 'ca', 'rm', 'lld', 'fur', 'lij', 'lmo', 'es', 'es_AR', 'es_CL', 'es_CO', 'es_CR', 'es_DO', 'es_EC', 'es_ES', 'es_GT', 'es_HN', 'es_MX', 'es_NI', 'es_PA', 'es_PE', 'es_PR', 'es_SV', 'es_UY', 'es_VE', 'pt', 'pt_br', 'pt_BR', 'pt_PT', 'gl', 'lad', 'an', 'mwl', 'it', 'it_IT', 'co', 'nap', 'scn', 'vec', 'sc', 'ro', 'la'],
'NORTH_EU': ['de', 'nl', 'fy', 'af', 'da', 'fo', 'is', 'no', 'nb', 'nn', 'sv'],
'SCANDINAVIA': ['da', 'fo', 'is', 'no', 'nb', 'nn', 'sv'],
'SAMI': ['se', 'sma', 'smj', 'smn', 'sms'],
'NORWAY': ['nb_NO', 'nb', 'nn_NO', 'nn', 'nog', 'no_nb', 'no'],
'CELTIC': ['ga', 'cy', 'br', 'gd', 'kw', 'gv']
}
```
Example of translating english to many romance languages, using old-style 2 character language codes
Use the [AttentionMaskVisualizer](https://github.com/huggingface/transformers/blob/beb9b5b02246b9b7ee81ddf938f93f44cfeaad19/src/transformers/utils/attention_visualizer.py#L139) to better understand what tokens the model can and cannot attend to.
```python
>>> from transformers import MarianMTModel, MarianTokenizer
from transformers.utils.attention_visualizer import AttentionMaskVisualizer
>>> src_text = [
... ">>fr<< this is a sentence in english that we want to translate to french",
... ">>pt<< This should go to portuguese",
... ">>es<< And this to Spanish",
... ]
>>> model_name = "Helsinki-NLP/opus-mt-en-ROMANCE"
>>> tokenizer = MarianTokenizer.from_pretrained(model_name)
>>> model = MarianMTModel.from_pretrained(model_name)
>>> translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
>>> tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
["c'est une phrase en anglais que nous voulons traduire en français",
'Isto deve ir para o português.',
'Y esto al español']
visualizer = AttentionMaskVisualizer("Helsinki-NLP/opus-mt-en-de")
visualizer("Hello, how are you?")
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/marianmt-attn-mask.png"/>
</div>
## Resources
## Notes
- [Translation task guide](../tasks/translation)
- [Summarization task guide](../tasks/summarization)
- [Causal language modeling task guide](../tasks/language_modeling)
- MarianMT models are ~298MB on disk and there are more than 1000 models. Check this [list](https://huggingface.co/Helsinki-NLP) for supported language pairs. The language codes may be inconsistent. Two digit codes can be found [here](https://developers.google.com/admin-sdk/directory/v1/languages) while three digit codes may require further searching.
- Models that require BPE preprocessing are not supported.
- All model names use the following format: `Helsinki-NLP/opus-mt-{src}-{tgt}`. Language codes formatted like `es_AR` usually refer to the `code_{region}`. For example, `es_AR` refers to Spanish from Argentina.
- If a model can output multiple languages, prepend the desired output language to `src_txt` as shown below. New multilingual models from the [Tatoeba-Challenge](https://github.com/Helsinki-NLP/Tatoeba-Challenge) require 3 character language codes.
```python
from transformers import MarianMTModel, MarianTokenizer
# Model trained on multiple source languages → multiple target languages
# Example: multilingual to Arabic (arb)
model_name = "Helsinki-NLP/opus-mt-mul-mul" # Tatoeba Challenge model
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
# Prepend the desired output language code (3-letter ISO 639-3)
src_texts = ["arb>> Hello, how are you today?"]
# Tokenize and translate
inputs = tokenizer(src_texts, return_tensors="pt", padding=True, truncation=True)
translated = model.generate(**inputs)
# Decode and print result
translated_texts = tokenizer.batch_decode(translated, skip_special_tokens=True)
print(translated_texts[0])
```
- Older multilingual models use 2 character language codes.
```python
from transformers import MarianMTModel, MarianTokenizer
# Example: older multilingual model (like en → many)
model_name = "Helsinki-NLP/opus-mt-en-ROMANCE" # English → French, Spanish, Italian, etc.
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
# Prepend the 2-letter ISO 639-1 target language code (older format)
src_texts = [">>fr<< Hello, how are you today?"]
# Tokenize and translate
inputs = tokenizer(src_texts, return_tensors="pt", padding=True, truncation=True)
translated = model.generate(**inputs)
# Decode and print result
translated_texts = tokenizer.batch_decode(translated, skip_special_tokens=True)
print(translated_texts[0])
```
## MarianConfig

View File

@@ -22,6 +22,7 @@ rendered properly in your Markdown viewer.
">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
</div>
@@ -138,6 +139,10 @@ Use the [AttentionMaskVisualizer](https://github.com/huggingface/transformers/bl
[[autodoc]] MistralConfig
## MistralCommonTokenizer
[[autodoc]] MistralCommonTokenizer
## MistralModel
[[autodoc]] MistralModel

View File

@@ -227,6 +227,10 @@ This example also how to use `BitsAndBytes` to load the model in 4bit quantizati
[[autodoc]] Mistral3Config
## MistralCommonTokenizer
[[autodoc]] MistralCommonTokenizer
## Mistral3Model
[[autodoc]] Mistral3Model

View File

@@ -20,6 +20,7 @@ rendered properly in your Markdown viewer.
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
## Overview
@@ -196,6 +197,10 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] MixtralConfig
## MistralCommonTokenizer
[[autodoc]] MistralCommonTokenizer
## MixtralModel
[[autodoc]] MixtralModel

View File

@@ -114,6 +114,7 @@ print(f"The predicted class label is: {predicted_class_label}")
[[autodoc]] MobileNetV2ImageProcessor
- preprocess
- post_process_semantic_segmentation
## MobileNetV2ImageProcessorFast

View File

@@ -95,6 +95,12 @@ If you're interested in submitting a resource to be included here, please feel f
- preprocess
- post_process_semantic_segmentation
## MobileViTImageProcessorFast
[[autodoc]] MobileViTImageProcessorFast
- preprocess
- post_process_semantic_segmentation
<frameworkcontent>
<pt>

View File

@@ -0,0 +1,155 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
# ModernBERT Decoder
ModernBERT Decoder is the same architecture as [ModernBERT](https://huggingface.co/papers/2412.13663) but trained from scratch with a causal language modeling (CLM) objective. This allows for using the same architecture for comparing encoders and decoders. This is the decoder architecture implementation of ModernBERT, designed for autoregressive text generation tasks.
Like the encoder version, ModernBERT Decoder incorporates modern architectural improvements such as rotary positional embeddings to support sequences of up to 8192 tokens, unpadding to avoid wasting compute on padding tokens, GeGLU layers, and alternating attention patterns. However, it uses causal (unidirectional) attention to enable autoregressive generation.
> [!TIP]
> Click on the ModernBERT Decoder models in the right sidebar for more examples of how to apply ModernBERT Decoder to different text generation tasks.
The example below demonstrates how to use ModernBERT Decoder for text generation with [`Pipeline`], [`AutoModel`], and from the command line.
<hfoptions id="usage">
<hfoption id="Pipeline">
```py
import torch
from transformers import pipeline
generator = pipeline(
task="text-generation",
model="blab-jhu/test-32m-dec",
torch_dtype=torch.float16,
device=0
)
generator("The future of artificial intelligence is", max_length=50, num_return_sequences=1)
# For sequence classification
classifier = pipeline(
task="text-classification",
model="blab-jhu/test-32m-dec",
torch_dtype=torch.float16,
device=0
)
classifier("This movie is really great!")
```
</hfoption>
<hfoption id="AutoModel">
```py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("blab-jhu/test-32m-dec")
model = AutoModelForCausalLM.from_pretrained(
"blab-jhu/test-32m-dec",
torch_dtype=torch.float16,
device_map="auto",
)
prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=50,
num_return_sequences=1,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Generated text: {generated_text}")
# For sequence classification
from transformers import AutoModelForSequenceClassification
classifier_model = AutoModelForSequenceClassification.from_pretrained(
"blab-jhu/test-32m-dec",
torch_dtype=torch.float16,
device_map="auto",
num_labels=2
)
text = "This movie is really great!"
inputs = tokenizer(text, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = classifier_model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1)
print(f"Predicted class: {predicted_class.item()}")
print(f"Prediction probabilities: {predictions}")
```
</hfoption>
<hfoption id="transformers CLI">
```bash
echo "The future of artificial intelligence is" | transformers run --task text-generation --model your-username/modernbert-decoder-base --device 0
```
</hfoption>
</hfoptions>
## ModernBertDecoderConfig
[[autodoc]] ModernBertDecoderConfig
<frameworkcontent>
<pt>
## ModernBertDecoderModel
[[autodoc]] ModernBertDecoderModel
- forward
## ModernBertDecoderForCausalLM
[[autodoc]] ModernBertDecoderForCausalLM
- forward
## ModernBertDecoderForSequenceClassification
[[autodoc]] ModernBertDecoderForSequenceClassification
- forward
### Usage tips
The ModernBertDecoder model can be fine-tuned for various text generation tasks using the HuggingFace Transformers library. It supports efficient inference with features like:
- **Causal attention**: Ensures autoregressive generation by masking future tokens
- **Sliding window attention**: Alternates between local and global attention patterns for efficiency
- **Rotary positional embeddings**: Enables handling of longer sequences up to 8000 tokens
- **FlashAttention support**: Optimized attention computation for faster training and inference
</pt>
</frameworkcontent>

View File

@@ -107,6 +107,11 @@ The model is identical to [Donut](donut) in terms of architecture.
[[autodoc]] NougatImageProcessor
- preprocess
## NougatImageProcessorFast
[[autodoc]] NougatImageProcessorFast
- preprocess
## NougatTokenizerFast
[[autodoc]] NougatTokenizerFast

View File

@@ -20,6 +20,7 @@ rendered properly in your Markdown viewer.
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
## Overview

View File

@@ -14,35 +14,115 @@ rendered properly in your Markdown viewer.
-->
# PEGASUS-X
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
</div>
</div>
## Overview
# PEGASUS-X
The PEGASUS-X model was proposed in [Investigating Efficiently Extending Transformers for Long Input Summarization](https://huggingface.co/papers/2208.04347) by Jason Phang, Yao Zhao and Peter J. Liu.
[PEGASUS-X](https://huggingface.co/papers/2208.04347) is an encoder-decoder (sequence-to-sequence) transformer model for long-input summarization. It extends the [Pegasus](./pegasus) model with staggered block-local attention, global encoder tokens, and additional pretraining on long text sequences, enabling it to handle inputs of up to 16,000 tokens. PEGASUS-X matches the performance of much larger models while using fewer parameters.
PEGASUS-X (PEGASUS eXtended) extends the PEGASUS models for long input summarization through additional long input pretraining and using staggered block-local attention with global tokens in the encoder.
You can find all the original PEGASUS-X checkpoints under the [Google](https://huggingface.co/google/models?search=pegasus-x) organization.
The abstract from the paper is the following:
> [!TIP]
> This model was contributed by [zphang](https://huggingface.co/zphang).
>
> Click on the PEGASUS-X models in the right sidebar for more examples of how to apply PEGASUS-X to different language tasks.
*While large pretrained Transformer models have proven highly capable at tackling natural language tasks, handling long sequence inputs continues to be a significant challenge. One such task is long input summarization, where inputs are longer than the maximum input context of most pretrained models. Through an extensive set of experiments, we investigate what model architectural changes and pretraining paradigms can most efficiently adapt a pretrained Transformer for long input summarization. We find that a staggered, block-local Transformer with global encoder tokens strikes a good balance of performance and efficiency, and that an additional pretraining phase on long sequences meaningfully improves downstream summarization performance. Based on our findings, we introduce PEGASUS-X, an extension of the PEGASUS model with additional long input pretraining to handle inputs of up to 16K tokens. PEGASUS-X achieves strong performance on long input summarization tasks comparable with much larger models while adding few additional parameters and not requiring model parallelism to train.*
The example below demonstrates how to summarize text with [`Pipeline`], [`AutoModel`], and from the command line.
This model was contributed by [zphang](https://huggingface.co/zphang). The original code can be found [here](https://github.com/google-research/pegasus).
<hfoptions id="usage">
<hfoption id="Pipeline">
## Documentation resources
```py
import torch
from transformers import pipeline
- [Translation task guide](../tasks/translation)
- [Summarization task guide](../tasks/summarization)
pipeline = pipeline(
task="summarization",
model="google/pegasus-x-large",
torch_dtype=torch.bfloat16,
device=0
)
pipeline("""Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet.
Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems.
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle.""")
```
</hfoption>
<hfoption id="AutoModel">
<Tip>
```py
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
PEGASUS-X uses the same tokenizer as [PEGASUS](pegasus).
tokenizer = AutoTokenizer.from_pretrained(
"google/pegasus-x-large"
)
model = AutoModelForSeq2SeqLM.from_pretrained(
"google/pegasus-x-large",
torch_dtype=torch.bfloat16,
device_map="auto",
)
</Tip>
input_text = """Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet.
Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems.
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle."""
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
output = model.generate(**input_ids, cache_implementation="static")
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
</hfoption>
<hfoption id="transformers-cli">
```bash
echo -e "Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet. Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts." | transformers-cli run --task summarization --model google/pegasus-x-large --device 0
```
</hfoption>
</hfoptions>
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
The example below uses [bitsandbytes](../quantization/bitsandbytes) to only quantize the weights to int4.
```py
import torch
from transformers import BitsAndBytesConfig, AutoModelForSeq2SeqLM, AutoTokenizer
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4"
)
model = AutoModelForSeq2SeqLM.from_pretrained(
"google/pegasus-x-large",
torch_dtype=torch.bfloat16,
device_map="auto",
quantization_config=quantization_config
)
tokenizer = AutoTokenizer.from_pretrained(
"google/pegasus-x-large"
)
input_text = """Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet.
Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems.
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle."""
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
output = model.generate(**input_ids, cache_implementation="static")
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
## Notes
- PEGASUS-X also uses the [`PegasusTokenizer`].
## PegasusXConfig

View File

@@ -0,0 +1,68 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# PerceptionLM
## Overview
The PerceptionLM model was proposed in [PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding](https://ai.meta.com/research/publications/perceptionlm-open-access-data-and-models-for-detailed-visual-understanding/) by Jang Hyun Cho et al. It's a fully open, reproducible model for transparent research in image and video understanding. PLM consists of
a vision encoder with a small scale (<8B parameters) LLM decoder.
The abstract from the paper is the following:
*Vision-language models are integral to computer vision research, yet many high-performing models
remain closed-source, obscuring their data, design and training recipe. The research community
has responded by using distillation from black-box models to label training data, achieving strong
benchmark results, at the cost of measurable scientific progress. However, without knowing the details
of the teacher model and its data sources, scientific progress remains difficult to measure. In this
paper, we study building a Perception Language Model (PLM) in a fully open and reproducible
framework for transparent research in image and video understanding. We analyze standard training
pipelines without distillation from proprietary models and explore large-scale synthetic data to identify
critical data gaps, particularly in detailed video understanding. To bridge these gaps, we release 2.8M
human-labeled instances of fine-grained video question-answer pairs and spatio-temporally grounded
video captions. Additionally, we introduce PLMVideoBench, a suite for evaluating challenging video
understanding tasks focusing on the ability to reason about “what”, “where”, “when”, and “how” of a
video. We make our work fully reproducible by providing data, training recipes, code & models.*
This model was contributed by [shumingh](https://huggingface.co/shumingh).
The original code can be found [here](https://github.com/facebookresearch/perception_models).
## PerceptionLMConfig
[[autodoc]] PerceptionLMConfig
## PerceptionLMProcessor
[[autodoc]] PerceptionLMProcessor
## PerceptionLMImageProcessorFast
[[autodoc]] PerceptionLMImageProcessorFast
## PerceptionLMVideoProcessor
[[autodoc]] PerceptionLMVideoProcessor
## PerceptionLMModel
[[autodoc]] PerceptionLMModel
## PerceptionLMForConditionalGeneration
[[autodoc]] PerceptionLMForConditionalGeneration
- forward

View File

@@ -18,6 +18,7 @@ rendered properly in your Markdown viewer.
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
</div>

View File

@@ -20,6 +20,7 @@ rendered properly in your Markdown viewer.
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
## Overview

View File

@@ -9,44 +9,53 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
# Phi4 Multimodal
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-EE4C2C?logo=pytorch&logoColor=white&style=flat">
</div>
</div>
## Overview
## Phi4 Multimodal
Phi4 Multimodal is a lightweight open multimodal foundation model that leverages the language, vision, and speech research and datasets used for Phi-3.5 and 4.0 models. The model processes text, image, and audio inputs, generating text outputs, and comes with 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning, direct preference optimization and RLHF (Reinforcement Learning from Human Feedback) to support precise instruction adherence and safety measures. The languages that each modal supports are the following:
[Phi4 Multimodal](https://huggingface.co/papers/2503.01743) is a multimodal model capable of text, image, and speech and audio inputs or any combination of these. It features a mixture of LoRA adapters for handling different inputs, and each input is routed to the appropriate encoder.
- Text: Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian
- Vision: English
- Audio: English, Chinese, German, French, Italian, Japanese, Spanish, Portuguese
You can find all the original Phi4 Multimodal checkpoints under the [Phi4](https://huggingface.co/collections/microsoft/phi-4-677e9380e514feb5577a40e4) collection.
This model was contributed by [Cyril Vallez](https://huggingface.co/cyrilvallez). The most recent code can be
found [here](https://github.com/huggingface/transformers/blob/main/src/transformers/models/phi4_multimodal/modeling_phi4_multimodal.py).
> [!TIP]
> This model was contributed by [cyrilvallez](https://huggingface.co/cyrilvallez).
>
> Click on the Phi-4 Multimodal in the right sidebar for more examples of how to apply Phi-4 Multimodal to different tasks.
The example below demonstrates how to generate text based on an image with [`Pipeline`] or the [`AutoModel`] class.
## Usage tips
<hfoptions id="usage">
<hfoption id="Pipeline">
`Phi4-multimodal-instruct` can be found on the [Huggingface Hub](https://huggingface.co/microsoft/Phi-4-multimodal-instruct)
```python
from transformers import pipeline
generator = pipeline("text-generation", model="microsoft/Phi-4-multimodal-instruct", torch_dtype="auto", device=0)
In the following, we demonstrate how to use it for inference depending on the input modalities (text, image, audio).
prompt = "Explain the concept of multimodal AI in simple terms."
result = generator(prompt, max_length=50)
print(result[0]['generated_text'])
```
</hfoption>
<hfoption id="AutoModel">
```python
import torch
from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig
# Define model path
model_path = "microsoft/Phi-4-multimodal-instruct"
device = "cuda:0"
# Load model and processor
processor = AutoProcessor.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device, torch_dtype=torch.float16)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device, torch_dtype=torch.float16)
# Optional: load the adapters (note that without them, the base model will very likely not work well)
model.load_adapter(model_path, adapter_name="speech", device_map=device, adapter_kwargs={"subfolder": 'speech-lora'})
model.load_adapter(model_path, adapter_name="vision", device_map=device, adapter_kwargs={"subfolder": 'vision-lora'})
# Part : Image Processing
messages = [
{
"role": "user",
@@ -57,7 +66,7 @@ messages = [
},
]
model.set_adapter("vision") # if loaded, activate the vision adapter
model.set_adapter("vision")
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
@@ -66,7 +75,6 @@ inputs = processor.apply_chat_template(
return_tensors="pt",
).to(device)
# Generate response
generate_ids = model.generate(
**inputs,
max_new_tokens=1000,
@@ -77,10 +85,27 @@ response = processor.batch_decode(
generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]
print(f'>>> Response\n{response}')
```
</hfoption>
</hfoptions>
# Part 2: Audio Processing
model.set_adapter("speech") # if loaded, activate the speech adapter
## Notes
The example below demonstrates inference with an audio and text input.
```py
import torch
from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig
model_path = "microsoft/Phi-4-multimodal-instruct"
device = "cuda:0"
processor = AutoProcessor.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device, torch_dtype=torch.float16)
model.load_adapter(model_path, adapter_name="speech", device_map=device, adapter_kwargs={"subfolder": 'speech-lora'})
model.set_adapter("speech")
audio_url = "https://upload.wikimedia.org/wikipedia/commons/b/b0/Barbara_Sahakian_BBC_Radio4_The_Life_Scientific_29_May_2012_b01j5j24.flac"
messages = [
{
@@ -110,6 +135,7 @@ response = processor.batch_decode(
generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]
print(f'>>> Response\n{response}')
```
## Phi4MultimodalFeatureExtractor

View File

@@ -86,6 +86,10 @@ output = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up
[[autodoc]] PixtralVisionConfig
## MistralCommonTokenizer
[[autodoc]] MistralCommonTokenizer
## PixtralVisionModel
[[autodoc]] PixtralVisionModel

View File

@@ -19,6 +19,7 @@ rendered properly in your Markdown viewer.
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
</div>

View File

@@ -18,6 +18,7 @@ rendered properly in your Markdown viewer.
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
# Qwen2MoE

View File

@@ -19,6 +19,7 @@ rendered properly in your Markdown viewer.
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
## Overview

View File

@@ -25,7 +25,7 @@ rendered properly in your Markdown viewer.
SAM (Segment Anything Model) was proposed in [Segment Anything](https://huggingface.co/papers/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.
The model can be used to predict segmentation masks of any object of interest given an input image.
The model can be used to predict segmentation masks of any object of interest given an input image.
![example image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/sam-output.png)
@@ -37,9 +37,9 @@ Tips:
- The model predicts binary masks that states the presence or not of the object of interest given an image.
- The model predicts much better results if input 2D points and/or input bounding boxes are provided
- You can prompt multiple points for the same image, and predict a single mask.
- You can prompt multiple points for the same image, and predict a single mask.
- Fine-tuning the model is not supported yet
- According to the paper, textual input should be also supported. However, at this time of writing this seems not to be supported according to [the official repository](https://github.com/facebookresearch/segment-anything/issues/4#issuecomment-1497626844).
- According to the paper, textual input should be also supported. However, at this time of writing this seems not to be supported according to [the official repository](https://github.com/facebookresearch/segment-anything/issues/4#issuecomment-1497626844).
This model was contributed by [ybelkada](https://huggingface.co/ybelkada) and [ArthurZ](https://huggingface.co/ArthurZ).
@@ -149,6 +149,11 @@ alt="drawing" width="900"/>
[[autodoc]] SamImageProcessor
## SamImageProcessorFast
[[autodoc]] SamImageProcessorFast
## SamVisionModel
[[autodoc]] SamVisionModel

View File

@@ -61,19 +61,16 @@ predicted token ids.
- Step-by-step Speech Translation
```python
>>> import torch
>>> from transformers import Speech2Text2Processor, SpeechEncoderDecoderModel
>>> from datasets import load_dataset
>>> import soundfile as sf
>>> model = SpeechEncoderDecoderModel.from_pretrained("facebook/s2t-wav2vec2-large-en-de")
>>> processor = Speech2Text2Processor.from_pretrained("facebook/s2t-wav2vec2-large-en-de")
>>> def map_to_array(batch):
... speech, _ = sf.read(batch["file"])
... batch["speech"] = speech
... return batch
>>> def map_to_array(example):
... example["speech"] = example["audio"]["array"]
... return example
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")

View File

@@ -20,6 +20,7 @@ rendered properly in your Markdown viewer.
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
</div>
## Overview

View File

@@ -10,40 +10,31 @@ specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white" >
</div>
</div>
# SuperGlue
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
[SuperGlue](https://huggingface.co/papers/1911.11763) is a neural network that matches two sets of local features by jointly finding correspondences and rejecting non-matchable points. Assignments are estimated by solving a differentiable optimal transport problem, whose costs are predicted by a graph neural network. SuperGlue introduces a flexible context aggregation mechanism based on attention, enabling it to reason about the underlying 3D scene and feature assignments jointly. Paired with the [SuperPoint model](https://huggingface.co/magic-leap-community/superpoint), it can be used to match two images and estimate the pose between them. This model is useful for tasks such as image matching, homography estimation, etc.
## Overview
You can find all the original SuperGlue checkpoints under the [Magic Leap Community](https://huggingface.co/magic-leap-community) organization.
The SuperGlue model was proposed in [SuperGlue: Learning Feature Matching with Graph Neural Networks](https://huggingface.co/papers/1911.11763) by Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz and Andrew Rabinovich.
> [!TIP]
> This model was contributed by [stevenbucaille](https://huggingface.co/stevenbucaille).
>
> Click on the SuperGlue models in the right sidebar for more examples of how to apply SuperGlue to different computer vision tasks.
This model consists of matching two sets of interest points detected in an image. Paired with the
[SuperPoint model](https://huggingface.co/magic-leap-community/superpoint), it can be used to match two images and
estimate the pose between them. This model is useful for tasks such as image matching, homography estimation, etc.
The example below demonstrates how to match keypoints between two images with the [`AutoModel`] class.
The abstract from the paper is the following:
<hfoptions id="usage">
<hfoption id="AutoModel">
*This paper introduces SuperGlue, a neural network that matches two sets of local features by jointly finding correspondences
and rejecting non-matchable points. Assignments are estimated by solving a differentiable optimal transport problem, whose costs
are predicted by a graph neural network. We introduce a flexible context aggregation mechanism based on attention, enabling
SuperGlue to reason about the underlying 3D scene and feature assignments jointly. Compared to traditional, hand-designed heuristics,
our technique learns priors over geometric transformations and regularities of the 3D world through end-to-end training from image
pairs. SuperGlue outperforms other learned approaches and achieves state-of-the-art results on the task of pose estimation in
challenging real-world indoor and outdoor environments. The proposed method performs matching in real-time on a modern GPU and
can be readily integrated into modern SfM or SLAM systems. The code and trained weights are publicly available at this [URL](https://github.com/magicleap/SuperGluePretrainedNetwork).*
## How to use
Here is a quick example of using the model. Since this model is an image matching model, it requires pairs of images to be matched.
The raw outputs contain the list of keypoints detected by the keypoint detector as well as the list of matches with their corresponding
matching scores.
```python
```py
from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
@@ -52,7 +43,7 @@ import requests
url_image1 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_98169888_3347710852.jpg"
image1 = Image.open(requests.get(url_image1, stream=True).raw)
url_image2 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_26757027_6717084061.jpg"
image_2 = Image.open(requests.get(url_image2, stream=True).raw)
image2 = Image.open(requests.get(url_image2, stream=True).raw)
images = [image1, image2]
@@ -62,67 +53,97 @@ model = AutoModel.from_pretrained("magic-leap-community/superglue_outdoor")
inputs = processor(images, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
```
You can use the `post_process_keypoint_matching` method from the `SuperGlueImageProcessor` to get the keypoints and matches in a more readable format:
```python
# Post-process to get keypoints and matches
image_sizes = [[(image.height, image.width) for image in images]]
outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
for i, output in enumerate(outputs):
print("For the image pair", i)
for keypoint0, keypoint1, matching_score in zip(
output["keypoints0"], output["keypoints1"], output["matching_scores"]
):
print(
f"Keypoint at coordinate {keypoint0.numpy()} in the first image matches with keypoint at coordinate {keypoint1.numpy()} in the second image with a score of {matching_score}."
processed_outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
```
</hfoption>
</hfoptions>
## Notes
- SuperGlue performs feature matching between two images simultaneously, requiring pairs of images as input.
```python
from transformers import AutoImageProcessor, AutoModel
import torch
from PIL import Image
import requests
processor = AutoImageProcessor.from_pretrained("magic-leap-community/superglue_outdoor")
model = AutoModel.from_pretrained("magic-leap-community/superglue_outdoor")
# SuperGlue requires pairs of images
images = [image1, image2]
inputs = processor(images, return_tensors="pt")
outputs = model(**inputs)
# Extract matching information
keypoints0 = outputs.keypoints0 # Keypoints in first image
keypoints1 = outputs.keypoints1 # Keypoints in second image
matches = outputs.matches # Matching indices
matching_scores = outputs.matching_scores # Confidence scores
```
- The model outputs matching indices, keypoints, and confidence scores for each match.
- For better visualization and analysis, use the [`SuperGlueImageProcessor.post_process_keypoint_matching`] method to get matches in a more readable format.
```py
# Process outputs for visualization
image_sizes = [[(image.height, image.width) for image in images]]
processed_outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
for i, output in enumerate(processed_outputs):
print(f"For the image pair {i}")
for keypoint0, keypoint1, matching_score in zip(
output["keypoints0"], output["keypoints1"], output["matching_scores"]
):
print(f"Keypoint at {keypoint0.numpy()} matches with keypoint at {keypoint1.numpy()} with score {matching_score}")
```
- The example below demonstrates how to visualize matches between two images.
```py
import matplotlib.pyplot as plt
import numpy as np
# Create side by side image
merged_image = np.zeros((max(image1.height, image2.height), image1.width + image2.width, 3))
merged_image[: image1.height, : image1.width] = np.array(image1) / 255.0
merged_image[: image2.height, image1.width :] = np.array(image2) / 255.0
plt.imshow(merged_image)
plt.axis("off")
# Retrieve the keypoints and matches
output = processed_outputs[0]
keypoints0 = output["keypoints0"]
keypoints1 = output["keypoints1"]
matching_scores = output["matching_scores"]
# Plot the matches
for keypoint0, keypoint1, matching_score in zip(keypoints0, keypoints1, matching_scores):
plt.plot(
[keypoint0[0], keypoint1[0] + image1.width],
[keypoint0[1], keypoint1[1]],
color=plt.get_cmap("RdYlGn")(matching_score.item()),
alpha=0.9,
linewidth=0.5,
)
plt.scatter(keypoint0[0], keypoint0[1], c="black", s=2)
plt.scatter(keypoint1[0] + image1.width, keypoint1[1], c="black", s=2)
```
plt.savefig("matched_image.png", dpi=300, bbox_inches='tight')
```
From the outputs, you can visualize the matches between the two images using the following code:
```python
import matplotlib.pyplot as plt
import numpy as np
<div class="flex justify-center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/632885ba1558dac67c440aa8/01ZYaLB1NL5XdA8u7yCo4.png">
</div>
# Create side by side image
merged_image = np.zeros((max(image1.height, image2.height), image1.width + image2.width, 3))
merged_image[: image1.height, : image1.width] = np.array(image1) / 255.0
merged_image[: image2.height, image1.width :] = np.array(image2) / 255.0
plt.imshow(merged_image)
plt.axis("off")
## Resources
# Retrieve the keypoints and matches
output = outputs[0]
keypoints0 = output["keypoints0"]
keypoints1 = output["keypoints1"]
matching_scores = output["matching_scores"]
keypoints0_x, keypoints0_y = keypoints0[:, 0].numpy(), keypoints0[:, 1].numpy()
keypoints1_x, keypoints1_y = keypoints1[:, 0].numpy(), keypoints1[:, 1].numpy()
# Plot the matches
for keypoint0_x, keypoint0_y, keypoint1_x, keypoint1_y, matching_score in zip(
keypoints0_x, keypoints0_y, keypoints1_x, keypoints1_y, matching_scores
):
plt.plot(
[keypoint0_x, keypoint1_x + image1.width],
[keypoint0_y, keypoint1_y],
color=plt.get_cmap("RdYlGn")(matching_score.item()),
alpha=0.9,
linewidth=0.5,
)
plt.scatter(keypoint0_x, keypoint0_y, c="black", s=2)
plt.scatter(keypoint1_x + image1.width, keypoint1_y, c="black", s=2)
# Save the plot
plt.savefig("matched_image.png", dpi=300, bbox_inches='tight')
plt.close()
```
![image/png](https://cdn-uploads.huggingface.co/production/uploads/632885ba1558dac67c440aa8/01ZYaLB1NL5XdA8u7yCo4.png)
This model was contributed by [stevenbucaille](https://huggingface.co/stevenbucaille).
The original code can be found [here](https://github.com/magicleap/SuperGluePretrainedNetwork).
- Refer to the [original SuperGlue repository](https://github.com/magicleap/SuperGluePretrainedNetwork) for more examples and implementation details.
## SuperGlueConfig
@@ -133,10 +154,15 @@ The original code can be found [here](https://github.com/magicleap/SuperGluePret
[[autodoc]] SuperGlueImageProcessor
- preprocess
- post_process_keypoint_matching
<frameworkcontent>
<pt>
## SuperGlueForKeypointMatching
[[autodoc]] SuperGlueForKeypointMatching
- forward
- post_process_keypoint_matching
</pt>
</frameworkcontent>

View File

@@ -10,48 +10,35 @@ specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white" >
</div>
</div>
# SuperPoint
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
## Overview
The SuperPoint model was proposed
in [SuperPoint: Self-Supervised Interest Point Detection and Description](https://huggingface.co/papers/1712.07629) by Daniel
DeTone, Tomasz Malisiewicz and Andrew Rabinovich.
This model is the result of a self-supervised training of a fully-convolutional network for interest point detection and
description. The model is able to detect interest points that are repeatable under homographic transformations and
provide a descriptor for each point. The use of the model in its own is limited, but it can be used as a feature
extractor for other tasks such as homography estimation, image matching, etc.
The abstract from the paper is the following:
*This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a
large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our
fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and
associated descriptors in one forward pass. We introduce Homographic Adaptation, a multi-scale, multi-homography
approach for boosting interest point detection repeatability and performing cross-domain adaptation (e.g.,
synthetic-to-real). Our model, when trained on the MS-COCO generic image dataset using Homographic Adaptation, is able
to repeatedly detect a much richer set of interest points than the initial pre-adapted deep model and any other
traditional corner detector. The final system gives rise to state-of-the-art homography estimation results on HPatches
when compared to LIFT, SIFT and ORB.*
[SuperPoint](https://huggingface.co/papers/1712.07629) is the result of self-supervised training of a fully-convolutional network for interest point detection and description. The model is able to detect interest points that are repeatable under homographic transformations and provide a descriptor for each point. Usage on it's own is limited, but it can be used as a feature extractor for other tasks such as homography estimation and image matching.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/superpoint_architecture.png"
alt="drawing" width="500"/>
<small> SuperPoint overview. Taken from the <a href="https://huggingface.co/papers/1712.07629v4">original paper.</a> </small>
You can find all the original SuperPoint checkpoints under the [Magic Leap Community](https://huggingface.co/magic-leap-community) organization.
## Usage tips
> [!TIP]
> This model was contributed by [stevenbucaille](https://huggingface.co/stevenbucaille).
>
> Click on the SuperPoint models in the right sidebar for more examples of how to apply SuperPoint to different computer vision tasks.
Here is a quick example of using the model to detect interest points in an image:
```python
The example below demonstrates how to detect interest points in an image with the [`AutoModel`] class.
<hfoptions id="usage">
<hfoption id="AutoModel">
```py
from transformers import AutoImageProcessor, SuperPointForKeypointDetection
import torch
from PIL import Image
@@ -64,67 +51,76 @@ processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint"
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")
inputs = processor(image, return_tensors="pt")
outputs = model(**inputs)
with torch.no_grad():
outputs = model(**inputs)
# Post-process to get keypoints, scores, and descriptors
image_size = (image.height, image.width)
processed_outputs = processor.post_process_keypoint_detection(outputs, [image_size])
```
The outputs contain the list of keypoint coordinates with their respective score and description (a 256-long vector).
</hfoption>
</hfoptions>
You can also feed multiple images to the model. Due to the nature of SuperPoint, to output a dynamic number of keypoints,
you will need to use the mask attribute to retrieve the respective information :
## Notes
```python
from transformers import AutoImageProcessor, SuperPointForKeypointDetection
import torch
from PIL import Image
import requests
- SuperPoint outputs a dynamic number of keypoints per image, which makes it suitable for tasks requiring variable-length feature representations.
url_image_1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
image_1 = Image.open(requests.get(url_image_1, stream=True).raw)
url_image_2 = "http://images.cocodataset.org/test-stuff2017/000000000568.jpg"
image_2 = Image.open(requests.get(url_image_2, stream=True).raw)
```py
from transformers import AutoImageProcessor, SuperPointForKeypointDetection
import torch
from PIL import Image
import requests
processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")
url_image_1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
image_1 = Image.open(requests.get(url_image_1, stream=True).raw)
url_image_2 = "http://images.cocodataset.org/test-stuff2017/000000000568.jpg"
image_2 = Image.open(requests.get(url_image_2, stream=True).raw)
images = [image_1, image_2]
inputs = processor(images, return_tensors="pt")
# Example of handling dynamic keypoint output
outputs = model(**inputs)
keypoints = outputs.keypoints # Shape varies per image
scores = outputs.scores # Confidence scores for each keypoint
descriptors = outputs.descriptors # 256-dimensional descriptors
mask = outputs.mask # Value of 1 corresponds to a keypoint detection
```
images = [image_1, image_2]
- The model provides both keypoint coordinates and their corresponding descriptors (256-dimensional vectors) in a single forward pass.
- For batch processing with multiple images, you need to use the mask attribute to retrieve the respective information for each image. You can use the `post_process_keypoint_detection` from the `SuperPointImageProcessor` to retrieve the each image information.
processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")
```py
# Batch processing example
images = [image1, image2, image3]
inputs = processor(images, return_tensors="pt")
outputs = model(**inputs)
image_sizes = [(img.height, img.width) for img in images]
processed_outputs = processor.post_process_keypoint_detection(outputs, image_sizes)
```
inputs = processor(images, return_tensors="pt")
outputs = model(**inputs)
image_sizes = [(image.height, image.width) for image in images]
outputs = processor.post_process_keypoint_detection(outputs, image_sizes)
- You can then print the keypoints on the image of your choice to visualize the result:
```py
import matplotlib.pyplot as plt
plt.axis("off")
plt.imshow(image_1)
plt.scatter(
outputs[0]["keypoints"][:, 0],
outputs[0]["keypoints"][:, 1],
c=outputs[0]["scores"] * 100,
s=outputs[0]["scores"] * 50,
alpha=0.8
)
plt.savefig(f"output_image.png")
```
for output in outputs:
for keypoints, scores, descriptors in zip(output["keypoints"], output["scores"], output["descriptors"]):
print(f"Keypoints: {keypoints}")
print(f"Scores: {scores}")
print(f"Descriptors: {descriptors}")
```
You can then print the keypoints on the image of your choice to visualize the result:
```python
import matplotlib.pyplot as plt
plt.axis("off")
plt.imshow(image_1)
plt.scatter(
outputs[0]["keypoints"][:, 0],
outputs[0]["keypoints"][:, 1],
c=outputs[0]["scores"] * 100,
s=outputs[0]["scores"] * 50,
alpha=0.8
)
plt.savefig(f"output_image.png")
```
![image/png](https://cdn-uploads.huggingface.co/production/uploads/632885ba1558dac67c440aa8/ZtFmphEhx8tcbEQqOolyE.png)
This model was contributed by [stevenbucaille](https://huggingface.co/stevenbucaille).
The original code can be found [here](https://github.com/magicleap/SuperPointPretrainedNetwork).
<div class="flex justify-center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/632885ba1558dac67c440aa8/ZtFmphEhx8tcbEQqOolyE.png">
</div>
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with SuperPoint. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
- A notebook showcasing inference and visualization with SuperPoint can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SuperPoint/Inference_with_SuperPoint_to_detect_interest_points_in_an_image.ipynb). 🌎
- Refer to this [noteboook](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SuperPoint/Inference_with_SuperPoint_to_detect_interest_points_in_an_image.ipynb) for an inference and visualization example.
## SuperPointConfig
@@ -137,8 +133,12 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
- preprocess
- post_process_keypoint_detection
<frameworkcontent>
<pt>
## SuperPointForKeypointDetection
[[autodoc]] SuperPointForKeypointDetection
- forward
</pt>

View File

@@ -14,35 +14,90 @@ rendered properly in your Markdown viewer.
-->
# SwitchTransformers
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
## Overview
# Switch Transformers
The SwitchTransformers model was proposed in [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://huggingface.co/papers/2101.03961) by William Fedus, Barret Zoph, Noam Shazeer.
[Switch Transformers](https://huggingface.co/papers/2101.03961) is a sparse T5 model where the MLP layer is replaced by a Mixture-of-Experts (MoE). A routing mechanism associates each token with an expert and each expert is a dense MLP. Sparsity enables better scaling and the routing mechanism allows the model to select relevant weights on the fly which increases model capacity.
The Switch Transformer model uses a sparse T5 encoder-decoder architecture, where the MLP are replaced by a Mixture of Experts (MoE). A routing mechanism (top 1 in this case) associates each token to one of the expert, where each expert is a dense MLP. While switch transformers have a lot more weights than their equivalent dense models, the sparsity allows better scaling and better finetuning performance at scale.
During a forward pass, only a fraction of the weights are used. The routing mechanism allows the model to select relevant weights on the fly which increases the model capacity without increasing the number of operations.
You can find all the original Switch Transformers checkpoints under the [Switch Transformer](https://huggingface.co/collections/google/switch-transformers-release-6548c35c6507968374b56d1f) collection.
The abstract from the paper is the following:
*In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different parameters for each incoming example. The result is a sparsely-activated model -- with outrageous numbers of parameters -- but a constant computational cost. However, despite several notable successes of MoE, widespread adoption has been hindered by complexity, communication costs and training instability -- we address these with the Switch Transformer. We simplify the MoE routing algorithm and design intuitive improved models with reduced communication and computational costs. Our proposed training techniques help wrangle the instabilities and we show large sparse models may be trained, for the first time, with lower precision (bfloat16) formats. We design models based off T5-Base and T5-Large to obtain up to 7x increases in pre-training speed with the same computational resources. These improvements extend into multilingual settings where we measure gains over the mT5-Base version across all 101 languages. Finally, we advance the current scale of language models by pre-training up to trillion parameter models on the "Colossal Clean Crawled Corpus" and achieve a 4x speedup over the T5-XXL model.*
> [!TIP]
> This model was contributed by [ybelkada](https://huggingface.co/ybelkada) and [ArthurZ](https://huggingface.co/ArthurZ).
>
> Click on the Switch Transformers models in the right sidebar for more examples of how to apply Switch Transformers to different natural language tasks.
This model was contributed by [Younes Belkada](https://huggingface.co/ybelkada) and [Arthur Zucker](https://huggingface.co/ArthurZ).
The original code can be found [here](https://github.com/google/flaxformer/tree/main/flaxformer/architectures/moe).
The example below demonstrates how to predict the masked token with [`Pipeline`], [`AutoModel`], and from the command line.
## Usage tips
<hfoptions id="usage">
<hfoption id="Pipeline">
- SwitchTransformers uses the [`T5Tokenizer`], which can be loaded directly from each model's repository.
- The released weights are pretrained on English [Masked Language Modeling](https://moon-ci-docs.huggingface.co/docs/transformers/pr_19323/en/glossary#general-terms) task, and should be finetuned.
```python
import torch
from transformers import pipeline
## Resources
pipeline = pipeline(
task="text2text-generation",
model="google/switch-base-8",
torch_dtype=torch.float16,
device=0
)
print(pipeline("The capital of France is <extra_id_0>."))
```
</hfoption>
<hfoption id="AutoModel">
```python
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/switch-base-8")
model = AutoModelForSeq2SeqLM.from_pretrained("google/switch-base-8", device_map="auto", torch_dtype=torch.float16)
input_text = "The capital of France is <extra_id_0>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```
</hfoption>
<hfoption id="transformers CLI">
```bash
echo -e "The capital of France is <extra_id_0>." | transformers run --task text2text-generation --model google/switch-base-8 --device 0
# [{'generated_text': 'Paris.'}]
```
</hfoption>
</hfoptions>
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
The example below uses [bitsandbytes](../quantization/bitsandbytes/) to only quantize the weights to 8-bits.
```py
# pip install bitsandbytes
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, BitsAndBytesConfig
tokenizer = AutoTokenizer.from_pretrained("google/switch-base-8")
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForSeq2SeqLM.from_pretrained("google/switch-base-8", device_map="auto", quantization_config=quantization_config)
input_text = "The capital of France is <extra_id_0>."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
```
- [Translation task guide](../tasks/translation)
- [Summarization task guide](../tasks/summarization)
## SwitchTransformersConfig

View File

@@ -14,16 +14,25 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
# T5Gemma
T5Gemma (aka encoder-decoder Gemma) was proposed in a [research paper](https://arxiv.org/abs/2504.06225) by Google. It is a family of encoder-decoder large langauge models, developed by adapting pretrained decoder-only models into encoder-decoder. T5Gemma includes pretrained and instruction-tuned variants. The architecture is based on transformer encoder-decoder design following T5, with improvements from Gemma 2: GQA, RoPE, GeGLU activation, RMSNorm, and interleaved local/global attention.
T5Gemma (aka encoder-decoder Gemma) was proposed in a [research paper](https://arxiv.org/abs/2504.06225) by Google. It is a family of encoder-decoder large language models, developed by adapting pretrained decoder-only models into encoder-decoder. T5Gemma includes pretrained and instruction-tuned variants. The architecture is based on transformer encoder-decoder design following T5, with improvements from Gemma 2: GQA, RoPE, GeGLU activation, RMSNorm, and interleaved local/global attention.
T5Gemma has two groups of model sizes: 1) [Gemma 2](https://ai.google.dev/gemma/docs/core/model_card_2) sizes (2B-2B, 9B-2B, and 9B-9B), which are based on the offical Gemma 2 models (2B and 9B); and 2) [T5](https://arxiv.org/abs/1910.10683) sizes (Small, Base, Large, and XL), where are pretrained under the Gemma 2 framework following T5 configuration. In addition, we also provide a model at ML size (medium large, ~2B in total), which is in-between T5 Large and T5 XL.
The pretrained varaints are trained with two objectives: prefix language modeling with knowledge distillation (PrefixLM) and UL2, separately. We release both variants for each model size. The instruction-turned varaints was post-trained with supervised fine-tuning and reinforcement learning.
> [!TIP]
> Click on the T5Gemma models in the right sidebar for more examples of how to apply T5Gemma to different language tasks.
The example below demonstrates how to chat with the model with [`Pipeline`] or the [`AutoModel`] class, and from the command line.
<hfoptions id="usage">
@@ -35,43 +44,52 @@ import torch
from transformers import pipeline
pipe = pipeline(
task="text2text-generation",
model="google/t5gemma-placeholder",
"text2text-generation",
model="google/t5gemma-2b-2b-prefixlm-it",
torch_dtype=torch.bfloat16,
device="cuda",
device="cuda", # replace with "mps" to run on a Mac device
)
pipe("Question: Why is the sky blue?\nAnswer:", max_new_tokens=50)
messages = [
{"role": "user", "content": "Tell me an unknown interesting biology fact about the brain."},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipe(prompt, max_new_tokens=32)
```
</hfoption>
<hfoption id="AutoModel">
```python
import torch
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
tokenizer = AutoTokenizer.from_pretrained("google/t5gemma-placeholder")
tokenizer = AutoTokenizer.from_pretrained("google/t5gemma-2b-2b-prefixlm-it")
model = AutoModelForSeq2SeqLM.from_pretrained(
"google/t5gemma-placeholder",
"google/t5gemma-2b-2b-prefixlm-it",
device_map="auto",
torch_dtype=torch.bfloat16,
device_map="auto"
)
input_text = "Question: Why is the sky blue?\nAnswer:"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
messages = [
{"role": "user", "content": "Tell me an unknown interesting biology fact about the brain."},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True, add_generation_prompt=True).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
print(tokenizer.decode(outputs[0]))
```
</hfoption>
<hfoption id="transformers CLI">
```
echo -e "Question: Why is the sky blue? Answer:" | transformers run --task text2text-generation --model google/t5gemma-placeholder --device 0
echo -e "Write me a poem about Machine Learning. Answer:" | transformers run --task text2text-generation --model google/t5gemma-2b-2b-prefixlm --device 0
```
</hfoption>
</hfoptions>
## T5GemmaConfig

View File

@@ -37,6 +37,7 @@ The original code can be found [here](https://github.com/google-research/timesfm
To use the model:
```python
import numpy as np
import torch
from transformers import TimesFmModelForPrediction

View File

@@ -10,52 +10,39 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
# ViTPose
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
## Overview
# ViTPose
The ViTPose model was proposed in [ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation](https://huggingface.co/papers/2204.12484) by Yufei Xu, Jing Zhang, Qiming Zhang, Dacheng Tao. ViTPose employs a standard, non-hierarchical [Vision Transformer](vit) as backbone for the task of keypoint estimation. A simple decoder head is added on top to predict the heatmaps from a given image. Despite its simplicity, the model gets state-of-the-art results on the challenging MS COCO Keypoint Detection benchmark. The model was further improved in [ViTPose++: Vision Transformer for Generic Body Pose Estimation](https://huggingface.co/papers/2212.04246) where the authors employ
a mixture-of-experts (MoE) module in the ViT backbone along with pre-training on more data, which further enhances the performance.
[ViTPose](https://huggingface.co/papers/2204.12484) is a vision transformer-based model for keypoint (pose) estimation. It uses a simple, non-hierarchical [ViT](./vit) backbone and a lightweight decoder head. This architecture simplifies model design, takes advantage of transformer scalability, and can be adapted to different training strategies.
The abstract from the paper is the following:
*Although no specific domain knowledge is considered in the design, plain vision transformers have shown excellent performance in visual recognition tasks. However, little effort has been made to reveal the potential of such simple structures for pose estimation tasks. In this paper, we show the surprisingly good capabilities of plain vision transformers for pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model called ViTPose. Specifically, ViTPose employs plain and non-hierarchical vision transformers as backbones to extract features for a given person instance and a lightweight decoder for pose estimation. It can be scaled up from 100M to 1B parameters by taking the advantages of the scalable model capacity and high parallelism of transformers, setting a new Pareto front between throughput and performance. Besides, ViTPose is very flexible regarding the attention type, input resolution, pre-training and finetuning strategy, as well as dealing with multiple pose tasks. We also empirically demonstrate that the knowledge of large ViTPose models can be easily transferred to small ones via a simple knowledge token. Experimental results show that our basic ViTPose model outperforms representative methods on the challenging MS COCO Keypoint Detection benchmark, while the largest model sets a new state-of-the-art.*
[ViTPose++](https://huggingface.co/papers/2212.04246) improves on ViTPose by incorporating a mixture-of-experts (MoE) module in the backbone and using more diverse pretraining data.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/vitpose-architecture.png"
alt="drawing" width="600"/>
<small> ViTPose architecture. Taken from the <a href="https://huggingface.co/papers/2204.12484">original paper.</a> </small>
You can find all ViTPose and ViTPose++ checkpoints under the [ViTPose collection](https://huggingface.co/collections/usyd-community/vitpose-677fcfd0a0b2b5c8f79c4335).
This model was contributed by [nielsr](https://huggingface.co/nielsr) and [sangbumchoi](https://github.com/SangbumChoi).
The original code can be found [here](https://github.com/ViTAE-Transformer/ViTPose).
## Usage Tips
ViTPose is a so-called top-down keypoint detection model. This means that one first uses an object detector, like [RT-DETR](rt_detr.md), to detect people (or other instances) in an image. Next, ViTPose takes the cropped images as input and predicts the keypoints for each of them.
The example below demonstrates pose estimation with the [`VitPoseForPoseEstimation`] class.
```py
import torch
import requests
import numpy as np
import supervision as sv
from PIL import Image
from transformers import AutoProcessor, RTDetrForObjectDetection, VitPoseForPoseEstimation
device = "cuda" if torch.cuda.is_available() else "cpu"
url = "http://images.cocodataset.org/val2017/000000000139.jpg"
url = "https://www.fcbarcelona.com/fcbarcelona/photo/2021/01/31/3c55a19f-dfc1-4451-885e-afd14e890a11/mini_2021-01-31-BARCELONA-ATHLETIC-BILBAOI-30.JPG"
image = Image.open(requests.get(url, stream=True).raw)
# ------------------------------------------------------------------------
# Stage 1. Detect humans on the image
# ------------------------------------------------------------------------
# You can choose any detector of your choice
# Detect humans in the image
person_image_processor = AutoProcessor.from_pretrained("PekingU/rtdetr_r50vd_coco_o365")
person_model = RTDetrForObjectDetection.from_pretrained("PekingU/rtdetr_r50vd_coco_o365", device_map=device)
@@ -67,7 +54,7 @@ with torch.no_grad():
results = person_image_processor.post_process_object_detection(
outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.3
)
result = results[0] # take first image results
result = results[0]
# Human label refers 0 index in COCO dataset
person_boxes = result["boxes"][result["labels"] == 0]
@@ -77,10 +64,7 @@ person_boxes = person_boxes.cpu().numpy()
person_boxes[:, 2] = person_boxes[:, 2] - person_boxes[:, 0]
person_boxes[:, 3] = person_boxes[:, 3] - person_boxes[:, 1]
# ------------------------------------------------------------------------
# Stage 2. Detect keypoints for each person found
# ------------------------------------------------------------------------
# Detect keypoints for each person found
image_processor = AutoProcessor.from_pretrained("usyd-community/vitpose-base-simple")
model = VitPoseForPoseEstimation.from_pretrained("usyd-community/vitpose-base-simple", device_map=device)
@@ -90,54 +74,7 @@ with torch.no_grad():
outputs = model(**inputs)
pose_results = image_processor.post_process_pose_estimation(outputs, boxes=[person_boxes])
image_pose_result = pose_results[0] # results for first image
```
### ViTPose++ models
The best [checkpoints](https://huggingface.co/collections/usyd-community/vitpose-677fcfd0a0b2b5c8f79c4335) are those of the [ViTPose++ paper](https://huggingface.co/papers/2212.04246). ViTPose++ models employ a so-called [Mixture-of-Experts (MoE)](https://huggingface.co/blog/moe) architecture for the ViT backbone, resulting in better performance.
The ViTPose+ checkpoints use 6 experts, hence 6 different dataset indices can be passed.
An overview of the various dataset indices is provided below:
- 0: [COCO validation 2017](https://cocodataset.org/#overview) dataset, using an object detector that gets 56 AP on the "person" class
- 1: [AiC](https://github.com/fabbrimatteo/AiC-Dataset) dataset
- 2: [MPII](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/software-and-datasets/mpii-human-pose-dataset) dataset
- 3: [AP-10K](https://github.com/AlexTheBad/AP-10K) dataset
- 4: [APT-36K](https://github.com/pandorgan/APT-36K) dataset
- 5: [COCO-WholeBody](https://github.com/jin-s13/COCO-WholeBody) dataset
Pass the `dataset_index` argument in the forward of the model to indicate which experts to use for each example in the batch. Example usage is shown below:
```python
image_processor = AutoProcessor.from_pretrained("usyd-community/vitpose-plus-base")
model = VitPoseForPoseEstimation.from_pretrained("usyd-community/vitpose-plus-base", device=device)
inputs = image_processor(image, boxes=[person_boxes], return_tensors="pt").to(device)
dataset_index = torch.tensor([0], device=device) # must be a tensor of shape (batch_size,)
with torch.no_grad():
outputs = model(**inputs, dataset_index=dataset_index)
```
The ViTPose+ checkpoints use 6 experts, hence 6 different dataset indices can be passed.
An overview of the various dataset indices is provided below:
- 0: [COCO validation 2017](https://cocodataset.org/#overview) dataset, using an object detector that gets 56 AP on the "person" class
- 1: [AiC](https://github.com/fabbrimatteo/AiC-Dataset) dataset
- 2: [MPII](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/software-and-datasets/mpii-human-pose-dataset) dataset
- 3: [AP-10K](https://github.com/AlexTheBad/AP-10K) dataset
- 4: [APT-36K](https://github.com/pandorgan/APT-36K) dataset
- 5: [COCO-WholeBody](https://github.com/jin-s13/COCO-WholeBody) dataset
### Visualization
To visualize the various keypoints, one can either leverage the `supervision` [library](https://github.com/roboflow/supervision (requires `pip install supervision`):
```python
import supervision as sv
image_pose_result = pose_results[0]
xy = torch.stack([pose_result['keypoints'] for pose_result in image_pose_result]).cpu().numpy()
scores = torch.stack([pose_result['scores'] for pose_result in image_pose_result]).cpu().numpy()
@@ -162,119 +99,192 @@ annotated_frame = vertex_annotator.annotate(
scene=annotated_frame,
key_points=key_points
)
annotated_frame
```
Alternatively, one can also visualize the keypoints using [OpenCV](https://opencv.org/) (requires `pip install opencv-python`):
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/vitpose.png"/>
</div>
```python
import math
import cv2
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
def draw_points(image, keypoints, scores, pose_keypoint_color, keypoint_score_threshold, radius, show_keypoint_weight):
if pose_keypoint_color is not None:
assert len(pose_keypoint_color) == len(keypoints)
for kid, (kpt, kpt_score) in enumerate(zip(keypoints, scores)):
x_coord, y_coord = int(kpt[0]), int(kpt[1])
if kpt_score > keypoint_score_threshold:
color = tuple(int(c) for c in pose_keypoint_color[kid])
if show_keypoint_weight:
cv2.circle(image, (int(x_coord), int(y_coord)), radius, color, -1)
transparency = max(0, min(1, kpt_score))
cv2.addWeighted(image, transparency, image, 1 - transparency, 0, dst=image)
else:
cv2.circle(image, (int(x_coord), int(y_coord)), radius, color, -1)
The example below uses [torchao](../quantization/torchao) to only quantize the weights to int4.
def draw_links(image, keypoints, scores, keypoint_edges, link_colors, keypoint_score_threshold, thickness, show_keypoint_weight, stick_width = 2):
height, width, _ = image.shape
if keypoint_edges is not None and link_colors is not None:
assert len(link_colors) == len(keypoint_edges)
for sk_id, sk in enumerate(keypoint_edges):
x1, y1, score1 = (int(keypoints[sk[0], 0]), int(keypoints[sk[0], 1]), scores[sk[0]])
x2, y2, score2 = (int(keypoints[sk[1], 0]), int(keypoints[sk[1], 1]), scores[sk[1]])
if (
x1 > 0
and x1 < width
and y1 > 0
and y1 < height
and x2 > 0
and x2 < width
and y2 > 0
and y2 < height
and score1 > keypoint_score_threshold
and score2 > keypoint_score_threshold
):
color = tuple(int(c) for c in link_colors[sk_id])
```py
# pip install torchao
import torch
import requests
import numpy as np
from PIL import Image
from transformers import AutoProcessor, RTDetrForObjectDetection, VitPoseForPoseEstimation, TorchAoConfig
url = "https://www.fcbarcelona.com/fcbarcelona/photo/2021/01/31/3c55a19f-dfc1-4451-885e-afd14e890a11/mini_2021-01-31-BARCELONA-ATHLETIC-BILBAOI-30.JPG"
image = Image.open(requests.get(url, stream=True).raw)
person_image_processor = AutoProcessor.from_pretrained("PekingU/rtdetr_r50vd_coco_o365")
person_model = RTDetrForObjectDetection.from_pretrained("PekingU/rtdetr_r50vd_coco_o365", device_map=device)
inputs = person_image_processor(images=image, return_tensors="pt").to(device)
with torch.no_grad():
outputs = person_model(**inputs)
results = person_image_processor.post_process_object_detection(
outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.3
)
result = results[0]
person_boxes = result["boxes"][result["labels"] == 0]
person_boxes = person_boxes.cpu().numpy()
person_boxes[:, 2] = person_boxes[:, 2] - person_boxes[:, 0]
person_boxes[:, 3] = person_boxes[:, 3] - person_boxes[:, 1]
quantization_config = TorchAoConfig("int4_weight_only", group_size=128)
image_processor = AutoProcessor.from_pretrained("usyd-community/vitpose-plus-huge")
model = VitPoseForPoseEstimation.from_pretrained("usyd-community/vitpose-plus-huge", device_map=device, quantization_config=quantization_config)
inputs = image_processor(image, boxes=[person_boxes], return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)
pose_results = image_processor.post_process_pose_estimation(outputs, boxes=[person_boxes])
image_pose_result = pose_results[0]
```
## Notes
- Use [`AutoProcessor`] to automatically prepare bounding box and image inputs.
- ViTPose is a top-down pose estimator. It uses a object detector to detect individuals first before keypoint prediction.
- ViTPose++ has 6 different MoE expert heads (COCO validation `0`, AiC `1`, MPII `2`, AP-10K `3`, APT-36K `4`, COCO-WholeBody `5`) which supports 6 different datasets. Pass a specific value corresponding to the dataset to the `dataset_index` to indicate which expert to use.
```py
from transformers import AutoProcessor, VitPoseForPoseEstimation
device = "cuda" if torch.cuda.is_available() else "cpu"
image_processor = AutoProcessor.from_pretrained("usyd-community/vitpose-plus-base")
model = VitPoseForPoseEstimation.from_pretrained("usyd-community/vitpose-plus-base", device=device)
inputs = image_processor(image, boxes=[person_boxes], return_tensors="pt").to(device)
dataset_index = torch.tensor([0], device=device) # must be a tensor of shape (batch_size,)
with torch.no_grad():
outputs = model(**inputs, dataset_index=dataset_index)
```
- [OpenCV](https://opencv.org/) is an alternative option for visualizing the estimated pose.
```py
# pip install opencv-python
import math
import cv2
def draw_points(image, keypoints, scores, pose_keypoint_color, keypoint_score_threshold, radius, show_keypoint_weight):
if pose_keypoint_color is not None:
assert len(pose_keypoint_color) == len(keypoints)
for kid, (kpt, kpt_score) in enumerate(zip(keypoints, scores)):
x_coord, y_coord = int(kpt[0]), int(kpt[1])
if kpt_score > keypoint_score_threshold:
color = tuple(int(c) for c in pose_keypoint_color[kid])
if show_keypoint_weight:
X = (x1, x2)
Y = (y1, y2)
mean_x = np.mean(X)
mean_y = np.mean(Y)
length = ((Y[0] - Y[1]) ** 2 + (X[0] - X[1]) ** 2) ** 0.5
angle = math.degrees(math.atan2(Y[0] - Y[1], X[0] - X[1]))
polygon = cv2.ellipse2Poly(
(int(mean_x), int(mean_y)), (int(length / 2), int(stick_width)), int(angle), 0, 360, 1
)
cv2.fillConvexPoly(image, polygon, color)
transparency = max(0, min(1, 0.5 * (keypoints[sk[0], 2] + keypoints[sk[1], 2])))
cv2.circle(image, (int(x_coord), int(y_coord)), radius, color, -1)
transparency = max(0, min(1, kpt_score))
cv2.addWeighted(image, transparency, image, 1 - transparency, 0, dst=image)
else:
cv2.line(image, (x1, y1), (x2, y2), color, thickness=thickness)
cv2.circle(image, (int(x_coord), int(y_coord)), radius, color, -1)
def draw_links(image, keypoints, scores, keypoint_edges, link_colors, keypoint_score_threshold, thickness, show_keypoint_weight, stick_width = 2):
height, width, _ = image.shape
if keypoint_edges is not None and link_colors is not None:
assert len(link_colors) == len(keypoint_edges)
for sk_id, sk in enumerate(keypoint_edges):
x1, y1, score1 = (int(keypoints[sk[0], 0]), int(keypoints[sk[0], 1]), scores[sk[0]])
x2, y2, score2 = (int(keypoints[sk[1], 0]), int(keypoints[sk[1], 1]), scores[sk[1]])
if (
x1 > 0
and x1 < width
and y1 > 0
and y1 < height
and x2 > 0
and x2 < width
and y2 > 0
and y2 < height
and score1 > keypoint_score_threshold
and score2 > keypoint_score_threshold
):
color = tuple(int(c) for c in link_colors[sk_id])
if show_keypoint_weight:
X = (x1, x2)
Y = (y1, y2)
mean_x = np.mean(X)
mean_y = np.mean(Y)
length = ((Y[0] - Y[1]) ** 2 + (X[0] - X[1]) ** 2) ** 0.5
angle = math.degrees(math.atan2(Y[0] - Y[1], X[0] - X[1]))
polygon = cv2.ellipse2Poly(
(int(mean_x), int(mean_y)), (int(length / 2), int(stick_width)), int(angle), 0, 360, 1
)
cv2.fillConvexPoly(image, polygon, color)
transparency = max(0, min(1, 0.5 * (keypoints[sk[0], 2] + keypoints[sk[1], 2])))
cv2.addWeighted(image, transparency, image, 1 - transparency, 0, dst=image)
else:
cv2.line(image, (x1, y1), (x2, y2), color, thickness=thickness)
# Note: keypoint_edges and color palette are dataset-specific
keypoint_edges = model.config.edges
# Note: keypoint_edges and color palette are dataset-specific
keypoint_edges = model.config.edges
palette = np.array(
[
[255, 128, 0],
[255, 153, 51],
[255, 178, 102],
[230, 230, 0],
[255, 153, 255],
[153, 204, 255],
[255, 102, 255],
[255, 51, 255],
[102, 178, 255],
[51, 153, 255],
[255, 153, 153],
[255, 102, 102],
[255, 51, 51],
[153, 255, 153],
[102, 255, 102],
[51, 255, 51],
[0, 255, 0],
[0, 0, 255],
[255, 0, 0],
[255, 255, 255],
]
)
palette = np.array(
[
[255, 128, 0],
[255, 153, 51],
[255, 178, 102],
[230, 230, 0],
[255, 153, 255],
[153, 204, 255],
[255, 102, 255],
[255, 51, 255],
[102, 178, 255],
[51, 153, 255],
[255, 153, 153],
[255, 102, 102],
[255, 51, 51],
[153, 255, 153],
[102, 255, 102],
[51, 255, 51],
[0, 255, 0],
[0, 0, 255],
[255, 0, 0],
[255, 255, 255],
]
)
link_colors = palette[[0, 0, 0, 0, 7, 7, 7, 9, 9, 9, 9, 9, 16, 16, 16, 16, 16, 16, 16]]
keypoint_colors = palette[[16, 16, 16, 16, 16, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0]]
link_colors = palette[[0, 0, 0, 0, 7, 7, 7, 9, 9, 9, 9, 9, 16, 16, 16, 16, 16, 16, 16]]
keypoint_colors = palette[[16, 16, 16, 16, 16, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0]]
numpy_image = np.array(image)
numpy_image = np.array(image)
for pose_result in image_pose_result:
scores = np.array(pose_result["scores"])
keypoints = np.array(pose_result["keypoints"])
for pose_result in image_pose_result:
scores = np.array(pose_result["scores"])
keypoints = np.array(pose_result["keypoints"])
# draw each point on image
draw_points(numpy_image, keypoints, scores, keypoint_colors, keypoint_score_threshold=0.3, radius=4, show_keypoint_weight=False)
# draw each point on image
draw_points(numpy_image, keypoints, scores, keypoint_colors, keypoint_score_threshold=0.3, radius=4, show_keypoint_weight=False)
# draw links
draw_links(numpy_image, keypoints, scores, keypoint_edges, link_colors, keypoint_score_threshold=0.3, thickness=1, show_keypoint_weight=False)
# draw links
draw_links(numpy_image, keypoints, scores, keypoint_edges, link_colors, keypoint_score_threshold=0.3, thickness=1, show_keypoint_weight=False)
pose_image = Image.fromarray(numpy_image)
pose_image
```
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/vitpose-coco.jpg" alt="drawing" width="600"/>
pose_image = Image.fromarray(numpy_image)
pose_image
```
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ViTPose. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
Refer to resources below to learn more about using ViTPose.
- A demo of ViTPose on images and video can be found [here](https://huggingface.co/spaces/hysts/ViTPose-transformers).
- A notebook illustrating inference and visualization can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViTPose/Inference_with_ViTPose_for_human_pose_estimation.ipynb).
- This [notebook](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViTPose/Inference_with_ViTPose_for_body_pose_estimation.ipynb) demonstrates inference and visualization.
- This [Space](https://huggingface.co/spaces/hysts/ViTPose-transformers) demonstrates ViTPose on images and video.
## VitPoseImageProcessor

View File

@@ -0,0 +1,300 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Voxtral
Voxtral is an upgrade of [Ministral 3B and Mistral Small 3B](https://mistral.ai/news/ministraux), extending its language capabilities with audio input support. It is designed to handle tasks such as speech transcription, translation, and audio understanding.
You can read more in Mistral's [realease blog post](https://mistral.ai/news/voxtral).
The model is available in two checkpoints:
- 3B: [mistralai/Voxtral-Mini-3B-2507](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507)
- 24B: [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507)
## Key Features
Voxtral builds on Ministral-3B by adding audio processing capabilities:
- **Transcription mode**: Includes a dedicated mode for speech transcription. By default, Voxtral detects the spoken language and transcribes it accordingly.
- **Long-form context**: With a 32k token context window, Voxtral can process up to 30 minutes of audio for transcription or 40 minutes for broader audio understanding.
- **Integrated Q&A and summarization**: Supports querying audio directly and producing structured summaries without relying on separate ASR and language models.
- **Multilingual support**: Automatically detects language and performs well across several widely spoken languages, including English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian.
- **Function calling via voice**: Can trigger functions or workflows directly from spoken input based on detected user intent.
- **Text capabilities**: Maintains the strong text processing performance of its Ministral-3B foundation.
## Usage
Let's first load the model!
```python
from transformers import VoxtralForConditionalGeneration, AutoProcessor
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
repo_id = "mistralai/Voxtral-Mini-3B-2507"
processor = AutoProcessor.from_pretrained(repo_id)
model = VoxtralForConditionalGeneration.from_pretrained(repo_id, torch_dtype=torch.bfloat16, device_map=device)
```
### Audio Instruct Mode
The model supports audio-text instructions, including multi-turn and multi-audio interactions, all processed in batches.
➡️ audio + text instruction
```python
conversation = [
{
"role": "user",
"content": [
{
"type": "audio",
"url": "https://huggingface.co/datasets/eustlb/audio-samples/resolve/main/dude_where_is_my_car.wav",
},
{"type": "text", "text": "What can you tell me about this audio?"},
],
}
]
inputs = processor.apply_chat_template(conversation)
inputs = inputs.to(device, dtype=torch.bfloat16)
outputs = model.generate(**inputs, max_new_tokens=500)
decoded_outputs = processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("\nGenerated response:")
print("=" * 80)
print(decoded_outputs[0])
print("=" * 80)
```
➡️ multi-audio + text instruction
```python
conversation = [
{
"role": "user",
"content": [
{
"type": "audio",
"path": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/mary_had_lamb.mp3",
},
{
"type": "audio",
"path": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/winning_call.mp3",
},
{"type": "text", "text": "What sport and what nursery rhyme are referenced?"},
],
}
]
inputs = processor.apply_chat_template(conversation)
inputs = inputs.to(device, dtype=torch.bfloat16)
outputs = model.generate(**inputs, max_new_tokens=500)
decoded_outputs = processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("\nGenerated response:")
print("=" * 80)
print(decoded_outputs[0])
print("=" * 80)
```
➡️ multi-turn:
```python
conversation = [
{
"role": "user",
"content": [
{
"type": "audio",
"path": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama.mp3",
},
{
"type": "audio",
"path": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/bcn_weather.mp3",
},
{"type": "text", "text": "Describe briefly what you can hear."},
],
},
{
"role": "assistant",
"content": "The audio begins with the speaker delivering a farewell address in Chicago, reflecting on his eight years as president and expressing gratitude to the American people. The audio then transitions to a weather report, stating that it was 35 degrees in Barcelona the previous day, but the temperature would drop to minus 20 degrees the following day.",
},
{
"role": "user",
"content": [
{
"type": "audio",
"path": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/dude_where_is_my_car.wav",
},
{"type": "text", "text": "Ok, now compare this new audio with the previous one."},
],
},
]
inputs = processor.apply_chat_template(conversation)
inputs = inputs.to(device, dtype=torch.bfloat16)
outputs = model.generate(**inputs, max_new_tokens=500)
decoded_outputs = processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("\nGenerated response:")
print("=" * 80)
print(decoded_outputs[0])
print("=" * 80)
```
➡️ text only:
```python
conversation = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What if a cyber brain could possibly generate its own ghost, and create a soul all by itself?",
},
],
}
]
inputs = processor.apply_chat_template(conversation)
inputs = inputs.to(device, dtype=torch.bfloat16)
outputs = model.generate(**inputs, max_new_tokens=500)
decoded_outputs = processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("\nGenerated response:")
print("=" * 80)
print(decoded_outputs[0])
print("=" * 80)
```
➡️ audio only:
```python
conversation = [
{
"role": "user",
"content": [
{
"type": "audio",
"path": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/dude_where_is_my_car.wav",
},
],
}
]
inputs = processor.apply_chat_template(conversation)
inputs = inputs.to(device, dtype=torch.bfloat16)
outputs = model.generate(**inputs, max_new_tokens=500)
decoded_outputs = processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("\nGenerated response:")
print("=" * 80)
print(decoded_outputs[0])
print("=" * 80)
```
➡️ batched inference!
```python
conversations = [
[
{
"role": "user",
"content": [
{
"type": "audio",
"path": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama.mp3",
},
{
"type": "audio",
"path": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/bcn_weather.mp3",
},
{
"type": "text",
"text": "Who's speaking in the speach and what city's weather is being discussed?",
},
],
}
],
[
{
"role": "user",
"content": [
{
"type": "audio",
"path": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/winning_call.mp3",
},
{"type": "text", "text": "What can you tell me about this audio?"},
],
}
],
]
inputs = processor.apply_chat_template(conversations)
inputs = inputs.to(device, dtype=torch.bfloat16)
outputs = model.generate(**inputs, max_new_tokens=500)
decoded_outputs = processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("\nGenerated responses:")
print("=" * 80)
for decoded_output in decoded_outputs:
print(decoded_output)
print("=" * 80)
```
### Transcription Mode
Use the model to transcribe audio (supports English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian)!
```python
inputs = processor.apply_transcrition_request(language="en", audio="https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama.mp3")
inputs = inputs.to(device, dtype=torch.bfloat16)
outputs = model.generate(**inputs, max_new_tokens=500)
decoded_outputs = processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print("\nGenerated responses:")
print("=" * 80)
for decoded_output in decoded_outputs:
print(decoded_output)
print("=" * 80)
```
This model was contributed by [Eustache Le Bihan](https://huggingface.co/eustlb).
## VoxtralConfig
[[autodoc]] VoxtralConfig
## VoxtralEncoderConfig
[[autodoc]] VoxtralEncoderConfig
## VoxtralProcessor
[[autodoc]] VoxtralProcessor
## VoxtralEncoder
[[autodoc]] VoxtralEncoder
- forward
## VoxtralForConditionalGeneration
[[autodoc]] VoxtralForConditionalGeneration
- forward

View File

@@ -172,9 +172,9 @@ Otherwise, [`~Wav2Vec2ProcessorWithLM.batch_decode`] performance will be slower
>>> dataset = dataset.cast_column("audio", datasets.Audio(sampling_rate=16_000))
>>> def map_to_array(batch):
... batch["speech"] = batch["audio"]["array"]
... return batch
>>> def map_to_array(example):
... example["speech"] = example["audio"]["array"]
... return example
>>> # prepare speech data for batch inference

View File

@@ -1,4 +1,4 @@
# Modular Transformers
# Contributing a new model to Transformers
Modular Transformers lowers the bar for contributing models and significantly reduces the code required to add a model by allowing imports and inheritance.
@@ -540,6 +540,9 @@ This makes it very easy to switch decorators and makes it explicit that the only
## Docstring variables
> [!TIP]
> Refer to the [Documeting a model](./auto_docstring) guide for more information about how you can use the `@auto_docstring` decorator to help automatically generate consistent docstring arguments.
If an object defined in both the modular and modeling file from which it inherits, the modular definition has precedence unless for assignments containing the pattern `DOCSTRING`. These variables are typically used in `MODEL_START_DOCSTRING` and `MODEL_INPUT_DOCSTRING` in the modeling files. They are big blocks of docstrings and the linter rewrites the names everywhere. For this reason, assignments containing the `DOCSTRING` variable can use the definition found in the source file without copying the whole docstring, by simply setting the variable to `None` in the modular file.
This is very useful if you need the variable reference somewhere but you don't want to clutter the modular file with docstrings which are always the same. The example code below allows you to automatically use the same docstrings from [Mistral](./model_doc/mistral) in [Starcoder2](./model_doc/starcoder2).

View File

@@ -164,7 +164,7 @@ args = TrainingArguments(
output_dir="./test-schedulefree",
max_steps=1000,
per_device_train_batch_size=4,
+ optim="schedule_free_radamw,
+ optim="schedule_free_radamw",
+ lr_scheduler_type="constant",
gradient_checkpointing=True,
logging_strategy="steps",
@@ -174,3 +174,29 @@ args = TrainingArguments(
run_name="sfo",
)
```
## StableAdamW
```bash
pip install torch-optimi
```
[StableAdamW](https://arxiv.org/pdf/2304.13013) is a hybrid between AdamW and AdaFactor. It ports AdaFactor's update clipping into AdamW, which removes the need for gradient clipping. Otherwise, it behaves as a drop-in replacement for AdamW.
> [!TIP]
> If training on large batch sizes or still observing training loss spikes, consider reducing beta_2 between [0.95, 0.99].
```diff
args = TrainingArguments(
output_dir="./test-stable-adamw",
max_steps=1000,
per_device_train_batch_size=4,
+ optim="stable_adamw",
gradient_checkpointing=True,
logging_strategy="steps",
logging_steps=1,
learning_rate=2e-6,
save_strategy="no",
run_name="stable-adamw",
)
```

View File

@@ -15,7 +15,7 @@ rendered properly in your Markdown viewer.
# Build your own machine
One of the most important consideration when building a machine for deep learning is the GPU choice. GPUs are the standard workhorse for deep learning owing to their tensor cores for performing very efficient matrix multiplication and high memory bandwidth. To train large models, you either need a more powerful GPU, multiple GPUs, or take advantage of techniques that offload some of the load to the CPU or NVMe.
One of the most important considerations when building a machine for deep learning is the GPU choice. GPUs are the standard workhorse for deep learning owing to their tensor cores for performing very efficient matrix multiplication and high memory bandwidth. To train large models, you either need a more powerful GPU, multiple GPUs, or take advantage of techniques that offload some of the load to the CPU or NVMe.
This guide provides some practical tips for setting up a GPU for deep learning. For a more detailed discussion and comparison of GPUs, take a look at the [Which GPU(s) to Get for Deep Learning](https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/) blog post.
@@ -25,11 +25,11 @@ High-end consumer GPUs may have two or three PCIe 8-pin power sockets, and you s
Each PCIe 8-pin power cable should be connected to a 12V rail on the power supply unit (PSU) and can deliver up to 150W. Other GPUs may use a PCIe 12-pin connector which can deliver up to 500-600W. Lower-end GPUs may only use a PCIe 6-pin connector which supplies up to 75W.
It is important the PSU has stable voltage otherwise it may not be able to supply the GPU with enough power to function properly during peak usage.
It is important that the PSU maintains stable voltage; otherwise, it may fail to supply the GPU with enough power during peak usage.
## Cooling
An overheated GPU throttles its performance and can even shutdown if it's too hot to prevent damage. Keeping the GPU temperature low, anywhere between 158 - 167F, is essential for delivering full performance and maintaining its lifespan. Once temperatures reach 183 - 194F, the GPU may begin to throttle performance.
An overheated GPU throttles its performance and can even shutdown if it's too hot to prevent damage. Keeping the GPU temperature low, anywhere between 158167°F, is essential for delivering full performance and maintaining its lifespan. Once temperatures reach 183 - 194°F, the GPU may begin to throttle performance.
## Multi-GPU connectivity

View File

@@ -13,21 +13,19 @@ rendered properly in your Markdown viewer.
-->
# Tensor parallelism in transformers
# Distributed inference
[Tensor parallelism](./perf_train_gpu_many#tensor-parallelism) shards a model onto multiple GPUs and parallelizes computations such as matrix multiplication. It enables fitting larger model sizes into memory and is faster because each GPU can process a tensor slice.
This document assumes that you are already familiar with the basics of tensor parallelism. If you are not, please refer to the [Ultra-Scale Playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook?section=tensor_parallelism) section on tensor parallelism.
When a model doesn't fit on a single GPU, distributed inference with [tensor parallelism](./perf_train_gpu_many#tensor-parallelism) can help. Tensor parallelism shards a model onto multiple accelerators (CUDA GPU, Intel XPU, etc.) and parallelizes computations such as matrix multiplication. It enables fitting larger model sizes into memory and is faster because each accelerator can process a tensor slice.
However, tensor parallelism adds communication overhead and should be used on single machine setups with multiple accelerators to take advantage of fast intra-node communication. For multi-node training, it may be more efficient to use pipeline or data parallelism depending on your use case.
> [!TIP]
> Tensor parallelism is very communication intensive, therefore it is reccomended to use it on a single machine with multiple GPUs, utilizing fast intra-node communication. For multi-node training, methods as pipeline or data parallelism are more efficient (depending on your use case).
> Refer to the [Ultra-Scale Playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook?section=tensor_parallelism) section on tensor parallelism to learn more.
Tensor parallelism requires slight changes to the model parameters, therefore in transformers, we support some of the popular models out of the box.
> [!TIP]
> Expand the list below to see which models support tensor parallelism. Open a GitHub issue or pull request to add support for a model not currently below.
Check the list below for models that natively support tensor parallelism. Open a GitHub issue or pull request to add support for a model.
<details>
<summary>Supported models</summary>
<summary>Show supported models</summary>
* [Cohere](./model_doc/cohere) and [Cohere 2](./model_doc/cohere2)
* [Gemma](./model_doc/gemma) and [Gemma 2](./model_doc/gemma2)
@@ -43,19 +41,74 @@ Tensor parallelism requires slight changes to the model parameters, therefore in
</details>
## Using 🤗 transformers
This guide shows how to enable tensor parallelism with Transformers and different partitioning strategies.
Transformers provides a simple interface to use for tensor parallelism. We provide multiple classes implementing different partitioning
strategies and a simple entrypoint to parallelize `nn.Module` instance. You won't have to interact with this interface directly, everything is done in `PretrainedModel.from_pretrained` method for you. This section will first talk about the partitioning strategies
we support, then the user interface you will be interacting with, and finally it will teach you how to extend it with your own partitioning
strategies.
## Partitioning a model
### Partitioning strategies
Transformers supports tensor parallelism if a model has a `tp_plan`. There are two plans to partition a model.
In transformers, partitioning strategies reside in a class `ParallelInterface` which works like a mapping from string to the strategy implementation.
- The `auto` tensor parallelism plan partitions a model (see the supported models above) based on a predefined configuration.
- You can also manually specify your own partitioning plan and pass it to the `tp_plan` parameter in [`~PreTrainedModel.from_pretrained`].
<hfoptions id="sharding">
<hfoption id="auto plan">
```python
```py
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct" # better to visualize all the possible strategies
model_id = "meta-llama/Meta-Llama-3-8B-Instruct" # better for smaller number of GPUs
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, tp_plan="auto")
print(model._tp_plan)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
prompt = "Can I help"
inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
# distributed run
outputs = model(inputs)
```
Launch the inference script above on [torchrun](https://pytorch.org/docs/stable/elastic/run.html) with 4 processes per GPU.
```bash
torchrun --nproc-per-node 4 demo.py
```
</hfoption>
<hfoption id="manual plan">
Define a tensor parallel plan for each layer in `tp_plan` and pass it to [`~PreTrainedModel.from_pretrained`]. The example below uses a combination of column and row partitioning. Refer to the [Partitioning strategies](#partitioning-strategies) section to learn about other supported partitioning strategies.
> [!WARNING]
> Manually specifying your own partitioning plan requires a good understanding of the model architecture and how the partitioning strategies interact together. If you are not sure about the partitioning strategies, the resulting model can be very slow, even failing or incorrect. Refer to the [Ultra-Scale Playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook?section=tensor_parallelism) to learn more.
```py
from transformers import AutoModelForCausalLM
tp_plan = {
"model.layers.*.self_attn.q_proj": "colwise",
"model.layers.*.self_attn.k_proj": "colwise",
"model.layers.*.self_attn.v_proj": "colwise",
"model.layers.*.self_attn.o_proj": "rowwise",
...
}
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, tp_plan=tp_plan)
print(model._tp_plan)
```
</hfoption>
</hfoptions>
## Partitioning strategies
All partitioning strategies are defined in the [`ParallelInterface`] class which maps a string to the strategy implementation. You don't need to interact with this class directly since all the strategies are set with `tp_plan` in [`~PreTrainedModel.from_pretrained`], but it is useful for checking what strategies are available.
```py
class ParallelInterface(MutableMapping):
"""
Dict-like object keeping track of allowed attention functions. You can easily add a new attention function
@@ -77,66 +130,32 @@ class ParallelInterface(MutableMapping):
}
```
We support the following strategies:
Refer to the table below to learn more about each strategy.
- `ColwiseParallel` - A simple column-wise partitioning, being able to handle both weights and biases, does exactly what we've discussed before.
- `RowwiseParallel` - Again, row-wise partitioning as dicussed before, supports weights and biases, on top of that it also supports `nn.Embedding` modules.
- `SequenceParallel` - Sequence parallel implementation, for support of `LayerNorm` and `Dropout` layers. Also supports Python implementation of `RMSNorm` (see [this](https://github.com/facebookresearch/llama/blob/main/llama/model.py#L34))
- `PackedColwiseParallel` - A variant of column-wise partitioning, however it works on packed weights (i.e. `up_proj` and `gate_proj` being packed together). For more details, see [this comment](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/tensor_parallel.py#L79-#L108)
- `PackedRowwiseParallel` - A variant of row-wise partitioning, works on packed weights, for more details check the comment linked above.
- `GatherParallel` - A very simple class, that only makes the outputs of the module to be gathered across devices.
- `IsolatedParallel` - This is a special case, where we want to *isolate* the module from the rest of the devices (world). This is used for Experts in MoE layers, basically creating Expert parallelism of sorts.
- `ReplicateParallel` - Many `torch.distributed` APIs break if model is partially sharded, so this class is used to replicate the module across all devices.
| Strategy | Description |
|---|---|
| `ColwiseParallel` | Column-wise partitioning of weights and biases. |
| `RowwiseParallel` | Row-wise partitioning of weights and biases. Also supports partitioning `nn.Embedding` modules. |
| `SequenceParallel` | Sequence parallel implementation to support `LayerNorm` and `Dropout` layers. Also supports Python implementation of [RMSNorm](https://github.com/facebookresearch/llama/blob/main/llama/model.py#L34). |
| `PackedColwiseParallel` | Variant of `ColwiseParallel` to support packed weights (for example, packing `up_proj` and `gate_proj` together). Refer to the [code](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/tensor_parallel.py#L79-#L108) for more details. |
| `PackedRowwiseParallel` | Variant of `RowwiseParallel` to support packed weights (refer to the [code](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/tensor_parallel.py#L79-#L108) for more details). |
| `GatherParallel` | Gather outputs of the module across devices. |
| `IsolatedParallel` | Used for Experts in Mixture-of-Experts (MoE) layers to isolates module from other devices. |
| `ReplicateParallel` | Replicate modules across all devices to prevent `torch.distributed` APIs from breaking due to a partially sharded model. |
### Sharding a model
### Packed strategies
We provide two ways to shard a model, first one is to use `auto` tensor parallelism plan, which will automatically shard the model based on our predefined configuration. This requires the model to have predefined tensor parallel plan in transformers.
Weight packing packs multiple linear layers into a single, bigger layer. Packed strategies, `PackedColwiseParallel` and `PackedRowwiseParallel`, are used to shard packed weights. The more basic `ColwiseParallel` or `RowwiseParallel` will incorrectly shard the packed weights.
```python
from transformers import AutoModelForCausalLM
The example below packs `up_proj` and `gate_proj` into a single `gate_up_proj` module and requires the `PackedRowwiseParallel` strategy to shard `gate_up_proj`.
# model_id = "meta-llama/Meta-Llama-3-8B-Instruct" # better for smaller number of GPUs
model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct" # better to visualize all the possible strategies
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, tp_plan="auto")
print(model._tp_plan)
```
> [!TIP]
> For a list of models that support tensor parallelism, see the [Supported models](#supported-models) section above.
The second way is to manually specify your own partitioning plan.
```python
from transformers import AutoModelForCausalLM
tp_plan = {
"model.layers.*.self_attn.q_proj": "colwise",
"model.layers.*.self_attn.k_proj": "colwise",
"model.layers.*.self_attn.v_proj": "colwise",
"model.layers.*.self_attn.o_proj": "rowwise",
...
}
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, tp_plan=tp_plan)
print(model._tp_plan)
```
You might have noticed that there are some special cases in the `ParallelInterface` mapping, let's now talk about them. This will help you understand their purpose and help with extending to other strategies.
### PackedRowwiseParallel
This class is a special case of `RowwiseParallel`, it's used to shard packed weights. Weight packing is a common technique used in models. It's a technique where we pack multiple linear layers into a single, bigger one.
For example in `Llama4` model, we pack `up_proj` and `gate_proj` into a single `gate_up_proj` module.
```python
class Llama4TextExperts(nn.Module):
...
self.gate_up_proj = nn.Parameter(torch.empty(self.num_experts, self.hidden_size, 2 * self.expert_dim))
```
Then in forward, we can use batch matrix multiplication to compute the output of the `gate_up_proj` module.
Batch matrix multiplication can be used in the `forward` pass to compute the output of the `gate_up_proj` module.
```python
def forward(self, hidden_states):
@@ -145,185 +164,148 @@ def forward(self, hidden_states):
gate, up = gate_up.chunk(2, dim=-1) # Split the output into gate and up
```
In this case, we need to use the `PackedRowwiseParallel` strategy to shard the `gate_up_proj` module, as using a simple `RowwiseParallel` will shard the layers wrongly.
> [!TIP]
> If this is a bit difficult to wrap your head around, check out [this comment](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/tensor_parallel.py#L79-#L108) for an amazing visual representation of why `Packed*` needs to be used.
> Refer to [this comment](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/tensor_parallel.py#L79-#L108) for an visual representation of why `Packed*` needs to be used.
### Local strategies
### `local*` strategies
Local strategies (`local_colwise`, `local_rowwise`, `local_packed_rowwise`) don't use [DTensor](https://docs.pytorch.org/docs/stable/distributed.tensor.html) because it isn't supported for some operations such as [torch.chunk](https://docs.pytorch.org/docs/stable/generated/torch.chunk.html). Instead, local strategies use the basic [torch.Tensor](https://docs.pytorch.org/docs/stable/tensors.html) and performs some of the distributed logic manually.
You could have noticed that there are `local*` strategies, which use the same layers as `*` strategy, but don't use `DTensor` at all.
This is because `DTensor` is not supported for some of the operations: such as `torch.chunk`. Therefore, sometimes we need to use the `local*` strategies, which use vanilla `torch.Tensor` and do some of the distributed logic manually.
<!---
<!--
Readd this when I get the exact error message
> [!TIP]
> If you are using a custom partitioning strategy, and it's not working with `... is not supported` error, try using the `local*` strategies to see if they work better.
-->
> [!WARNING]
> Manually specifying your own partitiong plan requires a good understanding of the model architecture and how the partitioning strategies interact together. If you are not sure about this, the resulting model can be very slow, even failing or incorrect. Again, refer to the [Ultra-Scale Playbook](https://huggingface.co/spaces/nanotron/ultrascale-playbook?section=tensor_parallelism) which can teach you everything required.
## Custom partitioning strategies
### Extending the interface with your own partitioning strategies
A custom partitioning strategy should inherit from [`TensorParallelLayer`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/tensor_parallel.py) and implement `partition_tensor`, `_prepare_input_fn` and `_prepare_output_fn`.
This is a very advanced topic, which requires a good understanding of distributed collectives and the model architecture.
Your custom partitioning strategy should inherit from `TensorParallelLayer` defined in [integrations/tensor_parallel.py](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/tensor_parallel.py) and implement: `partition_tensor`, `_prepare_input_fn` and `_prepare_output_fn`. Then it should be registered in the `ParallelInterface` mapping, so our dispatching logic can find it when specified in the `tp_plan`.
Then it needs to be registered in the `ParallelInterface` mapping so the dispatching logic can find it when specified in `tp_plan`.
Let's go through this workflow step by step, on an already existing example: `ColwiseParallel`.
The example below shows how to implement `ColwiseParallel` with this workflow.
1. Inherit from `TensorParallelLayer` and initialization
1. Inherit from `TensorParallelLayer`. In the `__init__` method, define `input_layouts` and `output_layouts` to describe how the input and output tensors should be placed on devices. The `desired_input_layouts` attribute is used to specify how the input *should* be placed on devices.
```python
class ColwiseParallel(TensorParallelLayer):
def __init__(
```python
class ColwiseParallel(TensorParallelLayer):
def __init__(
self,
*,
input_layouts: Optional[Placement] = None, # The input layout coming from the previous layer
output_layouts: Optional[Placement] = None, # The output layout we want to achieve
use_local_output: bool = True, # Whether to use local output or not
use_dtensor=True, # Whether to use DTensor or not
):
self.input_layouts = (input_layouts or Replicate(),) # The input sharding coming from the previous layer
self.output_layouts = (output_layouts or Shard(-1),) # Desired output sharding
self.desired_input_layouts = (Replicate(),) # Desired input sharding, inputs should be replicated across GPUs
self.use_local_output = use_local_output
self.use_dtensor = use_dtensor
```
2. Implement the `partition_tensor`, `_prepare_input_fn` and `_prepare_output_fn` methods.
The `partition_tensor` method partitions the tensor and fills `empty_param` with the partitioned tensor. Use the utility function `get_tensor_shard` to help you get the correct shard of the original parameter for a given rank and `get_packed_weights` to help with packed weights.
```python
def partition_tensor(
self,
*,
input_layouts: Optional[Placement] = None, # The input layout coming from the previous layer
output_layouts: Optional[Placement] = None, # The output layout we want to achieve
use_local_output: bool = True, # Whether to use local output or not
use_dtensor=True, # Whether to use DTensor or not
):
self.input_layouts = (input_layouts or Replicate(),) # The input sharding coming from the previous layer
self.output_layouts = (output_layouts or Shard(-1),) # Desired output sharding
self.desired_input_layouts = (Replicate(),) # Desired input sharding, inputs should be replicated across GPUs
self.use_local_output = use_local_output
self.use_dtensor = use_dtensor
```
param, # Full tensor of the parameter
empty_param, # Empty tensor of the parameter, will be filled with the partitioned tensor
param_type, # Type of the parameter, `bias` or `weight`
param_casting_dtype, # The type to cast the parameter to
to_contiguous, # Whether to convert the tensor to a contiguous memory layout
rank, # The rank of the current device
device_mesh, # The device mesh
) -> nn.Parameter: # Return the partitioned parameter
...
```
In the `__init__` method, we define these attributes, where `input_layouts` and `output_layouts` describing, how the input and output tensors should be placed on the devices. `desired_input_layouts` is used to specify, how the input *SHOULD* be placed on the devices.
The `_prepare_input_fn` and `_prepare_output_fn` methods are used in the [pre-forward](https://docs.pytorch.org/docs/stable/generated/torch.nn.modules.module.register_module_forward_pre_hook.html) and [forward](https://docs.pytorch.org/docs/stable/generated/torch.nn.modules.module.register_module_forward_hook.html) hooks. They redistribute the inputs and outputs to the desired layout as specified in the `__init__`.
2a. Implement `partition_tensor` method
```python
def _prepare_input_fn(input_layouts, desired_input_layouts, mod, inputs, device_mesh):
...
# Do some custom logic, cast to DTensor etc.
...
return inputs.redistribute(placements=desired_input_layouts, device_mesh=device_mesh)
def _prepare_output_fn(output_layouts, use_local_output, mod, outputs, device_mesh):
...
# Do some custom logic, cast to DTensor etc.
...
return outputs.redistribute(placements=output_layouts, device_mesh=device_mesh)
```
```python
def partition_tensor(
self,
param, # Full tensor of the parameter
empty_param, # Empty tensor of the parameter, will be filled with the partitioned tensor
param_type, # Type of the parameter, `bias` or `weight`
param_casting_dtype, # The type to cast the parameter to
to_contiguous, # Whether to convert the tensor to a contiguous memory layout
rank, # The rank of the current device
device_mesh, # The device mesh
) -> nn.Parameter: # Return the partitioned parameter
...
```
3. Register the strategy to [`ParallelInterface`] to enable it for use with `tp_plan`.
This method is used to partition the tensor, and fill the `empty_param` with the partitioned tensor.
We provide some utility functions to help you with this, such as `get_tensor_shard` which will get you the correct shard of the original parameter for this rank or `get_packed_weights` to help with packed weights.
```python
from transformers.integrations.tensor_parallel import ParallelInterface
2b. Implement `_prepare_input_fn` and `_prepare_output_fn` methods
ParallelInterface.register_strategy("colwise_custom", ColwiseParallel)
tp_plan = {
"model.layers.*.self_attn.q_proj": "colwise_custom",
...
}
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, tp_plan=tp_plan)
```
These methods are used as [`pre-forward`](https://docs.pytorch.org/docs/stable/generated/torch.nn.modules.module.register_module_forward_pre_hook.html) and [`forward`](https://docs.pytorch.org/docs/stable/generated/torch.nn.modules.module.register_module_forward_hook.html) hooks respectively. Their purpose is to re-distribute the inputs and outputs to the desired layout, passed in the `__init__` method.
## Benchmarks
```python
def _prepare_input_fn(input_layouts, desired_input_layouts, mod, inputs, device_mesh):
...
# Do some custom logic, cast to DTensor etc.
...
return inputs.redistribute(placements=desired_input_layouts, device_mesh=device_mesh)
Tensor parallelism can considerably speedup inference, especially for inputs with large batch sizes or long sequences.
def _prepare_output_fn(output_layouts, use_local_output, mod, outputs, device_mesh):
...
# Do some custom logic, cast to DTensor etc.
...
return outputs.redistribute(placements=output_layouts, device_mesh=device_mesh)
```
3. Register the strategy
Congratulations! You've implemented your own partitioning strategy. Now, to use it with your own `tp_plan`, you need to register it in the `ParallelInterface` mapping.
```python
from transformers.integrations.tensor_parallel import ParallelInterface
ParallelInterface.register_strategy("colwise_custom", ColwiseParallel)
```
And now you can use it in your `tp_plan` as such:
```python
tp_plan = {
"model.layers.*.self_attn.q_proj": "colwise_custom",
...
}
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, tp_plan=tp_plan)
```
## Full example
Let's go through a full example of inference with tensor parallelism.
```python
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# enable tensor parallelism
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B-Instruct",
tp_plan="auto",
)
# prepare input tokens
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
prompt = "Can I help"
inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
# distributed run
outputs = model(inputs)
```
Launch the inference script above on [torchrun](https://pytorch.org/docs/stable/elastic/run.html) with 4 processes per GPU.
```bash
torchrun --nproc-per-node 4 demo.py
```
You can benefit from considerable speed ups for inference, especially for inputs with large batch size or long sequences.
For a single forward pass on [Llama](./model_doc/llama) with a sequence length of 512 and various batch sizes, you can expect the following speed ups.
Refer to the chart below for the expected speedup for a single forward pass on [Llama](./model_doc/llama) with a sequence length of 512.
<div style="text-align: center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Meta-Llama-3-8B-Instruct%2C%20seqlen%20%3D%20512%2C%20python%2C%20w_%20compile.png">
</div>
## Tensor parallelism in-depth
Our implementation of tensor parallelism is framework-agnostic in design, but the specific implementations we've developed rely on the torch.distributed package. We heavily utilize abstractions such as `DeviceMesh` or `DTensor` to provide a simple and extensible interface to the user.
## Design implementation
The Transformers tensor parallelism implementation is framework-agnostic, but for specific implementations, we rely on [DeviceMesh](https://docs.pytorch.org/tutorials/recipes/distributed_device_mesh.html) and [DTensor](https://docs.pytorch.org/docs/stable/distributed.tensor.html) from [torch.distributed](https://docs.pytorch.org/tutorials/beginner/dist_overview.html) to provide a simple and extensible interface.
### DeviceMesh
Imagine `DeviceMesh` as a multi-dimensional grid of devices that communicate together. Different parallelization strategies require different types of communication patterns, therefore we can create a `DeviceMesh` with multiple submeshes:
Imagine `DeviceMesh` as a multi-dimensional grid of devices that communicate together. Different parallelization strategies require different types of communication patterns, so you can create a `DeviceMesh` with multiple sub-meshes.
```python
from torch.distributed.device_mesh import init_device_mesh
# Create a 1D mesh of 4 GPUs
device_mesh = init_device_mesh("cuda", (4,), mesh_dim_names=["tp"])
```
Then, most of the `torch.distributed` defined parallelization strategies can be applied to a mesh itself, or its submesh, automatically handling the communication patterns.
Most of the `torch.distributed` defined parallelization strategies can be applied to the mesh itself, or its sub-mesh, and it automatically handles the communication patterns.
### DTensor
Abbreviation for Distributed Tensor, `DTensor` is a tensor subclass that handles the distributed logic on-top of the usual tensor operations. Most of the model weights in case of tensor parallelism are stored as `DTensor`s (with some exceptions, more on that later).
The most important part of DTensor, that is crucial to understand, is the `placement` attribute. It's an attribute that tells PyTorch how is the tensor placed on the devices of the `DeviceMesh`.
`DTensor` (Distributed Tensor) is a tensor subclass that handles the distributed logic on top of the usual tensor operations. Most of the model weights in tensor parallelism are stored as `DTensor`s.
It can have the following values:
The most important part of DTensor is the `placement` attribute because it tells PyTorch how a tensor is placed on the devices in `DeviceMesh`. The `placement` attribute can take the following values.
- `Shard(dimension)` - Annotates that this `DTensor` is sharded across a given dimension, over the `DeviceMesh` it was constructed under. For example, if we would like to shard weights for column-wise partitioning, we would do:
```python
weight = ...
weight = DTensor.from_local(weight, device_mesh["tp"], placements=[Shard(0)]) # Shard across the 1st (column-wise) dimension
bias = ...
bias = DTensor.from_local(bias, device_mesh["tp"], placements=[Shard(-1)]) # Shard across the ONLY dimension
```
- `Shard(dimension)` - Indicates how a `DTensor` is sharded across a given dimension, over the `DeviceMesh` it was constructed under. The example below demonstrates how to shard weights over different dimensions for column-wise partitioning.
To give another example, for row-wise partitioning, we would do:
```python
weight = ...
weight = DTensor.from_local(weight, device_mesh["tp"], placements=[Shard(1)]) # Shard across the 2nd (row-wise) dimension
bias = ...
bias = DTensor.from_local(bias, device_mesh["tp"], placements=[Replicate()]) # Replicate bias across all GPUs
```
```python
weight = ...
weight = DTensor.from_local(weight, device_mesh["tp"], placements=[Shard(0)]) # Shard across the 1st (column-wise) dimension
bias = ...
bias = DTensor.from_local(bias, device_mesh["tp"], placements=[Shard(-1)]) # Shard across the ONLY dimension
```
- `Replicate()` - Annotates that this `DTensor` is replicated across the `DeviceMesh`. Very straight-forward, only creates a full copy of the tensor on each device.
- `Partial()` - This placement is mostly of no interest to us, it's used to annotate that this tensor is pending a reduction operation.
This example demonstrates how to shard weights over different dimensions for row-wise partitioning.
```python
weight = ...
weight = DTensor.from_local(weight, device_mesh["tp"], placements=[Shard(1)]) # Shard across the 2nd (row-wise) dimension
bias = ...
bias = DTensor.from_local(bias, device_mesh["tp"], placements=[Replicate()]) # Replicate bias across all GPUs
```
- `Replicate()` - Indicates a `DTensor` is replicated across the `DeviceMesh`. It only creates a full copy of the tensor on each device.
```py
bias = ...
bias = DTensor.from_local(bias, device_mesh["tp"], placements=[Replicate()]) # Replicate bias across all GPUs
```
- `Partial()` - Indicates a tensor is pending a reduction operation (not typically relevant for usage in Transformers).

Some files were not shown because too many files have changed in this diff Show More