Files
HuggingFace_transformer/docs/source/en/model_doc/auto.md
StevenBucaille a88ea9cbc8 Add EfficientLoFTR model (#36355)
* initial commit

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: various typos, typehints, refactors from suggestions

* fix: fine_matching method

* Added EfficientLoFTRModel and AutoModelForKeypointMatching class

* fix: got rid of compilation breaking instructions

* docs: added todo for plot

* fix: used correct hub repo

* docs: added comments

* fix: run modular

* doc: added PyTorch badge

* fix: model repo typo in config

* fix: make modular

* fix: removed mask values from outputs

* feat: added plot_keypoint_matching to EfficientLoFTRImageProcessor

* feat: added SuperGlueForKeypointMatching to AutoModelForKeypointMatching list

* fix: reformat

* refactor: renamed aggregation_sizes config parameter into q, kv aggregation kernel size and stride

* doc: added q, kv aggregation kernel size and stride doc to config

* refactor: converted efficientloftr implementation from modular to copied from mechanism

* tests: overwrote batching_equivalence for "keypoints" specific tests

* fix: changed EfficientLoFTRConfig import in test_modeling_rope_utils

* fix: make fix-copies

* fix: make style

* fix: update rope function to make meta tests pass

* fix: rename plot_keypoint_matching to visualize_output for clarity

* refactor: optimize image pair processing by removing redundant target size calculations

* feat: add EfficientLoFTRImageProcessor to image processor mapping

* refactor: removed logger and updated attention forward

* refactor: added auto_docstring and can_return_tuple decorators

* refactor: update type imports

* refactor: update type hints from List/Dict to list/dict for consistency

* refactor: update MODEL_MAPPING_NAMES and __all__ to include LightGlue and AutoModelForKeypointMatching

* fix: change type hint for size parameter in EfficientLoFTRImageProcessor to Optional[dict]

* fix typing

* fix some typing issues

* nit

* a few more typehint fixes

* Remove output_attentions and output_hidden_states from modeling code

* else -> elif to support efficientloftr

* nit

* tests: added EfficientLoFTR image processor tests

* refactor: reorder functions

* chore: update copyright year in EfficientLoFTR test file

* Use default rope

* Add docs

* Update visualization method

* fix doc order

* remove 2d rope test

* Update src/transformers/models/efficientloftr/modeling_efficientloftr.py

* fix docs

* Update src/transformers/models/efficientloftr/image_processing_efficientloftr.py

* update gradient

* refactor: removed unused codepath

* Add motivation to keep postprocessing in modeling code

* refactor: removed unnecessary variable declarations

* docs: use load_image from image_utils

* refactor: moved stage in and out channels computation to configuration

* refactor: set an intermediate_size parameter to be more explicit

* refactor: removed all mentions of attention masks as they are not used

* refactor: moved position_embeddings to be computed once in the model instead of every layer

* refactor: removed unnecessary hidden expansion parameter from config

* refactor: removed completely hidden expansions

* refactor: removed position embeddings slice function

* tests: fixed broken tests because of previous commit

* fix is_grayscale typehint

* not refactoring

* not renaming

* move h/w to embeddings class

* Precompute embeddings in init

* fix: replaced cuda device in convert script to accelerate device

* fix: replaced stevenbucaille repo to zju-community

* Remove accelerator.device from conversion script

* refactor: moved parameter computation in configuration instead of figuring it out when instantiating a Module

* fix: removed unused attributes in configuration

* fix: missing self

* fix: refactoring and tests

* fix: make style

---------

Co-authored-by: steven <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-07-22 10:53:16 +01:00

8.9 KiB

Auto Classes

In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you are supplying to the from_pretrained() method. AutoClasses are here to do this job for you so that you automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary.

Instantiating one of [AutoConfig], [AutoModel], and [AutoTokenizer] will directly create a class of the relevant architecture. For instance

model = AutoModel.from_pretrained("google-bert/bert-base-cased")

will create a model that is an instance of [BertModel].

There is one class of AutoModel for each task, and for each backend (PyTorch, TensorFlow, or Flax).

Extending the Auto Classes

Each of the auto classes has a method to be extended with your custom classes. For instance, if you have defined a custom class of model NewModel, make sure you have a NewModelConfig then you can add those to the auto classes like this:

from transformers import AutoConfig, AutoModel

AutoConfig.register("new-model", NewModelConfig)
AutoModel.register(NewModelConfig, NewModel)

You will then be able to use the auto classes like you would usually do!

If your NewModelConfig is a subclass of [~transformers.PretrainedConfig], make sure its model_type attribute is set to the same key you use when registering the config (here "new-model").

Likewise, if your NewModel is a subclass of [PreTrainedModel], make sure its config_class attribute is set to the same class you use when registering the model (here NewModelConfig).

AutoConfig

autodoc AutoConfig

AutoTokenizer

autodoc AutoTokenizer

AutoFeatureExtractor

autodoc AutoFeatureExtractor

AutoImageProcessor

autodoc AutoImageProcessor

AutoVideoProcessor

autodoc AutoVideoProcessor

AutoProcessor

autodoc AutoProcessor

Generic model classes

The following auto classes are available for instantiating a base model class without a specific head.

AutoModel

autodoc AutoModel

TFAutoModel

autodoc TFAutoModel

FlaxAutoModel

autodoc FlaxAutoModel

Generic pretraining classes

The following auto classes are available for instantiating a model with a pretraining head.

AutoModelForPreTraining

autodoc AutoModelForPreTraining

TFAutoModelForPreTraining

autodoc TFAutoModelForPreTraining

FlaxAutoModelForPreTraining

autodoc FlaxAutoModelForPreTraining

Natural Language Processing

The following auto classes are available for the following natural language processing tasks.

AutoModelForCausalLM

autodoc AutoModelForCausalLM

TFAutoModelForCausalLM

autodoc TFAutoModelForCausalLM

FlaxAutoModelForCausalLM

autodoc FlaxAutoModelForCausalLM

AutoModelForMaskedLM

autodoc AutoModelForMaskedLM

TFAutoModelForMaskedLM

autodoc TFAutoModelForMaskedLM

FlaxAutoModelForMaskedLM

autodoc FlaxAutoModelForMaskedLM

AutoModelForMaskGeneration

autodoc AutoModelForMaskGeneration

TFAutoModelForMaskGeneration

autodoc TFAutoModelForMaskGeneration

AutoModelForSeq2SeqLM

autodoc AutoModelForSeq2SeqLM

TFAutoModelForSeq2SeqLM

autodoc TFAutoModelForSeq2SeqLM

FlaxAutoModelForSeq2SeqLM

autodoc FlaxAutoModelForSeq2SeqLM

AutoModelForSequenceClassification

autodoc AutoModelForSequenceClassification

TFAutoModelForSequenceClassification

autodoc TFAutoModelForSequenceClassification

FlaxAutoModelForSequenceClassification

autodoc FlaxAutoModelForSequenceClassification

AutoModelForMultipleChoice

autodoc AutoModelForMultipleChoice

TFAutoModelForMultipleChoice

autodoc TFAutoModelForMultipleChoice

FlaxAutoModelForMultipleChoice

autodoc FlaxAutoModelForMultipleChoice

AutoModelForNextSentencePrediction

autodoc AutoModelForNextSentencePrediction

TFAutoModelForNextSentencePrediction

autodoc TFAutoModelForNextSentencePrediction

FlaxAutoModelForNextSentencePrediction

autodoc FlaxAutoModelForNextSentencePrediction

AutoModelForTokenClassification

autodoc AutoModelForTokenClassification

TFAutoModelForTokenClassification

autodoc TFAutoModelForTokenClassification

FlaxAutoModelForTokenClassification

autodoc FlaxAutoModelForTokenClassification

AutoModelForQuestionAnswering

autodoc AutoModelForQuestionAnswering

TFAutoModelForQuestionAnswering

autodoc TFAutoModelForQuestionAnswering

FlaxAutoModelForQuestionAnswering

autodoc FlaxAutoModelForQuestionAnswering

AutoModelForTextEncoding

autodoc AutoModelForTextEncoding

TFAutoModelForTextEncoding

autodoc TFAutoModelForTextEncoding

Computer vision

The following auto classes are available for the following computer vision tasks.

AutoModelForDepthEstimation

autodoc AutoModelForDepthEstimation

AutoModelForImageClassification

autodoc AutoModelForImageClassification

TFAutoModelForImageClassification

autodoc TFAutoModelForImageClassification

FlaxAutoModelForImageClassification

autodoc FlaxAutoModelForImageClassification

AutoModelForVideoClassification

autodoc AutoModelForVideoClassification

AutoModelForKeypointDetection

autodoc AutoModelForKeypointDetection

AutoModelForKeypointMatching

autodoc AutoModelForKeypointMatching

AutoModelForMaskedImageModeling

autodoc AutoModelForMaskedImageModeling

TFAutoModelForMaskedImageModeling

autodoc TFAutoModelForMaskedImageModeling

AutoModelForObjectDetection

autodoc AutoModelForObjectDetection

AutoModelForImageSegmentation

autodoc AutoModelForImageSegmentation

AutoModelForImageToImage

autodoc AutoModelForImageToImage

AutoModelForSemanticSegmentation

autodoc AutoModelForSemanticSegmentation

TFAutoModelForSemanticSegmentation

autodoc TFAutoModelForSemanticSegmentation

AutoModelForInstanceSegmentation

autodoc AutoModelForInstanceSegmentation

AutoModelForUniversalSegmentation

autodoc AutoModelForUniversalSegmentation

AutoModelForZeroShotImageClassification

autodoc AutoModelForZeroShotImageClassification

TFAutoModelForZeroShotImageClassification

autodoc TFAutoModelForZeroShotImageClassification

AutoModelForZeroShotObjectDetection

autodoc AutoModelForZeroShotObjectDetection

Audio

The following auto classes are available for the following audio tasks.

AutoModelForAudioClassification

autodoc AutoModelForAudioClassification

AutoModelForAudioFrameClassification

autodoc TFAutoModelForAudioClassification

TFAutoModelForAudioFrameClassification

autodoc AutoModelForAudioFrameClassification

AutoModelForCTC

autodoc AutoModelForCTC

AutoModelForSpeechSeq2Seq

autodoc AutoModelForSpeechSeq2Seq

TFAutoModelForSpeechSeq2Seq

autodoc TFAutoModelForSpeechSeq2Seq

FlaxAutoModelForSpeechSeq2Seq

autodoc FlaxAutoModelForSpeechSeq2Seq

AutoModelForAudioXVector

autodoc AutoModelForAudioXVector

AutoModelForTextToSpectrogram

autodoc AutoModelForTextToSpectrogram

AutoModelForTextToWaveform

autodoc AutoModelForTextToWaveform

AutoModelForAudioTokenization

autodoc AutoModelForAudioTokenization

Multimodal

The following auto classes are available for the following multimodal tasks.

AutoModelForTableQuestionAnswering

autodoc AutoModelForTableQuestionAnswering

TFAutoModelForTableQuestionAnswering

autodoc TFAutoModelForTableQuestionAnswering

AutoModelForDocumentQuestionAnswering

autodoc AutoModelForDocumentQuestionAnswering

TFAutoModelForDocumentQuestionAnswering

autodoc TFAutoModelForDocumentQuestionAnswering

AutoModelForVisualQuestionAnswering

autodoc AutoModelForVisualQuestionAnswering

AutoModelForVision2Seq

autodoc AutoModelForVision2Seq

TFAutoModelForVision2Seq

autodoc TFAutoModelForVision2Seq

FlaxAutoModelForVision2Seq

autodoc FlaxAutoModelForVision2Seq

AutoModelForImageTextToText

autodoc AutoModelForImageTextToText

Time Series

AutoModelForTimeSeriesPrediction

autodoc AutoModelForTimeSeriesPrediction