Deprecate low use models (#30781)

* Deprecate models - graphormer - time_series_transformer - xlm_prophetnet - qdqbert - nat - ernie_m - tvlt - nezha - mega - jukebox - vit_hybrid - x_clip - deta - speech_to_text_2 - efficientformer - realm - gptsan_japanese * Fix up * Fix speech2text2 imports * Make sure message isn't indented * Fix docstrings * Correctly map for deprecated models from model_type * Uncomment out * Add back time series transformer and x-clip * Import fix and fix-up * Fix up with updated ruff
2024-05-28 18:07:07 +01:00
parent 7f08817be4
commit a564d10afe
142 changed files with 1308 additions and 11908 deletions
--- a/docs/source/en/model_doc/deta.md
+++ b/docs/source/en/model_doc/deta.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # DETA
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 The DETA model was proposed in [NMS Strikes Back](https://arxiv.org/abs/2212.06137) by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl.
--- a/docs/source/en/model_doc/efficientformer.md
+++ b/docs/source/en/model_doc/efficientformer.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # EfficientFormer
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 The EfficientFormer model was proposed in [EfficientFormer: Vision Transformers at MobileNet Speed](https://arxiv.org/abs/2206.01191)
--- a/docs/source/en/model_doc/ernie_m.md
+++ b/docs/source/en/model_doc/ernie_m.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # ErnieM
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 The ErnieM model was proposed in [ERNIE-M: Enhanced Multilingual Representation by Aligning
--- a/docs/source/en/model_doc/gptsan-japanese.md
+++ b/docs/source/en/model_doc/gptsan-japanese.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # GPTSAN-japanese
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 The GPTSAN-japanese model was released in the repository by Toshiyuki Sakamoto (tanreinama).
--- a/docs/source/en/model_doc/graphormer.md
+++ b/docs/source/en/model_doc/graphormer.md
@@ -14,6 +14,14 @@ rendered properly in your Markdown viewer.
 # Graphormer
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 The Graphormer model was proposed in [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234)  by
--- a/docs/source/en/model_doc/jukebox.md
+++ b/docs/source/en/model_doc/jukebox.md
@@ -15,6 +15,14 @@ rendered properly in your Markdown viewer.
 -->
 # Jukebox
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 The Jukebox model was proposed in [Jukebox: A generative model for music](https://arxiv.org/pdf/2005.00341.pdf)
--- a/docs/source/en/model_doc/mega.md
+++ b/docs/source/en/model_doc/mega.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # MEGA
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 The MEGA model was proposed in [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer.
--- a/docs/source/en/model_doc/nat.md
+++ b/docs/source/en/model_doc/nat.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # Neighborhood Attention Transformer
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 NAT was proposed in [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143)
--- a/docs/source/en/model_doc/nezha.md
+++ b/docs/source/en/model_doc/nezha.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # Nezha
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 The Nezha model was proposed in [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei et al.
--- a/docs/source/en/model_doc/qdqbert.md
+++ b/docs/source/en/model_doc/qdqbert.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # QDQBERT
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 The QDQBERT model can be referenced in [Integer Quantization for Deep Learning Inference: Principles and Empirical
--- a/docs/source/en/model_doc/realm.md
+++ b/docs/source/en/model_doc/realm.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # REALM
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 The REALM model was proposed in [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang. It's a
--- a/docs/source/en/model_doc/speech_to_text_2.md
+++ b/docs/source/en/model_doc/speech_to_text_2.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # Speech2Text2
  <Tip warning={true}>
  This model is in maintenance mode only, we don't accept any new PRs changing its code.
  If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
  You can do so by running the following command: `pip install -U transformers==4.40.2`.
  </Tip>
 ## Overview
 The Speech2Text2 model is used together with [Wav2Vec2](wav2vec2) for Speech Translation models proposed in
--- a/docs/source/en/model_doc/tvlt.md
+++ b/docs/source/en/model_doc/tvlt.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # TVLT
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 The TVLT model was proposed in [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156)
--- a/docs/source/en/model_doc/vit_hybrid.md
+++ b/docs/source/en/model_doc/vit_hybrid.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # Hybrid Vision Transformer (ViT Hybrid)
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 ## Overview
 The hybrid Vision Transformer (ViT) model was proposed in [An Image is Worth 16x16 Words: Transformers for Image Recognition
--- a/docs/source/en/model_doc/xlm-prophetnet.md
+++ b/docs/source/en/model_doc/xlm-prophetnet.md
@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
 # XLM-ProphetNet
 <Tip warning={true}>
 This model is in maintenance mode only, we don't accept any new PRs changing its code.
 If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
 You can do so by running the following command: `pip install -U transformers==4.40.2`.
 </Tip>
 <div class="flex flex-wrap space-x-1">
 <a href="https://huggingface.co/models?filter=xprophetnet">
 <img alt="Models" src="https://img.shields.io/badge/All_model_pages-xprophetnet-blueviolet">
--- a/src/transformers/init.py
+++ b/src/transformers/init.py
--- a/src/transformers/models/init.py
+++ b/src/transformers/models/init.py
@@ -67,7 +67,6 @@ from . import (
    deit,
    deprecated,
    depth_anything,
    deta,
    detr,
    dialogpt,
    dinat,
@@ -77,13 +76,11 @@ from . import (
    donut,
    dpr,
    dpt,
    efficientformer,
    efficientnet,
    electra,
    encodec,
    encoder_decoder,
    ernie,
    ernie_m,
    esm,
    falcon,
    fastspeech2_conformer,
@@ -104,8 +101,6 @@ from . import (
    gpt_neox_japanese,
    gpt_sw3,
    gptj,
    gptsan_japanese,
    graphormer,
    grounding_dino,
    groupvit,
    herbert,
@@ -118,7 +113,6 @@ from . import (
    instructblip,
    jamba,
    jetmoe,
    jukebox,
    kosmos2,
    layoutlm,
    layoutlmv2,
@@ -142,7 +136,6 @@ from . import (
    maskformer,
    mbart,
    mbart50,
    mega,
    megatron_bert,
    megatron_gpt2,
    mgp_str,
@@ -161,8 +154,6 @@ from . import (
    musicgen,
    musicgen_melody,
    mvp,
    nat,
    nezha,
    nllb,
    nllb_moe,
    nougat,
@@ -190,11 +181,9 @@ from . import (
    prophetnet,
    pvt,
    pvt_v2,
    qdqbert,
    qwen2,
    qwen2_moe,
    rag,
    realm,
    recurrent_gemma,
    reformer,
    regnet,
@@ -215,7 +204,6 @@ from . import (
    siglip,
    speech_encoder_decoder,
    speech_to_text,
    speech_to_text_2,
    speecht5,
    splinter,
    squeezebert,
@@ -234,7 +222,6 @@ from . import (
    timesformer,
    timm_backbone,
    trocr,
    tvlt,
    tvp,
    udop,
    umt5,
@@ -250,7 +237,6 @@ from . import (
    vision_text_dual_encoder,
    visual_bert,
    vit,
    vit_hybrid,
    vit_mae,
    vit_msn,
    vitdet,
@@ -267,7 +253,6 @@ from . import (
    x_clip,
    xglm,
    xlm,
    xlm_prophetnet,
    xlm_roberta,
    xlm_roberta_xl,
    xlnet,
--- a/src/transformers/models/auto/configuration_auto.py
+++ b/src/transformers/models/auto/configuration_auto.py
@@ -585,14 +585,29 @@ MODEL_NAMES_MAPPING = OrderedDict(
 # `transfo-xl` (as in `CONFIG_MAPPING_NAMES`), we should use `transfo_xl`.
 DEPRECATED_MODELS = [
    "bort",
    "deta",
    "efficientformer",
    "ernie_m",
    "gptsan_japanese",
    "graphormer",
    "jukebox",
    "mctct",
    "mega",
    "mmbt",
    "nat",
    "nezha",
    "open_llama",
    "qdqbert",
    "realm",
    "retribert",
    "speech_to_text_2",
    "tapex",
    "trajectory_transformer",
    "transfo_xl",
    "tvlt",
    "van",
    "vit_hybrid",
    "xlm_prophetnet",
 ]
 SPECIAL_MODEL_TYPE_TO_MODULE_NAME = OrderedDict(
@@ -616,7 +631,11 @@ def model_type_to_module_name(key):
    """Converts a config key to the corresponding module."""
    # Special treatment
    if key in SPECIAL_MODEL_TYPE_TO_MODULE_NAME:
-        return SPECIAL_MODEL_TYPE_TO_MODULE_NAME[key]
+        key = SPECIAL_MODEL_TYPE_TO_MODULE_NAME[key]
        if key in DEPRECATED_MODELS:
            key = f"deprecated.{key}"
        return key
    key = key.replace("-", "_")
    if key in DEPRECATED_MODELS:
--- a/src/transformers/models/deprecated/deta/init.py
+++ b/src/transformers/models/deprecated/deta/init.py
@@ -14,7 +14,7 @@
 from typing import TYPE_CHECKING
-from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
+from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
 _import_structure = {
--- a/src/transformers/models/deprecated/deta/configuration_deta.py
+++ b/src/transformers/models/deprecated/deta/configuration_deta.py
@@ -14,9 +14,9 @@
 # limitations under the License.
 """DETA model configuration"""
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...utils import logging
+from ....utils import logging
-from ..auto import CONFIG_MAPPING
+from ...auto import CONFIG_MAPPING
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/deta/convert_deta_resnet_to_pytorch.py
+++ b/src/transformers/models/deprecated/deta/convert_deta_resnet_to_pytorch.py
--- a/src/transformers/models/deprecated/deta/convert_deta_swin_to_pytorch.py
+++ b/src/transformers/models/deprecated/deta/convert_deta_swin_to_pytorch.py
--- a/src/transformers/models/deprecated/deta/image_processing_deta.py
+++ b/src/transformers/models/deprecated/deta/image_processing_deta.py
@@ -19,9 +19,9 @@ from typing import Any, Callable, Dict, Iterable, List, Optional, Tuple, Union
 import numpy as np
-from ...feature_extraction_utils import BatchFeature
+from ....feature_extraction_utils import BatchFeature
-from ...image_processing_utils import BaseImageProcessor, get_size_dict
+from ....image_processing_utils import BaseImageProcessor, get_size_dict
-from ...image_transforms import (
+from ....image_transforms import (
    PaddingMode,
    center_to_corners_format,
    corners_to_center_format,
@@ -31,7 +31,7 @@ from ...image_transforms import (
    rgb_to_id,
    to_channel_dimension_format,
 )
-from ...image_utils import (
+from ....image_utils import (
    IMAGENET_DEFAULT_MEAN,
    IMAGENET_DEFAULT_STD,
    AnnotationFormat,
@@ -48,7 +48,7 @@ from ...image_utils import (
    validate_annotations,
    validate_preprocess_arguments,
 )
-from ...utils import (
+from ....utils import (
    is_flax_available,
    is_jax_tensor,
    is_tf_available,
@@ -59,7 +59,7 @@ from ...utils import (
    is_vision_available,
    logging,
 )
-from ...utils.generic import TensorType
+from ....utils.generic import TensorType
 if is_torch_available():
--- a/src/transformers/models/deprecated/deta/modeling_deta.py
+++ b/src/transformers/models/deprecated/deta/modeling_deta.py
@@ -28,8 +28,8 @@ from torch import Tensor, nn
 from torch.autograd import Function
 from torch.autograd.function import once_differentiable
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...file_utils import (
+from ....file_utils import (
    ModelOutput,
    add_start_docstrings,
    add_start_docstrings_to_model_forward,
@@ -38,12 +38,12 @@ from ...file_utils import (
    is_vision_available,
    replace_return_docstrings,
 )
-from ...modeling_attn_mask_utils import _prepare_4d_attention_mask
+from ....modeling_attn_mask_utils import _prepare_4d_attention_mask
-from ...modeling_outputs import BaseModelOutput
+from ....modeling_outputs import BaseModelOutput
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...pytorch_utils import meshgrid
+from ....pytorch_utils import meshgrid
-from ...utils import is_accelerate_available, is_ninja_available, is_torchvision_available, logging, requires_backends
+from ....utils import is_accelerate_available, is_ninja_available, is_torchvision_available, logging, requires_backends
-from ...utils.backbone_utils import load_backbone
+from ....utils.backbone_utils import load_backbone
 from .configuration_deta import DetaConfig
--- a/src/transformers/models/deprecated/efficientformer/init.py
+++ b/src/transformers/models/deprecated/efficientformer/init.py
@@ -13,7 +13,7 @@
 # limitations under the License.
 from typing import TYPE_CHECKING
-from ...utils import (
+from ....utils import (
    OptionalDependencyNotAvailable,
    _LazyModule,
    is_tf_available,
--- a/src/transformers/models/deprecated/efficientformer/configuration_efficientformer.py
+++ b/src/transformers/models/deprecated/efficientformer/configuration_efficientformer.py
@@ -16,8 +16,8 @@
 from typing import List
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/efficientformer/convert_efficientformer_original_pytorch_checkpoint_to_pytorch.py
+++ b/src/transformers/models/deprecated/efficientformer/convert_efficientformer_original_pytorch_checkpoint_to_pytorch.py
--- a/src/transformers/models/deprecated/efficientformer/image_processing_efficientformer.py
+++ b/src/transformers/models/deprecated/efficientformer/image_processing_efficientformer.py
@@ -18,13 +18,13 @@ from typing import Dict, List, Optional, Union
 import numpy as np
-from ...image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
+from ....image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
-from ...image_transforms import (
+from ....image_transforms import (
    get_resize_output_image_size,
    resize,
    to_channel_dimension_format,
 )
-from ...image_utils import (
+from ....image_utils import (
    IMAGENET_DEFAULT_MEAN,
    IMAGENET_DEFAULT_STD,
    ChannelDimension,
@@ -38,7 +38,7 @@ from ...image_utils import (
    validate_kwargs,
    validate_preprocess_arguments,
 )
-from ...utils import TensorType, logging
+from ....utils import TensorType, logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/efficientformer/modeling_efficientformer.py
+++ b/src/transformers/models/deprecated/efficientformer/modeling_efficientformer.py
@@ -23,10 +23,10 @@ import torch.utils.checkpoint
 from torch import nn
 from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, ImageClassifierOutput
+from ....modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, ImageClassifierOutput
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...utils import (
+from ....utils import (
    ModelOutput,
    add_code_sample_docstrings,
    add_start_docstrings,
--- a/src/transformers/models/deprecated/efficientformer/modeling_tf_efficientformer.py
+++ b/src/transformers/models/deprecated/efficientformer/modeling_tf_efficientformer.py
@@ -20,13 +20,13 @@ from typing import Optional, Tuple, Union
 import tensorflow as tf
-from ...activations_tf import ACT2FN
+from ....activations_tf import ACT2FN
-from ...modeling_tf_outputs import (
+from ....modeling_tf_outputs import (
    TFBaseModelOutput,
    TFBaseModelOutputWithPooling,
    TFImageClassifierOutput,
 )
-from ...modeling_tf_utils import (
+from ....modeling_tf_utils import (
    TFPreTrainedModel,
    TFSequenceClassificationLoss,
    get_initializer,
@@ -34,8 +34,8 @@ from ...modeling_tf_utils import (
    keras_serializable,
    unpack_inputs,
 )
-from ...tf_utils import shape_list, stable_softmax
+from ....tf_utils import shape_list, stable_softmax
-from ...utils import (
+from ....utils import (
    ModelOutput,
    add_code_sample_docstrings,
    add_start_docstrings,
--- a/src/transformers/models/deprecated/ernie_m/init.py
+++ b/src/transformers/models/deprecated/ernie_m/init.py
@@ -14,7 +14,7 @@
 from typing import TYPE_CHECKING
 # rely on isort to merge the imports
-from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_sentencepiece_available, is_torch_available
+from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_sentencepiece_available, is_torch_available
 _import_structure = {
--- a/src/transformers/models/deprecated/ernie_m/configuration_ernie_m.py
+++ b/src/transformers/models/deprecated/ernie_m/configuration_ernie_m.py
@@ -19,7 +19,7 @@ from __future__ import annotations
 from typing import Dict
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
 class ErnieMConfig(PretrainedConfig):
--- a/src/transformers/models/deprecated/ernie_m/modeling_ernie_m.py
+++ b/src/transformers/models/deprecated/ernie_m/modeling_ernie_m.py
@@ -22,8 +22,8 @@ import torch.utils.checkpoint
 from torch import nn, tensor
 from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_outputs import (
+from ....modeling_outputs import (
    BaseModelOutputWithPastAndCrossAttentions,
    BaseModelOutputWithPoolingAndCrossAttentions,
    MultipleChoiceModelOutput,
@@ -31,9 +31,9 @@ from ...modeling_outputs import (
    SequenceClassifierOutput,
    TokenClassifierOutput,
 )
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
+from ....pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
-from ...utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
+from ....utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
 from .configuration_ernie_m import ErnieMConfig
--- a/src/transformers/models/deprecated/ernie_m/tokenization_ernie_m.py
+++ b/src/transformers/models/deprecated/ernie_m/tokenization_ernie_m.py
@@ -21,8 +21,8 @@ from typing import Any, Dict, List, Optional, Tuple
 import sentencepiece as spm
-from ...tokenization_utils import PreTrainedTokenizer
+from ....tokenization_utils import PreTrainedTokenizer
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/gptsan_japanese/init.py
+++ b/src/transformers/models/deprecated/gptsan_japanese/init.py
@@ -14,7 +14,7 @@
 from typing import TYPE_CHECKING
-from ...utils import (
+from ....utils import (
    OptionalDependencyNotAvailable,
    _LazyModule,
    is_flax_available,
--- a/src/transformers/models/deprecated/gptsan_japanese/configuration_gptsan_japanese.py
+++ b/src/transformers/models/deprecated/gptsan_japanese/configuration_gptsan_japanese.py
@@ -14,8 +14,8 @@
 # limitations under the License.
 """GPTSAN-japanese model configuration"""
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/gptsan_japanese/convert_gptsan_tf_checkpoint_to_pytorch.py
+++ b/src/transformers/models/deprecated/gptsan_japanese/convert_gptsan_tf_checkpoint_to_pytorch.py
--- a/src/transformers/models/deprecated/gptsan_japanese/modeling_gptsan_japanese.py
+++ b/src/transformers/models/deprecated/gptsan_japanese/modeling_gptsan_japanese.py
@@ -20,10 +20,10 @@ from typing import List, Optional, Tuple, Union
 import torch
 import torch.nn as nn
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_outputs import MoECausalLMOutputWithPast, MoEModelOutputWithPastAndCrossAttentions
+from ....modeling_outputs import MoECausalLMOutputWithPast, MoEModelOutputWithPastAndCrossAttentions
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...utils import (
+from ....utils import (
    DUMMY_INPUTS,
    DUMMY_MASK,
    add_start_docstrings,
--- a/src/transformers/models/deprecated/gptsan_japanese/tokenization_gptsan_japanese.py
+++ b/src/transformers/models/deprecated/gptsan_japanese/tokenization_gptsan_japanese.py
@@ -22,8 +22,8 @@ from typing import List, Optional, Tuple, Union
 import numpy as np
-from ...tokenization_utils import PreTrainedTokenizer
+from ....tokenization_utils import PreTrainedTokenizer
-from ...tokenization_utils_base import (
+from ....tokenization_utils_base import (
    BatchEncoding,
    PreTokenizedInput,
    PreTokenizedInputPair,
@@ -31,7 +31,7 @@ from ...tokenization_utils_base import (
    TextInputPair,
    TruncationStrategy,
 )
-from ...utils import PaddingStrategy, logging
+from ....utils import PaddingStrategy, logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/graphormer/init.py
+++ b/src/transformers/models/deprecated/graphormer/init.py
@@ -13,7 +13,7 @@
 # limitations under the License.
 from typing import TYPE_CHECKING
-from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available
+from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available
 _import_structure = {
--- a/src/transformers/models/deprecated/graphormer/algos_graphormer.pyx
+++ b/src/transformers/models/deprecated/graphormer/algos_graphormer.pyx
--- a/src/transformers/models/deprecated/graphormer/collating_graphormer.py
+++ b/src/transformers/models/deprecated/graphormer/collating_graphormer.py
@@ -6,7 +6,7 @@ from typing import Any, Dict, List, Mapping
 import numpy as np
 import torch
-from ...utils import is_cython_available, requires_backends
+from ....utils import is_cython_available, requires_backends
 if is_cython_available():
--- a/src/transformers/models/deprecated/graphormer/configuration_graphormer.py
+++ b/src/transformers/models/deprecated/graphormer/configuration_graphormer.py
@@ -14,8 +14,8 @@
 # limitations under the License.
 """Graphormer model configuration"""
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/graphormer/modeling_graphormer.py
+++ b/src/transformers/models/deprecated/graphormer/modeling_graphormer.py
@@ -21,13 +21,13 @@ import torch
 import torch.nn as nn
 from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_outputs import (
+from ....modeling_outputs import (
    BaseModelOutputWithNoAttention,
    SequenceClassifierOutput,
 )
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...utils import logging
+from ....utils import logging
 from .configuration_graphormer import GraphormerConfig
--- a/src/transformers/models/deprecated/jukebox/init.py
+++ b/src/transformers/models/deprecated/jukebox/init.py
@@ -14,7 +14,7 @@
 from typing import TYPE_CHECKING
-from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available
+from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available
 _import_structure = {
--- a/src/transformers/models/deprecated/jukebox/configuration_jukebox.py
+++ b/src/transformers/models/deprecated/jukebox/configuration_jukebox.py
@@ -17,8 +17,8 @@
 import os
 from typing import List, Union
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/jukebox/convert_jukebox.py
+++ b/src/transformers/models/deprecated/jukebox/convert_jukebox.py
--- a/src/transformers/models/deprecated/jukebox/modeling_jukebox.py
+++ b/src/transformers/models/deprecated/jukebox/modeling_jukebox.py
@@ -24,10 +24,10 @@ import torch.nn.functional as F
 from torch import nn
 from torch.nn import LayerNorm as FusedLayerNorm
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...utils import add_start_docstrings, logging
+from ....utils import add_start_docstrings, logging
-from ...utils.logging import tqdm
+from ....utils.logging import tqdm
 from .configuration_jukebox import ATTENTION_PATTERNS, JukeboxConfig, JukeboxPriorConfig, JukeboxVQVAEConfig
--- a/src/transformers/models/deprecated/jukebox/tokenization_jukebox.py
+++ b/src/transformers/models/deprecated/jukebox/tokenization_jukebox.py
@@ -24,10 +24,10 @@ from typing import Any, Dict, List, Optional, Tuple, Union
 import numpy as np
 import regex
-from ...tokenization_utils import AddedToken, PreTrainedTokenizer
+from ....tokenization_utils import AddedToken, PreTrainedTokenizer
-from ...tokenization_utils_base import BatchEncoding
+from ....tokenization_utils_base import BatchEncoding
-from ...utils import TensorType, is_flax_available, is_tf_available, is_torch_available, logging
+from ....utils import TensorType, is_flax_available, is_tf_available, is_torch_available, logging
-from ...utils.generic import _is_jax, _is_numpy
+from ....utils.generic import _is_jax, _is_numpy
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/mega/init.py
+++ b/src/transformers/models/deprecated/mega/init.py
@@ -14,7 +14,7 @@
 from typing import TYPE_CHECKING
-from ...utils import (
+from ....utils import (
    OptionalDependencyNotAvailable,
    _LazyModule,
    is_torch_available,
--- a/src/transformers/models/deprecated/mega/configuration_mega.py
+++ b/src/transformers/models/deprecated/mega/configuration_mega.py
@@ -17,9 +17,9 @@
 from collections import OrderedDict
 from typing import Mapping
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...onnx import OnnxConfig
+from ....onnx import OnnxConfig
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/mega/convert_mega_original_pytorch_checkpoint_to_pytorch.py
+++ b/src/transformers/models/deprecated/mega/convert_mega_original_pytorch_checkpoint_to_pytorch.py
--- a/src/transformers/models/deprecated/mega/modeling_mega.py
+++ b/src/transformers/models/deprecated/mega/modeling_mega.py
@@ -23,8 +23,8 @@ import torch.utils.checkpoint
 from torch import nn
 from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_outputs import (
+from ....modeling_outputs import (
    BaseModelOutputWithPoolingAndCrossAttentions,
    CausalLMOutputWithCrossAttentions,
    MaskedLMOutput,
@@ -33,9 +33,9 @@ from ...modeling_outputs import (
    SequenceClassifierOutput,
    TokenClassifierOutput,
 )
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...pytorch_utils import ALL_LAYERNORM_LAYERS
+from ....pytorch_utils import ALL_LAYERNORM_LAYERS
-from ...utils import (
+from ....utils import (
    add_code_sample_docstrings,
    add_start_docstrings,
    add_start_docstrings_to_model_forward,
--- a/src/transformers/models/deprecated/nat/init.py
+++ b/src/transformers/models/deprecated/nat/init.py
@@ -13,7 +13,7 @@
 # limitations under the License.
 from typing import TYPE_CHECKING
-from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available
+from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available
 _import_structure = {"configuration_nat": ["NatConfig"]}
--- a/src/transformers/models/deprecated/nat/configuration_nat.py
+++ b/src/transformers/models/deprecated/nat/configuration_nat.py
@@ -14,9 +14,9 @@
 # limitations under the License.
 """Neighborhood Attention Transformer model configuration"""
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...utils import logging
+from ....utils import logging
-from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
+from ....utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/nat/modeling_nat.py
+++ b/src/transformers/models/deprecated/nat/modeling_nat.py
@@ -23,11 +23,11 @@ import torch.utils.checkpoint
 from torch import nn
 from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_outputs import BackboneOutput
+from ....modeling_outputs import BackboneOutput
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
+from ....pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
-from ...utils import (
+from ....utils import (
    ModelOutput,
    OptionalDependencyNotAvailable,
    add_code_sample_docstrings,
@@ -38,7 +38,7 @@ from ...utils import (
    replace_return_docstrings,
    requires_backends,
 )
-from ...utils.backbone_utils import BackboneMixin
+from ....utils.backbone_utils import BackboneMixin
 from .configuration_nat import NatConfig
--- a/src/transformers/models/deprecated/nezha/init.py
+++ b/src/transformers/models/deprecated/nezha/init.py
@@ -13,7 +13,7 @@
 # limitations under the License.
 from typing import TYPE_CHECKING
-from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available
+from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available
 _import_structure = {
--- a/src/transformers/models/deprecated/nezha/configuration_nezha.py
+++ b/src/transformers/models/deprecated/nezha/configuration_nezha.py
@@ -1,4 +1,4 @@
-from ... import PretrainedConfig
+from .... import PretrainedConfig
 class NezhaConfig(PretrainedConfig):
--- a/src/transformers/models/deprecated/nezha/modeling_nezha.py
+++ b/src/transformers/models/deprecated/nezha/modeling_nezha.py
@@ -25,8 +25,8 @@ import torch.utils.checkpoint
 from torch import nn
 from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_outputs import (
+from ....modeling_outputs import (
    BaseModelOutputWithPastAndCrossAttentions,
    BaseModelOutputWithPoolingAndCrossAttentions,
    MaskedLMOutput,
@@ -36,9 +36,9 @@ from ...modeling_outputs import (
    SequenceClassifierOutput,
    TokenClassifierOutput,
 )
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...pytorch_utils import apply_chunking_to_forward, find_pruneable_heads_and_indices, prune_linear_layer
+from ....pytorch_utils import apply_chunking_to_forward, find_pruneable_heads_and_indices, prune_linear_layer
-from ...utils import (
+from ....utils import (
    ModelOutput,
    add_code_sample_docstrings,
    add_start_docstrings,
--- a/src/transformers/models/deprecated/qdqbert/init.py
+++ b/src/transformers/models/deprecated/qdqbert/init.py
@@ -13,7 +13,7 @@
 # limitations under the License.
 from typing import TYPE_CHECKING
-from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available
+from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available
 _import_structure = {"configuration_qdqbert": ["QDQBertConfig"]}
--- a/src/transformers/models/deprecated/qdqbert/configuration_qdqbert.py
+++ b/src/transformers/models/deprecated/qdqbert/configuration_qdqbert.py
@@ -14,8 +14,8 @@
 # limitations under the License.
 """QDQBERT model configuration"""
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/qdqbert/modeling_qdqbert.py
+++ b/src/transformers/models/deprecated/qdqbert/modeling_qdqbert.py
@@ -25,8 +25,8 @@ import torch.utils.checkpoint
 from torch import nn
 from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_outputs import (
+from ....modeling_outputs import (
    BaseModelOutputWithPastAndCrossAttentions,
    BaseModelOutputWithPoolingAndCrossAttentions,
    CausalLMOutputWithCrossAttentions,
@@ -37,9 +37,9 @@ from ...modeling_outputs import (
    SequenceClassifierOutput,
    TokenClassifierOutput,
 )
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
+from ....pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
-from ...utils import (
+from ....utils import (
    add_code_sample_docstrings,
    add_start_docstrings,
    add_start_docstrings_to_model_forward,
--- a/src/transformers/models/deprecated/realm/init.py
+++ b/src/transformers/models/deprecated/realm/init.py
@@ -13,7 +13,7 @@
 # limitations under the License.
 from typing import TYPE_CHECKING
-from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available
+from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available
 _import_structure = {
--- a/src/transformers/models/deprecated/realm/configuration_realm.py
+++ b/src/transformers/models/deprecated/realm/configuration_realm.py
@@ -14,8 +14,8 @@
 # limitations under the License.
 """REALM model configuration."""
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/realm/modeling_realm.py
+++ b/src/transformers/models/deprecated/realm/modeling_realm.py
@@ -23,16 +23,16 @@ import torch
 from torch import nn
 from torch.nn import CrossEntropyLoss
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_outputs import (
+from ....modeling_outputs import (
    BaseModelOutputWithPastAndCrossAttentions,
    BaseModelOutputWithPoolingAndCrossAttentions,
    MaskedLMOutput,
    ModelOutput,
 )
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...pytorch_utils import apply_chunking_to_forward, find_pruneable_heads_and_indices, prune_linear_layer
+from ....pytorch_utils import apply_chunking_to_forward, find_pruneable_heads_and_indices, prune_linear_layer
-from ...utils import add_start_docstrings, add_start_docstrings_to_model_forward, logging, replace_return_docstrings
+from ....utils import add_start_docstrings, add_start_docstrings_to_model_forward, logging, replace_return_docstrings
 from .configuration_realm import RealmConfig
--- a/src/transformers/models/deprecated/realm/retrieval_realm.py
+++ b/src/transformers/models/deprecated/realm/retrieval_realm.py
@@ -20,8 +20,8 @@ from typing import Optional, Union
 import numpy as np
 from huggingface_hub import hf_hub_download
-from ... import AutoTokenizer
+from .... import AutoTokenizer
-from ...utils import logging
+from ....utils import logging
 _REALM_BLOCK_RECORDS_FILENAME = "block_records.npy"
--- a/src/transformers/models/deprecated/realm/tokenization_realm.py
+++ b/src/transformers/models/deprecated/realm/tokenization_realm.py
@@ -19,9 +19,9 @@ import os
 import unicodedata
 from typing import List, Optional, Tuple
-from ...tokenization_utils import PreTrainedTokenizer, _is_control, _is_punctuation, _is_whitespace
+from ....tokenization_utils import PreTrainedTokenizer, _is_control, _is_punctuation, _is_whitespace
-from ...tokenization_utils_base import BatchEncoding
+from ....tokenization_utils_base import BatchEncoding
-from ...utils import PaddingStrategy, logging
+from ....utils import PaddingStrategy, logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/realm/tokenization_realm_fast.py
+++ b/src/transformers/models/deprecated/realm/tokenization_realm_fast.py
@@ -19,9 +19,9 @@ from typing import List, Optional, Tuple
 from tokenizers import normalizers
-from ...tokenization_utils_base import BatchEncoding
+from ....tokenization_utils_base import BatchEncoding
-from ...tokenization_utils_fast import PreTrainedTokenizerFast
+from ....tokenization_utils_fast import PreTrainedTokenizerFast
-from ...utils import PaddingStrategy, logging
+from ....utils import PaddingStrategy, logging
 from .tokenization_realm import RealmTokenizer
--- a/src/transformers/models/deprecated/speech_to_text_2/init.py
+++ b/src/transformers/models/deprecated/speech_to_text_2/init.py
@@ -13,7 +13,7 @@
 # limitations under the License.
 from typing import TYPE_CHECKING
-from ...utils import (
+from ....utils import (
    OptionalDependencyNotAvailable,
    _LazyModule,
    is_sentencepiece_available,
--- a/src/transformers/models/deprecated/speech_to_text_2/configuration_speech_to_text_2.py
+++ b/src/transformers/models/deprecated/speech_to_text_2/configuration_speech_to_text_2.py
@@ -14,8 +14,8 @@
 # limitations under the License.
 """Speech2Text model configuration"""
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/speech_to_text_2/modeling_speech_to_text_2.py
+++ b/src/transformers/models/deprecated/speech_to_text_2/modeling_speech_to_text_2.py
@@ -22,11 +22,11 @@ import torch
 from torch import nn
 from torch.nn import CrossEntropyLoss
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_attn_mask_utils import _prepare_4d_attention_mask, _prepare_4d_causal_attention_mask
+from ....modeling_attn_mask_utils import _prepare_4d_attention_mask, _prepare_4d_causal_attention_mask
-from ...modeling_outputs import BaseModelOutputWithPastAndCrossAttentions, CausalLMOutputWithCrossAttentions
+from ....modeling_outputs import BaseModelOutputWithPastAndCrossAttentions, CausalLMOutputWithCrossAttentions
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...utils import add_start_docstrings, logging, replace_return_docstrings
+from ....utils import add_start_docstrings, logging, replace_return_docstrings
 from .configuration_speech_to_text_2 import Speech2Text2Config
--- a/src/transformers/models/deprecated/speech_to_text_2/processing_speech_to_text_2.py
+++ b/src/transformers/models/deprecated/speech_to_text_2/processing_speech_to_text_2.py
@@ -19,7 +19,7 @@ Speech processor class for Speech2Text2
 import warnings
 from contextlib import contextmanager
-from ...processing_utils import ProcessorMixin
+from ....processing_utils import ProcessorMixin
 class Speech2Text2Processor(ProcessorMixin):
--- a/src/transformers/models/deprecated/speech_to_text_2/tokenization_speech_to_text_2.py
+++ b/src/transformers/models/deprecated/speech_to_text_2/tokenization_speech_to_text_2.py
@@ -18,8 +18,8 @@ import json
 import os
 from typing import Dict, List, Optional, Tuple
-from ...tokenization_utils import PreTrainedTokenizer
+from ....tokenization_utils import PreTrainedTokenizer
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/tvlt/init.py
+++ b/src/transformers/models/deprecated/tvlt/init.py
@@ -17,7 +17,7 @@
 # limitations under the License.
 from typing import TYPE_CHECKING
-from ...utils import (
+from ....utils import (
    OptionalDependencyNotAvailable,
    _LazyModule,
    is_torch_available,
--- a/src/transformers/models/deprecated/tvlt/configuration_tvlt.py
+++ b/src/transformers/models/deprecated/tvlt/configuration_tvlt.py
@@ -14,8 +14,8 @@
 # limitations under the License.
 """TVLT model configuration"""
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/tvlt/feature_extraction_tvlt.py
+++ b/src/transformers/models/deprecated/tvlt/feature_extraction_tvlt.py
@@ -19,9 +19,9 @@ from typing import List, Optional, Union
 import numpy as np
-from ...audio_utils import mel_filter_bank, spectrogram, window_function
+from ....audio_utils import mel_filter_bank, spectrogram, window_function
-from ...feature_extraction_sequence_utils import BatchFeature, SequenceFeatureExtractor
+from ....feature_extraction_sequence_utils import BatchFeature, SequenceFeatureExtractor
-from ...utils import TensorType, logging
+from ....utils import TensorType, logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/tvlt/image_processing_tvlt.py
+++ b/src/transformers/models/deprecated/tvlt/image_processing_tvlt.py
@@ -18,13 +18,13 @@ from typing import Dict, List, Optional, Union
 import numpy as np
-from ...image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
+from ....image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
-from ...image_transforms import (
+from ....image_transforms import (
    get_resize_output_image_size,
    resize,
    to_channel_dimension_format,
 )
-from ...image_utils import (
+from ....image_utils import (
    IMAGENET_STANDARD_MEAN,
    IMAGENET_STANDARD_STD,
    ChannelDimension,
@@ -38,7 +38,7 @@ from ...image_utils import (
    validate_kwargs,
    validate_preprocess_arguments,
 )
-from ...utils import TensorType, logging
+from ....utils import TensorType, logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/tvlt/modeling_tvlt.py
+++ b/src/transformers/models/deprecated/tvlt/modeling_tvlt.py
@@ -25,11 +25,11 @@ import torch.utils.checkpoint
 from torch import nn
 from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_outputs import BaseModelOutput, SequenceClassifierOutput
+from ....modeling_outputs import BaseModelOutput, SequenceClassifierOutput
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
+from ....pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
-from ...utils import (
+from ....utils import (
    ModelOutput,
    add_start_docstrings,
    add_start_docstrings_to_model_forward,
--- a/src/transformers/models/deprecated/tvlt/processing_tvlt.py
+++ b/src/transformers/models/deprecated/tvlt/processing_tvlt.py
@@ -16,7 +16,7 @@
 Processor class for TVLT.
 """
-from ...processing_utils import ProcessorMixin
+from ....processing_utils import ProcessorMixin
 class TvltProcessor(ProcessorMixin):
--- a/src/transformers/models/deprecated/vit_hybrid/init.py
+++ b/src/transformers/models/deprecated/vit_hybrid/init.py
@@ -13,7 +13,7 @@
 # limitations under the License.
 from typing import TYPE_CHECKING
-from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
+from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
 _import_structure = {"configuration_vit_hybrid": ["ViTHybridConfig"]}
--- a/src/transformers/models/deprecated/vit_hybrid/configuration_vit_hybrid.py
+++ b/src/transformers/models/deprecated/vit_hybrid/configuration_vit_hybrid.py
@@ -14,10 +14,10 @@
 # limitations under the License.
 """ViT Hybrid model configuration"""
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...utils import logging
+from ....utils import logging
-from ..auto.configuration_auto import CONFIG_MAPPING
+from ...auto.configuration_auto import CONFIG_MAPPING
-from ..bit import BitConfig
+from ...bit import BitConfig
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/vit_hybrid/convert_vit_hybrid_timm_to_pytorch.py
+++ b/src/transformers/models/deprecated/vit_hybrid/convert_vit_hybrid_timm_to_pytorch.py
--- a/src/transformers/models/deprecated/vit_hybrid/image_processing_vit_hybrid.py
+++ b/src/transformers/models/deprecated/vit_hybrid/image_processing_vit_hybrid.py
@@ -18,14 +18,14 @@ from typing import Dict, List, Optional, Union
 import numpy as np
-from ...image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
+from ....image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
-from ...image_transforms import (
+from ....image_transforms import (
    convert_to_rgb,
    get_resize_output_image_size,
    resize,
    to_channel_dimension_format,
 )
-from ...image_utils import (
+from ....image_utils import (
    OPENAI_CLIP_MEAN,
    OPENAI_CLIP_STD,
    ChannelDimension,
@@ -39,7 +39,7 @@ from ...image_utils import (
    validate_kwargs,
    validate_preprocess_arguments,
 )
-from ...utils import TensorType, is_vision_available, logging
+from ....utils import TensorType, is_vision_available, logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/vit_hybrid/modeling_vit_hybrid.py
+++ b/src/transformers/models/deprecated/vit_hybrid/modeling_vit_hybrid.py
@@ -23,12 +23,12 @@ import torch.utils.checkpoint
 from torch import nn
 from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, ImageClassifierOutput
+from ....modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, ImageClassifierOutput
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
+from ....pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
-from ...utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
+from ....utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
-from ...utils.backbone_utils import load_backbone
+from ....utils.backbone_utils import load_backbone
 from .configuration_vit_hybrid import ViTHybridConfig
--- a/src/transformers/models/deprecated/xlm_prophetnet/init.py
+++ b/src/transformers/models/deprecated/xlm_prophetnet/init.py
@@ -13,7 +13,7 @@
 # limitations under the License.
 from typing import TYPE_CHECKING
-from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_sentencepiece_available, is_torch_available
+from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_sentencepiece_available, is_torch_available
 _import_structure = {
--- a/src/transformers/models/deprecated/xlm_prophetnet/configuration_xlm_prophetnet.py
+++ b/src/transformers/models/deprecated/xlm_prophetnet/configuration_xlm_prophetnet.py
@@ -16,8 +16,8 @@
 from typing import Callable, Optional, Union
-from ...configuration_utils import PretrainedConfig
+from ....configuration_utils import PretrainedConfig
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/deprecated/xlm_prophetnet/modeling_xlm_prophetnet.py
+++ b/src/transformers/models/deprecated/xlm_prophetnet/modeling_xlm_prophetnet.py
@@ -25,10 +25,10 @@ import torch.utils.checkpoint
 from torch import Tensor, nn
 from torch.nn import LayerNorm
-from ...activations import ACT2FN
+from ....activations import ACT2FN
-from ...modeling_outputs import BaseModelOutput
+from ....modeling_outputs import BaseModelOutput
-from ...modeling_utils import PreTrainedModel
+from ....modeling_utils import PreTrainedModel
-from ...utils import (
+from ....utils import (
    ModelOutput,
    add_start_docstrings,
    add_start_docstrings_to_model_forward,
--- a/src/transformers/models/deprecated/xlm_prophetnet/tokenization_xlm_prophetnet.py
+++ b/src/transformers/models/deprecated/xlm_prophetnet/tokenization_xlm_prophetnet.py
@@ -18,8 +18,8 @@ import os
 from shutil import copyfile
 from typing import Any, Dict, List, Optional, Tuple
-from ...tokenization_utils import PreTrainedTokenizer
+from ....tokenization_utils import PreTrainedTokenizer
-from ...utils import logging
+from ....utils import logging
 logger = logging.get_logger(__name__)
--- a/src/transformers/models/dinat/modeling_dinat.py
+++ b/src/transformers/models/dinat/modeling_dinat.py
@@ -71,7 +71,6 @@ _IMAGE_CLASS_EXPECTED_OUTPUT = "tabby, tabby cat"
@dataclass
 # Copied from transformers.models.nat.modeling_nat.NatEncoderOutput with Nat->Dinat
 class DinatEncoderOutput(ModelOutput):
    """
    Dinat encoder's outputs, with potential hidden states and attentions.
@@ -105,7 +104,6 @@ class DinatEncoderOutput(ModelOutput):
@dataclass
 # Copied from transformers.models.nat.modeling_nat.NatModelOutput with Nat->Dinat
 class DinatModelOutput(ModelOutput):
    """
    Dinat model's outputs that also contains a pooling of the last hidden states.
@@ -142,7 +140,6 @@ class DinatModelOutput(ModelOutput):
@dataclass
 # Copied from transformers.models.nat.modeling_nat.NatImageClassifierOutput with Nat->Dinat
 class DinatImageClassifierOutput(ModelOutput):
    """
    Dinat outputs for image classification.
@@ -178,7 +175,6 @@ class DinatImageClassifierOutput(ModelOutput):
    reshaped_hidden_states: Optional[Tuple[torch.FloatTensor, ...]] = None
 # Copied from transformers.models.nat.modeling_nat.NatEmbeddings with Nat->Dinat
 class DinatEmbeddings(nn.Module):
    """
    Construct the patch and position embeddings.
@@ -201,7 +197,6 @@ class DinatEmbeddings(nn.Module):
        return embeddings
 # Copied from transformers.models.nat.modeling_nat.NatPatchEmbeddings with Nat->Dinat
 class DinatPatchEmbeddings(nn.Module):
    """
    This class turns `pixel_values` of shape `(batch_size, num_channels, height, width)` into the initial
@@ -238,7 +233,6 @@ class DinatPatchEmbeddings(nn.Module):
        return embeddings
 # Copied from transformers.models.nat.modeling_nat.NatDownsampler with Nat->Dinat
 class DinatDownsampler(nn.Module):
    """
    Convolutional Downsampling Layer.
@@ -321,7 +315,6 @@ class NeighborhoodAttention(nn.Module):
        self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
    # Copied from transformers.models.nat.modeling_nat.NeighborhoodAttention.transpose_for_scores with Nat->Dinat
    def transpose_for_scores(self, x):
        new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)
        x = x.view(new_x_shape)
@@ -361,7 +354,6 @@ class NeighborhoodAttention(nn.Module):
        return outputs
 # Copied from transformers.models.nat.modeling_nat.NeighborhoodAttentionOutput
 class NeighborhoodAttentionOutput(nn.Module):
    def __init__(self, config, dim):
        super().__init__()
@@ -382,7 +374,6 @@ class NeighborhoodAttentionModule(nn.Module):
        self.output = NeighborhoodAttentionOutput(config, dim)
        self.pruned_heads = set()
    # Copied from transformers.models.nat.modeling_nat.NeighborhoodAttentionModule.prune_heads
    def prune_heads(self, heads):
        if len(heads) == 0:
            return
@@ -401,7 +392,6 @@ class NeighborhoodAttentionModule(nn.Module):
        self.self.all_head_size = self.self.attention_head_size * self.self.num_attention_heads
        self.pruned_heads = self.pruned_heads.union(heads)
    # Copied from transformers.models.nat.modeling_nat.NeighborhoodAttentionModule.forward
    def forward(
        self,
        hidden_states: torch.Tensor,
@@ -413,7 +403,6 @@ class NeighborhoodAttentionModule(nn.Module):
        return outputs
 # Copied from transformers.models.nat.modeling_nat.NatIntermediate with Nat->Dinat
 class DinatIntermediate(nn.Module):
    def __init__(self, config, dim):
        super().__init__()
@@ -429,7 +418,6 @@ class DinatIntermediate(nn.Module):
        return hidden_states
 # Copied from transformers.models.nat.modeling_nat.NatOutput with Nat->Dinat
 class DinatOutput(nn.Module):
    def __init__(self, config, dim):
        super().__init__()
@@ -539,7 +527,6 @@ class DinatStage(nn.Module):
        self.pointing = False
    # Copied from transformers.models.nat.modeling_nat.NatStage.forward
    def forward(
        self,
        hidden_states: torch.Tensor,
@@ -582,7 +569,6 @@ class DinatEncoder(nn.Module):
            ]
        )
    # Copied from transformers.models.nat.modeling_nat.NatEncoder.forward with Nat->Dinat
    def forward(
        self,
        hidden_states: torch.Tensor,
@@ -687,7 +673,6 @@ DINAT_INPUTS_DOCSTRING = r"""
    "The bare Dinat Model transformer outputting raw hidden-states without any specific head on top.",
    DINAT_START_DOCSTRING,
 )
 # Copied from transformers.models.nat.modeling_nat.NatModel with Nat->Dinat, NAT->DINAT
 class DinatModel(DinatPreTrainedModel):
    def __init__(self, config, add_pooling_layer=True):
        super().__init__(config)
--- a/src/transformers/utils/dummy_pt_objects.py
+++ b/src/transformers/utils/dummy_pt_objects.py
--- a/src/transformers/utils/dummy_sentencepiece_objects.py
+++ b/src/transformers/utils/dummy_sentencepiece_objects.py
@@ -72,6 +72,13 @@ class ErnieMTokenizer(metaclass=DummyObject):
        requires_backends(self, ["sentencepiece"])
 class XLMProphetNetTokenizer(metaclass=DummyObject):
    _backends = ["sentencepiece"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["sentencepiece"])
 class FNetTokenizer(metaclass=DummyObject):
    _backends = ["sentencepiece"]
@@ -233,13 +240,6 @@ class XGLMTokenizer(metaclass=DummyObject):
        requires_backends(self, ["sentencepiece"])
 class XLMProphetNetTokenizer(metaclass=DummyObject):
    _backends = ["sentencepiece"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["sentencepiece"])
 class XLMRobertaTokenizer(metaclass=DummyObject):
    _backends = ["sentencepiece"]
--- a/src/transformers/utils/dummy_tf_objects.py
+++ b/src/transformers/utils/dummy_tf_objects.py
@@ -1038,6 +1038,34 @@ class TFDeiTPreTrainedModel(metaclass=DummyObject):
        requires_backends(self, ["tf"])
 class TFEfficientFormerForImageClassification(metaclass=DummyObject):
    _backends = ["tf"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["tf"])
 class TFEfficientFormerForImageClassificationWithTeacher(metaclass=DummyObject):
    _backends = ["tf"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["tf"])
 class TFEfficientFormerModel(metaclass=DummyObject):
    _backends = ["tf"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["tf"])
 class TFEfficientFormerPreTrainedModel(metaclass=DummyObject):
    _backends = ["tf"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["tf"])
 class TFAdaptiveEmbedding(metaclass=DummyObject):
    _backends = ["tf"]
@@ -1178,34 +1206,6 @@ class TFDPRReader(metaclass=DummyObject):
        requires_backends(self, ["tf"])
 class TFEfficientFormerForImageClassification(metaclass=DummyObject):
    _backends = ["tf"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["tf"])
 class TFEfficientFormerForImageClassificationWithTeacher(metaclass=DummyObject):
    _backends = ["tf"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["tf"])
 class TFEfficientFormerModel(metaclass=DummyObject):
    _backends = ["tf"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["tf"])
 class TFEfficientFormerPreTrainedModel(metaclass=DummyObject):
    _backends = ["tf"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["tf"])
 class TFElectraForMaskedLM(metaclass=DummyObject):
    _backends = ["tf"]
--- a/src/transformers/utils/dummy_tokenizers_objects.py
+++ b/src/transformers/utils/dummy_tokenizers_objects.py
@@ -121,6 +121,13 @@ class DebertaV2TokenizerFast(metaclass=DummyObject):
        requires_backends(self, ["tokenizers"])
 class RealmTokenizerFast(metaclass=DummyObject):
    _backends = ["tokenizers"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["tokenizers"])
 class RetriBertTokenizerFast(metaclass=DummyObject):
    _backends = ["tokenizers"]
@@ -352,13 +359,6 @@ class Qwen2TokenizerFast(metaclass=DummyObject):
        requires_backends(self, ["tokenizers"])
 class RealmTokenizerFast(metaclass=DummyObject):
    _backends = ["tokenizers"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["tokenizers"])
 class ReformerTokenizerFast(metaclass=DummyObject):
    _backends = ["tokenizers"]
--- a/src/transformers/utils/dummy_vision_objects.py
+++ b/src/transformers/utils/dummy_vision_objects.py
@@ -142,6 +142,27 @@ class DetaImageProcessor(metaclass=DummyObject):
        requires_backends(self, ["vision"])
 class EfficientFormerImageProcessor(metaclass=DummyObject):
    _backends = ["vision"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["vision"])
 class TvltImageProcessor(metaclass=DummyObject):
    _backends = ["vision"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["vision"])
 class ViTHybridImageProcessor(metaclass=DummyObject):
    _backends = ["vision"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["vision"])
 class DetrFeatureExtractor(metaclass=DummyObject):
    _backends = ["vision"]
@@ -184,13 +205,6 @@ class DPTImageProcessor(metaclass=DummyObject):
        requires_backends(self, ["vision"])
 class EfficientFormerImageProcessor(metaclass=DummyObject):
    _backends = ["vision"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["vision"])
 class EfficientNetImageProcessor(metaclass=DummyObject):
    _backends = ["vision"]
@@ -520,13 +534,6 @@ class Swin2SRImageProcessor(metaclass=DummyObject):
        requires_backends(self, ["vision"])
 class TvltImageProcessor(metaclass=DummyObject):
    _backends = ["vision"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["vision"])
 class TvpImageProcessor(metaclass=DummyObject):
    _backends = ["vision"]
@@ -590,13 +597,6 @@ class ViTImageProcessor(metaclass=DummyObject):
        requires_backends(self, ["vision"])
 class ViTHybridImageProcessor(metaclass=DummyObject):
    _backends = ["vision"]
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["vision"])
 class VitMatteImageProcessor(metaclass=DummyObject):
    _backends = ["vision"]
--- a/tests/models/deta/init.py
+++ b/tests/models/deta/init.py
--- a/tests/models/deta/test_image_processing_deta.py
+++ b/tests/models/deta/test_image_processing_deta.py
@@ -1,535 +0,0 @@
 # coding=utf-8
 # Copyright 2022 HuggingFace Inc.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import json
 import pathlib
 import unittest
 from transformers.testing_utils import require_torch, require_vision, slow
 from transformers.utils import is_torch_available, is_vision_available
 from ...test_image_processing_common import AnnotationFormatTestMixin, ImageProcessingTestMixin, prepare_image_inputs
 if is_torch_available():
    import torch
 if is_vision_available():
    from PIL import Image
    from transformers import DetaImageProcessor
 class DetaImageProcessingTester(unittest.TestCase):
    def __init__(
        self,
        parent,
        batch_size=7,
        num_channels=3,
        min_resolution=30,
        max_resolution=400,
        do_resize=True,
        size=None,
        do_normalize=True,
        image_mean=[0.5, 0.5, 0.5],
        image_std=[0.5, 0.5, 0.5],
        do_rescale=True,
        rescale_factor=1 / 255,
        do_pad=True,
    ):
        # by setting size["longest_edge"] > max_resolution we're effectively not testing this :p
        size = size if size is not None else {"shortest_edge": 18, "longest_edge": 1333}
        self.parent = parent
        self.batch_size = batch_size
        self.num_channels = num_channels
        self.min_resolution = min_resolution
        self.max_resolution = max_resolution
        self.do_resize = do_resize
        self.size = size
        self.do_normalize = do_normalize
        self.image_mean = image_mean
        self.image_std = image_std
        self.do_rescale = do_rescale
        self.rescale_factor = rescale_factor
        self.do_pad = do_pad
    def prepare_image_processor_dict(self):
        return {
            "do_resize": self.do_resize,
            "size": self.size,
            "do_normalize": self.do_normalize,
            "image_mean": self.image_mean,
            "image_std": self.image_std,
            "do_rescale": self.do_rescale,
            "rescale_factor": self.rescale_factor,
            "do_pad": self.do_pad,
        }
    def get_expected_values(self, image_inputs, batched=False):
        """
        This function computes the expected height and width when providing images to DetaImageProcessor,
        assuming do_resize is set to True with a scalar size.
        """
        if not batched:
            image = image_inputs[0]
            if isinstance(image, Image.Image):
                w, h = image.size
            else:
                h, w = image.shape[1], image.shape[2]
            if w < h:
                expected_height = int(self.size["shortest_edge"] * h / w)
                expected_width = self.size["shortest_edge"]
            elif w > h:
                expected_height = self.size["shortest_edge"]
                expected_width = int(self.size["shortest_edge"] * w / h)
            else:
                expected_height = self.size["shortest_edge"]
                expected_width = self.size["shortest_edge"]
        else:
            expected_values = []
            for image in image_inputs:
                expected_height, expected_width = self.get_expected_values([image])
                expected_values.append((expected_height, expected_width))
            expected_height = max(expected_values, key=lambda item: item[0])[0]
            expected_width = max(expected_values, key=lambda item: item[1])[1]
        return expected_height, expected_width
    def expected_output_image_shape(self, images):
        height, width = self.get_expected_values(images, batched=True)
        return self.num_channels, height, width
    def prepare_image_inputs(self, equal_resolution=False, numpify=False, torchify=False):
        return prepare_image_inputs(
            batch_size=self.batch_size,
            num_channels=self.num_channels,
            min_resolution=self.min_resolution,
            max_resolution=self.max_resolution,
            equal_resolution=equal_resolution,
            numpify=numpify,
            torchify=torchify,
        )
@require_torch
@require_vision
 class DetaImageProcessingTest(AnnotationFormatTestMixin, ImageProcessingTestMixin, unittest.TestCase):
    image_processing_class = DetaImageProcessor if is_vision_available() else None
    def setUp(self):
        self.image_processor_tester = DetaImageProcessingTester(self)
    @property
    def image_processor_dict(self):
        return self.image_processor_tester.prepare_image_processor_dict()
    def test_image_processor_properties(self):
        image_processing = self.image_processing_class(**self.image_processor_dict)
        self.assertTrue(hasattr(image_processing, "image_mean"))
        self.assertTrue(hasattr(image_processing, "image_std"))
        self.assertTrue(hasattr(image_processing, "do_normalize"))
        self.assertTrue(hasattr(image_processing, "do_resize"))
        self.assertTrue(hasattr(image_processing, "do_rescale"))
        self.assertTrue(hasattr(image_processing, "do_pad"))
        self.assertTrue(hasattr(image_processing, "size"))
    def test_image_processor_from_dict_with_kwargs(self):
        image_processor = self.image_processing_class.from_dict(self.image_processor_dict)
        self.assertEqual(image_processor.size, {"shortest_edge": 18, "longest_edge": 1333})
        self.assertEqual(image_processor.do_pad, True)
    @slow
    def test_call_pytorch_with_coco_detection_annotations(self):
        # prepare image and target
        image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
        with open("./tests/fixtures/tests_samples/COCO/coco_annotations.txt", "r") as f:
            target = json.loads(f.read())
        target = {"image_id": 39769, "annotations": target}
        # encode them
        image_processing = DetaImageProcessor()
        encoding = image_processing(images=image, annotations=target, return_tensors="pt")
        # verify pixel values
        expected_shape = torch.Size([1, 3, 800, 1066])
        self.assertEqual(encoding["pixel_values"].shape, expected_shape)
        expected_slice = torch.tensor([0.2796, 0.3138, 0.3481])
        self.assertTrue(torch.allclose(encoding["pixel_values"][0, 0, 0, :3], expected_slice, atol=1e-4))
        # verify area
        expected_area = torch.tensor([5887.9600, 11250.2061, 489353.8438, 837122.7500, 147967.5156, 165732.3438])
        self.assertTrue(torch.allclose(encoding["labels"][0]["area"], expected_area))
        # verify boxes
        expected_boxes_shape = torch.Size([6, 4])
        self.assertEqual(encoding["labels"][0]["boxes"].shape, expected_boxes_shape)
        expected_boxes_slice = torch.tensor([0.5503, 0.2765, 0.0604, 0.2215])
        self.assertTrue(torch.allclose(encoding["labels"][0]["boxes"][0], expected_boxes_slice, atol=1e-3))
        # verify image_id
        expected_image_id = torch.tensor([39769])
        self.assertTrue(torch.allclose(encoding["labels"][0]["image_id"], expected_image_id))
        # verify is_crowd
        expected_is_crowd = torch.tensor([0, 0, 0, 0, 0, 0])
        self.assertTrue(torch.allclose(encoding["labels"][0]["iscrowd"], expected_is_crowd))
        # verify class_labels
        expected_class_labels = torch.tensor([75, 75, 63, 65, 17, 17])
        self.assertTrue(torch.allclose(encoding["labels"][0]["class_labels"], expected_class_labels))
        # verify orig_size
        expected_orig_size = torch.tensor([480, 640])
        self.assertTrue(torch.allclose(encoding["labels"][0]["orig_size"], expected_orig_size))
        # verify size
        expected_size = torch.tensor([800, 1066])
        self.assertTrue(torch.allclose(encoding["labels"][0]["size"], expected_size))
    @slow
    def test_call_pytorch_with_coco_panoptic_annotations(self):
        # prepare image, target and masks_path
        image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
        with open("./tests/fixtures/tests_samples/COCO/coco_panoptic_annotations.txt", "r") as f:
            target = json.loads(f.read())
        target = {"file_name": "000000039769.png", "image_id": 39769, "segments_info": target}
        masks_path = pathlib.Path("./tests/fixtures/tests_samples/COCO/coco_panoptic")
        # encode them
        image_processing = DetaImageProcessor(format="coco_panoptic")
        encoding = image_processing(images=image, annotations=target, masks_path=masks_path, return_tensors="pt")
        # verify pixel values
        expected_shape = torch.Size([1, 3, 800, 1066])
        self.assertEqual(encoding["pixel_values"].shape, expected_shape)
        expected_slice = torch.tensor([0.2796, 0.3138, 0.3481])
        self.assertTrue(torch.allclose(encoding["pixel_values"][0, 0, 0, :3], expected_slice, atol=1e-4))
        # verify area
        expected_area = torch.tensor([147979.6875, 165527.0469, 484638.5938, 11292.9375, 5879.6562, 7634.1147])
        self.assertTrue(torch.allclose(encoding["labels"][0]["area"], expected_area))
        # verify boxes
        expected_boxes_shape = torch.Size([6, 4])
        self.assertEqual(encoding["labels"][0]["boxes"].shape, expected_boxes_shape)
        expected_boxes_slice = torch.tensor([0.2625, 0.5437, 0.4688, 0.8625])
        self.assertTrue(torch.allclose(encoding["labels"][0]["boxes"][0], expected_boxes_slice, atol=1e-3))
        # verify image_id
        expected_image_id = torch.tensor([39769])
        self.assertTrue(torch.allclose(encoding["labels"][0]["image_id"], expected_image_id))
        # verify is_crowd
        expected_is_crowd = torch.tensor([0, 0, 0, 0, 0, 0])
        self.assertTrue(torch.allclose(encoding["labels"][0]["iscrowd"], expected_is_crowd))
        # verify class_labels
        expected_class_labels = torch.tensor([17, 17, 63, 75, 75, 93])
        self.assertTrue(torch.allclose(encoding["labels"][0]["class_labels"], expected_class_labels))
        # verify masks
        expected_masks_sum = 822873
        self.assertEqual(encoding["labels"][0]["masks"].sum().item(), expected_masks_sum)
        # verify orig_size
        expected_orig_size = torch.tensor([480, 640])
        self.assertTrue(torch.allclose(encoding["labels"][0]["orig_size"], expected_orig_size))
        # verify size
        expected_size = torch.tensor([800, 1066])
        self.assertTrue(torch.allclose(encoding["labels"][0]["size"], expected_size))
    @slow
    # Copied from tests.models.detr.test_image_processing_detr.DetrImageProcessingTest.test_batched_coco_detection_annotations with Detr->Deta
    def test_batched_coco_detection_annotations(self):
        image_0 = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
        image_1 = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png").resize((800, 800))
        with open("./tests/fixtures/tests_samples/COCO/coco_annotations.txt", "r") as f:
            target = json.loads(f.read())
        annotations_0 = {"image_id": 39769, "annotations": target}
        annotations_1 = {"image_id": 39769, "annotations": target}
        # Adjust the bounding boxes for the resized image
        w_0, h_0 = image_0.size
        w_1, h_1 = image_1.size
        for i in range(len(annotations_1["annotations"])):
            coords = annotations_1["annotations"][i]["bbox"]
            new_bbox = [
                coords[0] * w_1 / w_0,
                coords[1] * h_1 / h_0,
                coords[2] * w_1 / w_0,
                coords[3] * h_1 / h_0,
            ]
            annotations_1["annotations"][i]["bbox"] = new_bbox
        images = [image_0, image_1]
        annotations = [annotations_0, annotations_1]
        image_processing = DetaImageProcessor()
        encoding = image_processing(
            images=images,
            annotations=annotations,
            return_segmentation_masks=True,
            return_tensors="pt",  # do_convert_annotations=True
        )
        # Check the pixel values have been padded
        postprocessed_height, postprocessed_width = 800, 1066
        expected_shape = torch.Size([2, 3, postprocessed_height, postprocessed_width])
        self.assertEqual(encoding["pixel_values"].shape, expected_shape)
        # Check the bounding boxes have been adjusted for padded images
        self.assertEqual(encoding["labels"][0]["boxes"].shape, torch.Size([6, 4]))
        self.assertEqual(encoding["labels"][1]["boxes"].shape, torch.Size([6, 4]))
        expected_boxes_0 = torch.tensor(
            [
                [0.6879, 0.4609, 0.0755, 0.3691],
                [0.2118, 0.3359, 0.2601, 0.1566],
                [0.5011, 0.5000, 0.9979, 1.0000],
                [0.5010, 0.5020, 0.9979, 0.9959],
                [0.3284, 0.5944, 0.5884, 0.8112],
                [0.8394, 0.5445, 0.3213, 0.9110],
            ]
        )
        expected_boxes_1 = torch.tensor(
            [
                [0.4130, 0.2765, 0.0453, 0.2215],
                [0.1272, 0.2016, 0.1561, 0.0940],
                [0.3757, 0.4933, 0.7488, 0.9865],
                [0.3759, 0.5002, 0.7492, 0.9955],
                [0.1971, 0.5456, 0.3532, 0.8646],
                [0.5790, 0.4115, 0.3430, 0.7161],
            ]
        )
        self.assertTrue(torch.allclose(encoding["labels"][0]["boxes"], expected_boxes_0, rtol=1e-3))
        self.assertTrue(torch.allclose(encoding["labels"][1]["boxes"], expected_boxes_1, rtol=1e-3))
        # Check the masks have also been padded
        self.assertEqual(encoding["labels"][0]["masks"].shape, torch.Size([6, 800, 1066]))
        self.assertEqual(encoding["labels"][1]["masks"].shape, torch.Size([6, 800, 1066]))
        # Check if do_convert_annotations=False, then the annotations are not converted to centre_x, centre_y, width, height
        # format and not in the range [0, 1]
        encoding = image_processing(
            images=images,
            annotations=annotations,
            return_segmentation_masks=True,
            do_convert_annotations=False,
            return_tensors="pt",
        )
        self.assertEqual(encoding["labels"][0]["boxes"].shape, torch.Size([6, 4]))
        self.assertEqual(encoding["labels"][1]["boxes"].shape, torch.Size([6, 4]))
        # Convert to absolute coordinates
        unnormalized_boxes_0 = torch.vstack(
            [
                expected_boxes_0[:, 0] * postprocessed_width,
                expected_boxes_0[:, 1] * postprocessed_height,
                expected_boxes_0[:, 2] * postprocessed_width,
                expected_boxes_0[:, 3] * postprocessed_height,
            ]
        ).T
        unnormalized_boxes_1 = torch.vstack(
            [
                expected_boxes_1[:, 0] * postprocessed_width,
                expected_boxes_1[:, 1] * postprocessed_height,
                expected_boxes_1[:, 2] * postprocessed_width,
                expected_boxes_1[:, 3] * postprocessed_height,
            ]
        ).T
        # Convert from centre_x, centre_y, width, height to x_min, y_min, x_max, y_max
        expected_boxes_0 = torch.vstack(
            [
                unnormalized_boxes_0[:, 0] - unnormalized_boxes_0[:, 2] / 2,
                unnormalized_boxes_0[:, 1] - unnormalized_boxes_0[:, 3] / 2,
                unnormalized_boxes_0[:, 0] + unnormalized_boxes_0[:, 2] / 2,
                unnormalized_boxes_0[:, 1] + unnormalized_boxes_0[:, 3] / 2,
            ]
        ).T
        expected_boxes_1 = torch.vstack(
            [
                unnormalized_boxes_1[:, 0] - unnormalized_boxes_1[:, 2] / 2,
                unnormalized_boxes_1[:, 1] - unnormalized_boxes_1[:, 3] / 2,
                unnormalized_boxes_1[:, 0] + unnormalized_boxes_1[:, 2] / 2,
                unnormalized_boxes_1[:, 1] + unnormalized_boxes_1[:, 3] / 2,
            ]
        ).T
        self.assertTrue(torch.allclose(encoding["labels"][0]["boxes"], expected_boxes_0, rtol=1))
        self.assertTrue(torch.allclose(encoding["labels"][1]["boxes"], expected_boxes_1, rtol=1))
    # Copied from tests.models.detr.test_image_processing_detr.DetrImageProcessingTest.test_batched_coco_panoptic_annotations with Detr->Deta
    def test_batched_coco_panoptic_annotations(self):
        # prepare image, target and masks_path
        image_0 = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
        image_1 = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png").resize((800, 800))
        with open("./tests/fixtures/tests_samples/COCO/coco_panoptic_annotations.txt", "r") as f:
            target = json.loads(f.read())
        annotation_0 = {"file_name": "000000039769.png", "image_id": 39769, "segments_info": target}
        annotation_1 = {"file_name": "000000039769.png", "image_id": 39769, "segments_info": target}
        w_0, h_0 = image_0.size
        w_1, h_1 = image_1.size
        for i in range(len(annotation_1["segments_info"])):
            coords = annotation_1["segments_info"][i]["bbox"]
            new_bbox = [
                coords[0] * w_1 / w_0,
                coords[1] * h_1 / h_0,
                coords[2] * w_1 / w_0,
                coords[3] * h_1 / h_0,
            ]
            annotation_1["segments_info"][i]["bbox"] = new_bbox
        masks_path = pathlib.Path("./tests/fixtures/tests_samples/COCO/coco_panoptic")
        images = [image_0, image_1]
        annotations = [annotation_0, annotation_1]
        # encode them
        image_processing = DetaImageProcessor(format="coco_panoptic")
        encoding = image_processing(
            images=images,
            annotations=annotations,
            masks_path=masks_path,
            return_tensors="pt",
            return_segmentation_masks=True,
        )
        # Check the pixel values have been padded
        postprocessed_height, postprocessed_width = 800, 1066
        expected_shape = torch.Size([2, 3, postprocessed_height, postprocessed_width])
        self.assertEqual(encoding["pixel_values"].shape, expected_shape)
        # Check the bounding boxes have been adjusted for padded images
        self.assertEqual(encoding["labels"][0]["boxes"].shape, torch.Size([6, 4]))
        self.assertEqual(encoding["labels"][1]["boxes"].shape, torch.Size([6, 4]))
        expected_boxes_0 = torch.tensor(
            [
                [0.2625, 0.5437, 0.4688, 0.8625],
                [0.7719, 0.4104, 0.4531, 0.7125],
                [0.5000, 0.4927, 0.9969, 0.9854],
                [0.1688, 0.2000, 0.2063, 0.0917],
                [0.5492, 0.2760, 0.0578, 0.2187],
                [0.4992, 0.4990, 0.9984, 0.9979],
            ]
        )
        expected_boxes_1 = torch.tensor(
            [
                [0.1576, 0.3262, 0.2814, 0.5175],
                [0.4634, 0.2463, 0.2720, 0.4275],
                [0.3002, 0.2956, 0.5985, 0.5913],
                [0.1013, 0.1200, 0.1238, 0.0550],
                [0.3297, 0.1656, 0.0347, 0.1312],
                [0.2997, 0.2994, 0.5994, 0.5987],
            ]
        )
        self.assertTrue(torch.allclose(encoding["labels"][0]["boxes"], expected_boxes_0, rtol=1e-3))
        self.assertTrue(torch.allclose(encoding["labels"][1]["boxes"], expected_boxes_1, rtol=1e-3))
        # Check the masks have also been padded
        self.assertEqual(encoding["labels"][0]["masks"].shape, torch.Size([6, 800, 1066]))
        self.assertEqual(encoding["labels"][1]["masks"].shape, torch.Size([6, 800, 1066]))
        # Check if do_convert_annotations=False, then the annotations are not converted to centre_x, centre_y, width, height
        # format and not in the range [0, 1]
        encoding = image_processing(
            images=images,
            annotations=annotations,
            masks_path=masks_path,
            return_segmentation_masks=True,
            do_convert_annotations=False,
            return_tensors="pt",
        )
        self.assertEqual(encoding["labels"][0]["boxes"].shape, torch.Size([6, 4]))
        self.assertEqual(encoding["labels"][1]["boxes"].shape, torch.Size([6, 4]))
        # Convert to absolute coordinates
        unnormalized_boxes_0 = torch.vstack(
            [
                expected_boxes_0[:, 0] * postprocessed_width,
                expected_boxes_0[:, 1] * postprocessed_height,
                expected_boxes_0[:, 2] * postprocessed_width,
                expected_boxes_0[:, 3] * postprocessed_height,
            ]
        ).T
        unnormalized_boxes_1 = torch.vstack(
            [
                expected_boxes_1[:, 0] * postprocessed_width,
                expected_boxes_1[:, 1] * postprocessed_height,
                expected_boxes_1[:, 2] * postprocessed_width,
                expected_boxes_1[:, 3] * postprocessed_height,
            ]
        ).T
        # Convert from centre_x, centre_y, width, height to x_min, y_min, x_max, y_max
        expected_boxes_0 = torch.vstack(
            [
                unnormalized_boxes_0[:, 0] - unnormalized_boxes_0[:, 2] / 2,
                unnormalized_boxes_0[:, 1] - unnormalized_boxes_0[:, 3] / 2,
                unnormalized_boxes_0[:, 0] + unnormalized_boxes_0[:, 2] / 2,
                unnormalized_boxes_0[:, 1] + unnormalized_boxes_0[:, 3] / 2,
            ]
        ).T
        expected_boxes_1 = torch.vstack(
            [
                unnormalized_boxes_1[:, 0] - unnormalized_boxes_1[:, 2] / 2,
                unnormalized_boxes_1[:, 1] - unnormalized_boxes_1[:, 3] / 2,
                unnormalized_boxes_1[:, 0] + unnormalized_boxes_1[:, 2] / 2,
                unnormalized_boxes_1[:, 1] + unnormalized_boxes_1[:, 3] / 2,
            ]
        ).T
        self.assertTrue(torch.allclose(encoding["labels"][0]["boxes"], expected_boxes_0, rtol=1))
        self.assertTrue(torch.allclose(encoding["labels"][1]["boxes"], expected_boxes_1, rtol=1))
    # Copied from tests.models.detr.test_image_processing_detr.DetrImageProcessingTest.test_max_width_max_height_resizing_and_pad_strategy with Detr->Deta
    def test_max_width_max_height_resizing_and_pad_strategy(self):
        image_1 = torch.ones([200, 100, 3], dtype=torch.uint8)
        # do_pad=False, max_height=100, max_width=100, image=200x100 -> 100x50
        image_processor = DetaImageProcessor(
            size={"max_height": 100, "max_width": 100},
            do_pad=False,
        )
        inputs = image_processor(images=[image_1], return_tensors="pt")
        self.assertEqual(inputs["pixel_values"].shape, torch.Size([1, 3, 100, 50]))
        # do_pad=False, max_height=300, max_width=100, image=200x100 -> 200x100
        image_processor = DetaImageProcessor(
            size={"max_height": 300, "max_width": 100},
            do_pad=False,
        )
        inputs = image_processor(images=[image_1], return_tensors="pt")
        # do_pad=True, max_height=100, max_width=100, image=200x100 -> 100x100
        image_processor = DetaImageProcessor(
            size={"max_height": 100, "max_width": 100}, do_pad=True, pad_size={"height": 100, "width": 100}
        )
        inputs = image_processor(images=[image_1], return_tensors="pt")
        self.assertEqual(inputs["pixel_values"].shape, torch.Size([1, 3, 100, 100]))
        # do_pad=True, max_height=300, max_width=100, image=200x100 -> 300x100
        image_processor = DetaImageProcessor(
            size={"max_height": 300, "max_width": 100},
            do_pad=True,
            pad_size={"height": 301, "width": 101},
        )
        inputs = image_processor(images=[image_1], return_tensors="pt")
        self.assertEqual(inputs["pixel_values"].shape, torch.Size([1, 3, 301, 101]))
        ### Check for batch
        image_2 = torch.ones([100, 150, 3], dtype=torch.uint8)
        # do_pad=True, max_height=150, max_width=100, images=[200x100, 100x150] -> 150x100
        image_processor = DetaImageProcessor(
            size={"max_height": 150, "max_width": 100},
            do_pad=True,
            pad_size={"height": 150, "width": 100},
        )
        inputs = image_processor(images=[image_1, image_2], return_tensors="pt")
        self.assertEqual(inputs["pixel_values"].shape, torch.Size([2, 3, 150, 100]))
--- a/tests/models/deta/test_modeling_deta.py
+++ b/tests/models/deta/test_modeling_deta.py
@@ -1,671 +0,0 @@
 # coding=utf-8
 # Copyright 2022 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Testing suite for the PyTorch DETA model."""
 import collections
 import inspect
 import math
 import re
 import unittest
 from transformers import DetaConfig, ResNetConfig, is_torch_available, is_torchvision_available, is_vision_available
 from transformers.file_utils import cached_property
 from transformers.testing_utils import require_torchvision, require_vision, slow, torch_device
 from ...generation.test_utils import GenerationTesterMixin
 from ...test_configuration_common import ConfigTester
 from ...test_modeling_common import ModelTesterMixin, _config_zero_init, floats_tensor
 from ...test_pipeline_mixin import PipelineTesterMixin
 if is_torch_available():
    import torch
    from transformers.pytorch_utils import id_tensor_storage
 if is_torchvision_available():
    from transformers import DetaForObjectDetection, DetaModel
 if is_vision_available():
    from PIL import Image
    from transformers import AutoImageProcessor
 class DetaModelTester:
    def __init__(
        self,
        parent,
        batch_size=8,
        is_training=True,
        use_labels=True,
        hidden_size=32,
        num_hidden_layers=2,
        num_attention_heads=8,
        intermediate_size=4,
        hidden_act="gelu",
        hidden_dropout_prob=0.1,
        attention_probs_dropout_prob=0.1,
        num_queries=12,
        two_stage_num_proposals=12,
        num_channels=3,
        image_size=224,
        n_targets=8,
        num_labels=91,
        num_feature_levels=4,
        encoder_n_points=2,
        decoder_n_points=6,
        two_stage=True,
        assign_first_stage=True,
        assign_second_stage=True,
    ):
        self.parent = parent
        self.batch_size = batch_size
        self.is_training = is_training
        self.use_labels = use_labels
        self.hidden_size = hidden_size
        self.num_hidden_layers = num_hidden_layers
        self.num_attention_heads = num_attention_heads
        self.intermediate_size = intermediate_size
        self.hidden_act = hidden_act
        self.hidden_dropout_prob = hidden_dropout_prob
        self.attention_probs_dropout_prob = attention_probs_dropout_prob
        self.num_queries = num_queries
        self.two_stage_num_proposals = two_stage_num_proposals
        self.num_channels = num_channels
        self.image_size = image_size
        self.n_targets = n_targets
        self.num_labels = num_labels
        self.num_feature_levels = num_feature_levels
        self.encoder_n_points = encoder_n_points
        self.decoder_n_points = decoder_n_points
        self.two_stage = two_stage
        self.assign_first_stage = assign_first_stage
        self.assign_second_stage = assign_second_stage
        # we also set the expected seq length for both encoder and decoder
        self.encoder_seq_length = (
            math.ceil(self.image_size / 8) ** 2
            + math.ceil(self.image_size / 16) ** 2
            + math.ceil(self.image_size / 32) ** 2
            + math.ceil(self.image_size / 64) ** 2
        )
        self.decoder_seq_length = self.num_queries
    def prepare_config_and_inputs(self, model_class_name):
        pixel_values = floats_tensor([self.batch_size, self.num_channels, self.image_size, self.image_size])
        pixel_mask = torch.ones([self.batch_size, self.image_size, self.image_size], device=torch_device)
        labels = None
        if self.use_labels:
            # labels is a list of Dict (each Dict being the labels for a given example in the batch)
            labels = []
            for i in range(self.batch_size):
                target = {}
                target["class_labels"] = torch.randint(
                    high=self.num_labels, size=(self.n_targets,), device=torch_device
                )
                target["boxes"] = torch.rand(self.n_targets, 4, device=torch_device)
                target["masks"] = torch.rand(self.n_targets, self.image_size, self.image_size, device=torch_device)
                labels.append(target)
        config = self.get_config(model_class_name)
        return config, pixel_values, pixel_mask, labels
    def get_config(self, model_class_name):
        resnet_config = ResNetConfig(
            num_channels=3,
            embeddings_size=10,
            hidden_sizes=[10, 20, 30, 40],
            depths=[1, 1, 2, 1],
            hidden_act="relu",
            num_labels=3,
            out_features=["stage2", "stage3", "stage4"],
            out_indices=[2, 3, 4],
        )
        two_stage = model_class_name == "DetaForObjectDetection"
        assign_first_stage = model_class_name == "DetaForObjectDetection"
        assign_second_stage = model_class_name == "DetaForObjectDetection"
        return DetaConfig(
            d_model=self.hidden_size,
            encoder_layers=self.num_hidden_layers,
            decoder_layers=self.num_hidden_layers,
            encoder_attention_heads=self.num_attention_heads,
            decoder_attention_heads=self.num_attention_heads,
            encoder_ffn_dim=self.intermediate_size,
            decoder_ffn_dim=self.intermediate_size,
            dropout=self.hidden_dropout_prob,
            attention_dropout=self.attention_probs_dropout_prob,
            num_queries=self.num_queries,
            two_stage_num_proposals=self.two_stage_num_proposals,
            num_labels=self.num_labels,
            num_feature_levels=self.num_feature_levels,
            encoder_n_points=self.encoder_n_points,
            decoder_n_points=self.decoder_n_points,
            two_stage=two_stage,
            assign_first_stage=assign_first_stage,
            assign_second_stage=assign_second_stage,
            backbone_config=resnet_config,
            backbone=None,
        )
    def prepare_config_and_inputs_for_common(self, model_class_name="DetaModel"):
        config, pixel_values, pixel_mask, labels = self.prepare_config_and_inputs(model_class_name)
        inputs_dict = {"pixel_values": pixel_values, "pixel_mask": pixel_mask}
        return config, inputs_dict
    def create_and_check_deta_model(self, config, pixel_values, pixel_mask, labels):
        model = DetaModel(config=config)
        model.to(torch_device)
        model.eval()
        result = model(pixel_values=pixel_values, pixel_mask=pixel_mask)
        result = model(pixel_values)
        self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.num_queries, self.hidden_size))
    def create_and_check_deta_freeze_backbone(self, config, pixel_values, pixel_mask, labels):
        model = DetaModel(config=config)
        model.to(torch_device)
        model.eval()
        model.freeze_backbone()
        for _, param in model.backbone.model.named_parameters():
            self.parent.assertEqual(False, param.requires_grad)
    def create_and_check_deta_unfreeze_backbone(self, config, pixel_values, pixel_mask, labels):
        model = DetaModel(config=config)
        model.to(torch_device)
        model.eval()
        model.unfreeze_backbone()
        for _, param in model.backbone.model.named_parameters():
            self.parent.assertEqual(True, param.requires_grad)
    def create_and_check_deta_object_detection_head_model(self, config, pixel_values, pixel_mask, labels):
        model = DetaForObjectDetection(config=config)
        model.to(torch_device)
        model.eval()
        result = model(pixel_values=pixel_values, pixel_mask=pixel_mask)
        result = model(pixel_values)
        self.parent.assertEqual(result.logits.shape, (self.batch_size, self.two_stage_num_proposals, self.num_labels))
        self.parent.assertEqual(result.pred_boxes.shape, (self.batch_size, self.two_stage_num_proposals, 4))
        result = model(pixel_values=pixel_values, pixel_mask=pixel_mask, labels=labels)
        self.parent.assertEqual(result.loss.shape, ())
        self.parent.assertEqual(result.logits.shape, (self.batch_size, self.two_stage_num_proposals, self.num_labels))
        self.parent.assertEqual(result.pred_boxes.shape, (self.batch_size, self.two_stage_num_proposals, 4))
@require_torchvision
 class DetaModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin, unittest.TestCase):
    all_model_classes = (DetaModel, DetaForObjectDetection) if is_torchvision_available() else ()
    pipeline_model_mapping = (
        {"image-feature-extraction": DetaModel, "object-detection": DetaForObjectDetection}
        if is_torchvision_available()
        else {}
    )
    is_encoder_decoder = True
    test_torchscript = False
    test_pruning = False
    test_head_masking = False
    test_missing_keys = False
    # TODO: Fix the failed tests when this model gets more usage
    def is_pipeline_test_to_skip(
        self, pipeline_test_casse_name, config_class, model_architecture, tokenizer_name, processor_name
    ):
        if pipeline_test_casse_name == "ObjectDetectionPipelineTests":
            return True
        return False
    @unittest.skip("Skip for now. PR #22437 causes some loading issue. See (not merged) #22656 for some discussions.")
    def test_can_use_safetensors(self):
        super().test_can_use_safetensors()
    # special case for head models
    def _prepare_for_class(self, inputs_dict, model_class, return_labels=False):
        inputs_dict = super()._prepare_for_class(inputs_dict, model_class, return_labels=return_labels)
        if return_labels:
            if model_class.__name__ == "DetaForObjectDetection":
                labels = []
                for i in range(self.model_tester.batch_size):
                    target = {}
                    target["class_labels"] = torch.ones(
                        size=(self.model_tester.n_targets,), device=torch_device, dtype=torch.long
                    )
                    target["boxes"] = torch.ones(
                        self.model_tester.n_targets, 4, device=torch_device, dtype=torch.float
                    )
                    target["masks"] = torch.ones(
                        self.model_tester.n_targets,
                        self.model_tester.image_size,
                        self.model_tester.image_size,
                        device=torch_device,
                        dtype=torch.float,
                    )
                    labels.append(target)
                inputs_dict["labels"] = labels
        return inputs_dict
    def setUp(self):
        self.model_tester = DetaModelTester(self)
        self.config_tester = ConfigTester(self, config_class=DetaConfig, has_text_modality=False)
    def test_config(self):
        # we don't test common_properties and arguments_init as these don't apply for DETA
        self.config_tester.create_and_test_config_to_json_string()
        self.config_tester.create_and_test_config_to_json_file()
        self.config_tester.create_and_test_config_from_and_save_pretrained()
        self.config_tester.create_and_test_config_with_num_labels()
        self.config_tester.check_config_can_be_init_without_params()
    def test_deta_model(self):
        config_and_inputs = self.model_tester.prepare_config_and_inputs(model_class_name="DetaModel")
        self.model_tester.create_and_check_deta_model(*config_and_inputs)
    def test_deta_freeze_backbone(self):
        config_and_inputs = self.model_tester.prepare_config_and_inputs(model_class_name="DetaModel")
        self.model_tester.create_and_check_deta_freeze_backbone(*config_and_inputs)
    def test_deta_unfreeze_backbone(self):
        config_and_inputs = self.model_tester.prepare_config_and_inputs(model_class_name="DetaModel")
        self.model_tester.create_and_check_deta_unfreeze_backbone(*config_and_inputs)
    def test_deta_object_detection_head_model(self):
        config_and_inputs = self.model_tester.prepare_config_and_inputs(model_class_name="DetaForObjectDetection")
        self.model_tester.create_and_check_deta_object_detection_head_model(*config_and_inputs)
    @unittest.skip(reason="DETA does not use inputs_embeds")
    def test_inputs_embeds(self):
        pass
    @unittest.skip(reason="DETA does not use inputs_embeds")
    def test_inputs_embeds_matches_input_ids(self):
        pass
    @unittest.skip(reason="DETA does not have a get_input_embeddings method")
    def test_model_common_attributes(self):
        pass
    @unittest.skip(reason="DETA is not a generative model")
    def test_generate_without_input_ids(self):
        pass
    @unittest.skip(reason="DETA does not use token embeddings")
    def test_resize_tokens_embeddings(self):
        pass
    @unittest.skip(reason="Feed forward chunking is not implemented")
    def test_feed_forward_chunking(self):
        pass
    def test_attention_outputs(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
        config.return_dict = True
        for model_class in self.all_model_classes:
            inputs_dict["output_attentions"] = True
            inputs_dict["output_hidden_states"] = False
            config.return_dict = True
            model = model_class(config)
            model.to(torch_device)
            model.eval()
            with torch.no_grad():
                outputs = model(**self._prepare_for_class(inputs_dict, model_class))
            attentions = outputs.encoder_attentions
            self.assertEqual(len(attentions), self.model_tester.num_hidden_layers)
            # check that output_attentions also work using config
            del inputs_dict["output_attentions"]
            config.output_attentions = True
            model = model_class(config)
            model.to(torch_device)
            model.eval()
            with torch.no_grad():
                outputs = model(**self._prepare_for_class(inputs_dict, model_class))
            attentions = outputs.encoder_attentions
            self.assertEqual(len(attentions), self.model_tester.num_hidden_layers)
            self.assertListEqual(
                list(attentions[0].shape[-3:]),
                [
                    self.model_tester.num_attention_heads,
                    self.model_tester.num_feature_levels,
                    self.model_tester.encoder_n_points,
                ],
            )
            out_len = len(outputs)
            correct_outlen = 8
            # loss is at first position
            if "labels" in inputs_dict:
                correct_outlen += 1  # loss is added to beginning
            # Object Detection model returns pred_logits and pred_boxes
            if model_class.__name__ == "DetaForObjectDetection":
                correct_outlen += 2
            self.assertEqual(out_len, correct_outlen)
            # decoder attentions
            decoder_attentions = outputs.decoder_attentions
            self.assertIsInstance(decoder_attentions, (list, tuple))
            self.assertEqual(len(decoder_attentions), self.model_tester.num_hidden_layers)
            self.assertListEqual(
                list(decoder_attentions[0].shape[-3:]),
                [self.model_tester.num_attention_heads, self.model_tester.num_queries, self.model_tester.num_queries],
            )
            # cross attentions
            cross_attentions = outputs.cross_attentions
            self.assertIsInstance(cross_attentions, (list, tuple))
            self.assertEqual(len(cross_attentions), self.model_tester.num_hidden_layers)
            self.assertListEqual(
                list(cross_attentions[0].shape[-3:]),
                [
                    self.model_tester.num_attention_heads,
                    self.model_tester.num_feature_levels,
                    self.model_tester.decoder_n_points,
                ],
            )
            # Check attention is always last and order is fine
            inputs_dict["output_attentions"] = True
            inputs_dict["output_hidden_states"] = True
            model = model_class(config)
            model.to(torch_device)
            model.eval()
            with torch.no_grad():
                outputs = model(**self._prepare_for_class(inputs_dict, model_class))
            if hasattr(self.model_tester, "num_hidden_states_types"):
                added_hidden_states = self.model_tester.num_hidden_states_types
            elif self.is_encoder_decoder:
                added_hidden_states = 2
            else:
                added_hidden_states = 1
            self.assertEqual(out_len + added_hidden_states, len(outputs))
            self_attentions = outputs.encoder_attentions
            self.assertEqual(len(self_attentions), self.model_tester.num_hidden_layers)
            self.assertListEqual(
                list(self_attentions[0].shape[-3:]),
                [
                    self.model_tester.num_attention_heads,
                    self.model_tester.num_feature_levels,
                    self.model_tester.encoder_n_points,
                ],
            )
    # removed retain_grad and grad on decoder_hidden_states, as queries don't require grad
    def test_retain_grad_hidden_states_attentions(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
        config.output_hidden_states = True
        config.output_attentions = True
        # no need to test all models as different heads yield the same functionality
        model_class = self.all_model_classes[0]
        model = model_class(config)
        model.to(torch_device)
        inputs = self._prepare_for_class(inputs_dict, model_class)
        outputs = model(**inputs)
        # we take the second output since last_hidden_state is the second item
        output = outputs[1]
        encoder_hidden_states = outputs.encoder_hidden_states[0]
        encoder_attentions = outputs.encoder_attentions[0]
        encoder_hidden_states.retain_grad()
        encoder_attentions.retain_grad()
        decoder_attentions = outputs.decoder_attentions[0]
        decoder_attentions.retain_grad()
        cross_attentions = outputs.cross_attentions[0]
        cross_attentions.retain_grad()
        output.flatten()[0].backward(retain_graph=True)
        self.assertIsNotNone(encoder_hidden_states.grad)
        self.assertIsNotNone(encoder_attentions.grad)
        self.assertIsNotNone(decoder_attentions.grad)
        self.assertIsNotNone(cross_attentions.grad)
    def test_forward_auxiliary_loss(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
        config.auxiliary_loss = True
        # only test for object detection and segmentation model
        for model_class in self.all_model_classes[1:]:
            model = model_class(config)
            model.to(torch_device)
            inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True)
            outputs = model(**inputs)
            self.assertIsNotNone(outputs.auxiliary_outputs)
            self.assertEqual(len(outputs.auxiliary_outputs), self.model_tester.num_hidden_layers - 1)
    def test_forward_signature(self):
        config, _ = self.model_tester.prepare_config_and_inputs_for_common()
        for model_class in self.all_model_classes:
            model = model_class(config)
            signature = inspect.signature(model.forward)
            # signature.parameters is an OrderedDict => so arg_names order is deterministic
            arg_names = [*signature.parameters.keys()]
            if model.config.is_encoder_decoder:
                expected_arg_names = ["pixel_values", "pixel_mask"]
                expected_arg_names.extend(
                    ["head_mask", "decoder_head_mask", "encoder_outputs"]
                    if "head_mask" and "decoder_head_mask" in arg_names
                    else []
                )
                self.assertListEqual(arg_names[: len(expected_arg_names)], expected_arg_names)
            else:
                expected_arg_names = ["pixel_values", "pixel_mask"]
                self.assertListEqual(arg_names[:1], expected_arg_names)
    @unittest.skip(reason="Model doesn't use tied weights")
    def test_tied_model_weights_key_ignore(self):
        pass
    def test_initialization(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
        configs_no_init = _config_zero_init(config)
        for model_class in self.all_model_classes:
            model = model_class(config=configs_no_init)
            # Skip the check for the backbone
            for name, module in model.named_modules():
                if module.__class__.__name__ == "DetaBackboneWithPositionalEncodings":
                    backbone_params = [f"{name}.{key}" for key in module.state_dict().keys()]
                    break
            for name, param in model.named_parameters():
                if param.requires_grad:
                    if (
                        "level_embed" in name
                        or "sampling_offsets.bias" in name
                        or "value_proj" in name
                        or "output_proj" in name
                        or "reference_points" in name
                        or name in backbone_params
                    ):
                        continue
                    self.assertIn(
                        ((param.data.mean() * 1e9).round() / 1e9).item(),
                        [0.0, 1.0],
                        msg=f"Parameter {name} of model {model_class} seems not properly initialized",
                    )
    @unittest.skip("No support for low_cpu_mem_usage=True.")
    def test_save_load_low_cpu_mem_usage(self):
        pass
    @unittest.skip("No support for low_cpu_mem_usage=True.")
    def test_save_load_low_cpu_mem_usage_checkpoints(self):
        pass
    @unittest.skip("No support for low_cpu_mem_usage=True.")
    def test_save_load_low_cpu_mem_usage_no_safetensors(self):
        pass
    # Inspired by tests.test_modeling_common.ModelTesterMixin.test_tied_weights_keys
    def test_tied_weights_keys(self):
        for model_class in self.all_model_classes:
            # We need to pass model class name to correctly initialize the config.
            # If we don't pass it, the config for `DetaForObjectDetection`` will be initialized
            # with `two_stage=False` and the test will fail because for that case `class_embed`
            # weights are not tied.
            config, _ = self.model_tester.prepare_config_and_inputs_for_common(model_class_name=model_class.__name__)
            config.tie_word_embeddings = True
            model_tied = model_class(config)
            ptrs = collections.defaultdict(list)
            for name, tensor in model_tied.state_dict().items():
                ptrs[id_tensor_storage(tensor)].append(name)
            # These are all the pointers of shared tensors.
            tied_params = [names for _, names in ptrs.items() if len(names) > 1]
            tied_weight_keys = model_tied._tied_weights_keys if model_tied._tied_weights_keys is not None else []
            # Detect we get a hit for each key
            for key in tied_weight_keys:
                is_tied_key = any(re.search(key, p) for group in tied_params for p in group)
                self.assertTrue(is_tied_key, f"{key} is not a tied weight key for {model_class}.")
            # Removed tied weights found from tied params -> there should only be one left after
            for key in tied_weight_keys:
                for i in range(len(tied_params)):
                    tied_params[i] = [p for p in tied_params[i] if re.search(key, p) is None]
            tied_params = [group for group in tied_params if len(group) > 1]
            self.assertListEqual(
                tied_params,
                [],
                f"Missing `_tied_weights_keys` for {model_class}: add all of {tied_params} except one.",
            )
 TOLERANCE = 1e-4
 # We will verify our results on an image of cute cats
 def prepare_img():
    image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
    return image
@require_torchvision
@require_vision
@slow
 class DetaModelIntegrationTests(unittest.TestCase):
    @cached_property
    def default_image_processor(self):
        return AutoImageProcessor.from_pretrained("jozhang97/deta-resnet-50") if is_vision_available() else None
    def test_inference_object_detection_head(self):
        model = DetaForObjectDetection.from_pretrained("jozhang97/deta-resnet-50").to(torch_device)
        image_processor = self.default_image_processor
        image = prepare_img()
        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
        with torch.no_grad():
            outputs = model(**inputs)
        expected_shape_logits = torch.Size((1, 300, model.config.num_labels))
        self.assertEqual(outputs.logits.shape, expected_shape_logits)
        expected_logits = torch.tensor(
            [[-7.3978, -2.5406, -4.1668], [-8.2684, -3.9933, -3.8096], [-7.0515, -3.7973, -5.8516]]
        ).to(torch_device)
        expected_boxes = torch.tensor(
            [[0.5043, 0.4973, 0.9998], [0.2542, 0.5489, 0.4748], [0.5490, 0.2765, 0.0570]]
        ).to(torch_device)
        self.assertTrue(torch.allclose(outputs.logits[0, :3, :3], expected_logits, atol=1e-4))
        expected_shape_boxes = torch.Size((1, 300, 4))
        self.assertEqual(outputs.pred_boxes.shape, expected_shape_boxes)
        self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_boxes, atol=1e-4))
        # verify postprocessing
        results = image_processor.post_process_object_detection(
            outputs, threshold=0.3, target_sizes=[image.size[::-1]]
        )[0]
        expected_scores = torch.tensor([0.6392, 0.6276, 0.5546, 0.5260, 0.4706], device=torch_device)
        expected_labels = [75, 17, 17, 75, 63]
        expected_slice_boxes = torch.tensor([40.5866, 73.2107, 176.1421, 117.1751], device=torch_device)
        self.assertTrue(torch.allclose(results["scores"], expected_scores, atol=1e-4))
        self.assertSequenceEqual(results["labels"].tolist(), expected_labels)
        self.assertTrue(torch.allclose(results["boxes"][0, :], expected_slice_boxes))
    def test_inference_object_detection_head_swin_backbone(self):
        model = DetaForObjectDetection.from_pretrained("jozhang97/deta-swin-large").to(torch_device)
        image_processor = self.default_image_processor
        image = prepare_img()
        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
        with torch.no_grad():
            outputs = model(**inputs)
        expected_shape_logits = torch.Size((1, 300, model.config.num_labels))
        self.assertEqual(outputs.logits.shape, expected_shape_logits)
        expected_logits = torch.tensor(
            [[-7.6308, -2.8485, -5.3737], [-7.2037, -4.5505, -4.8027], [-7.2943, -4.2611, -4.6617]]
        ).to(torch_device)
        expected_boxes = torch.tensor(
            [[0.4987, 0.4969, 0.9999], [0.2549, 0.5498, 0.4805], [0.5498, 0.2757, 0.0569]]
        ).to(torch_device)
        self.assertTrue(torch.allclose(outputs.logits[0, :3, :3], expected_logits, atol=1e-4))
        expected_shape_boxes = torch.Size((1, 300, 4))
        self.assertEqual(outputs.pred_boxes.shape, expected_shape_boxes)
        self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_boxes, atol=1e-4))
        # verify postprocessing
        results = image_processor.post_process_object_detection(
            outputs, threshold=0.3, target_sizes=[image.size[::-1]]
        )[0]
        expected_scores = torch.tensor([0.6831, 0.6826, 0.5684, 0.5464, 0.4392], device=torch_device)
        expected_labels = [17, 17, 75, 75, 63]
        expected_slice_boxes = torch.tensor([345.8478, 23.6754, 639.8562, 372.8265], device=torch_device)
        self.assertTrue(torch.allclose(results["scores"], expected_scores, atol=1e-4))
        self.assertSequenceEqual(results["labels"].tolist(), expected_labels)
        self.assertTrue(torch.allclose(results["boxes"][0, :], expected_slice_boxes))
--- a/tests/models/efficientformer/init.py
+++ b/tests/models/efficientformer/init.py
--- a/tests/models/efficientformer/test_image_processing_efficientformer.py
+++ b/tests/models/efficientformer/test_image_processing_efficientformer.py
@@ -1,99 +0,0 @@
 # coding=utf-8
 # Copyright 2021 HuggingFace Inc.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import unittest
 from transformers.testing_utils import require_torch, require_vision
 from transformers.utils import is_vision_available
 from ...test_image_processing_common import ImageProcessingTestMixin, prepare_image_inputs
 if is_vision_available():
    from transformers import ViTImageProcessor
 class EfficientFormerImageProcessorTester(unittest.TestCase):
    def __init__(
        self,
        parent,
        batch_size=13,
        num_channels=3,
        image_size=224,
        min_resolution=30,
        max_resolution=400,
        do_resize=True,
        size=None,
        do_normalize=True,
        image_mean=[0.5, 0.5, 0.5],
        image_std=[0.5, 0.5, 0.5],
    ):
        size = size if size is not None else {"height": 18, "width": 18}
        self.parent = parent
        self.batch_size = batch_size
        self.num_channels = num_channels
        self.image_size = image_size
        self.min_resolution = min_resolution
        self.max_resolution = max_resolution
        self.do_resize = do_resize
        self.size = size
        self.do_normalize = do_normalize
        self.image_mean = image_mean
        self.image_std = image_std
    def prepare_image_processor_dict(self):
        return {
            "image_mean": self.image_mean,
            "image_std": self.image_std,
            "do_normalize": self.do_normalize,
            "do_resize": self.do_resize,
            "size": self.size,
        }
    def expected_output_image_shape(self, images):
        return self.num_channels, self.size["height"], self.size["width"]
    def prepare_image_inputs(self, equal_resolution=False, numpify=False, torchify=False):
        return prepare_image_inputs(
            batch_size=self.batch_size,
            num_channels=self.num_channels,
            min_resolution=self.min_resolution,
            max_resolution=self.max_resolution,
            equal_resolution=equal_resolution,
            numpify=numpify,
            torchify=torchify,
        )
@require_torch
@require_vision
 class EfficientFormerImageProcessorTest(ImageProcessingTestMixin, unittest.TestCase):
    image_processing_class = ViTImageProcessor if is_vision_available() else None
    def setUp(self):
        self.image_processor_tester = EfficientFormerImageProcessorTester(self)
    @property
    def image_processor_dict(self):
        return self.image_processor_tester.prepare_image_processor_dict()
    def test_image_proc_properties(self):
        image_processor = self.image_processing_class(**self.image_processor_dict)
        self.assertTrue(hasattr(image_processor, "image_mean"))
        self.assertTrue(hasattr(image_processor, "image_std"))
        self.assertTrue(hasattr(image_processor, "do_normalize"))
        self.assertTrue(hasattr(image_processor, "do_resize"))
        self.assertTrue(hasattr(image_processor, "size"))
--- a/tests/models/efficientformer/test_modeling_efficientformer.py
+++ b/tests/models/efficientformer/test_modeling_efficientformer.py
@@ -1,478 +0,0 @@
 # coding=utf-8
 # Copyright 2022 The HuggingFace Inc. team. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """Testing suite for the PyTorch EfficientFormer model."""
 import unittest
 import warnings
 from typing import List
 from transformers import EfficientFormerConfig
 from transformers.testing_utils import require_torch, require_vision, slow, torch_device
 from transformers.utils import cached_property, is_torch_available, is_vision_available
 from ...test_configuration_common import ConfigTester
 from ...test_modeling_common import ModelTesterMixin, floats_tensor, ids_tensor
 from ...test_pipeline_mixin import PipelineTesterMixin
 if is_torch_available():
    import torch
    from transformers import (
        EfficientFormerForImageClassification,
        EfficientFormerForImageClassificationWithTeacher,
        EfficientFormerModel,
    )
    from transformers.models.auto.modeling_auto import (
        MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES,
        MODEL_MAPPING_NAMES,
    )
 if is_vision_available():
    from PIL import Image
    from transformers import EfficientFormerImageProcessor
 class EfficientFormerModelTester:
    def __init__(
        self,
        parent,
        batch_size: int = 13,
        image_size: int = 64,
        patch_size: int = 2,
        embed_dim: int = 3,
        num_channels: int = 3,
        is_training: bool = True,
        use_labels: bool = True,
        hidden_size: int = 128,
        hidden_sizes=[16, 32, 64, 128],
        num_hidden_layers: int = 7,
        num_attention_heads: int = 4,
        intermediate_size: int = 37,
        hidden_act: str = "gelu",
        hidden_dropout_prob: float = 0.1,
        attention_probs_dropout_prob: float = 0.1,
        type_sequence_label_size: int = 10,
        initializer_range: float = 0.02,
        encoder_stride: int = 2,
        num_attention_outputs: int = 1,
        dim: int = 128,
        depths: List[int] = [2, 2, 2, 2],
        resolution: int = 2,
        mlp_expansion_ratio: int = 2,
    ):
        self.parent = parent
        self.batch_size = batch_size
        self.image_size = image_size
        self.patch_size = patch_size
        self.num_channels = num_channels
        self.is_training = is_training
        self.use_labels = use_labels
        self.hidden_size = hidden_size
        self.num_hidden_layers = num_hidden_layers
        self.num_attention_heads = num_attention_heads
        self.intermediate_size = intermediate_size
        self.hidden_act = hidden_act
        self.hidden_dropout_prob = hidden_dropout_prob
        self.attention_probs_dropout_prob = attention_probs_dropout_prob
        self.type_sequence_label_size = type_sequence_label_size
        self.initializer_range = initializer_range
        self.encoder_stride = encoder_stride
        self.num_attention_outputs = num_attention_outputs
        self.embed_dim = embed_dim
        self.seq_length = embed_dim + 1
        self.resolution = resolution
        self.depths = depths
        self.hidden_sizes = hidden_sizes
        self.dim = dim
        self.mlp_expansion_ratio = mlp_expansion_ratio
    def prepare_config_and_inputs(self):
        pixel_values = floats_tensor([self.batch_size, self.num_channels, self.image_size, self.image_size])
        labels = None
        if self.use_labels:
            labels = ids_tensor([self.batch_size], self.type_sequence_label_size)
        config = self.get_config()
        return config, pixel_values, labels
    def get_config(self):
        return EfficientFormerConfig(
            image_size=self.image_size,
            patch_size=self.patch_size,
            num_channels=self.num_channels,
            hidden_size=self.hidden_size,
            num_hidden_layers=self.num_hidden_layers,
            num_attention_heads=self.num_attention_heads,
            intermediate_size=self.intermediate_size,
            hidden_act=self.hidden_act,
            hidden_dropout_prob=self.hidden_dropout_prob,
            attention_probs_dropout_prob=self.attention_probs_dropout_prob,
            is_decoder=False,
            initializer_range=self.initializer_range,
            encoder_stride=self.encoder_stride,
            resolution=self.resolution,
            depths=self.depths,
            hidden_sizes=self.hidden_sizes,
            dim=self.dim,
            mlp_expansion_ratio=self.mlp_expansion_ratio,
        )
    def create_and_check_model(self, config, pixel_values, labels):
        model = EfficientFormerModel(config=config)
        model.to(torch_device)
        model.eval()
        result = model(pixel_values)
        self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.seq_length, self.hidden_size))
    def create_and_check_for_image_classification(self, config, pixel_values, labels):
        config.num_labels = self.type_sequence_label_size
        model = EfficientFormerForImageClassification(config)
        model.to(torch_device)
        model.eval()
        result = model(pixel_values, labels=labels)
        self.parent.assertEqual(result.logits.shape, (self.batch_size, self.type_sequence_label_size))
        # test greyscale images
        config.num_channels = 1
        model = EfficientFormerForImageClassification(config)
        model.to(torch_device)
        model.eval()
        pixel_values = floats_tensor([self.batch_size, 1, self.image_size, self.image_size])
        result = model(pixel_values)
        self.parent.assertEqual(result.logits.shape, (self.batch_size, self.type_sequence_label_size))
    def prepare_config_and_inputs_for_common(self):
        config_and_inputs = self.prepare_config_and_inputs()
        (
            config,
            pixel_values,
            labels,
        ) = config_and_inputs
        inputs_dict = {"pixel_values": pixel_values}
        return config, inputs_dict
@require_torch
 class EfficientFormerModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase):
    """
    Here we also overwrite some of the tests of test_modeling_common.py, as EfficientFormer does not use input_ids, inputs_embeds,
    attention_mask and seq_length.
    """
    all_model_classes = (
        (
            EfficientFormerModel,
            EfficientFormerForImageClassificationWithTeacher,
            EfficientFormerForImageClassification,
        )
        if is_torch_available()
        else ()
    )
    pipeline_model_mapping = (
        {
            "image-feature-extraction": EfficientFormerModel,
            "image-classification": (
                EfficientFormerForImageClassification,
                EfficientFormerForImageClassificationWithTeacher,
            ),
        }
        if is_torch_available()
        else {}
    )
    fx_compatible = False
    test_pruning = False
    test_resize_embeddings = False
    test_head_masking = False
    def setUp(self):
        self.model_tester = EfficientFormerModelTester(self)
        self.config_tester = ConfigTester(
            self, config_class=EfficientFormerConfig, has_text_modality=False, hidden_size=37
        )
    def test_config(self):
        self.config_tester.run_common_tests()
    @unittest.skip(reason="EfficientFormer does not use inputs_embeds")
    def test_inputs_embeds(self):
        pass
    @unittest.skip(reason="EfficientFormer does not support input and output embeddings")
    def test_model_common_attributes(self):
        pass
    def test_hidden_states_output(self):
        def check_hidden_states_output(inputs_dict, config, model_class):
            model = model_class(config)
            model.to(torch_device)
            model.eval()
            with torch.no_grad():
                outputs = model(**self._prepare_for_class(inputs_dict, model_class))
            hidden_states = outputs.encoder_hidden_states if config.is_encoder_decoder else outputs.hidden_states
            expected_num_layers = getattr(
                self.model_tester, "expected_num_hidden_layers", self.model_tester.num_hidden_layers + 1
            )
            self.assertEqual(len(hidden_states), expected_num_layers)
            if hasattr(self.model_tester, "encoder_seq_length"):
                seq_length = self.model_tester.encoder_seq_length
                if hasattr(self.model_tester, "chunk_length") and self.model_tester.chunk_length > 1:
                    seq_length = seq_length * self.model_tester.chunk_length
            else:
                seq_length = self.model_tester.seq_length
            self.assertListEqual(
                list(hidden_states[-1].shape[-2:]),
                [seq_length, self.model_tester.hidden_size],
            )
            if config.is_encoder_decoder:
                hidden_states = outputs.decoder_hidden_states
                self.assertIsInstance(hidden_states, (list, tuple))
                self.assertEqual(len(hidden_states), expected_num_layers)
                seq_len = getattr(self.model_tester, "seq_length", None)
                decoder_seq_length = getattr(self.model_tester, "decoder_seq_length", seq_len)
                self.assertListEqual(
                    list(hidden_states[-1].shape[-2:]),
                    [decoder_seq_length, self.model_tester.hidden_size],
                )
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
        for model_class in self.all_model_classes:
            inputs_dict["output_hidden_states"] = True
            check_hidden_states_output(inputs_dict, config, model_class)
            # check that output_hidden_states also work using config
            del inputs_dict["output_hidden_states"]
            config.output_hidden_states = True
            check_hidden_states_output(inputs_dict, config, model_class)
    def _prepare_for_class(self, inputs_dict, model_class, return_labels=False):
        inputs_dict = super()._prepare_for_class(inputs_dict, model_class, return_labels=return_labels)
        if return_labels:
            if model_class.__name__ == "EfficientFormerForImageClassificationWithTeacher":
                del inputs_dict["labels"]
        return inputs_dict
    def test_model(self):
        config_and_inputs = self.model_tester.prepare_config_and_inputs()
        self.model_tester.create_and_check_model(*config_and_inputs)
    @unittest.skip(reason="EfficientFormer does not implement masked image modeling yet")
    def test_for_masked_image_modeling(self):
        config_and_inputs = self.model_tester.prepare_config_and_inputs()
        self.model_tester.create_and_check_for_masked_image_modeling(*config_and_inputs)
    def test_for_image_classification(self):
        config_and_inputs = self.model_tester.prepare_config_and_inputs()
        self.model_tester.create_and_check_for_image_classification(*config_and_inputs)
    # special case for EfficientFormerForImageClassificationWithTeacher model
    def test_training(self):
        if not self.model_tester.is_training:
            return
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
        config.return_dict = True
        for model_class in self.all_model_classes:
            # EfficientFormerForImageClassificationWithTeacher supports inference-only
            if (
                model_class.__name__ in MODEL_MAPPING_NAMES.values()
                or model_class.__name__ == "EfficientFormerForImageClassificationWithTeacher"
            ):
                continue
            model = model_class(config)
            model.to(torch_device)
            model.train()
            inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True)
            loss = model(**inputs).loss
            loss.backward()
    def test_problem_types(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
        problem_types = [
            {"title": "multi_label_classification", "num_labels": 2, "dtype": torch.float},
            {"title": "single_label_classification", "num_labels": 1, "dtype": torch.long},
            {"title": "regression", "num_labels": 1, "dtype": torch.float},
        ]
        for model_class in self.all_model_classes:
            if (
                model_class.__name__
                not in [
                    *MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES.values(),
                ]
                or model_class.__name__ == "EfficientFormerForImageClassificationWithTeacher"
            ):
                continue
            for problem_type in problem_types:
                with self.subTest(msg=f"Testing {model_class} with {problem_type['title']}"):
                    config.problem_type = problem_type["title"]
                    config.num_labels = problem_type["num_labels"]
                    model = model_class(config)
                    model.to(torch_device)
                    model.train()
                    inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True)
                    if problem_type["num_labels"] > 1:
                        inputs["labels"] = inputs["labels"].unsqueeze(1).repeat(1, problem_type["num_labels"])
                    inputs["labels"] = inputs["labels"].to(problem_type["dtype"])
                    # This tests that we do not trigger the warning form PyTorch "Using a target size that is different
                    # to the input size. This will likely lead to incorrect results due to broadcasting. Please ensure
                    # they have the same size." which is a symptom something in wrong for the regression problem.
                    # See https://github.com/huggingface/transformers/issues/11780
                    with warnings.catch_warnings(record=True) as warning_list:
                        loss = model(**inputs).loss
                    for w in warning_list:
                        if "Using a target size that is different to the input size" in str(w.message):
                            raise ValueError(
                                f"Something is going wrong in the regression problem: intercepted {w.message}"
                            )
                    loss.backward()
    @slow
    def test_model_from_pretrained(self):
        model_name = "snap-research/efficientformer-l1-300"
        model = EfficientFormerModel.from_pretrained(model_name)
        self.assertIsNotNone(model)
    def test_attention_outputs(self):
        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
        config.return_dict = True
        seq_len = getattr(self.model_tester, "seq_length", None)
        encoder_seq_length = getattr(self.model_tester, "encoder_seq_length", seq_len)
        encoder_key_length = getattr(self.model_tester, "key_length", encoder_seq_length)
        chunk_length = getattr(self.model_tester, "chunk_length", None)
        if chunk_length is not None and hasattr(self.model_tester, "num_hashes"):
            encoder_seq_length = encoder_seq_length * self.model_tester.num_hashes
        for model_class in self.all_model_classes:
            inputs_dict["output_attentions"] = True
            inputs_dict["output_hidden_states"] = False
            config.return_dict = True
            model = model_class(config)
            model.to(torch_device)
            model.eval()
            with torch.no_grad():
                outputs = model(**self._prepare_for_class(inputs_dict, model_class))
            attentions = outputs.encoder_attentions if config.is_encoder_decoder else outputs.attentions
            self.assertEqual(len(attentions), self.model_tester.num_attention_outputs)
            # check that output_attentions also work using config
            del inputs_dict["output_attentions"]
            config.output_attentions = True
            model = model_class(config)
            model.to(torch_device)
            model.eval()
            with torch.no_grad():
                outputs = model(**self._prepare_for_class(inputs_dict, model_class))
            attentions = outputs.encoder_attentions if config.is_encoder_decoder else outputs.attentions
            self.assertEqual(len(attentions), self.model_tester.num_attention_outputs)
            if chunk_length is not None:
                self.assertListEqual(
                    list(attentions[0].shape[-4:]),
                    [self.model_tester.num_attention_heads, encoder_seq_length, chunk_length, encoder_key_length],
                )
            else:
                self.assertListEqual(
                    list(attentions[0].shape[-3:]),
                    [self.model_tester.num_attention_heads, encoder_seq_length, encoder_key_length],
                )
 # We will verify our results on an image of cute cats
 def prepare_img():
    image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
    return image
@require_torch
@require_vision
 class EfficientFormerModelIntegrationTest(unittest.TestCase):
    @cached_property
    def default_image_processor(self):
        return (
            EfficientFormerImageProcessor.from_pretrained("snap-research/efficientformer-l1-300")
            if is_vision_available()
            else None
        )
    @slow
    def test_inference_image_classification_head(self):
        model = EfficientFormerForImageClassification.from_pretrained("snap-research/efficientformer-l1-300").to(
            torch_device
        )
        image_processor = self.default_image_processor
        image = prepare_img()
        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
        # forward pass
        with torch.no_grad():
            outputs = model(**inputs)
        # verify the logits
        expected_shape = (1, 1000)
        self.assertEqual(outputs.logits.shape, expected_shape)
        expected_slice = torch.tensor([-0.0555, 0.4825, -0.0852]).to(torch_device)
        self.assertTrue(torch.allclose(outputs.logits[0][:3], expected_slice, atol=1e-4))
    @slow
    def test_inference_image_classification_head_with_teacher(self):
        model = EfficientFormerForImageClassificationWithTeacher.from_pretrained(
            "snap-research/efficientformer-l1-300"
        ).to(torch_device)
        image_processor = self.default_image_processor
        image = prepare_img()
        inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
        # forward pass
        with torch.no_grad():
            outputs = model(**inputs)
        # verify the logits
        expected_shape = (1, 1000)
        self.assertEqual(outputs.logits.shape, expected_shape)
        expected_slice = torch.tensor([-0.1312, 0.4353, -1.0499]).to(torch_device)
        self.assertTrue(torch.allclose(outputs.logits[0][:3], expected_slice, atol=1e-4))
--- a/Show More
+++ b/Show More
`@@ -1,4 +1,4 @@`
	`from ... import PretrainedConfig`	`from .... import PretrainedConfig`


	`class NezhaConfig(PretrainedConfig):`	`class NezhaConfig(PretrainedConfig):`