Deprecate low use models (#30781)

* Deprecate models
- graphormer
- time_series_transformer
- xlm_prophetnet
- qdqbert
- nat
- ernie_m
- tvlt
- nezha
- mega
- jukebox
- vit_hybrid
- x_clip
- deta
- speech_to_text_2
- efficientformer
- realm
- gptsan_japanese

* Fix up

* Fix speech2text2 imports

* Make sure message isn't indented

* Fix docstrings

* Correctly map for deprecated models from model_type

* Uncomment out

* Add back time series transformer and x-clip

* Import fix and fix-up

* Fix up with updated ruff
This commit is contained in:
amyeroberts
2024-05-28 18:07:07 +01:00
committed by GitHub
parent 7f08817be4
commit a564d10afe
142 changed files with 1308 additions and 11908 deletions

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# DETA # DETA
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The DETA model was proposed in [NMS Strikes Back](https://arxiv.org/abs/2212.06137) by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl. The DETA model was proposed in [NMS Strikes Back](https://arxiv.org/abs/2212.06137) by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl.

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# EfficientFormer # EfficientFormer
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The EfficientFormer model was proposed in [EfficientFormer: Vision Transformers at MobileNet Speed](https://arxiv.org/abs/2206.01191) The EfficientFormer model was proposed in [EfficientFormer: Vision Transformers at MobileNet Speed](https://arxiv.org/abs/2206.01191)

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# ErnieM # ErnieM
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The ErnieM model was proposed in [ERNIE-M: Enhanced Multilingual Representation by Aligning The ErnieM model was proposed in [ERNIE-M: Enhanced Multilingual Representation by Aligning

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# GPTSAN-japanese # GPTSAN-japanese
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The GPTSAN-japanese model was released in the repository by Toshiyuki Sakamoto (tanreinama). The GPTSAN-japanese model was released in the repository by Toshiyuki Sakamoto (tanreinama).

View File

@@ -14,6 +14,14 @@ rendered properly in your Markdown viewer.
# Graphormer # Graphormer
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The Graphormer model was proposed in [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by The Graphormer model was proposed in [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by

View File

@@ -15,6 +15,14 @@ rendered properly in your Markdown viewer.
--> -->
# Jukebox # Jukebox
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The Jukebox model was proposed in [Jukebox: A generative model for music](https://arxiv.org/pdf/2005.00341.pdf) The Jukebox model was proposed in [Jukebox: A generative model for music](https://arxiv.org/pdf/2005.00341.pdf)

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# MEGA # MEGA
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The MEGA model was proposed in [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. The MEGA model was proposed in [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer.

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# Neighborhood Attention Transformer # Neighborhood Attention Transformer
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
NAT was proposed in [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143) NAT was proposed in [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143)

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# Nezha # Nezha
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The Nezha model was proposed in [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei et al. The Nezha model was proposed in [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei et al.

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# QDQBERT # QDQBERT
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The QDQBERT model can be referenced in [Integer Quantization for Deep Learning Inference: Principles and Empirical The QDQBERT model can be referenced in [Integer Quantization for Deep Learning Inference: Principles and Empirical

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# REALM # REALM
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The REALM model was proposed in [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang. It's a The REALM model was proposed in [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang. It's a

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# Speech2Text2 # Speech2Text2
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The Speech2Text2 model is used together with [Wav2Vec2](wav2vec2) for Speech Translation models proposed in The Speech2Text2 model is used together with [Wav2Vec2](wav2vec2) for Speech Translation models proposed in

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# TVLT # TVLT
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The TVLT model was proposed in [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156) The TVLT model was proposed in [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156)

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# Hybrid Vision Transformer (ViT Hybrid) # Hybrid Vision Transformer (ViT Hybrid)
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview ## Overview
The hybrid Vision Transformer (ViT) model was proposed in [An Image is Worth 16x16 Words: Transformers for Image Recognition The hybrid Vision Transformer (ViT) model was proposed in [An Image is Worth 16x16 Words: Transformers for Image Recognition

View File

@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# XLM-ProphetNet # XLM-ProphetNet
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
<div class="flex flex-wrap space-x-1"> <div class="flex flex-wrap space-x-1">
<a href="https://huggingface.co/models?filter=xprophetnet"> <a href="https://huggingface.co/models?filter=xprophetnet">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-xprophetnet-blueviolet"> <img alt="Models" src="https://img.shields.io/badge/All_model_pages-xprophetnet-blueviolet">

File diff suppressed because it is too large Load Diff

View File

@@ -67,7 +67,6 @@ from . import (
deit, deit,
deprecated, deprecated,
depth_anything, depth_anything,
deta,
detr, detr,
dialogpt, dialogpt,
dinat, dinat,
@@ -77,13 +76,11 @@ from . import (
donut, donut,
dpr, dpr,
dpt, dpt,
efficientformer,
efficientnet, efficientnet,
electra, electra,
encodec, encodec,
encoder_decoder, encoder_decoder,
ernie, ernie,
ernie_m,
esm, esm,
falcon, falcon,
fastspeech2_conformer, fastspeech2_conformer,
@@ -104,8 +101,6 @@ from . import (
gpt_neox_japanese, gpt_neox_japanese,
gpt_sw3, gpt_sw3,
gptj, gptj,
gptsan_japanese,
graphormer,
grounding_dino, grounding_dino,
groupvit, groupvit,
herbert, herbert,
@@ -118,7 +113,6 @@ from . import (
instructblip, instructblip,
jamba, jamba,
jetmoe, jetmoe,
jukebox,
kosmos2, kosmos2,
layoutlm, layoutlm,
layoutlmv2, layoutlmv2,
@@ -142,7 +136,6 @@ from . import (
maskformer, maskformer,
mbart, mbart,
mbart50, mbart50,
mega,
megatron_bert, megatron_bert,
megatron_gpt2, megatron_gpt2,
mgp_str, mgp_str,
@@ -161,8 +154,6 @@ from . import (
musicgen, musicgen,
musicgen_melody, musicgen_melody,
mvp, mvp,
nat,
nezha,
nllb, nllb,
nllb_moe, nllb_moe,
nougat, nougat,
@@ -190,11 +181,9 @@ from . import (
prophetnet, prophetnet,
pvt, pvt,
pvt_v2, pvt_v2,
qdqbert,
qwen2, qwen2,
qwen2_moe, qwen2_moe,
rag, rag,
realm,
recurrent_gemma, recurrent_gemma,
reformer, reformer,
regnet, regnet,
@@ -215,7 +204,6 @@ from . import (
siglip, siglip,
speech_encoder_decoder, speech_encoder_decoder,
speech_to_text, speech_to_text,
speech_to_text_2,
speecht5, speecht5,
splinter, splinter,
squeezebert, squeezebert,
@@ -234,7 +222,6 @@ from . import (
timesformer, timesformer,
timm_backbone, timm_backbone,
trocr, trocr,
tvlt,
tvp, tvp,
udop, udop,
umt5, umt5,
@@ -250,7 +237,6 @@ from . import (
vision_text_dual_encoder, vision_text_dual_encoder,
visual_bert, visual_bert,
vit, vit,
vit_hybrid,
vit_mae, vit_mae,
vit_msn, vit_msn,
vitdet, vitdet,
@@ -267,7 +253,6 @@ from . import (
x_clip, x_clip,
xglm, xglm,
xlm, xlm,
xlm_prophetnet,
xlm_roberta, xlm_roberta,
xlm_roberta_xl, xlm_roberta_xl,
xlnet, xlnet,

View File

@@ -585,14 +585,29 @@ MODEL_NAMES_MAPPING = OrderedDict(
# `transfo-xl` (as in `CONFIG_MAPPING_NAMES`), we should use `transfo_xl`. # `transfo-xl` (as in `CONFIG_MAPPING_NAMES`), we should use `transfo_xl`.
DEPRECATED_MODELS = [ DEPRECATED_MODELS = [
"bort", "bort",
"deta",
"efficientformer",
"ernie_m",
"gptsan_japanese",
"graphormer",
"jukebox",
"mctct", "mctct",
"mega",
"mmbt", "mmbt",
"nat",
"nezha",
"open_llama", "open_llama",
"qdqbert",
"realm",
"retribert", "retribert",
"speech_to_text_2",
"tapex", "tapex",
"trajectory_transformer", "trajectory_transformer",
"transfo_xl", "transfo_xl",
"tvlt",
"van", "van",
"vit_hybrid",
"xlm_prophetnet",
] ]
SPECIAL_MODEL_TYPE_TO_MODULE_NAME = OrderedDict( SPECIAL_MODEL_TYPE_TO_MODULE_NAME = OrderedDict(
@@ -616,7 +631,11 @@ def model_type_to_module_name(key):
"""Converts a config key to the corresponding module.""" """Converts a config key to the corresponding module."""
# Special treatment # Special treatment
if key in SPECIAL_MODEL_TYPE_TO_MODULE_NAME: if key in SPECIAL_MODEL_TYPE_TO_MODULE_NAME:
return SPECIAL_MODEL_TYPE_TO_MODULE_NAME[key] key = SPECIAL_MODEL_TYPE_TO_MODULE_NAME[key]
if key in DEPRECATED_MODELS:
key = f"deprecated.{key}"
return key
key = key.replace("-", "_") key = key.replace("-", "_")
if key in DEPRECATED_MODELS: if key in DEPRECATED_MODELS:

View File

@@ -14,7 +14,7 @@
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
_import_structure = { _import_structure = {

View File

@@ -14,9 +14,9 @@
# limitations under the License. # limitations under the License.
"""DETA model configuration""" """DETA model configuration"""
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...utils import logging from ....utils import logging
from ..auto import CONFIG_MAPPING from ...auto import CONFIG_MAPPING
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -19,9 +19,9 @@ from typing import Any, Callable, Dict, Iterable, List, Optional, Tuple, Union
import numpy as np import numpy as np
from ...feature_extraction_utils import BatchFeature from ....feature_extraction_utils import BatchFeature
from ...image_processing_utils import BaseImageProcessor, get_size_dict from ....image_processing_utils import BaseImageProcessor, get_size_dict
from ...image_transforms import ( from ....image_transforms import (
PaddingMode, PaddingMode,
center_to_corners_format, center_to_corners_format,
corners_to_center_format, corners_to_center_format,
@@ -31,7 +31,7 @@ from ...image_transforms import (
rgb_to_id, rgb_to_id,
to_channel_dimension_format, to_channel_dimension_format,
) )
from ...image_utils import ( from ....image_utils import (
IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_MEAN,
IMAGENET_DEFAULT_STD, IMAGENET_DEFAULT_STD,
AnnotationFormat, AnnotationFormat,
@@ -48,7 +48,7 @@ from ...image_utils import (
validate_annotations, validate_annotations,
validate_preprocess_arguments, validate_preprocess_arguments,
) )
from ...utils import ( from ....utils import (
is_flax_available, is_flax_available,
is_jax_tensor, is_jax_tensor,
is_tf_available, is_tf_available,
@@ -59,7 +59,7 @@ from ...utils import (
is_vision_available, is_vision_available,
logging, logging,
) )
from ...utils.generic import TensorType from ....utils.generic import TensorType
if is_torch_available(): if is_torch_available():

View File

@@ -28,8 +28,8 @@ from torch import Tensor, nn
from torch.autograd import Function from torch.autograd import Function
from torch.autograd.function import once_differentiable from torch.autograd.function import once_differentiable
from ...activations import ACT2FN from ....activations import ACT2FN
from ...file_utils import ( from ....file_utils import (
ModelOutput, ModelOutput,
add_start_docstrings, add_start_docstrings,
add_start_docstrings_to_model_forward, add_start_docstrings_to_model_forward,
@@ -38,12 +38,12 @@ from ...file_utils import (
is_vision_available, is_vision_available,
replace_return_docstrings, replace_return_docstrings,
) )
from ...modeling_attn_mask_utils import _prepare_4d_attention_mask from ....modeling_attn_mask_utils import _prepare_4d_attention_mask
from ...modeling_outputs import BaseModelOutput from ....modeling_outputs import BaseModelOutput
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...pytorch_utils import meshgrid from ....pytorch_utils import meshgrid
from ...utils import is_accelerate_available, is_ninja_available, is_torchvision_available, logging, requires_backends from ....utils import is_accelerate_available, is_ninja_available, is_torchvision_available, logging, requires_backends
from ...utils.backbone_utils import load_backbone from ....utils.backbone_utils import load_backbone
from .configuration_deta import DetaConfig from .configuration_deta import DetaConfig

View File

@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import ( from ....utils import (
OptionalDependencyNotAvailable, OptionalDependencyNotAvailable,
_LazyModule, _LazyModule,
is_tf_available, is_tf_available,

View File

@@ -16,8 +16,8 @@
from typing import List from typing import List
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -18,13 +18,13 @@ from typing import Dict, List, Optional, Union
import numpy as np import numpy as np
from ...image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict from ....image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
from ...image_transforms import ( from ....image_transforms import (
get_resize_output_image_size, get_resize_output_image_size,
resize, resize,
to_channel_dimension_format, to_channel_dimension_format,
) )
from ...image_utils import ( from ....image_utils import (
IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_MEAN,
IMAGENET_DEFAULT_STD, IMAGENET_DEFAULT_STD,
ChannelDimension, ChannelDimension,
@@ -38,7 +38,7 @@ from ...image_utils import (
validate_kwargs, validate_kwargs,
validate_preprocess_arguments, validate_preprocess_arguments,
) )
from ...utils import TensorType, logging from ....utils import TensorType, logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -23,10 +23,10 @@ import torch.utils.checkpoint
from torch import nn from torch import nn
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, ImageClassifierOutput from ....modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, ImageClassifierOutput
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...utils import ( from ....utils import (
ModelOutput, ModelOutput,
add_code_sample_docstrings, add_code_sample_docstrings,
add_start_docstrings, add_start_docstrings,

View File

@@ -20,13 +20,13 @@ from typing import Optional, Tuple, Union
import tensorflow as tf import tensorflow as tf
from ...activations_tf import ACT2FN from ....activations_tf import ACT2FN
from ...modeling_tf_outputs import ( from ....modeling_tf_outputs import (
TFBaseModelOutput, TFBaseModelOutput,
TFBaseModelOutputWithPooling, TFBaseModelOutputWithPooling,
TFImageClassifierOutput, TFImageClassifierOutput,
) )
from ...modeling_tf_utils import ( from ....modeling_tf_utils import (
TFPreTrainedModel, TFPreTrainedModel,
TFSequenceClassificationLoss, TFSequenceClassificationLoss,
get_initializer, get_initializer,
@@ -34,8 +34,8 @@ from ...modeling_tf_utils import (
keras_serializable, keras_serializable,
unpack_inputs, unpack_inputs,
) )
from ...tf_utils import shape_list, stable_softmax from ....tf_utils import shape_list, stable_softmax
from ...utils import ( from ....utils import (
ModelOutput, ModelOutput,
add_code_sample_docstrings, add_code_sample_docstrings,
add_start_docstrings, add_start_docstrings,

View File

@@ -14,7 +14,7 @@
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
# rely on isort to merge the imports # rely on isort to merge the imports
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_sentencepiece_available, is_torch_available from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_sentencepiece_available, is_torch_available
_import_structure = { _import_structure = {

View File

@@ -19,7 +19,7 @@ from __future__ import annotations
from typing import Dict from typing import Dict
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
class ErnieMConfig(PretrainedConfig): class ErnieMConfig(PretrainedConfig):

View File

@@ -22,8 +22,8 @@ import torch.utils.checkpoint
from torch import nn, tensor from torch import nn, tensor
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_outputs import ( from ....modeling_outputs import (
BaseModelOutputWithPastAndCrossAttentions, BaseModelOutputWithPastAndCrossAttentions,
BaseModelOutputWithPoolingAndCrossAttentions, BaseModelOutputWithPoolingAndCrossAttentions,
MultipleChoiceModelOutput, MultipleChoiceModelOutput,
@@ -31,9 +31,9 @@ from ...modeling_outputs import (
SequenceClassifierOutput, SequenceClassifierOutput,
TokenClassifierOutput, TokenClassifierOutput,
) )
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer from ....pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
from ...utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging from ....utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
from .configuration_ernie_m import ErnieMConfig from .configuration_ernie_m import ErnieMConfig

View File

@@ -21,8 +21,8 @@ from typing import Any, Dict, List, Optional, Tuple
import sentencepiece as spm import sentencepiece as spm
from ...tokenization_utils import PreTrainedTokenizer from ....tokenization_utils import PreTrainedTokenizer
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -14,7 +14,7 @@
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import ( from ....utils import (
OptionalDependencyNotAvailable, OptionalDependencyNotAvailable,
_LazyModule, _LazyModule,
is_flax_available, is_flax_available,

View File

@@ -14,8 +14,8 @@
# limitations under the License. # limitations under the License.
"""GPTSAN-japanese model configuration""" """GPTSAN-japanese model configuration"""
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -20,10 +20,10 @@ from typing import List, Optional, Tuple, Union
import torch import torch
import torch.nn as nn import torch.nn as nn
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_outputs import MoECausalLMOutputWithPast, MoEModelOutputWithPastAndCrossAttentions from ....modeling_outputs import MoECausalLMOutputWithPast, MoEModelOutputWithPastAndCrossAttentions
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...utils import ( from ....utils import (
DUMMY_INPUTS, DUMMY_INPUTS,
DUMMY_MASK, DUMMY_MASK,
add_start_docstrings, add_start_docstrings,

View File

@@ -22,8 +22,8 @@ from typing import List, Optional, Tuple, Union
import numpy as np import numpy as np
from ...tokenization_utils import PreTrainedTokenizer from ....tokenization_utils import PreTrainedTokenizer
from ...tokenization_utils_base import ( from ....tokenization_utils_base import (
BatchEncoding, BatchEncoding,
PreTokenizedInput, PreTokenizedInput,
PreTokenizedInputPair, PreTokenizedInputPair,
@@ -31,7 +31,7 @@ from ...tokenization_utils_base import (
TextInputPair, TextInputPair,
TruncationStrategy, TruncationStrategy,
) )
from ...utils import PaddingStrategy, logging from ....utils import PaddingStrategy, logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available
_import_structure = { _import_structure = {

View File

@@ -6,7 +6,7 @@ from typing import Any, Dict, List, Mapping
import numpy as np import numpy as np
import torch import torch
from ...utils import is_cython_available, requires_backends from ....utils import is_cython_available, requires_backends
if is_cython_available(): if is_cython_available():

View File

@@ -14,8 +14,8 @@
# limitations under the License. # limitations under the License.
"""Graphormer model configuration""" """Graphormer model configuration"""
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -21,13 +21,13 @@ import torch
import torch.nn as nn import torch.nn as nn
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_outputs import ( from ....modeling_outputs import (
BaseModelOutputWithNoAttention, BaseModelOutputWithNoAttention,
SequenceClassifierOutput, SequenceClassifierOutput,
) )
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...utils import logging from ....utils import logging
from .configuration_graphormer import GraphormerConfig from .configuration_graphormer import GraphormerConfig

View File

@@ -14,7 +14,7 @@
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available
_import_structure = { _import_structure = {

View File

@@ -17,8 +17,8 @@
import os import os
from typing import List, Union from typing import List, Union
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -24,10 +24,10 @@ import torch.nn.functional as F
from torch import nn from torch import nn
from torch.nn import LayerNorm as FusedLayerNorm from torch.nn import LayerNorm as FusedLayerNorm
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...utils import add_start_docstrings, logging from ....utils import add_start_docstrings, logging
from ...utils.logging import tqdm from ....utils.logging import tqdm
from .configuration_jukebox import ATTENTION_PATTERNS, JukeboxConfig, JukeboxPriorConfig, JukeboxVQVAEConfig from .configuration_jukebox import ATTENTION_PATTERNS, JukeboxConfig, JukeboxPriorConfig, JukeboxVQVAEConfig

View File

@@ -24,10 +24,10 @@ from typing import Any, Dict, List, Optional, Tuple, Union
import numpy as np import numpy as np
import regex import regex
from ...tokenization_utils import AddedToken, PreTrainedTokenizer from ....tokenization_utils import AddedToken, PreTrainedTokenizer
from ...tokenization_utils_base import BatchEncoding from ....tokenization_utils_base import BatchEncoding
from ...utils import TensorType, is_flax_available, is_tf_available, is_torch_available, logging from ....utils import TensorType, is_flax_available, is_tf_available, is_torch_available, logging
from ...utils.generic import _is_jax, _is_numpy from ....utils.generic import _is_jax, _is_numpy
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -14,7 +14,7 @@
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import ( from ....utils import (
OptionalDependencyNotAvailable, OptionalDependencyNotAvailable,
_LazyModule, _LazyModule,
is_torch_available, is_torch_available,

View File

@@ -17,9 +17,9 @@
from collections import OrderedDict from collections import OrderedDict
from typing import Mapping from typing import Mapping
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...onnx import OnnxConfig from ....onnx import OnnxConfig
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -23,8 +23,8 @@ import torch.utils.checkpoint
from torch import nn from torch import nn
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_outputs import ( from ....modeling_outputs import (
BaseModelOutputWithPoolingAndCrossAttentions, BaseModelOutputWithPoolingAndCrossAttentions,
CausalLMOutputWithCrossAttentions, CausalLMOutputWithCrossAttentions,
MaskedLMOutput, MaskedLMOutput,
@@ -33,9 +33,9 @@ from ...modeling_outputs import (
SequenceClassifierOutput, SequenceClassifierOutput,
TokenClassifierOutput, TokenClassifierOutput,
) )
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...pytorch_utils import ALL_LAYERNORM_LAYERS from ....pytorch_utils import ALL_LAYERNORM_LAYERS
from ...utils import ( from ....utils import (
add_code_sample_docstrings, add_code_sample_docstrings,
add_start_docstrings, add_start_docstrings,
add_start_docstrings_to_model_forward, add_start_docstrings_to_model_forward,

View File

@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available
_import_structure = {"configuration_nat": ["NatConfig"]} _import_structure = {"configuration_nat": ["NatConfig"]}

View File

@@ -14,9 +14,9 @@
# limitations under the License. # limitations under the License.
"""Neighborhood Attention Transformer model configuration""" """Neighborhood Attention Transformer model configuration"""
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...utils import logging from ....utils import logging
from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices from ....utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -23,11 +23,11 @@ import torch.utils.checkpoint
from torch import nn from torch import nn
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_outputs import BackboneOutput from ....modeling_outputs import BackboneOutput
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer from ....pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
from ...utils import ( from ....utils import (
ModelOutput, ModelOutput,
OptionalDependencyNotAvailable, OptionalDependencyNotAvailable,
add_code_sample_docstrings, add_code_sample_docstrings,
@@ -38,7 +38,7 @@ from ...utils import (
replace_return_docstrings, replace_return_docstrings,
requires_backends, requires_backends,
) )
from ...utils.backbone_utils import BackboneMixin from ....utils.backbone_utils import BackboneMixin
from .configuration_nat import NatConfig from .configuration_nat import NatConfig

View File

@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available
_import_structure = { _import_structure = {

View File

@@ -1,4 +1,4 @@
from ... import PretrainedConfig from .... import PretrainedConfig
class NezhaConfig(PretrainedConfig): class NezhaConfig(PretrainedConfig):

View File

@@ -25,8 +25,8 @@ import torch.utils.checkpoint
from torch import nn from torch import nn
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_outputs import ( from ....modeling_outputs import (
BaseModelOutputWithPastAndCrossAttentions, BaseModelOutputWithPastAndCrossAttentions,
BaseModelOutputWithPoolingAndCrossAttentions, BaseModelOutputWithPoolingAndCrossAttentions,
MaskedLMOutput, MaskedLMOutput,
@@ -36,9 +36,9 @@ from ...modeling_outputs import (
SequenceClassifierOutput, SequenceClassifierOutput,
TokenClassifierOutput, TokenClassifierOutput,
) )
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...pytorch_utils import apply_chunking_to_forward, find_pruneable_heads_and_indices, prune_linear_layer from ....pytorch_utils import apply_chunking_to_forward, find_pruneable_heads_and_indices, prune_linear_layer
from ...utils import ( from ....utils import (
ModelOutput, ModelOutput,
add_code_sample_docstrings, add_code_sample_docstrings,
add_start_docstrings, add_start_docstrings,

View File

@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available
_import_structure = {"configuration_qdqbert": ["QDQBertConfig"]} _import_structure = {"configuration_qdqbert": ["QDQBertConfig"]}

View File

@@ -14,8 +14,8 @@
# limitations under the License. # limitations under the License.
"""QDQBERT model configuration""" """QDQBERT model configuration"""
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -25,8 +25,8 @@ import torch.utils.checkpoint
from torch import nn from torch import nn
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_outputs import ( from ....modeling_outputs import (
BaseModelOutputWithPastAndCrossAttentions, BaseModelOutputWithPastAndCrossAttentions,
BaseModelOutputWithPoolingAndCrossAttentions, BaseModelOutputWithPoolingAndCrossAttentions,
CausalLMOutputWithCrossAttentions, CausalLMOutputWithCrossAttentions,
@@ -37,9 +37,9 @@ from ...modeling_outputs import (
SequenceClassifierOutput, SequenceClassifierOutput,
TokenClassifierOutput, TokenClassifierOutput,
) )
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer from ....pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
from ...utils import ( from ....utils import (
add_code_sample_docstrings, add_code_sample_docstrings,
add_start_docstrings, add_start_docstrings,
add_start_docstrings_to_model_forward, add_start_docstrings_to_model_forward,

View File

@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_tokenizers_available, is_torch_available
_import_structure = { _import_structure = {

View File

@@ -14,8 +14,8 @@
# limitations under the License. # limitations under the License.
"""REALM model configuration.""" """REALM model configuration."""
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -23,16 +23,16 @@ import torch
from torch import nn from torch import nn
from torch.nn import CrossEntropyLoss from torch.nn import CrossEntropyLoss
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_outputs import ( from ....modeling_outputs import (
BaseModelOutputWithPastAndCrossAttentions, BaseModelOutputWithPastAndCrossAttentions,
BaseModelOutputWithPoolingAndCrossAttentions, BaseModelOutputWithPoolingAndCrossAttentions,
MaskedLMOutput, MaskedLMOutput,
ModelOutput, ModelOutput,
) )
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...pytorch_utils import apply_chunking_to_forward, find_pruneable_heads_and_indices, prune_linear_layer from ....pytorch_utils import apply_chunking_to_forward, find_pruneable_heads_and_indices, prune_linear_layer
from ...utils import add_start_docstrings, add_start_docstrings_to_model_forward, logging, replace_return_docstrings from ....utils import add_start_docstrings, add_start_docstrings_to_model_forward, logging, replace_return_docstrings
from .configuration_realm import RealmConfig from .configuration_realm import RealmConfig

View File

@@ -20,8 +20,8 @@ from typing import Optional, Union
import numpy as np import numpy as np
from huggingface_hub import hf_hub_download from huggingface_hub import hf_hub_download
from ... import AutoTokenizer from .... import AutoTokenizer
from ...utils import logging from ....utils import logging
_REALM_BLOCK_RECORDS_FILENAME = "block_records.npy" _REALM_BLOCK_RECORDS_FILENAME = "block_records.npy"

View File

@@ -19,9 +19,9 @@ import os
import unicodedata import unicodedata
from typing import List, Optional, Tuple from typing import List, Optional, Tuple
from ...tokenization_utils import PreTrainedTokenizer, _is_control, _is_punctuation, _is_whitespace from ....tokenization_utils import PreTrainedTokenizer, _is_control, _is_punctuation, _is_whitespace
from ...tokenization_utils_base import BatchEncoding from ....tokenization_utils_base import BatchEncoding
from ...utils import PaddingStrategy, logging from ....utils import PaddingStrategy, logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -19,9 +19,9 @@ from typing import List, Optional, Tuple
from tokenizers import normalizers from tokenizers import normalizers
from ...tokenization_utils_base import BatchEncoding from ....tokenization_utils_base import BatchEncoding
from ...tokenization_utils_fast import PreTrainedTokenizerFast from ....tokenization_utils_fast import PreTrainedTokenizerFast
from ...utils import PaddingStrategy, logging from ....utils import PaddingStrategy, logging
from .tokenization_realm import RealmTokenizer from .tokenization_realm import RealmTokenizer

View File

@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import ( from ....utils import (
OptionalDependencyNotAvailable, OptionalDependencyNotAvailable,
_LazyModule, _LazyModule,
is_sentencepiece_available, is_sentencepiece_available,

View File

@@ -14,8 +14,8 @@
# limitations under the License. # limitations under the License.
"""Speech2Text model configuration""" """Speech2Text model configuration"""
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -22,11 +22,11 @@ import torch
from torch import nn from torch import nn
from torch.nn import CrossEntropyLoss from torch.nn import CrossEntropyLoss
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_attn_mask_utils import _prepare_4d_attention_mask, _prepare_4d_causal_attention_mask from ....modeling_attn_mask_utils import _prepare_4d_attention_mask, _prepare_4d_causal_attention_mask
from ...modeling_outputs import BaseModelOutputWithPastAndCrossAttentions, CausalLMOutputWithCrossAttentions from ....modeling_outputs import BaseModelOutputWithPastAndCrossAttentions, CausalLMOutputWithCrossAttentions
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...utils import add_start_docstrings, logging, replace_return_docstrings from ....utils import add_start_docstrings, logging, replace_return_docstrings
from .configuration_speech_to_text_2 import Speech2Text2Config from .configuration_speech_to_text_2 import Speech2Text2Config

View File

@@ -19,7 +19,7 @@ Speech processor class for Speech2Text2
import warnings import warnings
from contextlib import contextmanager from contextlib import contextmanager
from ...processing_utils import ProcessorMixin from ....processing_utils import ProcessorMixin
class Speech2Text2Processor(ProcessorMixin): class Speech2Text2Processor(ProcessorMixin):

View File

@@ -18,8 +18,8 @@ import json
import os import os
from typing import Dict, List, Optional, Tuple from typing import Dict, List, Optional, Tuple
from ...tokenization_utils import PreTrainedTokenizer from ....tokenization_utils import PreTrainedTokenizer
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -17,7 +17,7 @@
# limitations under the License. # limitations under the License.
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import ( from ....utils import (
OptionalDependencyNotAvailable, OptionalDependencyNotAvailable,
_LazyModule, _LazyModule,
is_torch_available, is_torch_available,

View File

@@ -14,8 +14,8 @@
# limitations under the License. # limitations under the License.
"""TVLT model configuration""" """TVLT model configuration"""
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -19,9 +19,9 @@ from typing import List, Optional, Union
import numpy as np import numpy as np
from ...audio_utils import mel_filter_bank, spectrogram, window_function from ....audio_utils import mel_filter_bank, spectrogram, window_function
from ...feature_extraction_sequence_utils import BatchFeature, SequenceFeatureExtractor from ....feature_extraction_sequence_utils import BatchFeature, SequenceFeatureExtractor
from ...utils import TensorType, logging from ....utils import TensorType, logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -18,13 +18,13 @@ from typing import Dict, List, Optional, Union
import numpy as np import numpy as np
from ...image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict from ....image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
from ...image_transforms import ( from ....image_transforms import (
get_resize_output_image_size, get_resize_output_image_size,
resize, resize,
to_channel_dimension_format, to_channel_dimension_format,
) )
from ...image_utils import ( from ....image_utils import (
IMAGENET_STANDARD_MEAN, IMAGENET_STANDARD_MEAN,
IMAGENET_STANDARD_STD, IMAGENET_STANDARD_STD,
ChannelDimension, ChannelDimension,
@@ -38,7 +38,7 @@ from ...image_utils import (
validate_kwargs, validate_kwargs,
validate_preprocess_arguments, validate_preprocess_arguments,
) )
from ...utils import TensorType, logging from ....utils import TensorType, logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -25,11 +25,11 @@ import torch.utils.checkpoint
from torch import nn from torch import nn
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_outputs import BaseModelOutput, SequenceClassifierOutput from ....modeling_outputs import BaseModelOutput, SequenceClassifierOutput
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer from ....pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
from ...utils import ( from ....utils import (
ModelOutput, ModelOutput,
add_start_docstrings, add_start_docstrings,
add_start_docstrings_to_model_forward, add_start_docstrings_to_model_forward,

View File

@@ -16,7 +16,7 @@
Processor class for TVLT. Processor class for TVLT.
""" """
from ...processing_utils import ProcessorMixin from ....processing_utils import ProcessorMixin
class TvltProcessor(ProcessorMixin): class TvltProcessor(ProcessorMixin):

View File

@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
_import_structure = {"configuration_vit_hybrid": ["ViTHybridConfig"]} _import_structure = {"configuration_vit_hybrid": ["ViTHybridConfig"]}

View File

@@ -14,10 +14,10 @@
# limitations under the License. # limitations under the License.
"""ViT Hybrid model configuration""" """ViT Hybrid model configuration"""
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...utils import logging from ....utils import logging
from ..auto.configuration_auto import CONFIG_MAPPING from ...auto.configuration_auto import CONFIG_MAPPING
from ..bit import BitConfig from ...bit import BitConfig
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -18,14 +18,14 @@ from typing import Dict, List, Optional, Union
import numpy as np import numpy as np
from ...image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict from ....image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict
from ...image_transforms import ( from ....image_transforms import (
convert_to_rgb, convert_to_rgb,
get_resize_output_image_size, get_resize_output_image_size,
resize, resize,
to_channel_dimension_format, to_channel_dimension_format,
) )
from ...image_utils import ( from ....image_utils import (
OPENAI_CLIP_MEAN, OPENAI_CLIP_MEAN,
OPENAI_CLIP_STD, OPENAI_CLIP_STD,
ChannelDimension, ChannelDimension,
@@ -39,7 +39,7 @@ from ...image_utils import (
validate_kwargs, validate_kwargs,
validate_preprocess_arguments, validate_preprocess_arguments,
) )
from ...utils import TensorType, is_vision_available, logging from ....utils import TensorType, is_vision_available, logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -23,12 +23,12 @@ import torch.utils.checkpoint
from torch import nn from torch import nn
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, ImageClassifierOutput from ....modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, ImageClassifierOutput
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer from ....pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
from ...utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging from ....utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
from ...utils.backbone_utils import load_backbone from ....utils.backbone_utils import load_backbone
from .configuration_vit_hybrid import ViTHybridConfig from .configuration_vit_hybrid import ViTHybridConfig

View File

@@ -13,7 +13,7 @@
# limitations under the License. # limitations under the License.
from typing import TYPE_CHECKING from typing import TYPE_CHECKING
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_sentencepiece_available, is_torch_available from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_sentencepiece_available, is_torch_available
_import_structure = { _import_structure = {

View File

@@ -16,8 +16,8 @@
from typing import Callable, Optional, Union from typing import Callable, Optional, Union
from ...configuration_utils import PretrainedConfig from ....configuration_utils import PretrainedConfig
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -25,10 +25,10 @@ import torch.utils.checkpoint
from torch import Tensor, nn from torch import Tensor, nn
from torch.nn import LayerNorm from torch.nn import LayerNorm
from ...activations import ACT2FN from ....activations import ACT2FN
from ...modeling_outputs import BaseModelOutput from ....modeling_outputs import BaseModelOutput
from ...modeling_utils import PreTrainedModel from ....modeling_utils import PreTrainedModel
from ...utils import ( from ....utils import (
ModelOutput, ModelOutput,
add_start_docstrings, add_start_docstrings,
add_start_docstrings_to_model_forward, add_start_docstrings_to_model_forward,

View File

@@ -18,8 +18,8 @@ import os
from shutil import copyfile from shutil import copyfile
from typing import Any, Dict, List, Optional, Tuple from typing import Any, Dict, List, Optional, Tuple
from ...tokenization_utils import PreTrainedTokenizer from ....tokenization_utils import PreTrainedTokenizer
from ...utils import logging from ....utils import logging
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)

View File

@@ -71,7 +71,6 @@ _IMAGE_CLASS_EXPECTED_OUTPUT = "tabby, tabby cat"
@dataclass @dataclass
# Copied from transformers.models.nat.modeling_nat.NatEncoderOutput with Nat->Dinat
class DinatEncoderOutput(ModelOutput): class DinatEncoderOutput(ModelOutput):
""" """
Dinat encoder's outputs, with potential hidden states and attentions. Dinat encoder's outputs, with potential hidden states and attentions.
@@ -105,7 +104,6 @@ class DinatEncoderOutput(ModelOutput):
@dataclass @dataclass
# Copied from transformers.models.nat.modeling_nat.NatModelOutput with Nat->Dinat
class DinatModelOutput(ModelOutput): class DinatModelOutput(ModelOutput):
""" """
Dinat model's outputs that also contains a pooling of the last hidden states. Dinat model's outputs that also contains a pooling of the last hidden states.
@@ -142,7 +140,6 @@ class DinatModelOutput(ModelOutput):
@dataclass @dataclass
# Copied from transformers.models.nat.modeling_nat.NatImageClassifierOutput with Nat->Dinat
class DinatImageClassifierOutput(ModelOutput): class DinatImageClassifierOutput(ModelOutput):
""" """
Dinat outputs for image classification. Dinat outputs for image classification.
@@ -178,7 +175,6 @@ class DinatImageClassifierOutput(ModelOutput):
reshaped_hidden_states: Optional[Tuple[torch.FloatTensor, ...]] = None reshaped_hidden_states: Optional[Tuple[torch.FloatTensor, ...]] = None
# Copied from transformers.models.nat.modeling_nat.NatEmbeddings with Nat->Dinat
class DinatEmbeddings(nn.Module): class DinatEmbeddings(nn.Module):
""" """
Construct the patch and position embeddings. Construct the patch and position embeddings.
@@ -201,7 +197,6 @@ class DinatEmbeddings(nn.Module):
return embeddings return embeddings
# Copied from transformers.models.nat.modeling_nat.NatPatchEmbeddings with Nat->Dinat
class DinatPatchEmbeddings(nn.Module): class DinatPatchEmbeddings(nn.Module):
""" """
This class turns `pixel_values` of shape `(batch_size, num_channels, height, width)` into the initial This class turns `pixel_values` of shape `(batch_size, num_channels, height, width)` into the initial
@@ -238,7 +233,6 @@ class DinatPatchEmbeddings(nn.Module):
return embeddings return embeddings
# Copied from transformers.models.nat.modeling_nat.NatDownsampler with Nat->Dinat
class DinatDownsampler(nn.Module): class DinatDownsampler(nn.Module):
""" """
Convolutional Downsampling Layer. Convolutional Downsampling Layer.
@@ -321,7 +315,6 @@ class NeighborhoodAttention(nn.Module):
self.dropout = nn.Dropout(config.attention_probs_dropout_prob) self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
# Copied from transformers.models.nat.modeling_nat.NeighborhoodAttention.transpose_for_scores with Nat->Dinat
def transpose_for_scores(self, x): def transpose_for_scores(self, x):
new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size) new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)
x = x.view(new_x_shape) x = x.view(new_x_shape)
@@ -361,7 +354,6 @@ class NeighborhoodAttention(nn.Module):
return outputs return outputs
# Copied from transformers.models.nat.modeling_nat.NeighborhoodAttentionOutput
class NeighborhoodAttentionOutput(nn.Module): class NeighborhoodAttentionOutput(nn.Module):
def __init__(self, config, dim): def __init__(self, config, dim):
super().__init__() super().__init__()
@@ -382,7 +374,6 @@ class NeighborhoodAttentionModule(nn.Module):
self.output = NeighborhoodAttentionOutput(config, dim) self.output = NeighborhoodAttentionOutput(config, dim)
self.pruned_heads = set() self.pruned_heads = set()
# Copied from transformers.models.nat.modeling_nat.NeighborhoodAttentionModule.prune_heads
def prune_heads(self, heads): def prune_heads(self, heads):
if len(heads) == 0: if len(heads) == 0:
return return
@@ -401,7 +392,6 @@ class NeighborhoodAttentionModule(nn.Module):
self.self.all_head_size = self.self.attention_head_size * self.self.num_attention_heads self.self.all_head_size = self.self.attention_head_size * self.self.num_attention_heads
self.pruned_heads = self.pruned_heads.union(heads) self.pruned_heads = self.pruned_heads.union(heads)
# Copied from transformers.models.nat.modeling_nat.NeighborhoodAttentionModule.forward
def forward( def forward(
self, self,
hidden_states: torch.Tensor, hidden_states: torch.Tensor,
@@ -413,7 +403,6 @@ class NeighborhoodAttentionModule(nn.Module):
return outputs return outputs
# Copied from transformers.models.nat.modeling_nat.NatIntermediate with Nat->Dinat
class DinatIntermediate(nn.Module): class DinatIntermediate(nn.Module):
def __init__(self, config, dim): def __init__(self, config, dim):
super().__init__() super().__init__()
@@ -429,7 +418,6 @@ class DinatIntermediate(nn.Module):
return hidden_states return hidden_states
# Copied from transformers.models.nat.modeling_nat.NatOutput with Nat->Dinat
class DinatOutput(nn.Module): class DinatOutput(nn.Module):
def __init__(self, config, dim): def __init__(self, config, dim):
super().__init__() super().__init__()
@@ -539,7 +527,6 @@ class DinatStage(nn.Module):
self.pointing = False self.pointing = False
# Copied from transformers.models.nat.modeling_nat.NatStage.forward
def forward( def forward(
self, self,
hidden_states: torch.Tensor, hidden_states: torch.Tensor,
@@ -582,7 +569,6 @@ class DinatEncoder(nn.Module):
] ]
) )
# Copied from transformers.models.nat.modeling_nat.NatEncoder.forward with Nat->Dinat
def forward( def forward(
self, self,
hidden_states: torch.Tensor, hidden_states: torch.Tensor,
@@ -687,7 +673,6 @@ DINAT_INPUTS_DOCSTRING = r"""
"The bare Dinat Model transformer outputting raw hidden-states without any specific head on top.", "The bare Dinat Model transformer outputting raw hidden-states without any specific head on top.",
DINAT_START_DOCSTRING, DINAT_START_DOCSTRING,
) )
# Copied from transformers.models.nat.modeling_nat.NatModel with Nat->Dinat, NAT->DINAT
class DinatModel(DinatPreTrainedModel): class DinatModel(DinatPreTrainedModel):
def __init__(self, config, add_pooling_layer=True): def __init__(self, config, add_pooling_layer=True):
super().__init__(config) super().__init__(config)

File diff suppressed because it is too large Load Diff

View File

@@ -72,6 +72,13 @@ class ErnieMTokenizer(metaclass=DummyObject):
requires_backends(self, ["sentencepiece"]) requires_backends(self, ["sentencepiece"])
class XLMProphetNetTokenizer(metaclass=DummyObject):
_backends = ["sentencepiece"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["sentencepiece"])
class FNetTokenizer(metaclass=DummyObject): class FNetTokenizer(metaclass=DummyObject):
_backends = ["sentencepiece"] _backends = ["sentencepiece"]
@@ -233,13 +240,6 @@ class XGLMTokenizer(metaclass=DummyObject):
requires_backends(self, ["sentencepiece"]) requires_backends(self, ["sentencepiece"])
class XLMProphetNetTokenizer(metaclass=DummyObject):
_backends = ["sentencepiece"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["sentencepiece"])
class XLMRobertaTokenizer(metaclass=DummyObject): class XLMRobertaTokenizer(metaclass=DummyObject):
_backends = ["sentencepiece"] _backends = ["sentencepiece"]

View File

@@ -1038,6 +1038,34 @@ class TFDeiTPreTrainedModel(metaclass=DummyObject):
requires_backends(self, ["tf"]) requires_backends(self, ["tf"])
class TFEfficientFormerForImageClassification(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFEfficientFormerForImageClassificationWithTeacher(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFEfficientFormerModel(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFEfficientFormerPreTrainedModel(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFAdaptiveEmbedding(metaclass=DummyObject): class TFAdaptiveEmbedding(metaclass=DummyObject):
_backends = ["tf"] _backends = ["tf"]
@@ -1178,34 +1206,6 @@ class TFDPRReader(metaclass=DummyObject):
requires_backends(self, ["tf"]) requires_backends(self, ["tf"])
class TFEfficientFormerForImageClassification(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFEfficientFormerForImageClassificationWithTeacher(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFEfficientFormerModel(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFEfficientFormerPreTrainedModel(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFElectraForMaskedLM(metaclass=DummyObject): class TFElectraForMaskedLM(metaclass=DummyObject):
_backends = ["tf"] _backends = ["tf"]

View File

@@ -121,6 +121,13 @@ class DebertaV2TokenizerFast(metaclass=DummyObject):
requires_backends(self, ["tokenizers"]) requires_backends(self, ["tokenizers"])
class RealmTokenizerFast(metaclass=DummyObject):
_backends = ["tokenizers"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tokenizers"])
class RetriBertTokenizerFast(metaclass=DummyObject): class RetriBertTokenizerFast(metaclass=DummyObject):
_backends = ["tokenizers"] _backends = ["tokenizers"]
@@ -352,13 +359,6 @@ class Qwen2TokenizerFast(metaclass=DummyObject):
requires_backends(self, ["tokenizers"]) requires_backends(self, ["tokenizers"])
class RealmTokenizerFast(metaclass=DummyObject):
_backends = ["tokenizers"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tokenizers"])
class ReformerTokenizerFast(metaclass=DummyObject): class ReformerTokenizerFast(metaclass=DummyObject):
_backends = ["tokenizers"] _backends = ["tokenizers"]

View File

@@ -142,6 +142,27 @@ class DetaImageProcessor(metaclass=DummyObject):
requires_backends(self, ["vision"]) requires_backends(self, ["vision"])
class EfficientFormerImageProcessor(metaclass=DummyObject):
_backends = ["vision"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["vision"])
class TvltImageProcessor(metaclass=DummyObject):
_backends = ["vision"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["vision"])
class ViTHybridImageProcessor(metaclass=DummyObject):
_backends = ["vision"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["vision"])
class DetrFeatureExtractor(metaclass=DummyObject): class DetrFeatureExtractor(metaclass=DummyObject):
_backends = ["vision"] _backends = ["vision"]
@@ -184,13 +205,6 @@ class DPTImageProcessor(metaclass=DummyObject):
requires_backends(self, ["vision"]) requires_backends(self, ["vision"])
class EfficientFormerImageProcessor(metaclass=DummyObject):
_backends = ["vision"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["vision"])
class EfficientNetImageProcessor(metaclass=DummyObject): class EfficientNetImageProcessor(metaclass=DummyObject):
_backends = ["vision"] _backends = ["vision"]
@@ -520,13 +534,6 @@ class Swin2SRImageProcessor(metaclass=DummyObject):
requires_backends(self, ["vision"]) requires_backends(self, ["vision"])
class TvltImageProcessor(metaclass=DummyObject):
_backends = ["vision"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["vision"])
class TvpImageProcessor(metaclass=DummyObject): class TvpImageProcessor(metaclass=DummyObject):
_backends = ["vision"] _backends = ["vision"]
@@ -590,13 +597,6 @@ class ViTImageProcessor(metaclass=DummyObject):
requires_backends(self, ["vision"]) requires_backends(self, ["vision"])
class ViTHybridImageProcessor(metaclass=DummyObject):
_backends = ["vision"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["vision"])
class VitMatteImageProcessor(metaclass=DummyObject): class VitMatteImageProcessor(metaclass=DummyObject):
_backends = ["vision"] _backends = ["vision"]

View File

@@ -1,535 +0,0 @@
# coding=utf-8
# Copyright 2022 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
import pathlib
import unittest
from transformers.testing_utils import require_torch, require_vision, slow
from transformers.utils import is_torch_available, is_vision_available
from ...test_image_processing_common import AnnotationFormatTestMixin, ImageProcessingTestMixin, prepare_image_inputs
if is_torch_available():
import torch
if is_vision_available():
from PIL import Image
from transformers import DetaImageProcessor
class DetaImageProcessingTester(unittest.TestCase):
def __init__(
self,
parent,
batch_size=7,
num_channels=3,
min_resolution=30,
max_resolution=400,
do_resize=True,
size=None,
do_normalize=True,
image_mean=[0.5, 0.5, 0.5],
image_std=[0.5, 0.5, 0.5],
do_rescale=True,
rescale_factor=1 / 255,
do_pad=True,
):
# by setting size["longest_edge"] > max_resolution we're effectively not testing this :p
size = size if size is not None else {"shortest_edge": 18, "longest_edge": 1333}
self.parent = parent
self.batch_size = batch_size
self.num_channels = num_channels
self.min_resolution = min_resolution
self.max_resolution = max_resolution
self.do_resize = do_resize
self.size = size
self.do_normalize = do_normalize
self.image_mean = image_mean
self.image_std = image_std
self.do_rescale = do_rescale
self.rescale_factor = rescale_factor
self.do_pad = do_pad
def prepare_image_processor_dict(self):
return {
"do_resize": self.do_resize,
"size": self.size,
"do_normalize": self.do_normalize,
"image_mean": self.image_mean,
"image_std": self.image_std,
"do_rescale": self.do_rescale,
"rescale_factor": self.rescale_factor,
"do_pad": self.do_pad,
}
def get_expected_values(self, image_inputs, batched=False):
"""
This function computes the expected height and width when providing images to DetaImageProcessor,
assuming do_resize is set to True with a scalar size.
"""
if not batched:
image = image_inputs[0]
if isinstance(image, Image.Image):
w, h = image.size
else:
h, w = image.shape[1], image.shape[2]
if w < h:
expected_height = int(self.size["shortest_edge"] * h / w)
expected_width = self.size["shortest_edge"]
elif w > h:
expected_height = self.size["shortest_edge"]
expected_width = int(self.size["shortest_edge"] * w / h)
else:
expected_height = self.size["shortest_edge"]
expected_width = self.size["shortest_edge"]
else:
expected_values = []
for image in image_inputs:
expected_height, expected_width = self.get_expected_values([image])
expected_values.append((expected_height, expected_width))
expected_height = max(expected_values, key=lambda item: item[0])[0]
expected_width = max(expected_values, key=lambda item: item[1])[1]
return expected_height, expected_width
def expected_output_image_shape(self, images):
height, width = self.get_expected_values(images, batched=True)
return self.num_channels, height, width
def prepare_image_inputs(self, equal_resolution=False, numpify=False, torchify=False):
return prepare_image_inputs(
batch_size=self.batch_size,
num_channels=self.num_channels,
min_resolution=self.min_resolution,
max_resolution=self.max_resolution,
equal_resolution=equal_resolution,
numpify=numpify,
torchify=torchify,
)
@require_torch
@require_vision
class DetaImageProcessingTest(AnnotationFormatTestMixin, ImageProcessingTestMixin, unittest.TestCase):
image_processing_class = DetaImageProcessor if is_vision_available() else None
def setUp(self):
self.image_processor_tester = DetaImageProcessingTester(self)
@property
def image_processor_dict(self):
return self.image_processor_tester.prepare_image_processor_dict()
def test_image_processor_properties(self):
image_processing = self.image_processing_class(**self.image_processor_dict)
self.assertTrue(hasattr(image_processing, "image_mean"))
self.assertTrue(hasattr(image_processing, "image_std"))
self.assertTrue(hasattr(image_processing, "do_normalize"))
self.assertTrue(hasattr(image_processing, "do_resize"))
self.assertTrue(hasattr(image_processing, "do_rescale"))
self.assertTrue(hasattr(image_processing, "do_pad"))
self.assertTrue(hasattr(image_processing, "size"))
def test_image_processor_from_dict_with_kwargs(self):
image_processor = self.image_processing_class.from_dict(self.image_processor_dict)
self.assertEqual(image_processor.size, {"shortest_edge": 18, "longest_edge": 1333})
self.assertEqual(image_processor.do_pad, True)
@slow
def test_call_pytorch_with_coco_detection_annotations(self):
# prepare image and target
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
with open("./tests/fixtures/tests_samples/COCO/coco_annotations.txt", "r") as f:
target = json.loads(f.read())
target = {"image_id": 39769, "annotations": target}
# encode them
image_processing = DetaImageProcessor()
encoding = image_processing(images=image, annotations=target, return_tensors="pt")
# verify pixel values
expected_shape = torch.Size([1, 3, 800, 1066])
self.assertEqual(encoding["pixel_values"].shape, expected_shape)
expected_slice = torch.tensor([0.2796, 0.3138, 0.3481])
self.assertTrue(torch.allclose(encoding["pixel_values"][0, 0, 0, :3], expected_slice, atol=1e-4))
# verify area
expected_area = torch.tensor([5887.9600, 11250.2061, 489353.8438, 837122.7500, 147967.5156, 165732.3438])
self.assertTrue(torch.allclose(encoding["labels"][0]["area"], expected_area))
# verify boxes
expected_boxes_shape = torch.Size([6, 4])
self.assertEqual(encoding["labels"][0]["boxes"].shape, expected_boxes_shape)
expected_boxes_slice = torch.tensor([0.5503, 0.2765, 0.0604, 0.2215])
self.assertTrue(torch.allclose(encoding["labels"][0]["boxes"][0], expected_boxes_slice, atol=1e-3))
# verify image_id
expected_image_id = torch.tensor([39769])
self.assertTrue(torch.allclose(encoding["labels"][0]["image_id"], expected_image_id))
# verify is_crowd
expected_is_crowd = torch.tensor([0, 0, 0, 0, 0, 0])
self.assertTrue(torch.allclose(encoding["labels"][0]["iscrowd"], expected_is_crowd))
# verify class_labels
expected_class_labels = torch.tensor([75, 75, 63, 65, 17, 17])
self.assertTrue(torch.allclose(encoding["labels"][0]["class_labels"], expected_class_labels))
# verify orig_size
expected_orig_size = torch.tensor([480, 640])
self.assertTrue(torch.allclose(encoding["labels"][0]["orig_size"], expected_orig_size))
# verify size
expected_size = torch.tensor([800, 1066])
self.assertTrue(torch.allclose(encoding["labels"][0]["size"], expected_size))
@slow
def test_call_pytorch_with_coco_panoptic_annotations(self):
# prepare image, target and masks_path
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
with open("./tests/fixtures/tests_samples/COCO/coco_panoptic_annotations.txt", "r") as f:
target = json.loads(f.read())
target = {"file_name": "000000039769.png", "image_id": 39769, "segments_info": target}
masks_path = pathlib.Path("./tests/fixtures/tests_samples/COCO/coco_panoptic")
# encode them
image_processing = DetaImageProcessor(format="coco_panoptic")
encoding = image_processing(images=image, annotations=target, masks_path=masks_path, return_tensors="pt")
# verify pixel values
expected_shape = torch.Size([1, 3, 800, 1066])
self.assertEqual(encoding["pixel_values"].shape, expected_shape)
expected_slice = torch.tensor([0.2796, 0.3138, 0.3481])
self.assertTrue(torch.allclose(encoding["pixel_values"][0, 0, 0, :3], expected_slice, atol=1e-4))
# verify area
expected_area = torch.tensor([147979.6875, 165527.0469, 484638.5938, 11292.9375, 5879.6562, 7634.1147])
self.assertTrue(torch.allclose(encoding["labels"][0]["area"], expected_area))
# verify boxes
expected_boxes_shape = torch.Size([6, 4])
self.assertEqual(encoding["labels"][0]["boxes"].shape, expected_boxes_shape)
expected_boxes_slice = torch.tensor([0.2625, 0.5437, 0.4688, 0.8625])
self.assertTrue(torch.allclose(encoding["labels"][0]["boxes"][0], expected_boxes_slice, atol=1e-3))
# verify image_id
expected_image_id = torch.tensor([39769])
self.assertTrue(torch.allclose(encoding["labels"][0]["image_id"], expected_image_id))
# verify is_crowd
expected_is_crowd = torch.tensor([0, 0, 0, 0, 0, 0])
self.assertTrue(torch.allclose(encoding["labels"][0]["iscrowd"], expected_is_crowd))
# verify class_labels
expected_class_labels = torch.tensor([17, 17, 63, 75, 75, 93])
self.assertTrue(torch.allclose(encoding["labels"][0]["class_labels"], expected_class_labels))
# verify masks
expected_masks_sum = 822873
self.assertEqual(encoding["labels"][0]["masks"].sum().item(), expected_masks_sum)
# verify orig_size
expected_orig_size = torch.tensor([480, 640])
self.assertTrue(torch.allclose(encoding["labels"][0]["orig_size"], expected_orig_size))
# verify size
expected_size = torch.tensor([800, 1066])
self.assertTrue(torch.allclose(encoding["labels"][0]["size"], expected_size))
@slow
# Copied from tests.models.detr.test_image_processing_detr.DetrImageProcessingTest.test_batched_coco_detection_annotations with Detr->Deta
def test_batched_coco_detection_annotations(self):
image_0 = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
image_1 = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png").resize((800, 800))
with open("./tests/fixtures/tests_samples/COCO/coco_annotations.txt", "r") as f:
target = json.loads(f.read())
annotations_0 = {"image_id": 39769, "annotations": target}
annotations_1 = {"image_id": 39769, "annotations": target}
# Adjust the bounding boxes for the resized image
w_0, h_0 = image_0.size
w_1, h_1 = image_1.size
for i in range(len(annotations_1["annotations"])):
coords = annotations_1["annotations"][i]["bbox"]
new_bbox = [
coords[0] * w_1 / w_0,
coords[1] * h_1 / h_0,
coords[2] * w_1 / w_0,
coords[3] * h_1 / h_0,
]
annotations_1["annotations"][i]["bbox"] = new_bbox
images = [image_0, image_1]
annotations = [annotations_0, annotations_1]
image_processing = DetaImageProcessor()
encoding = image_processing(
images=images,
annotations=annotations,
return_segmentation_masks=True,
return_tensors="pt", # do_convert_annotations=True
)
# Check the pixel values have been padded
postprocessed_height, postprocessed_width = 800, 1066
expected_shape = torch.Size([2, 3, postprocessed_height, postprocessed_width])
self.assertEqual(encoding["pixel_values"].shape, expected_shape)
# Check the bounding boxes have been adjusted for padded images
self.assertEqual(encoding["labels"][0]["boxes"].shape, torch.Size([6, 4]))
self.assertEqual(encoding["labels"][1]["boxes"].shape, torch.Size([6, 4]))
expected_boxes_0 = torch.tensor(
[
[0.6879, 0.4609, 0.0755, 0.3691],
[0.2118, 0.3359, 0.2601, 0.1566],
[0.5011, 0.5000, 0.9979, 1.0000],
[0.5010, 0.5020, 0.9979, 0.9959],
[0.3284, 0.5944, 0.5884, 0.8112],
[0.8394, 0.5445, 0.3213, 0.9110],
]
)
expected_boxes_1 = torch.tensor(
[
[0.4130, 0.2765, 0.0453, 0.2215],
[0.1272, 0.2016, 0.1561, 0.0940],
[0.3757, 0.4933, 0.7488, 0.9865],
[0.3759, 0.5002, 0.7492, 0.9955],
[0.1971, 0.5456, 0.3532, 0.8646],
[0.5790, 0.4115, 0.3430, 0.7161],
]
)
self.assertTrue(torch.allclose(encoding["labels"][0]["boxes"], expected_boxes_0, rtol=1e-3))
self.assertTrue(torch.allclose(encoding["labels"][1]["boxes"], expected_boxes_1, rtol=1e-3))
# Check the masks have also been padded
self.assertEqual(encoding["labels"][0]["masks"].shape, torch.Size([6, 800, 1066]))
self.assertEqual(encoding["labels"][1]["masks"].shape, torch.Size([6, 800, 1066]))
# Check if do_convert_annotations=False, then the annotations are not converted to centre_x, centre_y, width, height
# format and not in the range [0, 1]
encoding = image_processing(
images=images,
annotations=annotations,
return_segmentation_masks=True,
do_convert_annotations=False,
return_tensors="pt",
)
self.assertEqual(encoding["labels"][0]["boxes"].shape, torch.Size([6, 4]))
self.assertEqual(encoding["labels"][1]["boxes"].shape, torch.Size([6, 4]))
# Convert to absolute coordinates
unnormalized_boxes_0 = torch.vstack(
[
expected_boxes_0[:, 0] * postprocessed_width,
expected_boxes_0[:, 1] * postprocessed_height,
expected_boxes_0[:, 2] * postprocessed_width,
expected_boxes_0[:, 3] * postprocessed_height,
]
).T
unnormalized_boxes_1 = torch.vstack(
[
expected_boxes_1[:, 0] * postprocessed_width,
expected_boxes_1[:, 1] * postprocessed_height,
expected_boxes_1[:, 2] * postprocessed_width,
expected_boxes_1[:, 3] * postprocessed_height,
]
).T
# Convert from centre_x, centre_y, width, height to x_min, y_min, x_max, y_max
expected_boxes_0 = torch.vstack(
[
unnormalized_boxes_0[:, 0] - unnormalized_boxes_0[:, 2] / 2,
unnormalized_boxes_0[:, 1] - unnormalized_boxes_0[:, 3] / 2,
unnormalized_boxes_0[:, 0] + unnormalized_boxes_0[:, 2] / 2,
unnormalized_boxes_0[:, 1] + unnormalized_boxes_0[:, 3] / 2,
]
).T
expected_boxes_1 = torch.vstack(
[
unnormalized_boxes_1[:, 0] - unnormalized_boxes_1[:, 2] / 2,
unnormalized_boxes_1[:, 1] - unnormalized_boxes_1[:, 3] / 2,
unnormalized_boxes_1[:, 0] + unnormalized_boxes_1[:, 2] / 2,
unnormalized_boxes_1[:, 1] + unnormalized_boxes_1[:, 3] / 2,
]
).T
self.assertTrue(torch.allclose(encoding["labels"][0]["boxes"], expected_boxes_0, rtol=1))
self.assertTrue(torch.allclose(encoding["labels"][1]["boxes"], expected_boxes_1, rtol=1))
# Copied from tests.models.detr.test_image_processing_detr.DetrImageProcessingTest.test_batched_coco_panoptic_annotations with Detr->Deta
def test_batched_coco_panoptic_annotations(self):
# prepare image, target and masks_path
image_0 = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
image_1 = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png").resize((800, 800))
with open("./tests/fixtures/tests_samples/COCO/coco_panoptic_annotations.txt", "r") as f:
target = json.loads(f.read())
annotation_0 = {"file_name": "000000039769.png", "image_id": 39769, "segments_info": target}
annotation_1 = {"file_name": "000000039769.png", "image_id": 39769, "segments_info": target}
w_0, h_0 = image_0.size
w_1, h_1 = image_1.size
for i in range(len(annotation_1["segments_info"])):
coords = annotation_1["segments_info"][i]["bbox"]
new_bbox = [
coords[0] * w_1 / w_0,
coords[1] * h_1 / h_0,
coords[2] * w_1 / w_0,
coords[3] * h_1 / h_0,
]
annotation_1["segments_info"][i]["bbox"] = new_bbox
masks_path = pathlib.Path("./tests/fixtures/tests_samples/COCO/coco_panoptic")
images = [image_0, image_1]
annotations = [annotation_0, annotation_1]
# encode them
image_processing = DetaImageProcessor(format="coco_panoptic")
encoding = image_processing(
images=images,
annotations=annotations,
masks_path=masks_path,
return_tensors="pt",
return_segmentation_masks=True,
)
# Check the pixel values have been padded
postprocessed_height, postprocessed_width = 800, 1066
expected_shape = torch.Size([2, 3, postprocessed_height, postprocessed_width])
self.assertEqual(encoding["pixel_values"].shape, expected_shape)
# Check the bounding boxes have been adjusted for padded images
self.assertEqual(encoding["labels"][0]["boxes"].shape, torch.Size([6, 4]))
self.assertEqual(encoding["labels"][1]["boxes"].shape, torch.Size([6, 4]))
expected_boxes_0 = torch.tensor(
[
[0.2625, 0.5437, 0.4688, 0.8625],
[0.7719, 0.4104, 0.4531, 0.7125],
[0.5000, 0.4927, 0.9969, 0.9854],
[0.1688, 0.2000, 0.2063, 0.0917],
[0.5492, 0.2760, 0.0578, 0.2187],
[0.4992, 0.4990, 0.9984, 0.9979],
]
)
expected_boxes_1 = torch.tensor(
[
[0.1576, 0.3262, 0.2814, 0.5175],
[0.4634, 0.2463, 0.2720, 0.4275],
[0.3002, 0.2956, 0.5985, 0.5913],
[0.1013, 0.1200, 0.1238, 0.0550],
[0.3297, 0.1656, 0.0347, 0.1312],
[0.2997, 0.2994, 0.5994, 0.5987],
]
)
self.assertTrue(torch.allclose(encoding["labels"][0]["boxes"], expected_boxes_0, rtol=1e-3))
self.assertTrue(torch.allclose(encoding["labels"][1]["boxes"], expected_boxes_1, rtol=1e-3))
# Check the masks have also been padded
self.assertEqual(encoding["labels"][0]["masks"].shape, torch.Size([6, 800, 1066]))
self.assertEqual(encoding["labels"][1]["masks"].shape, torch.Size([6, 800, 1066]))
# Check if do_convert_annotations=False, then the annotations are not converted to centre_x, centre_y, width, height
# format and not in the range [0, 1]
encoding = image_processing(
images=images,
annotations=annotations,
masks_path=masks_path,
return_segmentation_masks=True,
do_convert_annotations=False,
return_tensors="pt",
)
self.assertEqual(encoding["labels"][0]["boxes"].shape, torch.Size([6, 4]))
self.assertEqual(encoding["labels"][1]["boxes"].shape, torch.Size([6, 4]))
# Convert to absolute coordinates
unnormalized_boxes_0 = torch.vstack(
[
expected_boxes_0[:, 0] * postprocessed_width,
expected_boxes_0[:, 1] * postprocessed_height,
expected_boxes_0[:, 2] * postprocessed_width,
expected_boxes_0[:, 3] * postprocessed_height,
]
).T
unnormalized_boxes_1 = torch.vstack(
[
expected_boxes_1[:, 0] * postprocessed_width,
expected_boxes_1[:, 1] * postprocessed_height,
expected_boxes_1[:, 2] * postprocessed_width,
expected_boxes_1[:, 3] * postprocessed_height,
]
).T
# Convert from centre_x, centre_y, width, height to x_min, y_min, x_max, y_max
expected_boxes_0 = torch.vstack(
[
unnormalized_boxes_0[:, 0] - unnormalized_boxes_0[:, 2] / 2,
unnormalized_boxes_0[:, 1] - unnormalized_boxes_0[:, 3] / 2,
unnormalized_boxes_0[:, 0] + unnormalized_boxes_0[:, 2] / 2,
unnormalized_boxes_0[:, 1] + unnormalized_boxes_0[:, 3] / 2,
]
).T
expected_boxes_1 = torch.vstack(
[
unnormalized_boxes_1[:, 0] - unnormalized_boxes_1[:, 2] / 2,
unnormalized_boxes_1[:, 1] - unnormalized_boxes_1[:, 3] / 2,
unnormalized_boxes_1[:, 0] + unnormalized_boxes_1[:, 2] / 2,
unnormalized_boxes_1[:, 1] + unnormalized_boxes_1[:, 3] / 2,
]
).T
self.assertTrue(torch.allclose(encoding["labels"][0]["boxes"], expected_boxes_0, rtol=1))
self.assertTrue(torch.allclose(encoding["labels"][1]["boxes"], expected_boxes_1, rtol=1))
# Copied from tests.models.detr.test_image_processing_detr.DetrImageProcessingTest.test_max_width_max_height_resizing_and_pad_strategy with Detr->Deta
def test_max_width_max_height_resizing_and_pad_strategy(self):
image_1 = torch.ones([200, 100, 3], dtype=torch.uint8)
# do_pad=False, max_height=100, max_width=100, image=200x100 -> 100x50
image_processor = DetaImageProcessor(
size={"max_height": 100, "max_width": 100},
do_pad=False,
)
inputs = image_processor(images=[image_1], return_tensors="pt")
self.assertEqual(inputs["pixel_values"].shape, torch.Size([1, 3, 100, 50]))
# do_pad=False, max_height=300, max_width=100, image=200x100 -> 200x100
image_processor = DetaImageProcessor(
size={"max_height": 300, "max_width": 100},
do_pad=False,
)
inputs = image_processor(images=[image_1], return_tensors="pt")
# do_pad=True, max_height=100, max_width=100, image=200x100 -> 100x100
image_processor = DetaImageProcessor(
size={"max_height": 100, "max_width": 100}, do_pad=True, pad_size={"height": 100, "width": 100}
)
inputs = image_processor(images=[image_1], return_tensors="pt")
self.assertEqual(inputs["pixel_values"].shape, torch.Size([1, 3, 100, 100]))
# do_pad=True, max_height=300, max_width=100, image=200x100 -> 300x100
image_processor = DetaImageProcessor(
size={"max_height": 300, "max_width": 100},
do_pad=True,
pad_size={"height": 301, "width": 101},
)
inputs = image_processor(images=[image_1], return_tensors="pt")
self.assertEqual(inputs["pixel_values"].shape, torch.Size([1, 3, 301, 101]))
### Check for batch
image_2 = torch.ones([100, 150, 3], dtype=torch.uint8)
# do_pad=True, max_height=150, max_width=100, images=[200x100, 100x150] -> 150x100
image_processor = DetaImageProcessor(
size={"max_height": 150, "max_width": 100},
do_pad=True,
pad_size={"height": 150, "width": 100},
)
inputs = image_processor(images=[image_1, image_2], return_tensors="pt")
self.assertEqual(inputs["pixel_values"].shape, torch.Size([2, 3, 150, 100]))

View File

@@ -1,671 +0,0 @@
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Testing suite for the PyTorch DETA model."""
import collections
import inspect
import math
import re
import unittest
from transformers import DetaConfig, ResNetConfig, is_torch_available, is_torchvision_available, is_vision_available
from transformers.file_utils import cached_property
from transformers.testing_utils import require_torchvision, require_vision, slow, torch_device
from ...generation.test_utils import GenerationTesterMixin
from ...test_configuration_common import ConfigTester
from ...test_modeling_common import ModelTesterMixin, _config_zero_init, floats_tensor
from ...test_pipeline_mixin import PipelineTesterMixin
if is_torch_available():
import torch
from transformers.pytorch_utils import id_tensor_storage
if is_torchvision_available():
from transformers import DetaForObjectDetection, DetaModel
if is_vision_available():
from PIL import Image
from transformers import AutoImageProcessor
class DetaModelTester:
def __init__(
self,
parent,
batch_size=8,
is_training=True,
use_labels=True,
hidden_size=32,
num_hidden_layers=2,
num_attention_heads=8,
intermediate_size=4,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
num_queries=12,
two_stage_num_proposals=12,
num_channels=3,
image_size=224,
n_targets=8,
num_labels=91,
num_feature_levels=4,
encoder_n_points=2,
decoder_n_points=6,
two_stage=True,
assign_first_stage=True,
assign_second_stage=True,
):
self.parent = parent
self.batch_size = batch_size
self.is_training = is_training
self.use_labels = use_labels
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.intermediate_size = intermediate_size
self.hidden_act = hidden_act
self.hidden_dropout_prob = hidden_dropout_prob
self.attention_probs_dropout_prob = attention_probs_dropout_prob
self.num_queries = num_queries
self.two_stage_num_proposals = two_stage_num_proposals
self.num_channels = num_channels
self.image_size = image_size
self.n_targets = n_targets
self.num_labels = num_labels
self.num_feature_levels = num_feature_levels
self.encoder_n_points = encoder_n_points
self.decoder_n_points = decoder_n_points
self.two_stage = two_stage
self.assign_first_stage = assign_first_stage
self.assign_second_stage = assign_second_stage
# we also set the expected seq length for both encoder and decoder
self.encoder_seq_length = (
math.ceil(self.image_size / 8) ** 2
+ math.ceil(self.image_size / 16) ** 2
+ math.ceil(self.image_size / 32) ** 2
+ math.ceil(self.image_size / 64) ** 2
)
self.decoder_seq_length = self.num_queries
def prepare_config_and_inputs(self, model_class_name):
pixel_values = floats_tensor([self.batch_size, self.num_channels, self.image_size, self.image_size])
pixel_mask = torch.ones([self.batch_size, self.image_size, self.image_size], device=torch_device)
labels = None
if self.use_labels:
# labels is a list of Dict (each Dict being the labels for a given example in the batch)
labels = []
for i in range(self.batch_size):
target = {}
target["class_labels"] = torch.randint(
high=self.num_labels, size=(self.n_targets,), device=torch_device
)
target["boxes"] = torch.rand(self.n_targets, 4, device=torch_device)
target["masks"] = torch.rand(self.n_targets, self.image_size, self.image_size, device=torch_device)
labels.append(target)
config = self.get_config(model_class_name)
return config, pixel_values, pixel_mask, labels
def get_config(self, model_class_name):
resnet_config = ResNetConfig(
num_channels=3,
embeddings_size=10,
hidden_sizes=[10, 20, 30, 40],
depths=[1, 1, 2, 1],
hidden_act="relu",
num_labels=3,
out_features=["stage2", "stage3", "stage4"],
out_indices=[2, 3, 4],
)
two_stage = model_class_name == "DetaForObjectDetection"
assign_first_stage = model_class_name == "DetaForObjectDetection"
assign_second_stage = model_class_name == "DetaForObjectDetection"
return DetaConfig(
d_model=self.hidden_size,
encoder_layers=self.num_hidden_layers,
decoder_layers=self.num_hidden_layers,
encoder_attention_heads=self.num_attention_heads,
decoder_attention_heads=self.num_attention_heads,
encoder_ffn_dim=self.intermediate_size,
decoder_ffn_dim=self.intermediate_size,
dropout=self.hidden_dropout_prob,
attention_dropout=self.attention_probs_dropout_prob,
num_queries=self.num_queries,
two_stage_num_proposals=self.two_stage_num_proposals,
num_labels=self.num_labels,
num_feature_levels=self.num_feature_levels,
encoder_n_points=self.encoder_n_points,
decoder_n_points=self.decoder_n_points,
two_stage=two_stage,
assign_first_stage=assign_first_stage,
assign_second_stage=assign_second_stage,
backbone_config=resnet_config,
backbone=None,
)
def prepare_config_and_inputs_for_common(self, model_class_name="DetaModel"):
config, pixel_values, pixel_mask, labels = self.prepare_config_and_inputs(model_class_name)
inputs_dict = {"pixel_values": pixel_values, "pixel_mask": pixel_mask}
return config, inputs_dict
def create_and_check_deta_model(self, config, pixel_values, pixel_mask, labels):
model = DetaModel(config=config)
model.to(torch_device)
model.eval()
result = model(pixel_values=pixel_values, pixel_mask=pixel_mask)
result = model(pixel_values)
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.num_queries, self.hidden_size))
def create_and_check_deta_freeze_backbone(self, config, pixel_values, pixel_mask, labels):
model = DetaModel(config=config)
model.to(torch_device)
model.eval()
model.freeze_backbone()
for _, param in model.backbone.model.named_parameters():
self.parent.assertEqual(False, param.requires_grad)
def create_and_check_deta_unfreeze_backbone(self, config, pixel_values, pixel_mask, labels):
model = DetaModel(config=config)
model.to(torch_device)
model.eval()
model.unfreeze_backbone()
for _, param in model.backbone.model.named_parameters():
self.parent.assertEqual(True, param.requires_grad)
def create_and_check_deta_object_detection_head_model(self, config, pixel_values, pixel_mask, labels):
model = DetaForObjectDetection(config=config)
model.to(torch_device)
model.eval()
result = model(pixel_values=pixel_values, pixel_mask=pixel_mask)
result = model(pixel_values)
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.two_stage_num_proposals, self.num_labels))
self.parent.assertEqual(result.pred_boxes.shape, (self.batch_size, self.two_stage_num_proposals, 4))
result = model(pixel_values=pixel_values, pixel_mask=pixel_mask, labels=labels)
self.parent.assertEqual(result.loss.shape, ())
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.two_stage_num_proposals, self.num_labels))
self.parent.assertEqual(result.pred_boxes.shape, (self.batch_size, self.two_stage_num_proposals, 4))
@require_torchvision
class DetaModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin, unittest.TestCase):
all_model_classes = (DetaModel, DetaForObjectDetection) if is_torchvision_available() else ()
pipeline_model_mapping = (
{"image-feature-extraction": DetaModel, "object-detection": DetaForObjectDetection}
if is_torchvision_available()
else {}
)
is_encoder_decoder = True
test_torchscript = False
test_pruning = False
test_head_masking = False
test_missing_keys = False
# TODO: Fix the failed tests when this model gets more usage
def is_pipeline_test_to_skip(
self, pipeline_test_casse_name, config_class, model_architecture, tokenizer_name, processor_name
):
if pipeline_test_casse_name == "ObjectDetectionPipelineTests":
return True
return False
@unittest.skip("Skip for now. PR #22437 causes some loading issue. See (not merged) #22656 for some discussions.")
def test_can_use_safetensors(self):
super().test_can_use_safetensors()
# special case for head models
def _prepare_for_class(self, inputs_dict, model_class, return_labels=False):
inputs_dict = super()._prepare_for_class(inputs_dict, model_class, return_labels=return_labels)
if return_labels:
if model_class.__name__ == "DetaForObjectDetection":
labels = []
for i in range(self.model_tester.batch_size):
target = {}
target["class_labels"] = torch.ones(
size=(self.model_tester.n_targets,), device=torch_device, dtype=torch.long
)
target["boxes"] = torch.ones(
self.model_tester.n_targets, 4, device=torch_device, dtype=torch.float
)
target["masks"] = torch.ones(
self.model_tester.n_targets,
self.model_tester.image_size,
self.model_tester.image_size,
device=torch_device,
dtype=torch.float,
)
labels.append(target)
inputs_dict["labels"] = labels
return inputs_dict
def setUp(self):
self.model_tester = DetaModelTester(self)
self.config_tester = ConfigTester(self, config_class=DetaConfig, has_text_modality=False)
def test_config(self):
# we don't test common_properties and arguments_init as these don't apply for DETA
self.config_tester.create_and_test_config_to_json_string()
self.config_tester.create_and_test_config_to_json_file()
self.config_tester.create_and_test_config_from_and_save_pretrained()
self.config_tester.create_and_test_config_with_num_labels()
self.config_tester.check_config_can_be_init_without_params()
def test_deta_model(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs(model_class_name="DetaModel")
self.model_tester.create_and_check_deta_model(*config_and_inputs)
def test_deta_freeze_backbone(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs(model_class_name="DetaModel")
self.model_tester.create_and_check_deta_freeze_backbone(*config_and_inputs)
def test_deta_unfreeze_backbone(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs(model_class_name="DetaModel")
self.model_tester.create_and_check_deta_unfreeze_backbone(*config_and_inputs)
def test_deta_object_detection_head_model(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs(model_class_name="DetaForObjectDetection")
self.model_tester.create_and_check_deta_object_detection_head_model(*config_and_inputs)
@unittest.skip(reason="DETA does not use inputs_embeds")
def test_inputs_embeds(self):
pass
@unittest.skip(reason="DETA does not use inputs_embeds")
def test_inputs_embeds_matches_input_ids(self):
pass
@unittest.skip(reason="DETA does not have a get_input_embeddings method")
def test_model_common_attributes(self):
pass
@unittest.skip(reason="DETA is not a generative model")
def test_generate_without_input_ids(self):
pass
@unittest.skip(reason="DETA does not use token embeddings")
def test_resize_tokens_embeddings(self):
pass
@unittest.skip(reason="Feed forward chunking is not implemented")
def test_feed_forward_chunking(self):
pass
def test_attention_outputs(self):
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
config.return_dict = True
for model_class in self.all_model_classes:
inputs_dict["output_attentions"] = True
inputs_dict["output_hidden_states"] = False
config.return_dict = True
model = model_class(config)
model.to(torch_device)
model.eval()
with torch.no_grad():
outputs = model(**self._prepare_for_class(inputs_dict, model_class))
attentions = outputs.encoder_attentions
self.assertEqual(len(attentions), self.model_tester.num_hidden_layers)
# check that output_attentions also work using config
del inputs_dict["output_attentions"]
config.output_attentions = True
model = model_class(config)
model.to(torch_device)
model.eval()
with torch.no_grad():
outputs = model(**self._prepare_for_class(inputs_dict, model_class))
attentions = outputs.encoder_attentions
self.assertEqual(len(attentions), self.model_tester.num_hidden_layers)
self.assertListEqual(
list(attentions[0].shape[-3:]),
[
self.model_tester.num_attention_heads,
self.model_tester.num_feature_levels,
self.model_tester.encoder_n_points,
],
)
out_len = len(outputs)
correct_outlen = 8
# loss is at first position
if "labels" in inputs_dict:
correct_outlen += 1 # loss is added to beginning
# Object Detection model returns pred_logits and pred_boxes
if model_class.__name__ == "DetaForObjectDetection":
correct_outlen += 2
self.assertEqual(out_len, correct_outlen)
# decoder attentions
decoder_attentions = outputs.decoder_attentions
self.assertIsInstance(decoder_attentions, (list, tuple))
self.assertEqual(len(decoder_attentions), self.model_tester.num_hidden_layers)
self.assertListEqual(
list(decoder_attentions[0].shape[-3:]),
[self.model_tester.num_attention_heads, self.model_tester.num_queries, self.model_tester.num_queries],
)
# cross attentions
cross_attentions = outputs.cross_attentions
self.assertIsInstance(cross_attentions, (list, tuple))
self.assertEqual(len(cross_attentions), self.model_tester.num_hidden_layers)
self.assertListEqual(
list(cross_attentions[0].shape[-3:]),
[
self.model_tester.num_attention_heads,
self.model_tester.num_feature_levels,
self.model_tester.decoder_n_points,
],
)
# Check attention is always last and order is fine
inputs_dict["output_attentions"] = True
inputs_dict["output_hidden_states"] = True
model = model_class(config)
model.to(torch_device)
model.eval()
with torch.no_grad():
outputs = model(**self._prepare_for_class(inputs_dict, model_class))
if hasattr(self.model_tester, "num_hidden_states_types"):
added_hidden_states = self.model_tester.num_hidden_states_types
elif self.is_encoder_decoder:
added_hidden_states = 2
else:
added_hidden_states = 1
self.assertEqual(out_len + added_hidden_states, len(outputs))
self_attentions = outputs.encoder_attentions
self.assertEqual(len(self_attentions), self.model_tester.num_hidden_layers)
self.assertListEqual(
list(self_attentions[0].shape[-3:]),
[
self.model_tester.num_attention_heads,
self.model_tester.num_feature_levels,
self.model_tester.encoder_n_points,
],
)
# removed retain_grad and grad on decoder_hidden_states, as queries don't require grad
def test_retain_grad_hidden_states_attentions(self):
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
config.output_hidden_states = True
config.output_attentions = True
# no need to test all models as different heads yield the same functionality
model_class = self.all_model_classes[0]
model = model_class(config)
model.to(torch_device)
inputs = self._prepare_for_class(inputs_dict, model_class)
outputs = model(**inputs)
# we take the second output since last_hidden_state is the second item
output = outputs[1]
encoder_hidden_states = outputs.encoder_hidden_states[0]
encoder_attentions = outputs.encoder_attentions[0]
encoder_hidden_states.retain_grad()
encoder_attentions.retain_grad()
decoder_attentions = outputs.decoder_attentions[0]
decoder_attentions.retain_grad()
cross_attentions = outputs.cross_attentions[0]
cross_attentions.retain_grad()
output.flatten()[0].backward(retain_graph=True)
self.assertIsNotNone(encoder_hidden_states.grad)
self.assertIsNotNone(encoder_attentions.grad)
self.assertIsNotNone(decoder_attentions.grad)
self.assertIsNotNone(cross_attentions.grad)
def test_forward_auxiliary_loss(self):
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
config.auxiliary_loss = True
# only test for object detection and segmentation model
for model_class in self.all_model_classes[1:]:
model = model_class(config)
model.to(torch_device)
inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True)
outputs = model(**inputs)
self.assertIsNotNone(outputs.auxiliary_outputs)
self.assertEqual(len(outputs.auxiliary_outputs), self.model_tester.num_hidden_layers - 1)
def test_forward_signature(self):
config, _ = self.model_tester.prepare_config_and_inputs_for_common()
for model_class in self.all_model_classes:
model = model_class(config)
signature = inspect.signature(model.forward)
# signature.parameters is an OrderedDict => so arg_names order is deterministic
arg_names = [*signature.parameters.keys()]
if model.config.is_encoder_decoder:
expected_arg_names = ["pixel_values", "pixel_mask"]
expected_arg_names.extend(
["head_mask", "decoder_head_mask", "encoder_outputs"]
if "head_mask" and "decoder_head_mask" in arg_names
else []
)
self.assertListEqual(arg_names[: len(expected_arg_names)], expected_arg_names)
else:
expected_arg_names = ["pixel_values", "pixel_mask"]
self.assertListEqual(arg_names[:1], expected_arg_names)
@unittest.skip(reason="Model doesn't use tied weights")
def test_tied_model_weights_key_ignore(self):
pass
def test_initialization(self):
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
configs_no_init = _config_zero_init(config)
for model_class in self.all_model_classes:
model = model_class(config=configs_no_init)
# Skip the check for the backbone
for name, module in model.named_modules():
if module.__class__.__name__ == "DetaBackboneWithPositionalEncodings":
backbone_params = [f"{name}.{key}" for key in module.state_dict().keys()]
break
for name, param in model.named_parameters():
if param.requires_grad:
if (
"level_embed" in name
or "sampling_offsets.bias" in name
or "value_proj" in name
or "output_proj" in name
or "reference_points" in name
or name in backbone_params
):
continue
self.assertIn(
((param.data.mean() * 1e9).round() / 1e9).item(),
[0.0, 1.0],
msg=f"Parameter {name} of model {model_class} seems not properly initialized",
)
@unittest.skip("No support for low_cpu_mem_usage=True.")
def test_save_load_low_cpu_mem_usage(self):
pass
@unittest.skip("No support for low_cpu_mem_usage=True.")
def test_save_load_low_cpu_mem_usage_checkpoints(self):
pass
@unittest.skip("No support for low_cpu_mem_usage=True.")
def test_save_load_low_cpu_mem_usage_no_safetensors(self):
pass
# Inspired by tests.test_modeling_common.ModelTesterMixin.test_tied_weights_keys
def test_tied_weights_keys(self):
for model_class in self.all_model_classes:
# We need to pass model class name to correctly initialize the config.
# If we don't pass it, the config for `DetaForObjectDetection`` will be initialized
# with `two_stage=False` and the test will fail because for that case `class_embed`
# weights are not tied.
config, _ = self.model_tester.prepare_config_and_inputs_for_common(model_class_name=model_class.__name__)
config.tie_word_embeddings = True
model_tied = model_class(config)
ptrs = collections.defaultdict(list)
for name, tensor in model_tied.state_dict().items():
ptrs[id_tensor_storage(tensor)].append(name)
# These are all the pointers of shared tensors.
tied_params = [names for _, names in ptrs.items() if len(names) > 1]
tied_weight_keys = model_tied._tied_weights_keys if model_tied._tied_weights_keys is not None else []
# Detect we get a hit for each key
for key in tied_weight_keys:
is_tied_key = any(re.search(key, p) for group in tied_params for p in group)
self.assertTrue(is_tied_key, f"{key} is not a tied weight key for {model_class}.")
# Removed tied weights found from tied params -> there should only be one left after
for key in tied_weight_keys:
for i in range(len(tied_params)):
tied_params[i] = [p for p in tied_params[i] if re.search(key, p) is None]
tied_params = [group for group in tied_params if len(group) > 1]
self.assertListEqual(
tied_params,
[],
f"Missing `_tied_weights_keys` for {model_class}: add all of {tied_params} except one.",
)
TOLERANCE = 1e-4
# We will verify our results on an image of cute cats
def prepare_img():
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
return image
@require_torchvision
@require_vision
@slow
class DetaModelIntegrationTests(unittest.TestCase):
@cached_property
def default_image_processor(self):
return AutoImageProcessor.from_pretrained("jozhang97/deta-resnet-50") if is_vision_available() else None
def test_inference_object_detection_head(self):
model = DetaForObjectDetection.from_pretrained("jozhang97/deta-resnet-50").to(torch_device)
image_processor = self.default_image_processor
image = prepare_img()
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
with torch.no_grad():
outputs = model(**inputs)
expected_shape_logits = torch.Size((1, 300, model.config.num_labels))
self.assertEqual(outputs.logits.shape, expected_shape_logits)
expected_logits = torch.tensor(
[[-7.3978, -2.5406, -4.1668], [-8.2684, -3.9933, -3.8096], [-7.0515, -3.7973, -5.8516]]
).to(torch_device)
expected_boxes = torch.tensor(
[[0.5043, 0.4973, 0.9998], [0.2542, 0.5489, 0.4748], [0.5490, 0.2765, 0.0570]]
).to(torch_device)
self.assertTrue(torch.allclose(outputs.logits[0, :3, :3], expected_logits, atol=1e-4))
expected_shape_boxes = torch.Size((1, 300, 4))
self.assertEqual(outputs.pred_boxes.shape, expected_shape_boxes)
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_boxes, atol=1e-4))
# verify postprocessing
results = image_processor.post_process_object_detection(
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
)[0]
expected_scores = torch.tensor([0.6392, 0.6276, 0.5546, 0.5260, 0.4706], device=torch_device)
expected_labels = [75, 17, 17, 75, 63]
expected_slice_boxes = torch.tensor([40.5866, 73.2107, 176.1421, 117.1751], device=torch_device)
self.assertTrue(torch.allclose(results["scores"], expected_scores, atol=1e-4))
self.assertSequenceEqual(results["labels"].tolist(), expected_labels)
self.assertTrue(torch.allclose(results["boxes"][0, :], expected_slice_boxes))
def test_inference_object_detection_head_swin_backbone(self):
model = DetaForObjectDetection.from_pretrained("jozhang97/deta-swin-large").to(torch_device)
image_processor = self.default_image_processor
image = prepare_img()
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
with torch.no_grad():
outputs = model(**inputs)
expected_shape_logits = torch.Size((1, 300, model.config.num_labels))
self.assertEqual(outputs.logits.shape, expected_shape_logits)
expected_logits = torch.tensor(
[[-7.6308, -2.8485, -5.3737], [-7.2037, -4.5505, -4.8027], [-7.2943, -4.2611, -4.6617]]
).to(torch_device)
expected_boxes = torch.tensor(
[[0.4987, 0.4969, 0.9999], [0.2549, 0.5498, 0.4805], [0.5498, 0.2757, 0.0569]]
).to(torch_device)
self.assertTrue(torch.allclose(outputs.logits[0, :3, :3], expected_logits, atol=1e-4))
expected_shape_boxes = torch.Size((1, 300, 4))
self.assertEqual(outputs.pred_boxes.shape, expected_shape_boxes)
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_boxes, atol=1e-4))
# verify postprocessing
results = image_processor.post_process_object_detection(
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
)[0]
expected_scores = torch.tensor([0.6831, 0.6826, 0.5684, 0.5464, 0.4392], device=torch_device)
expected_labels = [17, 17, 75, 75, 63]
expected_slice_boxes = torch.tensor([345.8478, 23.6754, 639.8562, 372.8265], device=torch_device)
self.assertTrue(torch.allclose(results["scores"], expected_scores, atol=1e-4))
self.assertSequenceEqual(results["labels"].tolist(), expected_labels)
self.assertTrue(torch.allclose(results["boxes"][0, :], expected_slice_boxes))

View File

@@ -1,99 +0,0 @@
# coding=utf-8
# Copyright 2021 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
from transformers.testing_utils import require_torch, require_vision
from transformers.utils import is_vision_available
from ...test_image_processing_common import ImageProcessingTestMixin, prepare_image_inputs
if is_vision_available():
from transformers import ViTImageProcessor
class EfficientFormerImageProcessorTester(unittest.TestCase):
def __init__(
self,
parent,
batch_size=13,
num_channels=3,
image_size=224,
min_resolution=30,
max_resolution=400,
do_resize=True,
size=None,
do_normalize=True,
image_mean=[0.5, 0.5, 0.5],
image_std=[0.5, 0.5, 0.5],
):
size = size if size is not None else {"height": 18, "width": 18}
self.parent = parent
self.batch_size = batch_size
self.num_channels = num_channels
self.image_size = image_size
self.min_resolution = min_resolution
self.max_resolution = max_resolution
self.do_resize = do_resize
self.size = size
self.do_normalize = do_normalize
self.image_mean = image_mean
self.image_std = image_std
def prepare_image_processor_dict(self):
return {
"image_mean": self.image_mean,
"image_std": self.image_std,
"do_normalize": self.do_normalize,
"do_resize": self.do_resize,
"size": self.size,
}
def expected_output_image_shape(self, images):
return self.num_channels, self.size["height"], self.size["width"]
def prepare_image_inputs(self, equal_resolution=False, numpify=False, torchify=False):
return prepare_image_inputs(
batch_size=self.batch_size,
num_channels=self.num_channels,
min_resolution=self.min_resolution,
max_resolution=self.max_resolution,
equal_resolution=equal_resolution,
numpify=numpify,
torchify=torchify,
)
@require_torch
@require_vision
class EfficientFormerImageProcessorTest(ImageProcessingTestMixin, unittest.TestCase):
image_processing_class = ViTImageProcessor if is_vision_available() else None
def setUp(self):
self.image_processor_tester = EfficientFormerImageProcessorTester(self)
@property
def image_processor_dict(self):
return self.image_processor_tester.prepare_image_processor_dict()
def test_image_proc_properties(self):
image_processor = self.image_processing_class(**self.image_processor_dict)
self.assertTrue(hasattr(image_processor, "image_mean"))
self.assertTrue(hasattr(image_processor, "image_std"))
self.assertTrue(hasattr(image_processor, "do_normalize"))
self.assertTrue(hasattr(image_processor, "do_resize"))
self.assertTrue(hasattr(image_processor, "size"))

View File

@@ -1,478 +0,0 @@
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Testing suite for the PyTorch EfficientFormer model."""
import unittest
import warnings
from typing import List
from transformers import EfficientFormerConfig
from transformers.testing_utils import require_torch, require_vision, slow, torch_device
from transformers.utils import cached_property, is_torch_available, is_vision_available
from ...test_configuration_common import ConfigTester
from ...test_modeling_common import ModelTesterMixin, floats_tensor, ids_tensor
from ...test_pipeline_mixin import PipelineTesterMixin
if is_torch_available():
import torch
from transformers import (
EfficientFormerForImageClassification,
EfficientFormerForImageClassificationWithTeacher,
EfficientFormerModel,
)
from transformers.models.auto.modeling_auto import (
MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES,
MODEL_MAPPING_NAMES,
)
if is_vision_available():
from PIL import Image
from transformers import EfficientFormerImageProcessor
class EfficientFormerModelTester:
def __init__(
self,
parent,
batch_size: int = 13,
image_size: int = 64,
patch_size: int = 2,
embed_dim: int = 3,
num_channels: int = 3,
is_training: bool = True,
use_labels: bool = True,
hidden_size: int = 128,
hidden_sizes=[16, 32, 64, 128],
num_hidden_layers: int = 7,
num_attention_heads: int = 4,
intermediate_size: int = 37,
hidden_act: str = "gelu",
hidden_dropout_prob: float = 0.1,
attention_probs_dropout_prob: float = 0.1,
type_sequence_label_size: int = 10,
initializer_range: float = 0.02,
encoder_stride: int = 2,
num_attention_outputs: int = 1,
dim: int = 128,
depths: List[int] = [2, 2, 2, 2],
resolution: int = 2,
mlp_expansion_ratio: int = 2,
):
self.parent = parent
self.batch_size = batch_size
self.image_size = image_size
self.patch_size = patch_size
self.num_channels = num_channels
self.is_training = is_training
self.use_labels = use_labels
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.intermediate_size = intermediate_size
self.hidden_act = hidden_act
self.hidden_dropout_prob = hidden_dropout_prob
self.attention_probs_dropout_prob = attention_probs_dropout_prob
self.type_sequence_label_size = type_sequence_label_size
self.initializer_range = initializer_range
self.encoder_stride = encoder_stride
self.num_attention_outputs = num_attention_outputs
self.embed_dim = embed_dim
self.seq_length = embed_dim + 1
self.resolution = resolution
self.depths = depths
self.hidden_sizes = hidden_sizes
self.dim = dim
self.mlp_expansion_ratio = mlp_expansion_ratio
def prepare_config_and_inputs(self):
pixel_values = floats_tensor([self.batch_size, self.num_channels, self.image_size, self.image_size])
labels = None
if self.use_labels:
labels = ids_tensor([self.batch_size], self.type_sequence_label_size)
config = self.get_config()
return config, pixel_values, labels
def get_config(self):
return EfficientFormerConfig(
image_size=self.image_size,
patch_size=self.patch_size,
num_channels=self.num_channels,
hidden_size=self.hidden_size,
num_hidden_layers=self.num_hidden_layers,
num_attention_heads=self.num_attention_heads,
intermediate_size=self.intermediate_size,
hidden_act=self.hidden_act,
hidden_dropout_prob=self.hidden_dropout_prob,
attention_probs_dropout_prob=self.attention_probs_dropout_prob,
is_decoder=False,
initializer_range=self.initializer_range,
encoder_stride=self.encoder_stride,
resolution=self.resolution,
depths=self.depths,
hidden_sizes=self.hidden_sizes,
dim=self.dim,
mlp_expansion_ratio=self.mlp_expansion_ratio,
)
def create_and_check_model(self, config, pixel_values, labels):
model = EfficientFormerModel(config=config)
model.to(torch_device)
model.eval()
result = model(pixel_values)
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.seq_length, self.hidden_size))
def create_and_check_for_image_classification(self, config, pixel_values, labels):
config.num_labels = self.type_sequence_label_size
model = EfficientFormerForImageClassification(config)
model.to(torch_device)
model.eval()
result = model(pixel_values, labels=labels)
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.type_sequence_label_size))
# test greyscale images
config.num_channels = 1
model = EfficientFormerForImageClassification(config)
model.to(torch_device)
model.eval()
pixel_values = floats_tensor([self.batch_size, 1, self.image_size, self.image_size])
result = model(pixel_values)
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.type_sequence_label_size))
def prepare_config_and_inputs_for_common(self):
config_and_inputs = self.prepare_config_and_inputs()
(
config,
pixel_values,
labels,
) = config_and_inputs
inputs_dict = {"pixel_values": pixel_values}
return config, inputs_dict
@require_torch
class EfficientFormerModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase):
"""
Here we also overwrite some of the tests of test_modeling_common.py, as EfficientFormer does not use input_ids, inputs_embeds,
attention_mask and seq_length.
"""
all_model_classes = (
(
EfficientFormerModel,
EfficientFormerForImageClassificationWithTeacher,
EfficientFormerForImageClassification,
)
if is_torch_available()
else ()
)
pipeline_model_mapping = (
{
"image-feature-extraction": EfficientFormerModel,
"image-classification": (
EfficientFormerForImageClassification,
EfficientFormerForImageClassificationWithTeacher,
),
}
if is_torch_available()
else {}
)
fx_compatible = False
test_pruning = False
test_resize_embeddings = False
test_head_masking = False
def setUp(self):
self.model_tester = EfficientFormerModelTester(self)
self.config_tester = ConfigTester(
self, config_class=EfficientFormerConfig, has_text_modality=False, hidden_size=37
)
def test_config(self):
self.config_tester.run_common_tests()
@unittest.skip(reason="EfficientFormer does not use inputs_embeds")
def test_inputs_embeds(self):
pass
@unittest.skip(reason="EfficientFormer does not support input and output embeddings")
def test_model_common_attributes(self):
pass
def test_hidden_states_output(self):
def check_hidden_states_output(inputs_dict, config, model_class):
model = model_class(config)
model.to(torch_device)
model.eval()
with torch.no_grad():
outputs = model(**self._prepare_for_class(inputs_dict, model_class))
hidden_states = outputs.encoder_hidden_states if config.is_encoder_decoder else outputs.hidden_states
expected_num_layers = getattr(
self.model_tester, "expected_num_hidden_layers", self.model_tester.num_hidden_layers + 1
)
self.assertEqual(len(hidden_states), expected_num_layers)
if hasattr(self.model_tester, "encoder_seq_length"):
seq_length = self.model_tester.encoder_seq_length
if hasattr(self.model_tester, "chunk_length") and self.model_tester.chunk_length > 1:
seq_length = seq_length * self.model_tester.chunk_length
else:
seq_length = self.model_tester.seq_length
self.assertListEqual(
list(hidden_states[-1].shape[-2:]),
[seq_length, self.model_tester.hidden_size],
)
if config.is_encoder_decoder:
hidden_states = outputs.decoder_hidden_states
self.assertIsInstance(hidden_states, (list, tuple))
self.assertEqual(len(hidden_states), expected_num_layers)
seq_len = getattr(self.model_tester, "seq_length", None)
decoder_seq_length = getattr(self.model_tester, "decoder_seq_length", seq_len)
self.assertListEqual(
list(hidden_states[-1].shape[-2:]),
[decoder_seq_length, self.model_tester.hidden_size],
)
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
for model_class in self.all_model_classes:
inputs_dict["output_hidden_states"] = True
check_hidden_states_output(inputs_dict, config, model_class)
# check that output_hidden_states also work using config
del inputs_dict["output_hidden_states"]
config.output_hidden_states = True
check_hidden_states_output(inputs_dict, config, model_class)
def _prepare_for_class(self, inputs_dict, model_class, return_labels=False):
inputs_dict = super()._prepare_for_class(inputs_dict, model_class, return_labels=return_labels)
if return_labels:
if model_class.__name__ == "EfficientFormerForImageClassificationWithTeacher":
del inputs_dict["labels"]
return inputs_dict
def test_model(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_model(*config_and_inputs)
@unittest.skip(reason="EfficientFormer does not implement masked image modeling yet")
def test_for_masked_image_modeling(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_for_masked_image_modeling(*config_and_inputs)
def test_for_image_classification(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs()
self.model_tester.create_and_check_for_image_classification(*config_and_inputs)
# special case for EfficientFormerForImageClassificationWithTeacher model
def test_training(self):
if not self.model_tester.is_training:
return
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
config.return_dict = True
for model_class in self.all_model_classes:
# EfficientFormerForImageClassificationWithTeacher supports inference-only
if (
model_class.__name__ in MODEL_MAPPING_NAMES.values()
or model_class.__name__ == "EfficientFormerForImageClassificationWithTeacher"
):
continue
model = model_class(config)
model.to(torch_device)
model.train()
inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True)
loss = model(**inputs).loss
loss.backward()
def test_problem_types(self):
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
problem_types = [
{"title": "multi_label_classification", "num_labels": 2, "dtype": torch.float},
{"title": "single_label_classification", "num_labels": 1, "dtype": torch.long},
{"title": "regression", "num_labels": 1, "dtype": torch.float},
]
for model_class in self.all_model_classes:
if (
model_class.__name__
not in [
*MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES.values(),
]
or model_class.__name__ == "EfficientFormerForImageClassificationWithTeacher"
):
continue
for problem_type in problem_types:
with self.subTest(msg=f"Testing {model_class} with {problem_type['title']}"):
config.problem_type = problem_type["title"]
config.num_labels = problem_type["num_labels"]
model = model_class(config)
model.to(torch_device)
model.train()
inputs = self._prepare_for_class(inputs_dict, model_class, return_labels=True)
if problem_type["num_labels"] > 1:
inputs["labels"] = inputs["labels"].unsqueeze(1).repeat(1, problem_type["num_labels"])
inputs["labels"] = inputs["labels"].to(problem_type["dtype"])
# This tests that we do not trigger the warning form PyTorch "Using a target size that is different
# to the input size. This will likely lead to incorrect results due to broadcasting. Please ensure
# they have the same size." which is a symptom something in wrong for the regression problem.
# See https://github.com/huggingface/transformers/issues/11780
with warnings.catch_warnings(record=True) as warning_list:
loss = model(**inputs).loss
for w in warning_list:
if "Using a target size that is different to the input size" in str(w.message):
raise ValueError(
f"Something is going wrong in the regression problem: intercepted {w.message}"
)
loss.backward()
@slow
def test_model_from_pretrained(self):
model_name = "snap-research/efficientformer-l1-300"
model = EfficientFormerModel.from_pretrained(model_name)
self.assertIsNotNone(model)
def test_attention_outputs(self):
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
config.return_dict = True
seq_len = getattr(self.model_tester, "seq_length", None)
encoder_seq_length = getattr(self.model_tester, "encoder_seq_length", seq_len)
encoder_key_length = getattr(self.model_tester, "key_length", encoder_seq_length)
chunk_length = getattr(self.model_tester, "chunk_length", None)
if chunk_length is not None and hasattr(self.model_tester, "num_hashes"):
encoder_seq_length = encoder_seq_length * self.model_tester.num_hashes
for model_class in self.all_model_classes:
inputs_dict["output_attentions"] = True
inputs_dict["output_hidden_states"] = False
config.return_dict = True
model = model_class(config)
model.to(torch_device)
model.eval()
with torch.no_grad():
outputs = model(**self._prepare_for_class(inputs_dict, model_class))
attentions = outputs.encoder_attentions if config.is_encoder_decoder else outputs.attentions
self.assertEqual(len(attentions), self.model_tester.num_attention_outputs)
# check that output_attentions also work using config
del inputs_dict["output_attentions"]
config.output_attentions = True
model = model_class(config)
model.to(torch_device)
model.eval()
with torch.no_grad():
outputs = model(**self._prepare_for_class(inputs_dict, model_class))
attentions = outputs.encoder_attentions if config.is_encoder_decoder else outputs.attentions
self.assertEqual(len(attentions), self.model_tester.num_attention_outputs)
if chunk_length is not None:
self.assertListEqual(
list(attentions[0].shape[-4:]),
[self.model_tester.num_attention_heads, encoder_seq_length, chunk_length, encoder_key_length],
)
else:
self.assertListEqual(
list(attentions[0].shape[-3:]),
[self.model_tester.num_attention_heads, encoder_seq_length, encoder_key_length],
)
# We will verify our results on an image of cute cats
def prepare_img():
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
return image
@require_torch
@require_vision
class EfficientFormerModelIntegrationTest(unittest.TestCase):
@cached_property
def default_image_processor(self):
return (
EfficientFormerImageProcessor.from_pretrained("snap-research/efficientformer-l1-300")
if is_vision_available()
else None
)
@slow
def test_inference_image_classification_head(self):
model = EfficientFormerForImageClassification.from_pretrained("snap-research/efficientformer-l1-300").to(
torch_device
)
image_processor = self.default_image_processor
image = prepare_img()
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():
outputs = model(**inputs)
# verify the logits
expected_shape = (1, 1000)
self.assertEqual(outputs.logits.shape, expected_shape)
expected_slice = torch.tensor([-0.0555, 0.4825, -0.0852]).to(torch_device)
self.assertTrue(torch.allclose(outputs.logits[0][:3], expected_slice, atol=1e-4))
@slow
def test_inference_image_classification_head_with_teacher(self):
model = EfficientFormerForImageClassificationWithTeacher.from_pretrained(
"snap-research/efficientformer-l1-300"
).to(torch_device)
image_processor = self.default_image_processor
image = prepare_img()
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
# forward pass
with torch.no_grad():
outputs = model(**inputs)
# verify the logits
expected_shape = (1, 1000)
self.assertEqual(outputs.logits.shape, expected_shape)
expected_slice = torch.tensor([-0.1312, 0.4353, -1.0499]).to(torch_device)
self.assertTrue(torch.allclose(outputs.logits[0][:3], expected_slice, atol=1e-4))

Some files were not shown because too many files have changed in this diff Show More