Update old existing feature extractor references (#24552)
* Update old existing feature extractor references * Typo * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Address comments from review - update 'feature extractor' Co-authored by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
This commit is contained in:
@@ -354,12 +354,12 @@ Als Nächstes sehen Sie sich das Bild mit dem Merkmal 🤗 Datensätze [Bild] (h
|
|||||||
|
|
||||||
### Merkmalsextraktor
|
### Merkmalsextraktor
|
||||||
|
|
||||||
Laden Sie den Merkmalsextraktor mit [`AutoFeatureExtractor.from_pretrained`]:
|
Laden Sie den Merkmalsextraktor mit [`AutoImageProcessor.from_pretrained`]:
|
||||||
|
|
||||||
```py
|
```py
|
||||||
>>> from transformers import AutoFeatureExtractor
|
>>> from transformers import AutoImageProcessor
|
||||||
|
|
||||||
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
|
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
|
||||||
```
|
```
|
||||||
|
|
||||||
### Datenerweiterung
|
### Datenerweiterung
|
||||||
@@ -371,9 +371,9 @@ Bei Bildverarbeitungsaufgaben ist es üblich, den Bildern als Teil der Vorverarb
|
|||||||
```py
|
```py
|
||||||
>>> from torchvision.transforms import Compose, Normalize, RandomResizedCrop, ColorJitter, ToTensor
|
>>> from torchvision.transforms import Compose, Normalize, RandomResizedCrop, ColorJitter, ToTensor
|
||||||
|
|
||||||
>>> normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
|
>>> normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
|
||||||
>>> _transforms = Compose(
|
>>> _transforms = Compose(
|
||||||
... [RandomResizedCrop(feature_extractor.size), ColorJitter(brightness=0.5, hue=0.5), ToTensor(), normalize]
|
... [RandomResizedCrop(image_processor.size["height"]), ColorJitter(brightness=0.5, hue=0.5), ToTensor(), normalize]
|
||||||
... )
|
... )
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -263,7 +263,7 @@ To use, create an image processor associated with the model you're using. For ex
|
|||||||
ViTImageProcessor {
|
ViTImageProcessor {
|
||||||
"do_normalize": true,
|
"do_normalize": true,
|
||||||
"do_resize": true,
|
"do_resize": true,
|
||||||
"feature_extractor_type": "ViTImageProcessor",
|
"image_processor_type": "ViTImageProcessor",
|
||||||
"image_mean": [
|
"image_mean": [
|
||||||
0.5,
|
0.5,
|
||||||
0.5,
|
0.5,
|
||||||
@@ -295,7 +295,7 @@ Modify any of the [`ViTImageProcessor`] parameters to create your custom image p
|
|||||||
ViTImageProcessor {
|
ViTImageProcessor {
|
||||||
"do_normalize": false,
|
"do_normalize": false,
|
||||||
"do_resize": true,
|
"do_resize": true,
|
||||||
"feature_extractor_type": "ViTImageProcessor",
|
"image_processor_type": "ViTImageProcessor",
|
||||||
"image_mean": [
|
"image_mean": [
|
||||||
0.3,
|
0.3,
|
||||||
0.3,
|
0.3,
|
||||||
|
|||||||
@@ -50,10 +50,10 @@ product between the projected image and text features is then used as a similar
|
|||||||
To feed images to the Transformer encoder, each image is split into a sequence of fixed-size non-overlapping patches,
|
To feed images to the Transformer encoder, each image is split into a sequence of fixed-size non-overlapping patches,
|
||||||
which are then linearly embedded. A [CLS] token is added to serve as representation of an entire image. The authors
|
which are then linearly embedded. A [CLS] token is added to serve as representation of an entire image. The authors
|
||||||
also add absolute position embeddings, and feed the resulting sequence of vectors to a standard Transformer encoder.
|
also add absolute position embeddings, and feed the resulting sequence of vectors to a standard Transformer encoder.
|
||||||
The [`CLIPFeatureExtractor`] can be used to resize (or rescale) and normalize images for the model.
|
The [`CLIPImageProcessor`] can be used to resize (or rescale) and normalize images for the model.
|
||||||
|
|
||||||
The [`CLIPTokenizer`] is used to encode the text. The [`CLIPProcessor`] wraps
|
The [`CLIPTokenizer`] is used to encode the text. The [`CLIPProcessor`] wraps
|
||||||
[`CLIPFeatureExtractor`] and [`CLIPTokenizer`] into a single instance to both
|
[`CLIPImageProcessor`] and [`CLIPTokenizer`] into a single instance to both
|
||||||
encode the text and prepare the images. The following example shows how to get the image-text similarity scores using
|
encode the text and prepare the images. The following example shows how to get the image-text similarity scores using
|
||||||
[`CLIPProcessor`] and [`CLIPModel`].
|
[`CLIPProcessor`] and [`CLIPModel`].
|
||||||
|
|
||||||
|
|||||||
@@ -46,9 +46,9 @@ Tips:
|
|||||||
Donut's [`VisionEncoderDecoder`] model accepts images as input and makes use of
|
Donut's [`VisionEncoderDecoder`] model accepts images as input and makes use of
|
||||||
[`~generation.GenerationMixin.generate`] to autoregressively generate text given the input image.
|
[`~generation.GenerationMixin.generate`] to autoregressively generate text given the input image.
|
||||||
|
|
||||||
The [`DonutFeatureExtractor`] class is responsible for preprocessing the input image and
|
The [`DonutImageProcessor`] class is responsible for preprocessing the input image and
|
||||||
[`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`] decodes the generated target tokens to the target string. The
|
[`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`] decodes the generated target tokens to the target string. The
|
||||||
[`DonutProcessor`] wraps [`DonutFeatureExtractor`] and [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`]
|
[`DonutProcessor`] wraps [`DonutImageProcessor`] and [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`]
|
||||||
into a single instance to both extract the input features and decode the predicted token ids.
|
into a single instance to both extract the input features and decode the predicted token ids.
|
||||||
|
|
||||||
- Step-by-step Document Image Classification
|
- Step-by-step Document Image Classification
|
||||||
|
|||||||
@@ -150,23 +150,23 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
|
|||||||
## Usage: LayoutLMv2Processor
|
## Usage: LayoutLMv2Processor
|
||||||
|
|
||||||
The easiest way to prepare data for the model is to use [`LayoutLMv2Processor`], which internally
|
The easiest way to prepare data for the model is to use [`LayoutLMv2Processor`], which internally
|
||||||
combines a feature extractor ([`LayoutLMv2FeatureExtractor`]) and a tokenizer
|
combines a image processor ([`LayoutLMv2ImageProcessor`]) and a tokenizer
|
||||||
([`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]). The feature extractor
|
([`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]). The image processor
|
||||||
handles the image modality, while the tokenizer handles the text modality. A processor combines both, which is ideal
|
handles the image modality, while the tokenizer handles the text modality. A processor combines both, which is ideal
|
||||||
for a multi-modal model like LayoutLMv2. Note that you can still use both separately, if you only want to handle one
|
for a multi-modal model like LayoutLMv2. Note that you can still use both separately, if you only want to handle one
|
||||||
modality.
|
modality.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import LayoutLMv2FeatureExtractor, LayoutLMv2TokenizerFast, LayoutLMv2Processor
|
from transformers import LayoutLMv2ImageProcessor, LayoutLMv2TokenizerFast, LayoutLMv2Processor
|
||||||
|
|
||||||
feature_extractor = LayoutLMv2FeatureExtractor() # apply_ocr is set to True by default
|
image_processor = LayoutLMv2ImageProcessor() # apply_ocr is set to True by default
|
||||||
tokenizer = LayoutLMv2TokenizerFast.from_pretrained("microsoft/layoutlmv2-base-uncased")
|
tokenizer = LayoutLMv2TokenizerFast.from_pretrained("microsoft/layoutlmv2-base-uncased")
|
||||||
processor = LayoutLMv2Processor(feature_extractor, tokenizer)
|
processor = LayoutLMv2Processor(image_processor, tokenizer)
|
||||||
```
|
```
|
||||||
|
|
||||||
In short, one can provide a document image (and possibly additional data) to [`LayoutLMv2Processor`],
|
In short, one can provide a document image (and possibly additional data) to [`LayoutLMv2Processor`],
|
||||||
and it will create the inputs expected by the model. Internally, the processor first uses
|
and it will create the inputs expected by the model. Internally, the processor first uses
|
||||||
[`LayoutLMv2FeatureExtractor`] to apply OCR on the image to get a list of words and normalized
|
[`LayoutLMv2ImageProcessor`] to apply OCR on the image to get a list of words and normalized
|
||||||
bounding boxes, as well to resize the image to a given size in order to get the `image` input. The words and
|
bounding boxes, as well to resize the image to a given size in order to get the `image` input. The words and
|
||||||
normalized bounding boxes are then provided to [`LayoutLMv2Tokenizer`] or
|
normalized bounding boxes are then provided to [`LayoutLMv2Tokenizer`] or
|
||||||
[`LayoutLMv2TokenizerFast`], which converts them to token-level `input_ids`,
|
[`LayoutLMv2TokenizerFast`], which converts them to token-level `input_ids`,
|
||||||
@@ -176,7 +176,7 @@ which are turned into token-level `labels`.
|
|||||||
[`LayoutLMv2Processor`] uses [PyTesseract](https://pypi.org/project/pytesseract/), a Python
|
[`LayoutLMv2Processor`] uses [PyTesseract](https://pypi.org/project/pytesseract/), a Python
|
||||||
wrapper around Google's Tesseract OCR engine, under the hood. Note that you can still use your own OCR engine of
|
wrapper around Google's Tesseract OCR engine, under the hood. Note that you can still use your own OCR engine of
|
||||||
choice, and provide the words and normalized boxes yourself. This requires initializing
|
choice, and provide the words and normalized boxes yourself. This requires initializing
|
||||||
[`LayoutLMv2FeatureExtractor`] with `apply_ocr` set to `False`.
|
[`LayoutLMv2ImageProcessor`] with `apply_ocr` set to `False`.
|
||||||
|
|
||||||
In total, there are 5 use cases that are supported by the processor. Below, we list them all. Note that each of these
|
In total, there are 5 use cases that are supported by the processor. Below, we list them all. Note that each of these
|
||||||
use cases work for both batched and non-batched inputs (we illustrate them for non-batched inputs).
|
use cases work for both batched and non-batched inputs (we illustrate them for non-batched inputs).
|
||||||
@@ -184,7 +184,7 @@ use cases work for both batched and non-batched inputs (we illustrate them for n
|
|||||||
**Use case 1: document image classification (training, inference) + token classification (inference), apply_ocr =
|
**Use case 1: document image classification (training, inference) + token classification (inference), apply_ocr =
|
||||||
True**
|
True**
|
||||||
|
|
||||||
This is the simplest case, in which the processor (actually the feature extractor) will perform OCR on the image to get
|
This is the simplest case, in which the processor (actually the image processor) will perform OCR on the image to get
|
||||||
the words and normalized bounding boxes.
|
the words and normalized bounding boxes.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@@ -205,7 +205,7 @@ print(encoding.keys())
|
|||||||
|
|
||||||
**Use case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False**
|
**Use case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False**
|
||||||
|
|
||||||
In case one wants to do OCR themselves, one can initialize the feature extractor with `apply_ocr` set to
|
In case one wants to do OCR themselves, one can initialize the image processor with `apply_ocr` set to
|
||||||
`False`. In that case, one should provide the words and corresponding (normalized) bounding boxes themselves to
|
`False`. In that case, one should provide the words and corresponding (normalized) bounding boxes themselves to
|
||||||
the processor.
|
the processor.
|
||||||
|
|
||||||
|
|||||||
@@ -31,7 +31,7 @@ Tips:
|
|||||||
- In terms of data processing, LayoutLMv3 is identical to its predecessor [LayoutLMv2](layoutlmv2), except that:
|
- In terms of data processing, LayoutLMv3 is identical to its predecessor [LayoutLMv2](layoutlmv2), except that:
|
||||||
- images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format.
|
- images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format.
|
||||||
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
|
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
|
||||||
Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3FeatureExtractor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
|
Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3ImageProcessor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
|
||||||
- Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
|
- Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
|
||||||
- Demo notebooks for LayoutLMv3 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv3).
|
- Demo notebooks for LayoutLMv3 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv3).
|
||||||
- Demo scripts can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/layoutlmv3).
|
- Demo scripts can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/layoutlmv3).
|
||||||
|
|||||||
@@ -52,7 +52,7 @@ tokenizer = LayoutXLMTokenizer.from_pretrained("microsoft/layoutxlm-base")
|
|||||||
```
|
```
|
||||||
|
|
||||||
Similar to LayoutLMv2, you can use [`LayoutXLMProcessor`] (which internally applies
|
Similar to LayoutLMv2, you can use [`LayoutXLMProcessor`] (which internally applies
|
||||||
[`LayoutLMv2FeatureExtractor`] and
|
[`LayoutLMv2ImageProcessor`] and
|
||||||
[`LayoutXLMTokenizer`]/[`LayoutXLMTokenizerFast`] in sequence) to prepare all
|
[`LayoutXLMTokenizer`]/[`LayoutXLMTokenizerFast`] in sequence) to prepare all
|
||||||
data for the model.
|
data for the model.
|
||||||
|
|
||||||
|
|||||||
@@ -28,7 +28,7 @@ The abstract from the paper is the following:
|
|||||||
|
|
||||||
OWL-ViT is a zero-shot text-conditioned object detection model. OWL-ViT uses [CLIP](clip) as its multi-modal backbone, with a ViT-like Transformer to get visual features and a causal language model to get the text features. To use CLIP for detection, OWL-ViT removes the final token pooling layer of the vision model and attaches a lightweight classification and box head to each transformer output token. Open-vocabulary classification is enabled by replacing the fixed classification layer weights with the class-name embeddings obtained from the text model. The authors first train CLIP from scratch and fine-tune it end-to-end with the classification and box heads on standard detection datasets using a bipartite matching loss. One or multiple text queries per image can be used to perform zero-shot text-conditioned object detection.
|
OWL-ViT is a zero-shot text-conditioned object detection model. OWL-ViT uses [CLIP](clip) as its multi-modal backbone, with a ViT-like Transformer to get visual features and a causal language model to get the text features. To use CLIP for detection, OWL-ViT removes the final token pooling layer of the vision model and attaches a lightweight classification and box head to each transformer output token. Open-vocabulary classification is enabled by replacing the fixed classification layer weights with the class-name embeddings obtained from the text model. The authors first train CLIP from scratch and fine-tune it end-to-end with the classification and box heads on standard detection datasets using a bipartite matching loss. One or multiple text queries per image can be used to perform zero-shot text-conditioned object detection.
|
||||||
|
|
||||||
[`OwlViTFeatureExtractor`] can be used to resize (or rescale) and normalize images for the model and [`CLIPTokenizer`] is used to encode the text. [`OwlViTProcessor`] wraps [`OwlViTFeatureExtractor`] and [`CLIPTokenizer`] into a single instance to both encode the text and prepare the images. The following example shows how to perform object detection using [`OwlViTProcessor`] and [`OwlViTForObjectDetection`].
|
[`OwlViTImageProcessor`] can be used to resize (or rescale) and normalize images for the model and [`CLIPTokenizer`] is used to encode the text. [`OwlViTProcessor`] wraps [`OwlViTImageProcessor`] and [`CLIPTokenizer`] into a single instance to both encode the text and prepare the images. The following example shows how to perform object detection using [`OwlViTProcessor`] and [`OwlViTForObjectDetection`].
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
|||||||
@@ -39,7 +39,7 @@ Tips:
|
|||||||
- The quickest way to get started with ViLT is by checking the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/ViLT)
|
- The quickest way to get started with ViLT is by checking the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/ViLT)
|
||||||
(which showcase both inference and fine-tuning on custom data).
|
(which showcase both inference and fine-tuning on custom data).
|
||||||
- ViLT is a model that takes both `pixel_values` and `input_ids` as input. One can use [`ViltProcessor`] to prepare data for the model.
|
- ViLT is a model that takes both `pixel_values` and `input_ids` as input. One can use [`ViltProcessor`] to prepare data for the model.
|
||||||
This processor wraps a feature extractor (for the image modality) and a tokenizer (for the language modality) into one.
|
This processor wraps a image processor (for the image modality) and a tokenizer (for the language modality) into one.
|
||||||
- ViLT is trained with images of various sizes: the authors resize the shorter edge of input images to 384 and limit the longer edge to
|
- ViLT is trained with images of various sizes: the authors resize the shorter edge of input images to 384 and limit the longer edge to
|
||||||
under 640 while preserving the aspect ratio. To make batching of images possible, the authors use a `pixel_mask` that indicates
|
under 640 while preserving the aspect ratio. To make batching of images possible, the authors use a `pixel_mask` that indicates
|
||||||
which pixel values are real and which are padding. [`ViltProcessor`] automatically creates this for you.
|
which pixel values are real and which are padding. [`ViltProcessor`] automatically creates this for you.
|
||||||
|
|||||||
@@ -462,9 +462,9 @@ Next, prepare an instance of a `CocoDetection` class that can be used with `coco
|
|||||||
|
|
||||||
|
|
||||||
>>> class CocoDetection(torchvision.datasets.CocoDetection):
|
>>> class CocoDetection(torchvision.datasets.CocoDetection):
|
||||||
... def __init__(self, img_folder, feature_extractor, ann_file):
|
... def __init__(self, img_folder, image_processor, ann_file):
|
||||||
... super().__init__(img_folder, ann_file)
|
... super().__init__(img_folder, ann_file)
|
||||||
... self.feature_extractor = feature_extractor
|
... self.image_processor = image_processor
|
||||||
|
|
||||||
... def __getitem__(self, idx):
|
... def __getitem__(self, idx):
|
||||||
... # read in PIL image and target in COCO format
|
... # read in PIL image and target in COCO format
|
||||||
@@ -474,7 +474,7 @@ Next, prepare an instance of a `CocoDetection` class that can be used with `coco
|
|||||||
... # resizing + normalization of both image and target)
|
... # resizing + normalization of both image and target)
|
||||||
... image_id = self.ids[idx]
|
... image_id = self.ids[idx]
|
||||||
... target = {"image_id": image_id, "annotations": target}
|
... target = {"image_id": image_id, "annotations": target}
|
||||||
... encoding = self.feature_extractor(images=img, annotations=target, return_tensors="pt")
|
... encoding = self.image_processor(images=img, annotations=target, return_tensors="pt")
|
||||||
... pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension
|
... pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension
|
||||||
... target = encoding["labels"][0] # remove batch dimension
|
... target = encoding["labels"][0] # remove batch dimension
|
||||||
|
|
||||||
@@ -591,4 +591,3 @@ Let's plot the result:
|
|||||||
<div class="flex justify-center">
|
<div class="flex justify-center">
|
||||||
<img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"/>
|
<img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|||||||
@@ -73,12 +73,12 @@ Cada clase de alimento - o label - corresponde a un número; `79` indica una cos
|
|||||||
|
|
||||||
## Preprocesa
|
## Preprocesa
|
||||||
|
|
||||||
Carga el feature extractor de ViT para procesar la imagen en un tensor:
|
Carga el image processor de ViT para procesar la imagen en un tensor:
|
||||||
|
|
||||||
```py
|
```py
|
||||||
>>> from transformers import AutoFeatureExtractor
|
>>> from transformers import AutoImageProcessor
|
||||||
|
|
||||||
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224-in21k")
|
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
|
||||||
```
|
```
|
||||||
|
|
||||||
Aplica varias transformaciones de imagen al dataset para hacer el modelo más robusto contra el overfitting. En este caso se utilizará el módulo [`transforms`](https://pytorch.org/vision/stable/transforms.html) de torchvision. Recorta una parte aleatoria de la imagen, cambia su tamaño y normalízala con la media y la desviación estándar de la imagen:
|
Aplica varias transformaciones de imagen al dataset para hacer el modelo más robusto contra el overfitting. En este caso se utilizará el módulo [`transforms`](https://pytorch.org/vision/stable/transforms.html) de torchvision. Recorta una parte aleatoria de la imagen, cambia su tamaño y normalízala con la media y la desviación estándar de la imagen:
|
||||||
@@ -86,8 +86,8 @@ Aplica varias transformaciones de imagen al dataset para hacer el modelo más ro
|
|||||||
```py
|
```py
|
||||||
>>> from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor
|
>>> from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor
|
||||||
|
|
||||||
>>> normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
|
>>> normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
|
||||||
>>> _transforms = Compose([RandomResizedCrop(feature_extractor.size), ToTensor(), normalize])
|
>>> _transforms = Compose([RandomResizedCrop(image_processor.size["height"]), ToTensor(), normalize])
|
||||||
```
|
```
|
||||||
|
|
||||||
Crea una función de preprocesamiento que aplique las transformaciones y devuelva los `pixel_values` - los inputs al modelo - de la imagen:
|
Crea una función de preprocesamiento que aplique las transformaciones y devuelva los `pixel_values` - los inputs al modelo - de la imagen:
|
||||||
@@ -160,7 +160,7 @@ Al llegar a este punto, solo quedan tres pasos:
|
|||||||
... data_collator=data_collator,
|
... data_collator=data_collator,
|
||||||
... train_dataset=food["train"],
|
... train_dataset=food["train"],
|
||||||
... eval_dataset=food["test"],
|
... eval_dataset=food["test"],
|
||||||
... tokenizer=feature_extractor,
|
... tokenizer=image_processor,
|
||||||
... )
|
... )
|
||||||
|
|
||||||
>>> trainer.train()
|
>>> trainer.train()
|
||||||
|
|||||||
@@ -454,9 +454,9 @@ COCO 데이터 세트를 빌드하는 API는 데이터를 특정 형식으로
|
|||||||
|
|
||||||
|
|
||||||
>>> class CocoDetection(torchvision.datasets.CocoDetection):
|
>>> class CocoDetection(torchvision.datasets.CocoDetection):
|
||||||
... def __init__(self, img_folder, feature_extractor, ann_file):
|
... def __init__(self, img_folder, image_processor, ann_file):
|
||||||
... super().__init__(img_folder, ann_file)
|
... super().__init__(img_folder, ann_file)
|
||||||
... self.feature_extractor = feature_extractor
|
... self.image_processor = image_processor
|
||||||
|
|
||||||
... def __getitem__(self, idx):
|
... def __getitem__(self, idx):
|
||||||
... # read in PIL image and target in COCO format
|
... # read in PIL image and target in COCO format
|
||||||
@@ -466,7 +466,7 @@ COCO 데이터 세트를 빌드하는 API는 데이터를 특정 형식으로
|
|||||||
... # resizing + normalization of both image and target)
|
... # resizing + normalization of both image and target)
|
||||||
... image_id = self.ids[idx]
|
... image_id = self.ids[idx]
|
||||||
... target = {"image_id": image_id, "annotations": target}
|
... target = {"image_id": image_id, "annotations": target}
|
||||||
... encoding = self.feature_extractor(images=img, annotations=target, return_tensors="pt")
|
... encoding = self.image_processor(images=img, annotations=target, return_tensors="pt")
|
||||||
... pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension
|
... pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension
|
||||||
... target = encoding["labels"][0] # remove batch dimension
|
... target = encoding["labels"][0] # remove batch dimension
|
||||||
|
|
||||||
@@ -586,4 +586,3 @@ Detected Mask with confidence 0.584 at location [2449.06, 823.19, 3256.43, 1413.
|
|||||||
<div class="flex justify-center">
|
<div class="flex justify-center">
|
||||||
<img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"/>
|
<img src="https://i.imgur.com/4QZnf9A.png" alt="Object detection result on a new image"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|||||||
@@ -354,12 +354,12 @@ def convert_align_checkpoint(checkpoint_path, pytorch_dump_folder_path, save_mod
|
|||||||
# Create folder to save model
|
# Create folder to save model
|
||||||
if not os.path.isdir(pytorch_dump_folder_path):
|
if not os.path.isdir(pytorch_dump_folder_path):
|
||||||
os.mkdir(pytorch_dump_folder_path)
|
os.mkdir(pytorch_dump_folder_path)
|
||||||
# Save converted model and feature extractor
|
# Save converted model and image processor
|
||||||
hf_model.save_pretrained(pytorch_dump_folder_path)
|
hf_model.save_pretrained(pytorch_dump_folder_path)
|
||||||
processor.save_pretrained(pytorch_dump_folder_path)
|
processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
# Push model and feature extractor to hub
|
# Push model and image processor to hub
|
||||||
print("Pushing converted ALIGN to the hub...")
|
print("Pushing converted ALIGN to the hub...")
|
||||||
processor.push_to_hub("align-base")
|
processor.push_to_hub("align-base")
|
||||||
hf_model.push_to_hub("align-base")
|
hf_model.push_to_hub("align-base")
|
||||||
@@ -381,7 +381,7 @@ if __name__ == "__main__":
|
|||||||
help="Path to the output PyTorch model directory.",
|
help="Path to the output PyTorch model directory.",
|
||||||
)
|
)
|
||||||
parser.add_argument("--save_model", action="store_true", help="Save model to local")
|
parser.add_argument("--save_model", action="store_true", help="Save model to local")
|
||||||
parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
|
parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
convert_align_checkpoint(args.checkpoint_path, args.pytorch_dump_folder_path, args.save_model, args.push_to_hub)
|
convert_align_checkpoint(args.checkpoint_path, args.pytorch_dump_folder_path, args.save_model, args.push_to_hub)
|
||||||
|
|||||||
@@ -27,10 +27,10 @@ from PIL import Image
|
|||||||
|
|
||||||
from transformers import (
|
from transformers import (
|
||||||
BeitConfig,
|
BeitConfig,
|
||||||
BeitFeatureExtractor,
|
|
||||||
BeitForImageClassification,
|
BeitForImageClassification,
|
||||||
BeitForMaskedImageModeling,
|
BeitForMaskedImageModeling,
|
||||||
BeitForSemanticSegmentation,
|
BeitForSemanticSegmentation,
|
||||||
|
BeitImageProcessor,
|
||||||
)
|
)
|
||||||
from transformers.image_utils import PILImageResampling
|
from transformers.image_utils import PILImageResampling
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
@@ -266,16 +266,16 @@ def convert_beit_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
|||||||
|
|
||||||
# Check outputs on an image
|
# Check outputs on an image
|
||||||
if is_semantic:
|
if is_semantic:
|
||||||
feature_extractor = BeitFeatureExtractor(size=config.image_size, do_center_crop=False)
|
image_processor = BeitImageProcessor(size=config.image_size, do_center_crop=False)
|
||||||
ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
|
ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
|
||||||
image = Image.open(ds[0]["file"])
|
image = Image.open(ds[0]["file"])
|
||||||
else:
|
else:
|
||||||
feature_extractor = BeitFeatureExtractor(
|
image_processor = BeitImageProcessor(
|
||||||
size=config.image_size, resample=PILImageResampling.BILINEAR, do_center_crop=False
|
size=config.image_size, resample=PILImageResampling.BILINEAR, do_center_crop=False
|
||||||
)
|
)
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
|
|
||||||
encoding = feature_extractor(images=image, return_tensors="pt")
|
encoding = image_processor(images=image, return_tensors="pt")
|
||||||
pixel_values = encoding["pixel_values"]
|
pixel_values = encoding["pixel_values"]
|
||||||
|
|
||||||
outputs = model(pixel_values)
|
outputs = model(pixel_values)
|
||||||
@@ -353,8 +353,8 @@ def convert_beit_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -468,7 +468,7 @@ class ChineseCLIPOnnxConfig(OnnxConfig):
|
|||||||
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
|
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
|
||||||
)
|
)
|
||||||
image_input_dict = super().generate_dummy_inputs(
|
image_input_dict = super().generate_dummy_inputs(
|
||||||
processor.feature_extractor, batch_size=batch_size, framework=framework
|
processor.image_processor, batch_size=batch_size, framework=framework
|
||||||
)
|
)
|
||||||
return {**text_input_dict, **image_input_dict}
|
return {**text_input_dict, **image_input_dict}
|
||||||
|
|
||||||
|
|||||||
@@ -449,7 +449,7 @@ class CLIPOnnxConfig(OnnxConfig):
|
|||||||
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
|
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
|
||||||
)
|
)
|
||||||
image_input_dict = super().generate_dummy_inputs(
|
image_input_dict = super().generate_dummy_inputs(
|
||||||
processor.feature_extractor, batch_size=batch_size, framework=framework
|
processor.image_processor, batch_size=batch_size, framework=framework
|
||||||
)
|
)
|
||||||
return {**text_input_dict, **image_input_dict}
|
return {**text_input_dict, **image_input_dict}
|
||||||
|
|
||||||
|
|||||||
@@ -28,7 +28,7 @@ from transformers import (
|
|||||||
CLIPSegTextConfig,
|
CLIPSegTextConfig,
|
||||||
CLIPSegVisionConfig,
|
CLIPSegVisionConfig,
|
||||||
CLIPTokenizer,
|
CLIPTokenizer,
|
||||||
ViTFeatureExtractor,
|
ViTImageProcessor,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@@ -185,9 +185,9 @@ def convert_clipseg_checkpoint(model_name, checkpoint_path, pytorch_dump_folder_
|
|||||||
if unexpected_keys != ["decoder.reduce.weight", "decoder.reduce.bias"]:
|
if unexpected_keys != ["decoder.reduce.weight", "decoder.reduce.bias"]:
|
||||||
raise ValueError(f"Unexpected keys: {unexpected_keys}")
|
raise ValueError(f"Unexpected keys: {unexpected_keys}")
|
||||||
|
|
||||||
feature_extractor = ViTFeatureExtractor(size=352)
|
image_processor = ViTImageProcessor(size=352)
|
||||||
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")
|
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")
|
||||||
processor = CLIPSegProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
processor = CLIPSegProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||||
|
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
text = ["a glass", "something to fill", "wood", "a jar"]
|
text = ["a glass", "something to fill", "wood", "a jar"]
|
||||||
|
|||||||
@@ -27,9 +27,9 @@ from PIL import Image
|
|||||||
|
|
||||||
from transformers import (
|
from transformers import (
|
||||||
ConditionalDetrConfig,
|
ConditionalDetrConfig,
|
||||||
ConditionalDetrFeatureExtractor,
|
|
||||||
ConditionalDetrForObjectDetection,
|
ConditionalDetrForObjectDetection,
|
||||||
ConditionalDetrForSegmentation,
|
ConditionalDetrForSegmentation,
|
||||||
|
ConditionalDetrImageProcessor,
|
||||||
)
|
)
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
@@ -244,13 +244,13 @@ def convert_conditional_detr_checkpoint(model_name, pytorch_dump_folder_path):
|
|||||||
config.id2label = id2label
|
config.id2label = id2label
|
||||||
config.label2id = {v: k for k, v in id2label.items()}
|
config.label2id = {v: k for k, v in id2label.items()}
|
||||||
|
|
||||||
# load feature extractor
|
# load image processor
|
||||||
format = "coco_panoptic" if is_panoptic else "coco_detection"
|
format = "coco_panoptic" if is_panoptic else "coco_detection"
|
||||||
feature_extractor = ConditionalDetrFeatureExtractor(format=format)
|
image_processor = ConditionalDetrImageProcessor(format=format)
|
||||||
|
|
||||||
# prepare image
|
# prepare image
|
||||||
img = prepare_img()
|
img = prepare_img()
|
||||||
encoding = feature_extractor(images=img, return_tensors="pt")
|
encoding = image_processor(images=img, return_tensors="pt")
|
||||||
pixel_values = encoding["pixel_values"]
|
pixel_values = encoding["pixel_values"]
|
||||||
|
|
||||||
logger.info(f"Converting model {model_name}...")
|
logger.info(f"Converting model {model_name}...")
|
||||||
@@ -302,11 +302,11 @@ def convert_conditional_detr_checkpoint(model_name, pytorch_dump_folder_path):
|
|||||||
if is_panoptic:
|
if is_panoptic:
|
||||||
assert torch.allclose(outputs.pred_masks, original_outputs["pred_masks"], atol=1e-4)
|
assert torch.allclose(outputs.pred_masks, original_outputs["pred_masks"], atol=1e-4)
|
||||||
|
|
||||||
# Save model and feature extractor
|
# Save model and image processor
|
||||||
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
|
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
|
||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -26,7 +26,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import ConvNextConfig, ConvNextFeatureExtractor, ConvNextForImageClassification
|
from transformers import ConvNextConfig, ConvNextForImageClassification, ConvNextImageProcessor
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -144,10 +144,10 @@ def convert_convnext_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
|||||||
model.load_state_dict(state_dict)
|
model.load_state_dict(state_dict)
|
||||||
model.eval()
|
model.eval()
|
||||||
|
|
||||||
# Check outputs on an image, prepared by ConvNextFeatureExtractor
|
# Check outputs on an image, prepared by ConvNextImageProcessor
|
||||||
size = 224 if "224" in checkpoint_url else 384
|
size = 224 if "224" in checkpoint_url else 384
|
||||||
feature_extractor = ConvNextFeatureExtractor(size=size)
|
image_processor = ConvNextImageProcessor(size=size)
|
||||||
pixel_values = feature_extractor(images=prepare_img(), return_tensors="pt").pixel_values
|
pixel_values = image_processor(images=prepare_img(), return_tensors="pt").pixel_values
|
||||||
|
|
||||||
logits = model(pixel_values).logits
|
logits = model(pixel_values).logits
|
||||||
|
|
||||||
@@ -191,8 +191,8 @@ def convert_convnext_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
print("Pushing model to the hub...")
|
print("Pushing model to the hub...")
|
||||||
model_name = "convnext"
|
model_name = "convnext"
|
||||||
|
|||||||
@@ -24,7 +24,7 @@ from collections import OrderedDict
|
|||||||
import torch
|
import torch
|
||||||
from huggingface_hub import cached_download, hf_hub_url
|
from huggingface_hub import cached_download, hf_hub_url
|
||||||
|
|
||||||
from transformers import AutoFeatureExtractor, CvtConfig, CvtForImageClassification
|
from transformers import AutoImageProcessor, CvtConfig, CvtForImageClassification
|
||||||
|
|
||||||
|
|
||||||
def embeddings(idx):
|
def embeddings(idx):
|
||||||
@@ -307,8 +307,8 @@ def convert_cvt_checkpoint(cvt_model, image_size, cvt_file_name, pytorch_dump_fo
|
|||||||
config.embed_dim = [192, 768, 1024]
|
config.embed_dim = [192, 768, 1024]
|
||||||
|
|
||||||
model = CvtForImageClassification(config)
|
model = CvtForImageClassification(config)
|
||||||
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k")
|
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k")
|
||||||
feature_extractor.size["shortest_edge"] = image_size
|
image_processor.size["shortest_edge"] = image_size
|
||||||
original_weights = torch.load(cvt_file_name, map_location=torch.device("cpu"))
|
original_weights = torch.load(cvt_file_name, map_location=torch.device("cpu"))
|
||||||
|
|
||||||
huggingface_weights = OrderedDict()
|
huggingface_weights = OrderedDict()
|
||||||
@@ -329,7 +329,7 @@ def convert_cvt_checkpoint(cvt_model, image_size, cvt_file_name, pytorch_dump_fo
|
|||||||
|
|
||||||
model.load_state_dict(huggingface_weights)
|
model.load_state_dict(huggingface_weights)
|
||||||
model.save_pretrained(pytorch_dump_folder)
|
model.save_pretrained(pytorch_dump_folder)
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder)
|
image_processor.save_pretrained(pytorch_dump_folder)
|
||||||
|
|
||||||
|
|
||||||
# Download the weights from zoo: https://1drv.ms/u/s!AhIXJn_J-blW9RzF3rMW7SsLHa8h?e=blQ0Al
|
# Download the weights from zoo: https://1drv.ms/u/s!AhIXJn_J-blW9RzF3rMW7SsLHa8h?e=blQ0Al
|
||||||
|
|||||||
@@ -8,7 +8,7 @@ from PIL import Image
|
|||||||
from timm.models import create_model
|
from timm.models import create_model
|
||||||
|
|
||||||
from transformers import (
|
from transformers import (
|
||||||
BeitFeatureExtractor,
|
BeitImageProcessor,
|
||||||
Data2VecVisionConfig,
|
Data2VecVisionConfig,
|
||||||
Data2VecVisionForImageClassification,
|
Data2VecVisionForImageClassification,
|
||||||
Data2VecVisionModel,
|
Data2VecVisionModel,
|
||||||
@@ -304,9 +304,9 @@ def main():
|
|||||||
orig_model.eval()
|
orig_model.eval()
|
||||||
|
|
||||||
# 3. Forward Beit model
|
# 3. Forward Beit model
|
||||||
feature_extractor = BeitFeatureExtractor(size=config.image_size, do_center_crop=False)
|
image_processor = BeitImageProcessor(size=config.image_size, do_center_crop=False)
|
||||||
image = Image.open("../../../../tests/fixtures/tests_samples/COCO/000000039769.png")
|
image = Image.open("../../../../tests/fixtures/tests_samples/COCO/000000039769.png")
|
||||||
encoding = feature_extractor(images=image, return_tensors="pt")
|
encoding = image_processor(images=image, return_tensors="pt")
|
||||||
pixel_values = encoding["pixel_values"]
|
pixel_values = encoding["pixel_values"]
|
||||||
|
|
||||||
orig_args = (pixel_values,) if is_finetuned else (pixel_values, None)
|
orig_args = (pixel_values,) if is_finetuned else (pixel_values, None)
|
||||||
@@ -354,7 +354,7 @@ def main():
|
|||||||
# 7. Save
|
# 7. Save
|
||||||
print(f"Saving to {args.hf_checkpoint_name}")
|
print(f"Saving to {args.hf_checkpoint_name}")
|
||||||
hf_model.save_pretrained(args.hf_checkpoint_name)
|
hf_model.save_pretrained(args.hf_checkpoint_name)
|
||||||
feature_extractor.save_pretrained(args.hf_checkpoint_name)
|
image_processor.save_pretrained(args.hf_checkpoint_name)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -24,7 +24,7 @@ import torch
|
|||||||
from huggingface_hub import cached_download, hf_hub_url
|
from huggingface_hub import cached_download, hf_hub_url
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import DeformableDetrConfig, DeformableDetrFeatureExtractor, DeformableDetrForObjectDetection
|
from transformers import DeformableDetrConfig, DeformableDetrForObjectDetection, DeformableDetrImageProcessor
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -115,12 +115,12 @@ def convert_deformable_detr_checkpoint(
|
|||||||
config.id2label = id2label
|
config.id2label = id2label
|
||||||
config.label2id = {v: k for k, v in id2label.items()}
|
config.label2id = {v: k for k, v in id2label.items()}
|
||||||
|
|
||||||
# load feature extractor
|
# load image processor
|
||||||
feature_extractor = DeformableDetrFeatureExtractor(format="coco_detection")
|
image_processor = DeformableDetrImageProcessor(format="coco_detection")
|
||||||
|
|
||||||
# prepare image
|
# prepare image
|
||||||
img = prepare_img()
|
img = prepare_img()
|
||||||
encoding = feature_extractor(images=img, return_tensors="pt")
|
encoding = image_processor(images=img, return_tensors="pt")
|
||||||
pixel_values = encoding["pixel_values"]
|
pixel_values = encoding["pixel_values"]
|
||||||
|
|
||||||
logger.info("Converting model...")
|
logger.info("Converting model...")
|
||||||
@@ -185,11 +185,11 @@ def convert_deformable_detr_checkpoint(
|
|||||||
|
|
||||||
print("Everything ok!")
|
print("Everything ok!")
|
||||||
|
|
||||||
# Save model and feature extractor
|
# Save model and image processor
|
||||||
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
|
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
|
||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
# Push to hub
|
# Push to hub
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
|
|||||||
@@ -25,7 +25,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import DeiTConfig, DeiTFeatureExtractor, DeiTForImageClassificationWithTeacher
|
from transformers import DeiTConfig, DeiTForImageClassificationWithTeacher, DeiTImageProcessor
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -182,12 +182,12 @@ def convert_deit_checkpoint(deit_name, pytorch_dump_folder_path):
|
|||||||
model = DeiTForImageClassificationWithTeacher(config).eval()
|
model = DeiTForImageClassificationWithTeacher(config).eval()
|
||||||
model.load_state_dict(state_dict)
|
model.load_state_dict(state_dict)
|
||||||
|
|
||||||
# Check outputs on an image, prepared by DeiTFeatureExtractor
|
# Check outputs on an image, prepared by DeiTImageProcessor
|
||||||
size = int(
|
size = int(
|
||||||
(256 / 224) * config.image_size
|
(256 / 224) * config.image_size
|
||||||
) # to maintain same ratio w.r.t. 224 images, see https://github.com/facebookresearch/deit/blob/ab5715372db8c6cad5740714b2216d55aeae052e/datasets.py#L103
|
) # to maintain same ratio w.r.t. 224 images, see https://github.com/facebookresearch/deit/blob/ab5715372db8c6cad5740714b2216d55aeae052e/datasets.py#L103
|
||||||
feature_extractor = DeiTFeatureExtractor(size=size, crop_size=config.image_size)
|
image_processor = DeiTImageProcessor(size=size, crop_size=config.image_size)
|
||||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||||
pixel_values = encoding["pixel_values"]
|
pixel_values = encoding["pixel_values"]
|
||||||
outputs = model(pixel_values)
|
outputs = model(pixel_values)
|
||||||
|
|
||||||
@@ -198,8 +198,8 @@ def convert_deit_checkpoint(deit_name, pytorch_dump_folder_path):
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model {deit_name} to {pytorch_dump_folder_path}")
|
print(f"Saving model {deit_name} to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -25,7 +25,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import DetrConfig, DetrFeatureExtractor, DetrForObjectDetection, DetrForSegmentation
|
from transformers import DetrConfig, DetrForObjectDetection, DetrForSegmentation, DetrImageProcessor
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -201,13 +201,13 @@ def convert_detr_checkpoint(model_name, pytorch_dump_folder_path):
|
|||||||
config.id2label = id2label
|
config.id2label = id2label
|
||||||
config.label2id = {v: k for k, v in id2label.items()}
|
config.label2id = {v: k for k, v in id2label.items()}
|
||||||
|
|
||||||
# load feature extractor
|
# load image processor
|
||||||
format = "coco_panoptic" if is_panoptic else "coco_detection"
|
format = "coco_panoptic" if is_panoptic else "coco_detection"
|
||||||
feature_extractor = DetrFeatureExtractor(format=format)
|
image_processor = DetrImageProcessor(format=format)
|
||||||
|
|
||||||
# prepare image
|
# prepare image
|
||||||
img = prepare_img()
|
img = prepare_img()
|
||||||
encoding = feature_extractor(images=img, return_tensors="pt")
|
encoding = image_processor(images=img, return_tensors="pt")
|
||||||
pixel_values = encoding["pixel_values"]
|
pixel_values = encoding["pixel_values"]
|
||||||
|
|
||||||
logger.info(f"Converting model {model_name}...")
|
logger.info(f"Converting model {model_name}...")
|
||||||
@@ -258,11 +258,11 @@ def convert_detr_checkpoint(model_name, pytorch_dump_folder_path):
|
|||||||
if is_panoptic:
|
if is_panoptic:
|
||||||
assert torch.allclose(outputs.pred_masks, original_outputs["pred_masks"], atol=1e-4)
|
assert torch.allclose(outputs.pred_masks, original_outputs["pred_masks"], atol=1e-4)
|
||||||
|
|
||||||
# Save model and feature extractor
|
# Save model and image processor
|
||||||
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
|
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
|
||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -1341,8 +1341,7 @@ class DetrImageProcessor(BaseImageProcessor):
|
|||||||
|
|
||||||
Args:
|
Args:
|
||||||
results (`List[Dict]`):
|
results (`List[Dict]`):
|
||||||
Results list obtained by [`~DetrFeatureExtractor.post_process`], to which "masks" results will be
|
Results list obtained by [`~DetrImageProcessor.post_process`], to which "masks" results will be added.
|
||||||
added.
|
|
||||||
outputs ([`DetrSegmentationOutput`]):
|
outputs ([`DetrSegmentationOutput`]):
|
||||||
Raw outputs of the model.
|
Raw outputs of the model.
|
||||||
orig_target_sizes (`torch.Tensor` of shape `(batch_size, 2)`):
|
orig_target_sizes (`torch.Tensor` of shape `(batch_size, 2)`):
|
||||||
|
|||||||
@@ -24,7 +24,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import BeitConfig, BeitFeatureExtractor, BeitForImageClassification, BeitForMaskedImageModeling
|
from transformers import BeitConfig, BeitForImageClassification, BeitForMaskedImageModeling, BeitImageProcessor
|
||||||
from transformers.image_utils import PILImageResampling
|
from transformers.image_utils import PILImageResampling
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
@@ -171,12 +171,12 @@ def convert_dit_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
|||||||
model.load_state_dict(state_dict)
|
model.load_state_dict(state_dict)
|
||||||
|
|
||||||
# Check outputs on an image
|
# Check outputs on an image
|
||||||
feature_extractor = BeitFeatureExtractor(
|
image_processor = BeitImageProcessor(
|
||||||
size=config.image_size, resample=PILImageResampling.BILINEAR, do_center_crop=False
|
size=config.image_size, resample=PILImageResampling.BILINEAR, do_center_crop=False
|
||||||
)
|
)
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
|
|
||||||
encoding = feature_extractor(images=image, return_tensors="pt")
|
encoding = image_processor(images=image, return_tensors="pt")
|
||||||
pixel_values = encoding["pixel_values"]
|
pixel_values = encoding["pixel_values"]
|
||||||
|
|
||||||
outputs = model(pixel_values)
|
outputs = model(pixel_values)
|
||||||
@@ -189,18 +189,18 @@ def convert_dit_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
if has_lm_head:
|
if has_lm_head:
|
||||||
model_name = "dit-base" if "base" in checkpoint_url else "dit-large"
|
model_name = "dit-base" if "base" in checkpoint_url else "dit-large"
|
||||||
else:
|
else:
|
||||||
model_name = "dit-base-finetuned-rvlcdip" if "dit-b" in checkpoint_url else "dit-large-finetuned-rvlcdip"
|
model_name = "dit-base-finetuned-rvlcdip" if "dit-b" in checkpoint_url else "dit-large-finetuned-rvlcdip"
|
||||||
feature_extractor.push_to_hub(
|
image_processor.push_to_hub(
|
||||||
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
|
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
|
||||||
organization="nielsr",
|
organization="nielsr",
|
||||||
commit_message="Add feature extractor",
|
commit_message="Add image processor",
|
||||||
use_temp_dir=True,
|
use_temp_dir=True,
|
||||||
)
|
)
|
||||||
model.push_to_hub(
|
model.push_to_hub(
|
||||||
|
|||||||
@@ -21,7 +21,7 @@ from datasets import load_dataset
|
|||||||
from donut import DonutModel
|
from donut import DonutModel
|
||||||
|
|
||||||
from transformers import (
|
from transformers import (
|
||||||
DonutFeatureExtractor,
|
DonutImageProcessor,
|
||||||
DonutProcessor,
|
DonutProcessor,
|
||||||
DonutSwinConfig,
|
DonutSwinConfig,
|
||||||
DonutSwinModel,
|
DonutSwinModel,
|
||||||
@@ -152,10 +152,10 @@ def convert_donut_checkpoint(model_name, pytorch_dump_folder_path=None, push_to_
|
|||||||
image = dataset["test"][0]["image"].convert("RGB")
|
image = dataset["test"][0]["image"].convert("RGB")
|
||||||
|
|
||||||
tokenizer = XLMRobertaTokenizerFast.from_pretrained(model_name, from_slow=True)
|
tokenizer = XLMRobertaTokenizerFast.from_pretrained(model_name, from_slow=True)
|
||||||
feature_extractor = DonutFeatureExtractor(
|
image_processor = DonutImageProcessor(
|
||||||
do_align_long_axis=original_model.config.align_long_axis, size=original_model.config.input_size[::-1]
|
do_align_long_axis=original_model.config.align_long_axis, size=original_model.config.input_size[::-1]
|
||||||
)
|
)
|
||||||
processor = DonutProcessor(feature_extractor, tokenizer)
|
processor = DonutProcessor(image_processor, tokenizer)
|
||||||
pixel_values = processor(image, return_tensors="pt").pixel_values
|
pixel_values = processor(image, return_tensors="pt").pixel_values
|
||||||
|
|
||||||
if model_name == "naver-clova-ix/donut-base-finetuned-docvqa":
|
if model_name == "naver-clova-ix/donut-base-finetuned-docvqa":
|
||||||
|
|||||||
@@ -24,7 +24,7 @@ import torch
|
|||||||
from huggingface_hub import cached_download, hf_hub_url
|
from huggingface_hub import cached_download, hf_hub_url
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import DPTConfig, DPTFeatureExtractor, DPTForDepthEstimation, DPTForSemanticSegmentation
|
from transformers import DPTConfig, DPTForDepthEstimation, DPTForSemanticSegmentation, DPTImageProcessor
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -244,10 +244,10 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
|||||||
|
|
||||||
# Check outputs on an image
|
# Check outputs on an image
|
||||||
size = 480 if "ade" in checkpoint_url else 384
|
size = 480 if "ade" in checkpoint_url else 384
|
||||||
feature_extractor = DPTFeatureExtractor(size=size)
|
image_processor = DPTImageProcessor(size=size)
|
||||||
|
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
encoding = feature_extractor(image, return_tensors="pt")
|
encoding = image_processor(image, return_tensors="pt")
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
outputs = model(**encoding).logits if "ade" in checkpoint_url else model(**encoding).predicted_depth
|
outputs = model(**encoding).logits if "ade" in checkpoint_url else model(**encoding).predicted_depth
|
||||||
@@ -271,12 +271,12 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
model.push_to_hub("ybelkada/dpt-hybrid-midas")
|
model.push_to_hub("ybelkada/dpt-hybrid-midas")
|
||||||
feature_extractor.push_to_hub("ybelkada/dpt-hybrid-midas")
|
image_processor.push_to_hub("ybelkada/dpt-hybrid-midas")
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -24,7 +24,7 @@ import torch
|
|||||||
from huggingface_hub import cached_download, hf_hub_url
|
from huggingface_hub import cached_download, hf_hub_url
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import DPTConfig, DPTFeatureExtractor, DPTForDepthEstimation, DPTForSemanticSegmentation
|
from transformers import DPTConfig, DPTForDepthEstimation, DPTForSemanticSegmentation, DPTImageProcessor
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -211,10 +211,10 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
|||||||
|
|
||||||
# Check outputs on an image
|
# Check outputs on an image
|
||||||
size = 480 if "ade" in checkpoint_url else 384
|
size = 480 if "ade" in checkpoint_url else 384
|
||||||
feature_extractor = DPTFeatureExtractor(size=size)
|
image_processor = DPTImageProcessor(size=size)
|
||||||
|
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
encoding = feature_extractor(image, return_tensors="pt")
|
encoding = image_processor(image, return_tensors="pt")
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
outputs = model(**encoding).logits if "ade" in checkpoint_url else model(**encoding).predicted_depth
|
outputs = model(**encoding).logits if "ade" in checkpoint_url else model(**encoding).predicted_depth
|
||||||
@@ -233,8 +233,8 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
print("Pushing model to hub...")
|
print("Pushing model to hub...")
|
||||||
@@ -244,10 +244,10 @@ def convert_dpt_checkpoint(checkpoint_url, pytorch_dump_folder_path, push_to_hub
|
|||||||
commit_message="Add model",
|
commit_message="Add model",
|
||||||
use_temp_dir=True,
|
use_temp_dir=True,
|
||||||
)
|
)
|
||||||
feature_extractor.push_to_hub(
|
image_processor.push_to_hub(
|
||||||
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
|
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
|
||||||
organization="nielsr",
|
organization="nielsr",
|
||||||
commit_message="Add feature extractor",
|
commit_message="Add image processor",
|
||||||
use_temp_dir=True,
|
use_temp_dir=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|||||||
@@ -208,7 +208,7 @@ def convert_efficientformer_checkpoint(
|
|||||||
)
|
)
|
||||||
processor.push_to_hub(
|
processor.push_to_hub(
|
||||||
repo_id=f"Bearnardd/{pytorch_dump_path}",
|
repo_id=f"Bearnardd/{pytorch_dump_path}",
|
||||||
commit_message="Add feature extractor",
|
commit_message="Add image processor",
|
||||||
use_temp_dir=True,
|
use_temp_dir=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -234,12 +234,12 @@ if __name__ == "__main__":
|
|||||||
"--pytorch_dump_path", default=None, type=str, required=True, help="Path to the output PyTorch model."
|
"--pytorch_dump_path", default=None, type=str, required=True, help="Path to the output PyTorch model."
|
||||||
)
|
)
|
||||||
|
|
||||||
parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
|
parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--no-push_to_hub",
|
"--no-push_to_hub",
|
||||||
dest="push_to_hub",
|
dest="push_to_hub",
|
||||||
action="store_false",
|
action="store_false",
|
||||||
help="Do not push model and feature extractor to the hub",
|
help="Do not push model and image processor to the hub",
|
||||||
)
|
)
|
||||||
parser.set_defaults(push_to_hub=True)
|
parser.set_defaults(push_to_hub=True)
|
||||||
|
|
||||||
|
|||||||
@@ -537,8 +537,8 @@ EFFICIENTFORMER_START_DOCSTRING = r"""
|
|||||||
EFFICIENTFORMER_INPUTS_DOCSTRING = r"""
|
EFFICIENTFORMER_INPUTS_DOCSTRING = r"""
|
||||||
Args:
|
Args:
|
||||||
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
|
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
|
||||||
Pixel values. Pixel values can be obtained using [`ViTFeatureExtractor`]. See
|
Pixel values. Pixel values can be obtained using [`ViTImageProcessor`]. See
|
||||||
[`ViTFeatureExtractor.__call__`] for details.
|
[`ViTImageProcessor.preprocess`] for details.
|
||||||
output_attentions (`bool`, *optional*):
|
output_attentions (`bool`, *optional*):
|
||||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
|
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
|
||||||
tensors for more detail.
|
tensors for more detail.
|
||||||
|
|||||||
@@ -305,12 +305,12 @@ def convert_efficientnet_checkpoint(model_name, pytorch_dump_folder_path, save_m
|
|||||||
# Create folder to save model
|
# Create folder to save model
|
||||||
if not os.path.isdir(pytorch_dump_folder_path):
|
if not os.path.isdir(pytorch_dump_folder_path):
|
||||||
os.mkdir(pytorch_dump_folder_path)
|
os.mkdir(pytorch_dump_folder_path)
|
||||||
# Save converted model and feature extractor
|
# Save converted model and image processor
|
||||||
hf_model.save_pretrained(pytorch_dump_folder_path)
|
hf_model.save_pretrained(pytorch_dump_folder_path)
|
||||||
preprocessor.save_pretrained(pytorch_dump_folder_path)
|
preprocessor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
# Push model and feature extractor to hub
|
# Push model and image processor to hub
|
||||||
print(f"Pushing converted {model_name} to the hub...")
|
print(f"Pushing converted {model_name} to the hub...")
|
||||||
model_name = f"efficientnet-{model_name}"
|
model_name = f"efficientnet-{model_name}"
|
||||||
preprocessor.push_to_hub(model_name)
|
preprocessor.push_to_hub(model_name)
|
||||||
@@ -333,7 +333,7 @@ if __name__ == "__main__":
|
|||||||
help="Path to the output PyTorch model directory.",
|
help="Path to the output PyTorch model directory.",
|
||||||
)
|
)
|
||||||
parser.add_argument("--save_model", action="store_true", help="Save model to local")
|
parser.add_argument("--save_model", action="store_true", help="Save model to local")
|
||||||
parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
|
parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
convert_efficientnet_checkpoint(args.model_name, args.pytorch_dump_folder_path, args.save_model, args.push_to_hub)
|
convert_efficientnet_checkpoint(args.model_name, args.pytorch_dump_folder_path, args.save_model, args.push_to_hub)
|
||||||
|
|||||||
@@ -23,7 +23,7 @@ import requests
|
|||||||
import torch
|
import torch
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import GLPNConfig, GLPNFeatureExtractor, GLPNForDepthEstimation
|
from transformers import GLPNConfig, GLPNForDepthEstimation, GLPNImageProcessor
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -131,12 +131,12 @@ def convert_glpn_checkpoint(checkpoint_path, pytorch_dump_folder_path, push_to_h
|
|||||||
# load GLPN configuration (Segformer-B4 size)
|
# load GLPN configuration (Segformer-B4 size)
|
||||||
config = GLPNConfig(hidden_sizes=[64, 128, 320, 512], decoder_hidden_size=64, depths=[3, 8, 27, 3])
|
config = GLPNConfig(hidden_sizes=[64, 128, 320, 512], decoder_hidden_size=64, depths=[3, 8, 27, 3])
|
||||||
|
|
||||||
# load feature extractor (only resize + rescale)
|
# load image processor (only resize + rescale)
|
||||||
feature_extractor = GLPNFeatureExtractor()
|
image_processor = GLPNImageProcessor()
|
||||||
|
|
||||||
# prepare image
|
# prepare image
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
|
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values
|
||||||
|
|
||||||
logger.info("Converting model...")
|
logger.info("Converting model...")
|
||||||
|
|
||||||
@@ -179,17 +179,17 @@ def convert_glpn_checkpoint(checkpoint_path, pytorch_dump_folder_path, push_to_h
|
|||||||
|
|
||||||
# finally, push to hub if required
|
# finally, push to hub if required
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
logger.info("Pushing model and feature extractor to the hub...")
|
logger.info("Pushing model and image processor to the hub...")
|
||||||
model.push_to_hub(
|
model.push_to_hub(
|
||||||
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
|
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
|
||||||
organization="nielsr",
|
organization="nielsr",
|
||||||
commit_message="Add model",
|
commit_message="Add model",
|
||||||
use_temp_dir=True,
|
use_temp_dir=True,
|
||||||
)
|
)
|
||||||
feature_extractor.push_to_hub(
|
image_processor.push_to_hub(
|
||||||
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
|
repo_path_or_name=Path(pytorch_dump_folder_path, model_name),
|
||||||
organization="nielsr",
|
organization="nielsr",
|
||||||
commit_message="Add feature extractor",
|
commit_message="Add image processor",
|
||||||
use_temp_dir=True,
|
use_temp_dir=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|||||||
@@ -458,7 +458,7 @@ class GroupViTOnnxConfig(OnnxConfig):
|
|||||||
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
|
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
|
||||||
)
|
)
|
||||||
image_input_dict = super().generate_dummy_inputs(
|
image_input_dict = super().generate_dummy_inputs(
|
||||||
processor.feature_extractor, batch_size=batch_size, framework=framework
|
processor.image_processor, batch_size=batch_size, framework=framework
|
||||||
)
|
)
|
||||||
return {**text_input_dict, **image_input_dict}
|
return {**text_input_dict, **image_input_dict}
|
||||||
|
|
||||||
|
|||||||
@@ -81,7 +81,7 @@ class ImageGPTImageProcessor(BaseImageProcessor):
|
|||||||
|
|
||||||
def __init__(
|
def __init__(
|
||||||
self,
|
self,
|
||||||
# clusters is a first argument to maintain backwards compatibility with the old ImageGPTFeatureExtractor
|
# clusters is a first argument to maintain backwards compatibility with the old ImageGPTImageProcessor
|
||||||
clusters: Optional[Union[List[List[int]], np.ndarray]] = None,
|
clusters: Optional[Union[List[List[int]], np.ndarray]] = None,
|
||||||
do_resize: bool = True,
|
do_resize: bool = True,
|
||||||
size: Dict[str, int] = None,
|
size: Dict[str, int] = None,
|
||||||
|
|||||||
@@ -260,7 +260,7 @@ class LayoutLMv3OnnxConfig(OnnxConfig):
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
# A dummy image is used so OCR should not be applied
|
# A dummy image is used so OCR should not be applied
|
||||||
setattr(processor.feature_extractor, "apply_ocr", False)
|
setattr(processor.image_processor, "apply_ocr", False)
|
||||||
|
|
||||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||||
batch_size = compute_effective_axis_dimension(
|
batch_size = compute_effective_axis_dimension(
|
||||||
|
|||||||
@@ -15,6 +15,7 @@
|
|||||||
"""
|
"""
|
||||||
Processor class for LayoutXLM.
|
Processor class for LayoutXLM.
|
||||||
"""
|
"""
|
||||||
|
import warnings
|
||||||
from typing import List, Optional, Union
|
from typing import List, Optional, Union
|
||||||
|
|
||||||
from ...processing_utils import ProcessorMixin
|
from ...processing_utils import ProcessorMixin
|
||||||
@@ -24,26 +25,45 @@ from ...utils import TensorType
|
|||||||
|
|
||||||
class LayoutXLMProcessor(ProcessorMixin):
|
class LayoutXLMProcessor(ProcessorMixin):
|
||||||
r"""
|
r"""
|
||||||
Constructs a LayoutXLM processor which combines a LayoutXLM feature extractor and a LayoutXLM tokenizer into a
|
Constructs a LayoutXLM processor which combines a LayoutXLM image processor and a LayoutXLM tokenizer into a single
|
||||||
single processor.
|
processor.
|
||||||
|
|
||||||
[`LayoutXLMProcessor`] offers all the functionalities you need to prepare data for the model.
|
[`LayoutXLMProcessor`] offers all the functionalities you need to prepare data for the model.
|
||||||
|
|
||||||
It first uses [`LayoutLMv2FeatureExtractor`] to resize document images to a fixed size, and optionally applies OCR
|
It first uses [`LayoutLMv2ImageProcessor`] to resize document images to a fixed size, and optionally applies OCR to
|
||||||
to get words and normalized bounding boxes. These are then provided to [`LayoutXLMTokenizer`] or
|
get words and normalized bounding boxes. These are then provided to [`LayoutXLMTokenizer`] or
|
||||||
[`LayoutXLMTokenizerFast`], which turns the words and bounding boxes into token-level `input_ids`,
|
[`LayoutXLMTokenizerFast`], which turns the words and bounding boxes into token-level `input_ids`,
|
||||||
`attention_mask`, `token_type_ids`, `bbox`. Optionally, one can provide integer `word_labels`, which are turned
|
`attention_mask`, `token_type_ids`, `bbox`. Optionally, one can provide integer `word_labels`, which are turned
|
||||||
into token-level `labels` for token classification tasks (such as FUNSD, CORD).
|
into token-level `labels` for token classification tasks (such as FUNSD, CORD).
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
feature_extractor (`LayoutLMv2FeatureExtractor`):
|
image_processor (`LayoutLMv2ImageProcessor`):
|
||||||
An instance of [`LayoutLMv2FeatureExtractor`]. The feature extractor is a required input.
|
An instance of [`LayoutLMv2ImageProcessor`]. The image processor is a required input.
|
||||||
tokenizer (`LayoutXLMTokenizer` or `LayoutXLMTokenizerFast`):
|
tokenizer (`LayoutXLMTokenizer` or `LayoutXLMTokenizerFast`):
|
||||||
An instance of [`LayoutXLMTokenizer`] or [`LayoutXLMTokenizerFast`]. The tokenizer is a required input.
|
An instance of [`LayoutXLMTokenizer`] or [`LayoutXLMTokenizerFast`]. The tokenizer is a required input.
|
||||||
"""
|
"""
|
||||||
feature_extractor_class = "LayoutLMv2FeatureExtractor"
|
|
||||||
|
attributes = ["image_processor", "tokenizer"]
|
||||||
|
image_processor_class = "LayoutLMv2ImageProcessor"
|
||||||
tokenizer_class = ("LayoutXLMTokenizer", "LayoutXLMTokenizerFast")
|
tokenizer_class = ("LayoutXLMTokenizer", "LayoutXLMTokenizerFast")
|
||||||
|
|
||||||
|
def __init__(self, image_processor=None, tokenizer=None, **kwargs):
|
||||||
|
if "feature_extractor" in kwargs:
|
||||||
|
warnings.warn(
|
||||||
|
"The `feature_extractor` argument is deprecated and will be removed in v5, use `image_processor`"
|
||||||
|
" instead.",
|
||||||
|
FutureWarning,
|
||||||
|
)
|
||||||
|
feature_extractor = kwargs.pop("feature_extractor")
|
||||||
|
|
||||||
|
image_processor = image_processor if image_processor is not None else feature_extractor
|
||||||
|
if image_processor is None:
|
||||||
|
raise ValueError("You need to specify an `image_processor`.")
|
||||||
|
if tokenizer is None:
|
||||||
|
raise ValueError("You need to specify a `tokenizer`.")
|
||||||
|
|
||||||
|
super().__init__(image_processor, tokenizer)
|
||||||
|
|
||||||
def __call__(
|
def __call__(
|
||||||
self,
|
self,
|
||||||
images,
|
images,
|
||||||
@@ -68,37 +88,37 @@ class LayoutXLMProcessor(ProcessorMixin):
|
|||||||
**kwargs,
|
**kwargs,
|
||||||
) -> BatchEncoding:
|
) -> BatchEncoding:
|
||||||
"""
|
"""
|
||||||
This method first forwards the `images` argument to [`~LayoutLMv2FeatureExtractor.__call__`]. In case
|
This method first forwards the `images` argument to [`~LayoutLMv2ImagePrpcessor.__call__`]. In case
|
||||||
[`LayoutLMv2FeatureExtractor`] was initialized with `apply_ocr` set to `True`, it passes the obtained words and
|
[`LayoutLMv2ImagePrpcessor`] was initialized with `apply_ocr` set to `True`, it passes the obtained words and
|
||||||
bounding boxes along with the additional arguments to [`~LayoutXLMTokenizer.__call__`] and returns the output,
|
bounding boxes along with the additional arguments to [`~LayoutXLMTokenizer.__call__`] and returns the output,
|
||||||
together with resized `images`. In case [`LayoutLMv2FeatureExtractor`] was initialized with `apply_ocr` set to
|
together with resized `images`. In case [`LayoutLMv2ImagePrpcessor`] was initialized with `apply_ocr` set to
|
||||||
`False`, it passes the words (`text`/``text_pair`) and `boxes` specified by the user along with the additional
|
`False`, it passes the words (`text`/``text_pair`) and `boxes` specified by the user along with the additional
|
||||||
arguments to [`~LayoutXLMTokenizer.__call__`] and returns the output, together with resized `images``.
|
arguments to [`~LayoutXLMTokenizer.__call__`] and returns the output, together with resized `images``.
|
||||||
|
|
||||||
Please refer to the docstring of the above two methods for more information.
|
Please refer to the docstring of the above two methods for more information.
|
||||||
"""
|
"""
|
||||||
# verify input
|
# verify input
|
||||||
if self.feature_extractor.apply_ocr and (boxes is not None):
|
if self.image_processor.apply_ocr and (boxes is not None):
|
||||||
raise ValueError(
|
raise ValueError(
|
||||||
"You cannot provide bounding boxes "
|
"You cannot provide bounding boxes "
|
||||||
"if you initialized the feature extractor with apply_ocr set to True."
|
"if you initialized the image processor with apply_ocr set to True."
|
||||||
)
|
)
|
||||||
|
|
||||||
if self.feature_extractor.apply_ocr and (word_labels is not None):
|
if self.image_processor.apply_ocr and (word_labels is not None):
|
||||||
raise ValueError(
|
raise ValueError(
|
||||||
"You cannot provide word labels if you initialized the feature extractor with apply_ocr set to True."
|
"You cannot provide word labels if you initialized the image processor with apply_ocr set to True."
|
||||||
)
|
)
|
||||||
|
|
||||||
if return_overflowing_tokens is True and return_offsets_mapping is False:
|
if return_overflowing_tokens is True and return_offsets_mapping is False:
|
||||||
raise ValueError("You cannot return overflowing tokens without returning the offsets mapping.")
|
raise ValueError("You cannot return overflowing tokens without returning the offsets mapping.")
|
||||||
|
|
||||||
# first, apply the feature extractor
|
# first, apply the image processor
|
||||||
features = self.feature_extractor(images=images, return_tensors=return_tensors)
|
features = self.image_processor(images=images, return_tensors=return_tensors)
|
||||||
|
|
||||||
# second, apply the tokenizer
|
# second, apply the tokenizer
|
||||||
if text is not None and self.feature_extractor.apply_ocr and text_pair is None:
|
if text is not None and self.image_processor.apply_ocr and text_pair is None:
|
||||||
if isinstance(text, str):
|
if isinstance(text, str):
|
||||||
text = [text] # add batch dimension (as the feature extractor always adds a batch dimension)
|
text = [text] # add batch dimension (as the image processor always adds a batch dimension)
|
||||||
text_pair = features["words"]
|
text_pair = features["words"]
|
||||||
|
|
||||||
encoded_inputs = self.tokenizer(
|
encoded_inputs = self.tokenizer(
|
||||||
@@ -162,3 +182,19 @@ class LayoutXLMProcessor(ProcessorMixin):
|
|||||||
@property
|
@property
|
||||||
def model_input_names(self):
|
def model_input_names(self):
|
||||||
return ["input_ids", "bbox", "attention_mask", "image"]
|
return ["input_ids", "bbox", "attention_mask", "image"]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def feature_extractor_class(self):
|
||||||
|
warnings.warn(
|
||||||
|
"`feature_extractor_class` is deprecated and will be removed in v5. Use `image_processor_class` instead.",
|
||||||
|
FutureWarning,
|
||||||
|
)
|
||||||
|
return self.image_processor_class
|
||||||
|
|
||||||
|
@property
|
||||||
|
def feature_extractor(self):
|
||||||
|
warnings.warn(
|
||||||
|
"`feature_extractor` is deprecated and will be removed in v5. Use `image_processor` instead.",
|
||||||
|
FutureWarning,
|
||||||
|
)
|
||||||
|
return self.image_processor
|
||||||
|
|||||||
@@ -25,7 +25,7 @@ import timm
|
|||||||
import torch
|
import torch
|
||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
|
|
||||||
from transformers import LevitConfig, LevitFeatureExtractor, LevitForImageClassificationWithTeacher
|
from transformers import LevitConfig, LevitForImageClassificationWithTeacher, LevitImageProcessor
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -74,8 +74,8 @@ def convert_weight_and_push(
|
|||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
our_model.save_pretrained(save_directory / checkpoint_name)
|
our_model.save_pretrained(save_directory / checkpoint_name)
|
||||||
feature_extractor = LevitFeatureExtractor()
|
image_processor = LevitImageProcessor()
|
||||||
feature_extractor.save_pretrained(save_directory / checkpoint_name)
|
image_processor.save_pretrained(save_directory / checkpoint_name)
|
||||||
|
|
||||||
print(f"Pushed {checkpoint_name}")
|
print(f"Pushed {checkpoint_name}")
|
||||||
|
|
||||||
@@ -167,12 +167,12 @@ if __name__ == "__main__":
|
|||||||
required=False,
|
required=False,
|
||||||
help="Path to the output PyTorch model directory.",
|
help="Path to the output PyTorch model directory.",
|
||||||
)
|
)
|
||||||
parser.add_argument("--push_to_hub", action="store_true", help="Push model and feature extractor to the hub")
|
parser.add_argument("--push_to_hub", action="store_true", help="Push model and image processor to the hub")
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--no-push_to_hub",
|
"--no-push_to_hub",
|
||||||
dest="push_to_hub",
|
dest="push_to_hub",
|
||||||
action="store_false",
|
action="store_false",
|
||||||
help="Do not push model and feature extractor to the hub",
|
help="Do not push model and image processor to the hub",
|
||||||
)
|
)
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|||||||
@@ -192,7 +192,7 @@ class OriginalMask2FormerConfigToOursConverter:
|
|||||||
return config
|
return config
|
||||||
|
|
||||||
|
|
||||||
class OriginalMask2FormerConfigToFeatureExtractorConverter:
|
class OriginalMask2FormerConfigToImageProcessorConverter:
|
||||||
def __call__(self, original_config: object) -> Mask2FormerImageProcessor:
|
def __call__(self, original_config: object) -> Mask2FormerImageProcessor:
|
||||||
model = original_config.MODEL
|
model = original_config.MODEL
|
||||||
model_input = original_config.INPUT
|
model_input = original_config.INPUT
|
||||||
@@ -846,7 +846,7 @@ class OriginalMask2FormerCheckpointToOursConverter:
|
|||||||
def test(
|
def test(
|
||||||
original_model,
|
original_model,
|
||||||
our_model: Mask2FormerForUniversalSegmentation,
|
our_model: Mask2FormerForUniversalSegmentation,
|
||||||
feature_extractor: Mask2FormerImageProcessor,
|
image_processor: Mask2FormerImageProcessor,
|
||||||
tolerance: float,
|
tolerance: float,
|
||||||
):
|
):
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
@@ -854,7 +854,7 @@ def test(
|
|||||||
our_model = our_model.eval()
|
our_model = our_model.eval()
|
||||||
|
|
||||||
im = prepare_img()
|
im = prepare_img()
|
||||||
x = feature_extractor(images=im, return_tensors="pt")["pixel_values"]
|
x = image_processor(images=im, return_tensors="pt")["pixel_values"]
|
||||||
|
|
||||||
original_model_backbone_features = original_model.backbone(x.clone())
|
original_model_backbone_features = original_model.backbone(x.clone())
|
||||||
our_model_output: Mask2FormerModelOutput = our_model.model(x.clone(), output_hidden_states=True)
|
our_model_output: Mask2FormerModelOutput = our_model.model(x.clone(), output_hidden_states=True)
|
||||||
@@ -979,10 +979,10 @@ if __name__ == "__main__":
|
|||||||
checkpoints_dir, config_dir
|
checkpoints_dir, config_dir
|
||||||
):
|
):
|
||||||
model_name = get_model_name(checkpoint_file)
|
model_name = get_model_name(checkpoint_file)
|
||||||
feature_extractor = OriginalMask2FormerConfigToFeatureExtractorConverter()(
|
image_processor = OriginalMask2FormerConfigToImageProcessorConverter()(
|
||||||
setup_cfg(Args(config_file=config_file))
|
setup_cfg(Args(config_file=config_file))
|
||||||
)
|
)
|
||||||
feature_extractor.size = {"height": 384, "width": 384}
|
image_processor.size = {"height": 384, "width": 384}
|
||||||
|
|
||||||
original_config = setup_cfg(Args(config_file=config_file))
|
original_config = setup_cfg(Args(config_file=config_file))
|
||||||
mask2former_kwargs = OriginalMask2Former.from_config(original_config)
|
mask2former_kwargs = OriginalMask2Former.from_config(original_config)
|
||||||
@@ -1012,8 +1012,8 @@ if __name__ == "__main__":
|
|||||||
tolerance = 3e-1
|
tolerance = 3e-1
|
||||||
|
|
||||||
logger.info(f"🪄 Testing {model_name}...")
|
logger.info(f"🪄 Testing {model_name}...")
|
||||||
test(original_model, mask2former_for_segmentation, feature_extractor, tolerance)
|
test(original_model, mask2former_for_segmentation, image_processor, tolerance)
|
||||||
logger.info(f"🪄 Pushing {model_name} to hub...")
|
logger.info(f"🪄 Pushing {model_name} to hub...")
|
||||||
|
|
||||||
feature_extractor.push_to_hub(model_name)
|
image_processor.push_to_hub(model_name)
|
||||||
mask2former_for_segmentation.push_to_hub(model_name)
|
mask2former_for_segmentation.push_to_hub(model_name)
|
||||||
|
|||||||
@@ -2106,8 +2106,8 @@ MASK2FORMER_START_DOCSTRING = r"""
|
|||||||
MASK2FORMER_INPUTS_DOCSTRING = r"""
|
MASK2FORMER_INPUTS_DOCSTRING = r"""
|
||||||
Args:
|
Args:
|
||||||
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
|
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
|
||||||
Pixel values. Pixel values can be obtained using [`AutoFeatureExtractor`]. See
|
Pixel values. Pixel values can be obtained using [`AutoImageProcessor`]. See
|
||||||
[`AutoFeatureExtractor.__call__`] for details.
|
[`AutoImageProcessor.preprocess`] for details.
|
||||||
pixel_mask (`torch.LongTensor` of shape `(batch_size, height, width)`, *optional*):
|
pixel_mask (`torch.LongTensor` of shape `(batch_size, height, width)`, *optional*):
|
||||||
Mask to avoid performing attention on padding pixel values. Mask values selected in `[0, 1]`:
|
Mask to avoid performing attention on padding pixel values. Mask values selected in `[0, 1]`:
|
||||||
|
|
||||||
|
|||||||
@@ -29,7 +29,7 @@ from detectron2.projects.deeplab import add_deeplab_config
|
|||||||
from PIL import Image
|
from PIL import Image
|
||||||
from torch import Tensor, nn
|
from torch import Tensor, nn
|
||||||
|
|
||||||
from transformers.models.maskformer.feature_extraction_maskformer import MaskFormerFeatureExtractor
|
from transformers.models.maskformer.feature_extraction_maskformer import MaskFormerImageProcessor
|
||||||
from transformers.models.maskformer.modeling_maskformer import (
|
from transformers.models.maskformer.modeling_maskformer import (
|
||||||
MaskFormerConfig,
|
MaskFormerConfig,
|
||||||
MaskFormerForInstanceSegmentation,
|
MaskFormerForInstanceSegmentation,
|
||||||
@@ -164,13 +164,13 @@ class OriginalMaskFormerConfigToOursConverter:
|
|||||||
return config
|
return config
|
||||||
|
|
||||||
|
|
||||||
class OriginalMaskFormerConfigToFeatureExtractorConverter:
|
class OriginalMaskFormerConfigToImageProcessorConverter:
|
||||||
def __call__(self, original_config: object) -> MaskFormerFeatureExtractor:
|
def __call__(self, original_config: object) -> MaskFormerImageProcessor:
|
||||||
model = original_config.MODEL
|
model = original_config.MODEL
|
||||||
model_input = original_config.INPUT
|
model_input = original_config.INPUT
|
||||||
dataset_catalog = MetadataCatalog.get(original_config.DATASETS.TEST[0])
|
dataset_catalog = MetadataCatalog.get(original_config.DATASETS.TEST[0])
|
||||||
|
|
||||||
return MaskFormerFeatureExtractor(
|
return MaskFormerImageProcessor(
|
||||||
image_mean=(torch.tensor(model.PIXEL_MEAN) / 255).tolist(),
|
image_mean=(torch.tensor(model.PIXEL_MEAN) / 255).tolist(),
|
||||||
image_std=(torch.tensor(model.PIXEL_STD) / 255).tolist(),
|
image_std=(torch.tensor(model.PIXEL_STD) / 255).tolist(),
|
||||||
size=model_input.MIN_SIZE_TEST,
|
size=model_input.MIN_SIZE_TEST,
|
||||||
@@ -554,7 +554,7 @@ class OriginalMaskFormerCheckpointToOursConverter:
|
|||||||
yield config, checkpoint
|
yield config, checkpoint
|
||||||
|
|
||||||
|
|
||||||
def test(original_model, our_model: MaskFormerForInstanceSegmentation, feature_extractor: MaskFormerFeatureExtractor):
|
def test(original_model, our_model: MaskFormerForInstanceSegmentation, image_processor: MaskFormerImageProcessor):
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
original_model = original_model.eval()
|
original_model = original_model.eval()
|
||||||
our_model = our_model.eval()
|
our_model = our_model.eval()
|
||||||
@@ -600,7 +600,7 @@ def test(original_model, our_model: MaskFormerForInstanceSegmentation, feature_e
|
|||||||
|
|
||||||
our_model_out: MaskFormerForInstanceSegmentationOutput = our_model(x)
|
our_model_out: MaskFormerForInstanceSegmentationOutput = our_model(x)
|
||||||
|
|
||||||
our_segmentation = feature_extractor.post_process_segmentation(our_model_out, target_size=(384, 384))
|
our_segmentation = image_processor.post_process_segmentation(our_model_out, target_size=(384, 384))
|
||||||
|
|
||||||
assert torch.allclose(
|
assert torch.allclose(
|
||||||
original_segmentation, our_segmentation, atol=1e-3
|
original_segmentation, our_segmentation, atol=1e-3
|
||||||
@@ -686,9 +686,7 @@ if __name__ == "__main__":
|
|||||||
for config_file, checkpoint_file in OriginalMaskFormerCheckpointToOursConverter.using_dirs(
|
for config_file, checkpoint_file in OriginalMaskFormerCheckpointToOursConverter.using_dirs(
|
||||||
checkpoints_dir, config_dir
|
checkpoints_dir, config_dir
|
||||||
):
|
):
|
||||||
feature_extractor = OriginalMaskFormerConfigToFeatureExtractorConverter()(
|
image_processor = OriginalMaskFormerConfigToImageProcessorConverter()(setup_cfg(Args(config_file=config_file)))
|
||||||
setup_cfg(Args(config_file=config_file))
|
|
||||||
)
|
|
||||||
|
|
||||||
original_config = setup_cfg(Args(config_file=config_file))
|
original_config = setup_cfg(Args(config_file=config_file))
|
||||||
mask_former_kwargs = OriginalMaskFormer.from_config(original_config)
|
mask_former_kwargs = OriginalMaskFormer.from_config(original_config)
|
||||||
@@ -712,15 +710,15 @@ if __name__ == "__main__":
|
|||||||
mask_former_for_instance_segmentation
|
mask_former_for_instance_segmentation
|
||||||
)
|
)
|
||||||
|
|
||||||
test(original_model, mask_former_for_instance_segmentation, feature_extractor)
|
test(original_model, mask_former_for_instance_segmentation, image_processor)
|
||||||
|
|
||||||
model_name = get_name(checkpoint_file)
|
model_name = get_name(checkpoint_file)
|
||||||
logger.info(f"🪄 Saving {model_name}")
|
logger.info(f"🪄 Saving {model_name}")
|
||||||
|
|
||||||
feature_extractor.save_pretrained(save_directory / model_name)
|
image_processor.save_pretrained(save_directory / model_name)
|
||||||
mask_former_for_instance_segmentation.save_pretrained(save_directory / model_name)
|
mask_former_for_instance_segmentation.save_pretrained(save_directory / model_name)
|
||||||
|
|
||||||
feature_extractor.push_to_hub(
|
image_processor.push_to_hub(
|
||||||
repo_path_or_name=save_directory / model_name,
|
repo_path_or_name=save_directory / model_name,
|
||||||
commit_message="Add model",
|
commit_message="Add model",
|
||||||
use_temp_dir=True,
|
use_temp_dir=True,
|
||||||
|
|||||||
@@ -26,7 +26,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import MaskFormerConfig, MaskFormerFeatureExtractor, MaskFormerForInstanceSegmentation, ResNetConfig
|
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation, MaskFormerImageProcessor, ResNetConfig
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -297,9 +297,9 @@ def convert_maskformer_checkpoint(
|
|||||||
else:
|
else:
|
||||||
ignore_index = 255
|
ignore_index = 255
|
||||||
reduce_labels = True if "ade" in model_name else False
|
reduce_labels = True if "ade" in model_name else False
|
||||||
feature_extractor = MaskFormerFeatureExtractor(ignore_index=ignore_index, reduce_labels=reduce_labels)
|
image_processor = MaskFormerImageProcessor(ignore_index=ignore_index, reduce_labels=reduce_labels)
|
||||||
|
|
||||||
inputs = feature_extractor(image, return_tensors="pt")
|
inputs = image_processor(image, return_tensors="pt")
|
||||||
|
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
|
|
||||||
@@ -340,15 +340,15 @@ def convert_maskformer_checkpoint(
|
|||||||
print("Looks ok!")
|
print("Looks ok!")
|
||||||
|
|
||||||
if pytorch_dump_folder_path is not None:
|
if pytorch_dump_folder_path is not None:
|
||||||
print(f"Saving model and feature extractor of {model_name} to {pytorch_dump_folder_path}")
|
print(f"Saving model and image processor of {model_name} to {pytorch_dump_folder_path}")
|
||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
print(f"Pushing model and feature extractor of {model_name} to the hub...")
|
print(f"Pushing model and image processor of {model_name} to the hub...")
|
||||||
model.push_to_hub(f"facebook/{model_name}")
|
model.push_to_hub(f"facebook/{model_name}")
|
||||||
feature_extractor.push_to_hub(f"facebook/{model_name}")
|
image_processor.push_to_hub(f"facebook/{model_name}")
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -26,7 +26,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import MaskFormerConfig, MaskFormerFeatureExtractor, MaskFormerForInstanceSegmentation, SwinConfig
|
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation, MaskFormerImageProcessor, SwinConfig
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -278,9 +278,9 @@ def convert_maskformer_checkpoint(
|
|||||||
else:
|
else:
|
||||||
ignore_index = 255
|
ignore_index = 255
|
||||||
reduce_labels = True if "ade" in model_name else False
|
reduce_labels = True if "ade" in model_name else False
|
||||||
feature_extractor = MaskFormerFeatureExtractor(ignore_index=ignore_index, reduce_labels=reduce_labels)
|
image_processor = MaskFormerImageProcessor(ignore_index=ignore_index, reduce_labels=reduce_labels)
|
||||||
|
|
||||||
inputs = feature_extractor(image, return_tensors="pt")
|
inputs = image_processor(image, return_tensors="pt")
|
||||||
|
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
|
|
||||||
@@ -294,15 +294,15 @@ def convert_maskformer_checkpoint(
|
|||||||
print("Looks ok!")
|
print("Looks ok!")
|
||||||
|
|
||||||
if pytorch_dump_folder_path is not None:
|
if pytorch_dump_folder_path is not None:
|
||||||
print(f"Saving model and feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving model and image processor to {pytorch_dump_folder_path}")
|
||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
print("Pushing model and feature extractor to the hub...")
|
print("Pushing model and image processor to the hub...")
|
||||||
model.push_to_hub(f"nielsr/{model_name}")
|
model.push_to_hub(f"nielsr/{model_name}")
|
||||||
feature_extractor.push_to_hub(f"nielsr/{model_name}")
|
image_processor.push_to_hub(f"nielsr/{model_name}")
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -27,8 +27,8 @@ from PIL import Image
|
|||||||
|
|
||||||
from transformers import (
|
from transformers import (
|
||||||
MobileNetV1Config,
|
MobileNetV1Config,
|
||||||
MobileNetV1FeatureExtractor,
|
|
||||||
MobileNetV1ForImageClassification,
|
MobileNetV1ForImageClassification,
|
||||||
|
MobileNetV1ImageProcessor,
|
||||||
load_tf_weights_in_mobilenet_v1,
|
load_tf_weights_in_mobilenet_v1,
|
||||||
)
|
)
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
@@ -83,12 +83,12 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
|
|||||||
# Load weights from TensorFlow checkpoint
|
# Load weights from TensorFlow checkpoint
|
||||||
load_tf_weights_in_mobilenet_v1(model, config, checkpoint_path)
|
load_tf_weights_in_mobilenet_v1(model, config, checkpoint_path)
|
||||||
|
|
||||||
# Check outputs on an image, prepared by MobileNetV1FeatureExtractor
|
# Check outputs on an image, prepared by MobileNetV1ImageProcessor
|
||||||
feature_extractor = MobileNetV1FeatureExtractor(
|
image_processor = MobileNetV1ImageProcessor(
|
||||||
crop_size={"width": config.image_size, "height": config.image_size},
|
crop_size={"width": config.image_size, "height": config.image_size},
|
||||||
size={"shortest_edge": config.image_size + 32},
|
size={"shortest_edge": config.image_size + 32},
|
||||||
)
|
)
|
||||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||||
outputs = model(**encoding)
|
outputs = model(**encoding)
|
||||||
logits = outputs.logits
|
logits = outputs.logits
|
||||||
|
|
||||||
@@ -107,13 +107,13 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
|
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
print("Pushing to the hub...")
|
print("Pushing to the hub...")
|
||||||
repo_id = "google/" + model_name
|
repo_id = "google/" + model_name
|
||||||
feature_extractor.push_to_hub(repo_id)
|
image_processor.push_to_hub(repo_id)
|
||||||
model.push_to_hub(repo_id)
|
model.push_to_hub(repo_id)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -99,11 +99,11 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
|
|||||||
load_tf_weights_in_mobilenet_v2(model, config, checkpoint_path)
|
load_tf_weights_in_mobilenet_v2(model, config, checkpoint_path)
|
||||||
|
|
||||||
# Check outputs on an image, prepared by MobileNetV2ImageProcessor
|
# Check outputs on an image, prepared by MobileNetV2ImageProcessor
|
||||||
feature_extractor = MobileNetV2ImageProcessor(
|
image_processor = MobileNetV2ImageProcessor(
|
||||||
crop_size={"width": config.image_size, "height": config.image_size},
|
crop_size={"width": config.image_size, "height": config.image_size},
|
||||||
size={"shortest_edge": config.image_size + 32},
|
size={"shortest_edge": config.image_size + 32},
|
||||||
)
|
)
|
||||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||||
outputs = model(**encoding)
|
outputs = model(**encoding)
|
||||||
logits = outputs.logits
|
logits = outputs.logits
|
||||||
|
|
||||||
@@ -143,13 +143,13 @@ def convert_movilevit_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
|
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
print("Pushing to the hub...")
|
print("Pushing to the hub...")
|
||||||
repo_id = "google/" + model_name
|
repo_id = "google/" + model_name
|
||||||
feature_extractor.push_to_hub(repo_id)
|
image_processor.push_to_hub(repo_id)
|
||||||
model.push_to_hub(repo_id)
|
model.push_to_hub(repo_id)
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -26,9 +26,9 @@ from PIL import Image
|
|||||||
|
|
||||||
from transformers import (
|
from transformers import (
|
||||||
MobileViTConfig,
|
MobileViTConfig,
|
||||||
MobileViTFeatureExtractor,
|
|
||||||
MobileViTForImageClassification,
|
MobileViTForImageClassification,
|
||||||
MobileViTForSemanticSegmentation,
|
MobileViTForSemanticSegmentation,
|
||||||
|
MobileViTImageProcessor,
|
||||||
)
|
)
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
@@ -211,9 +211,9 @@ def convert_movilevit_checkpoint(mobilevit_name, checkpoint_path, pytorch_dump_f
|
|||||||
new_state_dict = convert_state_dict(state_dict, model)
|
new_state_dict = convert_state_dict(state_dict, model)
|
||||||
model.load_state_dict(new_state_dict)
|
model.load_state_dict(new_state_dict)
|
||||||
|
|
||||||
# Check outputs on an image, prepared by MobileViTFeatureExtractor
|
# Check outputs on an image, prepared by MobileViTImageProcessor
|
||||||
feature_extractor = MobileViTFeatureExtractor(crop_size=config.image_size, size=config.image_size + 32)
|
image_processor = MobileViTImageProcessor(crop_size=config.image_size, size=config.image_size + 32)
|
||||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||||
outputs = model(**encoding)
|
outputs = model(**encoding)
|
||||||
logits = outputs.logits
|
logits = outputs.logits
|
||||||
|
|
||||||
@@ -265,8 +265,8 @@ def convert_movilevit_checkpoint(mobilevit_name, checkpoint_path, pytorch_dump_f
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model {mobilevit_name} to {pytorch_dump_folder_path}")
|
print(f"Saving model {mobilevit_name} to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
model_mapping = {
|
model_mapping = {
|
||||||
@@ -280,7 +280,7 @@ def convert_movilevit_checkpoint(mobilevit_name, checkpoint_path, pytorch_dump_f
|
|||||||
|
|
||||||
print("Pushing to the hub...")
|
print("Pushing to the hub...")
|
||||||
model_name = model_mapping[mobilevit_name]
|
model_name = model_mapping[mobilevit_name]
|
||||||
feature_extractor.push_to_hub(model_name, organization="apple")
|
image_processor.push_to_hub(model_name, organization="apple")
|
||||||
model.push_to_hub(model_name, organization="apple")
|
model.push_to_hub(model_name, organization="apple")
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -259,8 +259,8 @@ def convert_mobilevitv2_checkpoint(task_name, checkpoint_path, orig_config_path,
|
|||||||
model.load_state_dict(state_dict)
|
model.load_state_dict(state_dict)
|
||||||
|
|
||||||
# Check outputs on an image, prepared by MobileViTImageProcessor
|
# Check outputs on an image, prepared by MobileViTImageProcessor
|
||||||
feature_extractor = MobileViTImageProcessor(crop_size=config.image_size, size=config.image_size + 32)
|
image_processor = MobileViTImageProcessor(crop_size=config.image_size, size=config.image_size + 32)
|
||||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||||
outputs = model(**encoding)
|
outputs = model(**encoding)
|
||||||
|
|
||||||
# verify classification model
|
# verify classification model
|
||||||
@@ -276,8 +276,8 @@ def convert_mobilevitv2_checkpoint(task_name, checkpoint_path, orig_config_path,
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model {task_name} to {pytorch_dump_folder_path}")
|
print(f"Saving model {task_name} to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -383,7 +383,7 @@ class OwlViTOnnxConfig(OnnxConfig):
|
|||||||
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
|
processor.tokenizer, batch_size=batch_size, seq_length=seq_length, framework=framework
|
||||||
)
|
)
|
||||||
image_input_dict = super().generate_dummy_inputs(
|
image_input_dict = super().generate_dummy_inputs(
|
||||||
processor.feature_extractor, batch_size=batch_size, framework=framework
|
processor.image_processor, batch_size=batch_size, framework=framework
|
||||||
)
|
)
|
||||||
return {**text_input_dict, **image_input_dict}
|
return {**text_input_dict, **image_input_dict}
|
||||||
|
|
||||||
|
|||||||
@@ -29,8 +29,8 @@ from huggingface_hub import Repository
|
|||||||
from transformers import (
|
from transformers import (
|
||||||
CLIPTokenizer,
|
CLIPTokenizer,
|
||||||
OwlViTConfig,
|
OwlViTConfig,
|
||||||
OwlViTFeatureExtractor,
|
|
||||||
OwlViTForObjectDetection,
|
OwlViTForObjectDetection,
|
||||||
|
OwlViTImageProcessor,
|
||||||
OwlViTModel,
|
OwlViTModel,
|
||||||
OwlViTProcessor,
|
OwlViTProcessor,
|
||||||
)
|
)
|
||||||
@@ -350,16 +350,16 @@ def convert_owlvit_checkpoint(pt_backbone, flax_params, attn_params, pytorch_dum
|
|||||||
# Save HF model
|
# Save HF model
|
||||||
hf_model.save_pretrained(repo.local_dir)
|
hf_model.save_pretrained(repo.local_dir)
|
||||||
|
|
||||||
# Initialize feature extractor
|
# Initialize image processor
|
||||||
feature_extractor = OwlViTFeatureExtractor(
|
image_processor = OwlViTImageProcessor(
|
||||||
size=config.vision_config.image_size, crop_size=config.vision_config.image_size
|
size=config.vision_config.image_size, crop_size=config.vision_config.image_size
|
||||||
)
|
)
|
||||||
# Initialize tokenizer
|
# Initialize tokenizer
|
||||||
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32", pad_token="!", model_max_length=16)
|
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32", pad_token="!", model_max_length=16)
|
||||||
|
|
||||||
# Initialize processor
|
# Initialize processor
|
||||||
processor = OwlViTProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
processor = OwlViTProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||||
feature_extractor.save_pretrained(repo.local_dir)
|
image_processor.save_pretrained(repo.local_dir)
|
||||||
processor.save_pretrained(repo.local_dir)
|
processor.save_pretrained(repo.local_dir)
|
||||||
|
|
||||||
repo.git_add()
|
repo.git_add()
|
||||||
|
|||||||
@@ -29,13 +29,13 @@ from PIL import Image
|
|||||||
|
|
||||||
from transformers import (
|
from transformers import (
|
||||||
PerceiverConfig,
|
PerceiverConfig,
|
||||||
PerceiverFeatureExtractor,
|
|
||||||
PerceiverForImageClassificationConvProcessing,
|
PerceiverForImageClassificationConvProcessing,
|
||||||
PerceiverForImageClassificationFourier,
|
PerceiverForImageClassificationFourier,
|
||||||
PerceiverForImageClassificationLearned,
|
PerceiverForImageClassificationLearned,
|
||||||
PerceiverForMaskedLM,
|
PerceiverForMaskedLM,
|
||||||
PerceiverForMultimodalAutoencoding,
|
PerceiverForMultimodalAutoencoding,
|
||||||
PerceiverForOpticalFlow,
|
PerceiverForOpticalFlow,
|
||||||
|
PerceiverImageProcessor,
|
||||||
PerceiverTokenizer,
|
PerceiverTokenizer,
|
||||||
)
|
)
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
@@ -389,9 +389,9 @@ def convert_perceiver_checkpoint(pickle_file, pytorch_dump_folder_path, architec
|
|||||||
inputs = encoding.input_ids
|
inputs = encoding.input_ids
|
||||||
input_mask = encoding.attention_mask
|
input_mask = encoding.attention_mask
|
||||||
elif architecture in ["image_classification", "image_classification_fourier", "image_classification_conv"]:
|
elif architecture in ["image_classification", "image_classification_fourier", "image_classification_conv"]:
|
||||||
feature_extractor = PerceiverFeatureExtractor()
|
image_processor = PerceiverImageProcessor()
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
encoding = feature_extractor(image, return_tensors="pt")
|
encoding = image_processor(image, return_tensors="pt")
|
||||||
inputs = encoding.pixel_values
|
inputs = encoding.pixel_values
|
||||||
elif architecture == "optical_flow":
|
elif architecture == "optical_flow":
|
||||||
inputs = torch.randn(1, 2, 27, 368, 496)
|
inputs = torch.randn(1, 2, 27, 368, 496)
|
||||||
|
|||||||
@@ -24,7 +24,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import PoolFormerConfig, PoolFormerFeatureExtractor, PoolFormerForImageClassification
|
from transformers import PoolFormerConfig, PoolFormerForImageClassification, PoolFormerImageProcessor
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -141,12 +141,12 @@ def convert_poolformer_checkpoint(model_name, checkpoint_path, pytorch_dump_fold
|
|||||||
else:
|
else:
|
||||||
raise ValueError(f"Size {size} not supported")
|
raise ValueError(f"Size {size} not supported")
|
||||||
|
|
||||||
# load feature extractor
|
# load image processor
|
||||||
feature_extractor = PoolFormerFeatureExtractor(crop_pct=crop_pct)
|
image_processor = PoolFormerImageProcessor(crop_pct=crop_pct)
|
||||||
|
|
||||||
# Prepare image
|
# Prepare image
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
|
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values
|
||||||
|
|
||||||
logger.info(f"Converting model {model_name}...")
|
logger.info(f"Converting model {model_name}...")
|
||||||
|
|
||||||
@@ -161,9 +161,9 @@ def convert_poolformer_checkpoint(model_name, checkpoint_path, pytorch_dump_fold
|
|||||||
model.load_state_dict(state_dict)
|
model.load_state_dict(state_dict)
|
||||||
model.eval()
|
model.eval()
|
||||||
|
|
||||||
# Define feature extractor
|
# Define image processor
|
||||||
feature_extractor = PoolFormerFeatureExtractor(crop_pct=crop_pct)
|
image_processor = PoolFormerImageProcessor(crop_pct=crop_pct)
|
||||||
pixel_values = feature_extractor(images=prepare_img(), return_tensors="pt").pixel_values
|
pixel_values = image_processor(images=prepare_img(), return_tensors="pt").pixel_values
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
outputs = model(pixel_values)
|
outputs = model(pixel_values)
|
||||||
@@ -187,12 +187,12 @@ def convert_poolformer_checkpoint(model_name, checkpoint_path, pytorch_dump_fold
|
|||||||
assert logits.shape == expected_shape
|
assert logits.shape == expected_shape
|
||||||
assert torch.allclose(logits[0, :3], expected_slice, atol=1e-2)
|
assert torch.allclose(logits[0, :3], expected_slice, atol=1e-2)
|
||||||
|
|
||||||
# finally, save model and feature extractor
|
# finally, save model and image processor
|
||||||
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
|
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
|
||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -34,7 +34,7 @@ from huggingface_hub import cached_download, hf_hub_url
|
|||||||
from torch import Tensor
|
from torch import Tensor
|
||||||
from vissl.models.model_helpers import get_trunk_forward_outputs
|
from vissl.models.model_helpers import get_trunk_forward_outputs
|
||||||
|
|
||||||
from transformers import AutoFeatureExtractor, RegNetConfig, RegNetForImageClassification, RegNetModel
|
from transformers import AutoImageProcessor, RegNetConfig, RegNetForImageClassification, RegNetModel
|
||||||
from transformers.modeling_utils import PreTrainedModel
|
from transformers.modeling_utils import PreTrainedModel
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
@@ -262,10 +262,10 @@ def convert_weights_and_push(save_directory: Path, model_name: str = None, push_
|
|||||||
)
|
)
|
||||||
size = 384
|
size = 384
|
||||||
# we can use the convnext one
|
# we can use the convnext one
|
||||||
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
|
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
|
||||||
feature_extractor.push_to_hub(
|
image_processor.push_to_hub(
|
||||||
repo_path_or_name=save_directory / model_name,
|
repo_path_or_name=save_directory / model_name,
|
||||||
commit_message="Add feature extractor",
|
commit_message="Add image processor",
|
||||||
output_dir=save_directory / model_name,
|
output_dir=save_directory / model_name,
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -294,7 +294,7 @@ if __name__ == "__main__":
|
|||||||
default=True,
|
default=True,
|
||||||
type=bool,
|
type=bool,
|
||||||
required=False,
|
required=False,
|
||||||
help="If True, push model and feature extractor to the hub.",
|
help="If True, push model and image processor to the hub.",
|
||||||
)
|
)
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|||||||
@@ -30,7 +30,7 @@ from huggingface_hub import cached_download, hf_hub_url
|
|||||||
from torch import Tensor
|
from torch import Tensor
|
||||||
from vissl.models.model_helpers import get_trunk_forward_outputs
|
from vissl.models.model_helpers import get_trunk_forward_outputs
|
||||||
|
|
||||||
from transformers import AutoFeatureExtractor, RegNetConfig, RegNetForImageClassification, RegNetModel
|
from transformers import AutoImageProcessor, RegNetConfig, RegNetForImageClassification, RegNetModel
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -209,10 +209,10 @@ def convert_weight_and_push(
|
|||||||
|
|
||||||
size = 224 if "seer" not in name else 384
|
size = 224 if "seer" not in name else 384
|
||||||
# we can use the convnext one
|
# we can use the convnext one
|
||||||
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
|
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k", size=size)
|
||||||
feature_extractor.push_to_hub(
|
image_processor.push_to_hub(
|
||||||
repo_path_or_name=save_directory / name,
|
repo_path_or_name=save_directory / name,
|
||||||
commit_message="Add feature extractor",
|
commit_message="Add image processor",
|
||||||
use_temp_dir=True,
|
use_temp_dir=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -449,7 +449,7 @@ if __name__ == "__main__":
|
|||||||
default=True,
|
default=True,
|
||||||
type=bool,
|
type=bool,
|
||||||
required=False,
|
required=False,
|
||||||
help="If True, push model and feature extractor to the hub.",
|
help="If True, push model and image processor to the hub.",
|
||||||
)
|
)
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|||||||
@@ -28,7 +28,7 @@ import torch.nn as nn
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from torch import Tensor
|
from torch import Tensor
|
||||||
|
|
||||||
from transformers import AutoFeatureExtractor, ResNetConfig, ResNetForImageClassification
|
from transformers import AutoImageProcessor, ResNetConfig, ResNetForImageClassification
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -113,10 +113,10 @@ def convert_weight_and_push(name: str, config: ResNetConfig, save_directory: Pat
|
|||||||
)
|
)
|
||||||
|
|
||||||
# we can use the convnext one
|
# we can use the convnext one
|
||||||
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k")
|
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k")
|
||||||
feature_extractor.push_to_hub(
|
image_processor.push_to_hub(
|
||||||
repo_path_or_name=save_directory / checkpoint_name,
|
repo_path_or_name=save_directory / checkpoint_name,
|
||||||
commit_message="Add feature extractor",
|
commit_message="Add image processor",
|
||||||
use_temp_dir=True,
|
use_temp_dir=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -191,7 +191,7 @@ if __name__ == "__main__":
|
|||||||
default=True,
|
default=True,
|
||||||
type=bool,
|
type=bool,
|
||||||
required=False,
|
required=False,
|
||||||
help="If True, push model and feature extractor to the hub.",
|
help="If True, push model and image processor to the hub.",
|
||||||
)
|
)
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|||||||
@@ -27,9 +27,9 @@ from PIL import Image
|
|||||||
|
|
||||||
from transformers import (
|
from transformers import (
|
||||||
SegformerConfig,
|
SegformerConfig,
|
||||||
SegformerFeatureExtractor,
|
|
||||||
SegformerForImageClassification,
|
SegformerForImageClassification,
|
||||||
SegformerForSemanticSegmentation,
|
SegformerForSemanticSegmentation,
|
||||||
|
SegformerImageProcessor,
|
||||||
)
|
)
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
@@ -179,14 +179,14 @@ def convert_segformer_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
|
|||||||
else:
|
else:
|
||||||
raise ValueError(f"Size {size} not supported")
|
raise ValueError(f"Size {size} not supported")
|
||||||
|
|
||||||
# load feature extractor (only resize + normalize)
|
# load image processor (only resize + normalize)
|
||||||
feature_extractor = SegformerFeatureExtractor(
|
image_processor = SegformerImageProcessor(
|
||||||
image_scale=(512, 512), keep_ratio=False, align=False, do_random_crop=False
|
image_scale=(512, 512), keep_ratio=False, align=False, do_random_crop=False
|
||||||
)
|
)
|
||||||
|
|
||||||
# prepare image
|
# prepare image
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
|
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values
|
||||||
|
|
||||||
logger.info(f"Converting model {model_name}...")
|
logger.info(f"Converting model {model_name}...")
|
||||||
|
|
||||||
@@ -362,11 +362,11 @@ def convert_segformer_checkpoint(model_name, checkpoint_path, pytorch_dump_folde
|
|||||||
assert logits.shape == expected_shape
|
assert logits.shape == expected_shape
|
||||||
assert torch.allclose(logits[0, :3, :3, :3], expected_slice, atol=1e-2)
|
assert torch.allclose(logits[0, :3, :3, :3], expected_slice, atol=1e-2)
|
||||||
|
|
||||||
# finally, save model and feature extractor
|
# finally, save model and image processor
|
||||||
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
|
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
|
||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -22,7 +22,7 @@ import requests
|
|||||||
import torch
|
import torch
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import SwinConfig, SwinForMaskedImageModeling, ViTFeatureExtractor
|
from transformers import SwinConfig, SwinForMaskedImageModeling, ViTImageProcessor
|
||||||
|
|
||||||
|
|
||||||
def get_swin_config(model_name):
|
def get_swin_config(model_name):
|
||||||
@@ -132,9 +132,9 @@ def convert_swin_checkpoint(model_name, checkpoint_path, pytorch_dump_folder_pat
|
|||||||
|
|
||||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||||
|
|
||||||
feature_extractor = ViTFeatureExtractor(size={"height": 192, "width": 192})
|
image_processor = ViTImageProcessor(size={"height": 192, "width": 192})
|
||||||
image = Image.open(requests.get(url, stream=True).raw)
|
image = Image.open(requests.get(url, stream=True).raw)
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt")
|
inputs = image_processor(images=image, return_tensors="pt")
|
||||||
|
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
outputs = model(**inputs).logits
|
outputs = model(**inputs).logits
|
||||||
@@ -146,13 +146,13 @@ def convert_swin_checkpoint(model_name, checkpoint_path, pytorch_dump_folder_pat
|
|||||||
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
|
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
print(f"Pushing model and feature extractor for {model_name} to hub")
|
print(f"Pushing model and image processor for {model_name} to hub")
|
||||||
model.push_to_hub(f"microsoft/{model_name}")
|
model.push_to_hub(f"microsoft/{model_name}")
|
||||||
feature_extractor.push_to_hub(f"microsoft/{model_name}")
|
image_processor.push_to_hub(f"microsoft/{model_name}")
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -7,7 +7,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import AutoFeatureExtractor, SwinConfig, SwinForImageClassification
|
from transformers import AutoImageProcessor, SwinConfig, SwinForImageClassification
|
||||||
|
|
||||||
|
|
||||||
def get_swin_config(swin_name):
|
def get_swin_config(swin_name):
|
||||||
@@ -140,9 +140,9 @@ def convert_swin_checkpoint(swin_name, pytorch_dump_folder_path):
|
|||||||
|
|
||||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||||
|
|
||||||
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/{}".format(swin_name.replace("_", "-")))
|
image_processor = AutoImageProcessor.from_pretrained("microsoft/{}".format(swin_name.replace("_", "-")))
|
||||||
image = Image.open(requests.get(url, stream=True).raw)
|
image = Image.open(requests.get(url, stream=True).raw)
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt")
|
inputs = image_processor(images=image, return_tensors="pt")
|
||||||
|
|
||||||
timm_outs = timm_model(inputs["pixel_values"])
|
timm_outs = timm_model(inputs["pixel_values"])
|
||||||
hf_outs = model(**inputs).logits
|
hf_outs = model(**inputs).logits
|
||||||
@@ -152,8 +152,8 @@ def convert_swin_checkpoint(swin_name, pytorch_dump_folder_path):
|
|||||||
print(f"Saving model {swin_name} to {pytorch_dump_folder_path}")
|
print(f"Saving model {swin_name} to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -24,7 +24,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import AutoFeatureExtractor, Swinv2Config, Swinv2ForImageClassification
|
from transformers import AutoImageProcessor, Swinv2Config, Swinv2ForImageClassification
|
||||||
|
|
||||||
|
|
||||||
def get_swinv2_config(swinv2_name):
|
def get_swinv2_config(swinv2_name):
|
||||||
@@ -180,9 +180,9 @@ def convert_swinv2_checkpoint(swinv2_name, pytorch_dump_folder_path):
|
|||||||
|
|
||||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||||
|
|
||||||
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/{}".format(swinv2_name.replace("_", "-")))
|
image_processor = AutoImageProcessor.from_pretrained("microsoft/{}".format(swinv2_name.replace("_", "-")))
|
||||||
image = Image.open(requests.get(url, stream=True).raw)
|
image = Image.open(requests.get(url, stream=True).raw)
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt")
|
inputs = image_processor(images=image, return_tensors="pt")
|
||||||
|
|
||||||
timm_outs = timm_model(inputs["pixel_values"])
|
timm_outs = timm_model(inputs["pixel_values"])
|
||||||
hf_outs = model(**inputs).logits
|
hf_outs = model(**inputs).logits
|
||||||
@@ -192,8 +192,8 @@ def convert_swinv2_checkpoint(swinv2_name, pytorch_dump_folder_path):
|
|||||||
print(f"Saving model {swinv2_name} to {pytorch_dump_folder_path}")
|
print(f"Saving model {swinv2_name} to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
model.push_to_hub(
|
model.push_to_hub(
|
||||||
repo_path_or_name=Path(pytorch_dump_folder_path, swinv2_name),
|
repo_path_or_name=Path(pytorch_dump_folder_path, swinv2_name),
|
||||||
|
|||||||
@@ -27,7 +27,7 @@ from huggingface_hub import hf_hub_download
|
|||||||
from PIL import Image
|
from PIL import Image
|
||||||
from torchvision.transforms import functional as F
|
from torchvision.transforms import functional as F
|
||||||
|
|
||||||
from transformers import DetrFeatureExtractor, TableTransformerConfig, TableTransformerForObjectDetection
|
from transformers import DetrImageProcessor, TableTransformerConfig, TableTransformerForObjectDetection
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -242,7 +242,7 @@ def convert_table_transformer_checkpoint(checkpoint_url, pytorch_dump_folder_pat
|
|||||||
config.id2label = id2label
|
config.id2label = id2label
|
||||||
config.label2id = {v: k for k, v in id2label.items()}
|
config.label2id = {v: k for k, v in id2label.items()}
|
||||||
|
|
||||||
feature_extractor = DetrFeatureExtractor(
|
image_processor = DetrImageProcessor(
|
||||||
format="coco_detection", max_size=800 if "detection" in checkpoint_url else 1000
|
format="coco_detection", max_size=800 if "detection" in checkpoint_url else 1000
|
||||||
)
|
)
|
||||||
model = TableTransformerForObjectDetection(config)
|
model = TableTransformerForObjectDetection(config)
|
||||||
@@ -277,11 +277,11 @@ def convert_table_transformer_checkpoint(checkpoint_url, pytorch_dump_folder_pat
|
|||||||
print("Looks ok!")
|
print("Looks ok!")
|
||||||
|
|
||||||
if pytorch_dump_folder_path is not None:
|
if pytorch_dump_folder_path is not None:
|
||||||
# Save model and feature extractor
|
# Save model and image processor
|
||||||
logger.info(f"Saving PyTorch model and feature extractor to {pytorch_dump_folder_path}...")
|
logger.info(f"Saving PyTorch model and image processor to {pytorch_dump_folder_path}...")
|
||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
# Push model to HF hub
|
# Push model to HF hub
|
||||||
@@ -292,7 +292,7 @@ def convert_table_transformer_checkpoint(checkpoint_url, pytorch_dump_folder_pat
|
|||||||
else "microsoft/table-transformer-structure-recognition"
|
else "microsoft/table-transformer-structure-recognition"
|
||||||
)
|
)
|
||||||
model.push_to_hub(model_name)
|
model.push_to_hub(model_name)
|
||||||
feature_extractor.push_to_hub(model_name)
|
image_processor.push_to_hub(model_name)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -22,7 +22,7 @@ import numpy as np
|
|||||||
import torch
|
import torch
|
||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
|
|
||||||
from transformers import TimesformerConfig, TimesformerForVideoClassification, VideoMAEFeatureExtractor
|
from transformers import TimesformerConfig, TimesformerForVideoClassification, VideoMAEImageProcessor
|
||||||
|
|
||||||
|
|
||||||
def get_timesformer_config(model_name):
|
def get_timesformer_config(model_name):
|
||||||
@@ -156,9 +156,9 @@ def convert_timesformer_checkpoint(checkpoint_url, pytorch_dump_folder_path, mod
|
|||||||
model.eval()
|
model.eval()
|
||||||
|
|
||||||
# verify model on basic input
|
# verify model on basic input
|
||||||
feature_extractor = VideoMAEFeatureExtractor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
|
image_processor = VideoMAEImageProcessor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
|
||||||
video = prepare_video()
|
video = prepare_video()
|
||||||
inputs = feature_extractor(video[:8], return_tensors="pt")
|
inputs = image_processor(video[:8], return_tensors="pt")
|
||||||
|
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
logits = outputs.logits
|
logits = outputs.logits
|
||||||
@@ -215,8 +215,8 @@ def convert_timesformer_checkpoint(checkpoint_url, pytorch_dump_folder_path, mod
|
|||||||
print("Logits ok!")
|
print("Logits ok!")
|
||||||
|
|
||||||
if pytorch_dump_folder_path is not None:
|
if pytorch_dump_folder_path is not None:
|
||||||
print(f"Saving model and feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving model and image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
|
|||||||
@@ -513,8 +513,8 @@ TIMESFORMER_START_DOCSTRING = r"""
|
|||||||
TIMESFORMER_INPUTS_DOCSTRING = r"""
|
TIMESFORMER_INPUTS_DOCSTRING = r"""
|
||||||
Args:
|
Args:
|
||||||
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_frames, num_channels, height, width)`):
|
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_frames, num_channels, height, width)`):
|
||||||
Pixel values. Pixel values can be obtained using [`AutoFeatureExtractor`]. See
|
Pixel values. Pixel values can be obtained using [`AutoImageProcessor`]. See
|
||||||
[`VideoMAEFeatureExtractor.__call__`] for details.
|
[`VideoMAEImageProcessor.preprocess`] for details.
|
||||||
|
|
||||||
output_attentions (`bool`, *optional*):
|
output_attentions (`bool`, *optional*):
|
||||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
|
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
|
||||||
|
|||||||
@@ -29,7 +29,7 @@ from transformers import (
|
|||||||
TrOCRProcessor,
|
TrOCRProcessor,
|
||||||
VisionEncoderDecoderModel,
|
VisionEncoderDecoderModel,
|
||||||
ViTConfig,
|
ViTConfig,
|
||||||
ViTFeatureExtractor,
|
ViTImageProcessor,
|
||||||
ViTModel,
|
ViTModel,
|
||||||
)
|
)
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
@@ -182,9 +182,9 @@ def convert_tr_ocr_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
|||||||
model.load_state_dict(state_dict)
|
model.load_state_dict(state_dict)
|
||||||
|
|
||||||
# Check outputs on an image
|
# Check outputs on an image
|
||||||
feature_extractor = ViTFeatureExtractor(size=encoder_config.image_size)
|
image_processor = ViTImageProcessor(size=encoder_config.image_size)
|
||||||
tokenizer = RobertaTokenizer.from_pretrained("roberta-large")
|
tokenizer = RobertaTokenizer.from_pretrained("roberta-large")
|
||||||
processor = TrOCRProcessor(feature_extractor, tokenizer)
|
processor = TrOCRProcessor(image_processor, tokenizer)
|
||||||
|
|
||||||
pixel_values = processor(images=prepare_img(checkpoint_url), return_tensors="pt").pixel_values
|
pixel_values = processor(images=prepare_img(checkpoint_url), return_tensors="pt").pixel_values
|
||||||
|
|
||||||
|
|||||||
@@ -30,7 +30,7 @@ import torch.nn as nn
|
|||||||
from huggingface_hub import cached_download, hf_hub_download
|
from huggingface_hub import cached_download, hf_hub_download
|
||||||
from torch import Tensor
|
from torch import Tensor
|
||||||
|
|
||||||
from transformers import AutoFeatureExtractor, VanConfig, VanForImageClassification
|
from transformers import AutoImageProcessor, VanConfig, VanForImageClassification
|
||||||
from transformers.models.van.modeling_van import VanLayerScaling
|
from transformers.models.van.modeling_van import VanLayerScaling
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
@@ -154,10 +154,10 @@ def convert_weight_and_push(
|
|||||||
)
|
)
|
||||||
|
|
||||||
# we can use the convnext one
|
# we can use the convnext one
|
||||||
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/convnext-base-224-22k-1k")
|
image_processor = AutoImageProcessor.from_pretrained("facebook/convnext-base-224-22k-1k")
|
||||||
feature_extractor.push_to_hub(
|
image_processor.push_to_hub(
|
||||||
repo_path_or_name=save_directory / checkpoint_name,
|
repo_path_or_name=save_directory / checkpoint_name,
|
||||||
commit_message="Add feature extractor",
|
commit_message="Add image processor",
|
||||||
use_temp_dir=True,
|
use_temp_dir=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -277,7 +277,7 @@ if __name__ == "__main__":
|
|||||||
default=True,
|
default=True,
|
||||||
type=bool,
|
type=bool,
|
||||||
required=False,
|
required=False,
|
||||||
help="If True, push model and feature extractor to the hub.",
|
help="If True, push model and image processor to the hub.",
|
||||||
)
|
)
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|||||||
@@ -24,9 +24,9 @@ from huggingface_hub import hf_hub_download
|
|||||||
|
|
||||||
from transformers import (
|
from transformers import (
|
||||||
VideoMAEConfig,
|
VideoMAEConfig,
|
||||||
VideoMAEFeatureExtractor,
|
|
||||||
VideoMAEForPreTraining,
|
VideoMAEForPreTraining,
|
||||||
VideoMAEForVideoClassification,
|
VideoMAEForVideoClassification,
|
||||||
|
VideoMAEImageProcessor,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@@ -198,9 +198,9 @@ def convert_videomae_checkpoint(checkpoint_url, pytorch_dump_folder_path, model_
|
|||||||
model.eval()
|
model.eval()
|
||||||
|
|
||||||
# verify model on basic input
|
# verify model on basic input
|
||||||
feature_extractor = VideoMAEFeatureExtractor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
|
image_processor = VideoMAEImageProcessor(image_mean=[0.5, 0.5, 0.5], image_std=[0.5, 0.5, 0.5])
|
||||||
video = prepare_video()
|
video = prepare_video()
|
||||||
inputs = feature_extractor(video, return_tensors="pt")
|
inputs = image_processor(video, return_tensors="pt")
|
||||||
|
|
||||||
if "finetuned" not in model_name:
|
if "finetuned" not in model_name:
|
||||||
local_path = hf_hub_download(repo_id="hf-internal-testing/bool-masked-pos", filename="bool_masked_pos.pt")
|
local_path = hf_hub_download(repo_id="hf-internal-testing/bool-masked-pos", filename="bool_masked_pos.pt")
|
||||||
@@ -288,8 +288,8 @@ def convert_videomae_checkpoint(checkpoint_url, pytorch_dump_folder_path, model_
|
|||||||
print("Loss ok!")
|
print("Loss ok!")
|
||||||
|
|
||||||
if pytorch_dump_folder_path is not None:
|
if pytorch_dump_folder_path is not None:
|
||||||
print(f"Saving model and feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving model and image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
|
|||||||
@@ -27,11 +27,11 @@ from PIL import Image
|
|||||||
from transformers import (
|
from transformers import (
|
||||||
BertTokenizer,
|
BertTokenizer,
|
||||||
ViltConfig,
|
ViltConfig,
|
||||||
ViltFeatureExtractor,
|
|
||||||
ViltForImageAndTextRetrieval,
|
ViltForImageAndTextRetrieval,
|
||||||
ViltForImagesAndTextClassification,
|
ViltForImagesAndTextClassification,
|
||||||
ViltForMaskedLM,
|
ViltForMaskedLM,
|
||||||
ViltForQuestionAnswering,
|
ViltForQuestionAnswering,
|
||||||
|
ViltImageProcessor,
|
||||||
ViltProcessor,
|
ViltProcessor,
|
||||||
)
|
)
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
@@ -223,9 +223,9 @@ def convert_vilt_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
|||||||
model.load_state_dict(state_dict)
|
model.load_state_dict(state_dict)
|
||||||
|
|
||||||
# Define processor
|
# Define processor
|
||||||
feature_extractor = ViltFeatureExtractor(size=384)
|
image_processor = ViltImageProcessor(size=384)
|
||||||
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
|
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
|
||||||
processor = ViltProcessor(feature_extractor, tokenizer)
|
processor = ViltProcessor(image_processor, tokenizer)
|
||||||
|
|
||||||
# Forward pass on example inputs (image + text)
|
# Forward pass on example inputs (image + text)
|
||||||
if nlvr_model:
|
if nlvr_model:
|
||||||
|
|||||||
@@ -24,7 +24,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import ViTConfig, ViTFeatureExtractor, ViTForImageClassification, ViTModel
|
from transformers import ViTConfig, ViTForImageClassification, ViTImageProcessor, ViTModel
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -175,9 +175,9 @@ def convert_vit_checkpoint(model_name, pytorch_dump_folder_path, base_model=True
|
|||||||
model = ViTForImageClassification(config).eval()
|
model = ViTForImageClassification(config).eval()
|
||||||
model.load_state_dict(state_dict)
|
model.load_state_dict(state_dict)
|
||||||
|
|
||||||
# Check outputs on an image, prepared by ViTFeatureExtractor
|
# Check outputs on an image, prepared by ViTImageProcessor
|
||||||
feature_extractor = ViTFeatureExtractor()
|
image_processor = ViTImageProcessor()
|
||||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||||
pixel_values = encoding["pixel_values"]
|
pixel_values = encoding["pixel_values"]
|
||||||
outputs = model(pixel_values)
|
outputs = model(pixel_values)
|
||||||
|
|
||||||
@@ -192,8 +192,8 @@ def convert_vit_checkpoint(model_name, pytorch_dump_folder_path, base_model=True
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
|
print(f"Saving model {model_name} to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -25,7 +25,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import DeiTFeatureExtractor, ViTConfig, ViTFeatureExtractor, ViTForImageClassification, ViTModel
|
from transformers import DeiTImageProcessor, ViTConfig, ViTForImageClassification, ViTImageProcessor, ViTModel
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -208,12 +208,12 @@ def convert_vit_checkpoint(vit_name, pytorch_dump_folder_path):
|
|||||||
model = ViTForImageClassification(config).eval()
|
model = ViTForImageClassification(config).eval()
|
||||||
model.load_state_dict(state_dict)
|
model.load_state_dict(state_dict)
|
||||||
|
|
||||||
# Check outputs on an image, prepared by ViTFeatureExtractor/DeiTFeatureExtractor
|
# Check outputs on an image, prepared by ViTImageProcessor/DeiTImageProcessor
|
||||||
if "deit" in vit_name:
|
if "deit" in vit_name:
|
||||||
feature_extractor = DeiTFeatureExtractor(size=config.image_size)
|
image_processor = DeiTImageProcessor(size=config.image_size)
|
||||||
else:
|
else:
|
||||||
feature_extractor = ViTFeatureExtractor(size=config.image_size)
|
image_processor = ViTImageProcessor(size=config.image_size)
|
||||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||||
pixel_values = encoding["pixel_values"]
|
pixel_values = encoding["pixel_values"]
|
||||||
outputs = model(pixel_values)
|
outputs = model(pixel_values)
|
||||||
|
|
||||||
@@ -229,8 +229,8 @@ def convert_vit_checkpoint(vit_name, pytorch_dump_folder_path):
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model {vit_name} to {pytorch_dump_folder_path}")
|
print(f"Saving model {vit_name} to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -20,7 +20,7 @@ import requests
|
|||||||
import torch
|
import torch
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import ViTMAEConfig, ViTMAEFeatureExtractor, ViTMAEForPreTraining
|
from transformers import ViTMAEConfig, ViTMAEForPreTraining, ViTMAEImageProcessor
|
||||||
|
|
||||||
|
|
||||||
def rename_key(name):
|
def rename_key(name):
|
||||||
@@ -120,7 +120,7 @@ def convert_vit_mae_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
|||||||
|
|
||||||
state_dict = torch.hub.load_state_dict_from_url(checkpoint_url, map_location="cpu")["model"]
|
state_dict = torch.hub.load_state_dict_from_url(checkpoint_url, map_location="cpu")["model"]
|
||||||
|
|
||||||
feature_extractor = ViTMAEFeatureExtractor(size=config.image_size)
|
image_processor = ViTMAEImageProcessor(size=config.image_size)
|
||||||
|
|
||||||
new_state_dict = convert_state_dict(state_dict, config)
|
new_state_dict = convert_state_dict(state_dict, config)
|
||||||
|
|
||||||
@@ -130,8 +130,8 @@ def convert_vit_mae_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
|||||||
url = "https://user-images.githubusercontent.com/11435359/147738734-196fd92f-9260-48d5-ba7e-bf103d29364d.jpg"
|
url = "https://user-images.githubusercontent.com/11435359/147738734-196fd92f-9260-48d5-ba7e-bf103d29364d.jpg"
|
||||||
|
|
||||||
image = Image.open(requests.get(url, stream=True).raw)
|
image = Image.open(requests.get(url, stream=True).raw)
|
||||||
feature_extractor = ViTMAEFeatureExtractor(size=config.image_size)
|
image_processor = ViTMAEImageProcessor(size=config.image_size)
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt")
|
inputs = image_processor(images=image, return_tensors="pt")
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
torch.manual_seed(2)
|
torch.manual_seed(2)
|
||||||
@@ -157,8 +157,8 @@ def convert_vit_mae_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
|||||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -22,7 +22,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import ViTFeatureExtractor, ViTMSNConfig, ViTMSNModel
|
from transformers import ViTImageProcessor, ViTMSNConfig, ViTMSNModel
|
||||||
from transformers.image_utils import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
|
from transformers.image_utils import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
|
||||||
|
|
||||||
|
|
||||||
@@ -180,7 +180,7 @@ def convert_vit_msn_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
|||||||
|
|
||||||
state_dict = torch.hub.load_state_dict_from_url(checkpoint_url, map_location="cpu")["target_encoder"]
|
state_dict = torch.hub.load_state_dict_from_url(checkpoint_url, map_location="cpu")["target_encoder"]
|
||||||
|
|
||||||
feature_extractor = ViTFeatureExtractor(size=config.image_size)
|
image_processor = ViTImageProcessor(size=config.image_size)
|
||||||
|
|
||||||
remove_projection_head(state_dict)
|
remove_projection_head(state_dict)
|
||||||
rename_keys = create_rename_keys(config, base_model=True)
|
rename_keys = create_rename_keys(config, base_model=True)
|
||||||
@@ -195,10 +195,10 @@ def convert_vit_msn_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
|||||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||||
|
|
||||||
image = Image.open(requests.get(url, stream=True).raw)
|
image = Image.open(requests.get(url, stream=True).raw)
|
||||||
feature_extractor = ViTFeatureExtractor(
|
image_processor = ViTImageProcessor(
|
||||||
size=config.image_size, image_mean=IMAGENET_DEFAULT_MEAN, image_std=IMAGENET_DEFAULT_STD
|
size=config.image_size, image_mean=IMAGENET_DEFAULT_MEAN, image_std=IMAGENET_DEFAULT_STD
|
||||||
)
|
)
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt")
|
inputs = image_processor(images=image, return_tensors="pt")
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
torch.manual_seed(2)
|
torch.manual_seed(2)
|
||||||
@@ -224,8 +224,8 @@ def convert_vit_msn_checkpoint(checkpoint_url, pytorch_dump_folder_path):
|
|||||||
print(f"Saving model to {pytorch_dump_folder_path}")
|
print(f"Saving model to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
@@ -23,7 +23,7 @@ from huggingface_hub import hf_hub_download
|
|||||||
from transformers import (
|
from transformers import (
|
||||||
CLIPTokenizer,
|
CLIPTokenizer,
|
||||||
CLIPTokenizerFast,
|
CLIPTokenizerFast,
|
||||||
VideoMAEFeatureExtractor,
|
VideoMAEImageProcessor,
|
||||||
XCLIPConfig,
|
XCLIPConfig,
|
||||||
XCLIPModel,
|
XCLIPModel,
|
||||||
XCLIPProcessor,
|
XCLIPProcessor,
|
||||||
@@ -291,10 +291,10 @@ def convert_xclip_checkpoint(model_name, pytorch_dump_folder_path=None, push_to_
|
|||||||
model.eval()
|
model.eval()
|
||||||
|
|
||||||
size = 336 if model_name == "xclip-large-patch14-16-frames" else 224
|
size = 336 if model_name == "xclip-large-patch14-16-frames" else 224
|
||||||
feature_extractor = VideoMAEFeatureExtractor(size=size)
|
image_processor = VideoMAEImageProcessor(size=size)
|
||||||
slow_tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")
|
slow_tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")
|
||||||
fast_tokenizer = CLIPTokenizerFast.from_pretrained("openai/clip-vit-base-patch32")
|
fast_tokenizer = CLIPTokenizerFast.from_pretrained("openai/clip-vit-base-patch32")
|
||||||
processor = XCLIPProcessor(feature_extractor=feature_extractor, tokenizer=fast_tokenizer)
|
processor = XCLIPProcessor(image_processor=image_processor, tokenizer=fast_tokenizer)
|
||||||
|
|
||||||
video = prepare_video(num_frames)
|
video = prepare_video(num_frames)
|
||||||
inputs = processor(
|
inputs = processor(
|
||||||
|
|||||||
@@ -24,7 +24,7 @@ import torch
|
|||||||
from huggingface_hub import hf_hub_download
|
from huggingface_hub import hf_hub_download
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import YolosConfig, YolosFeatureExtractor, YolosForObjectDetection
|
from transformers import YolosConfig, YolosForObjectDetection, YolosImageProcessor
|
||||||
from transformers.utils import logging
|
from transformers.utils import logging
|
||||||
|
|
||||||
|
|
||||||
@@ -172,10 +172,10 @@ def convert_yolos_checkpoint(
|
|||||||
new_state_dict = convert_state_dict(state_dict, model)
|
new_state_dict = convert_state_dict(state_dict, model)
|
||||||
model.load_state_dict(new_state_dict)
|
model.load_state_dict(new_state_dict)
|
||||||
|
|
||||||
# Check outputs on an image, prepared by YolosFeatureExtractor
|
# Check outputs on an image, prepared by YolosImageProcessor
|
||||||
size = 800 if yolos_name != "yolos_ti" else 512
|
size = 800 if yolos_name != "yolos_ti" else 512
|
||||||
feature_extractor = YolosFeatureExtractor(format="coco_detection", size=size)
|
image_processor = YolosImageProcessor(format="coco_detection", size=size)
|
||||||
encoding = feature_extractor(images=prepare_img(), return_tensors="pt")
|
encoding = image_processor(images=prepare_img(), return_tensors="pt")
|
||||||
outputs = model(**encoding)
|
outputs = model(**encoding)
|
||||||
logits, pred_boxes = outputs.logits, outputs.pred_boxes
|
logits, pred_boxes = outputs.logits, outputs.pred_boxes
|
||||||
|
|
||||||
@@ -224,8 +224,8 @@ def convert_yolos_checkpoint(
|
|||||||
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
|
||||||
print(f"Saving model {yolos_name} to {pytorch_dump_folder_path}")
|
print(f"Saving model {yolos_name} to {pytorch_dump_folder_path}")
|
||||||
model.save_pretrained(pytorch_dump_folder_path)
|
model.save_pretrained(pytorch_dump_folder_path)
|
||||||
print(f"Saving feature extractor to {pytorch_dump_folder_path}")
|
print(f"Saving image processor to {pytorch_dump_folder_path}")
|
||||||
feature_extractor.save_pretrained(pytorch_dump_folder_path)
|
image_processor.save_pretrained(pytorch_dump_folder_path)
|
||||||
|
|
||||||
if push_to_hub:
|
if push_to_hub:
|
||||||
model_mapping = {
|
model_mapping = {
|
||||||
@@ -238,7 +238,7 @@ def convert_yolos_checkpoint(
|
|||||||
|
|
||||||
print("Pushing to the hub...")
|
print("Pushing to the hub...")
|
||||||
model_name = model_mapping[yolos_name]
|
model_name = model_mapping[yolos_name]
|
||||||
feature_extractor.push_to_hub(model_name, organization="hustvl")
|
image_processor.push_to_hub(model_name, organization="hustvl")
|
||||||
model.push_to_hub(model_name, organization="hustvl")
|
model.push_to_hub(model_name, organization="hustvl")
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -19,7 +19,7 @@ from pathlib import Path
|
|||||||
|
|
||||||
from packaging import version
|
from packaging import version
|
||||||
|
|
||||||
from .. import AutoFeatureExtractor, AutoProcessor, AutoTokenizer
|
from .. import AutoFeatureExtractor, AutoImageProcessor, AutoProcessor, AutoTokenizer
|
||||||
from ..utils import logging
|
from ..utils import logging
|
||||||
from ..utils.import_utils import is_optimum_available
|
from ..utils.import_utils import is_optimum_available
|
||||||
from .convert import export, validate_model_outputs
|
from .convert import export, validate_model_outputs
|
||||||
@@ -145,6 +145,8 @@ def export_with_transformers(args):
|
|||||||
preprocessor = get_preprocessor(args.model)
|
preprocessor = get_preprocessor(args.model)
|
||||||
elif args.preprocessor == "tokenizer":
|
elif args.preprocessor == "tokenizer":
|
||||||
preprocessor = AutoTokenizer.from_pretrained(args.model)
|
preprocessor = AutoTokenizer.from_pretrained(args.model)
|
||||||
|
elif args.preprocessor == "image_processor":
|
||||||
|
preprocessor = AutoImageProcessor.from_pretrained(args.model)
|
||||||
elif args.preprocessor == "feature_extractor":
|
elif args.preprocessor == "feature_extractor":
|
||||||
preprocessor = AutoFeatureExtractor.from_pretrained(args.model)
|
preprocessor = AutoFeatureExtractor.from_pretrained(args.model)
|
||||||
elif args.preprocessor == "processor":
|
elif args.preprocessor == "processor":
|
||||||
@@ -213,7 +215,7 @@ def main():
|
|||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--preprocessor",
|
"--preprocessor",
|
||||||
type=str,
|
type=str,
|
||||||
choices=["auto", "tokenizer", "feature_extractor", "processor"],
|
choices=["auto", "tokenizer", "feature_extractor", "image_processor", "processor"],
|
||||||
default="auto",
|
default="auto",
|
||||||
help="Which type of preprocessor to use. 'auto' tries to automatically detect it.",
|
help="Which type of preprocessor to use. 'auto' tries to automatically detect it.",
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -49,7 +49,7 @@ if is_vision_available():
|
|||||||
import PIL
|
import PIL
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import BeitFeatureExtractor
|
from transformers import BeitImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class BeitModelTester:
|
class BeitModelTester:
|
||||||
@@ -342,18 +342,16 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class BeitModelIntegrationTest(unittest.TestCase):
|
class BeitModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return (
|
return BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
|
||||||
BeitFeatureExtractor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
|
|
||||||
)
|
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
def test_inference_masked_image_modeling_head(self):
|
def test_inference_masked_image_modeling_head(self):
|
||||||
model = BeitForMaskedImageModeling.from_pretrained("microsoft/beit-base-patch16-224-pt22k").to(torch_device)
|
model = BeitForMaskedImageModeling.from_pretrained("microsoft/beit-base-patch16-224-pt22k").to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values.to(torch_device)
|
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values.to(torch_device)
|
||||||
|
|
||||||
# prepare bool_masked_pos
|
# prepare bool_masked_pos
|
||||||
bool_masked_pos = torch.ones((1, 196), dtype=torch.bool).to(torch_device)
|
bool_masked_pos = torch.ones((1, 196), dtype=torch.bool).to(torch_device)
|
||||||
@@ -377,9 +375,9 @@ class BeitModelIntegrationTest(unittest.TestCase):
|
|||||||
def test_inference_image_classification_head_imagenet_1k(self):
|
def test_inference_image_classification_head_imagenet_1k(self):
|
||||||
model = BeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224").to(torch_device)
|
model = BeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224").to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
@@ -403,9 +401,9 @@ class BeitModelIntegrationTest(unittest.TestCase):
|
|||||||
torch_device
|
torch_device
|
||||||
)
|
)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
@@ -428,11 +426,11 @@ class BeitModelIntegrationTest(unittest.TestCase):
|
|||||||
model = BeitForSemanticSegmentation.from_pretrained("microsoft/beit-base-finetuned-ade-640-640")
|
model = BeitForSemanticSegmentation.from_pretrained("microsoft/beit-base-finetuned-ade-640-640")
|
||||||
model = model.to(torch_device)
|
model = model.to(torch_device)
|
||||||
|
|
||||||
feature_extractor = BeitFeatureExtractor(do_resize=True, size=640, do_center_crop=False)
|
image_processor = BeitImageProcessor(do_resize=True, size=640, do_center_crop=False)
|
||||||
|
|
||||||
ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
|
ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
|
||||||
image = Image.open(ds[0]["file"])
|
image = Image.open(ds[0]["file"])
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
@@ -471,11 +469,11 @@ class BeitModelIntegrationTest(unittest.TestCase):
|
|||||||
model = BeitForSemanticSegmentation.from_pretrained("microsoft/beit-base-finetuned-ade-640-640")
|
model = BeitForSemanticSegmentation.from_pretrained("microsoft/beit-base-finetuned-ade-640-640")
|
||||||
model = model.to(torch_device)
|
model = model.to(torch_device)
|
||||||
|
|
||||||
feature_extractor = BeitFeatureExtractor(do_resize=True, size=640, do_center_crop=False)
|
image_processor = BeitImageProcessor(do_resize=True, size=640, do_center_crop=False)
|
||||||
|
|
||||||
ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
|
ds = load_dataset("hf-internal-testing/fixtures_ade20k", split="test")
|
||||||
image = Image.open(ds[0]["file"])
|
image = Image.open(ds[0]["file"])
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
@@ -483,10 +481,10 @@ class BeitModelIntegrationTest(unittest.TestCase):
|
|||||||
|
|
||||||
outputs.logits = outputs.logits.detach().cpu()
|
outputs.logits = outputs.logits.detach().cpu()
|
||||||
|
|
||||||
segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
|
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
|
||||||
expected_shape = torch.Size((500, 300))
|
expected_shape = torch.Size((500, 300))
|
||||||
self.assertEqual(segmentation[0].shape, expected_shape)
|
self.assertEqual(segmentation[0].shape, expected_shape)
|
||||||
|
|
||||||
segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs)
|
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs)
|
||||||
expected_shape = torch.Size((160, 160))
|
expected_shape = torch.Size((160, 160))
|
||||||
self.assertEqual(segmentation[0].shape, expected_shape)
|
self.assertEqual(segmentation[0].shape, expected_shape)
|
||||||
|
|||||||
@@ -33,7 +33,7 @@ if is_flax_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import BeitFeatureExtractor
|
from transformers import BeitImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class FlaxBeitModelTester(unittest.TestCase):
|
class FlaxBeitModelTester(unittest.TestCase):
|
||||||
@@ -219,18 +219,16 @@ def prepare_img():
|
|||||||
@require_flax
|
@require_flax
|
||||||
class FlaxBeitModelIntegrationTest(unittest.TestCase):
|
class FlaxBeitModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return (
|
return BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
|
||||||
BeitFeatureExtractor.from_pretrained("microsoft/beit-base-patch16-224") if is_vision_available() else None
|
|
||||||
)
|
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
def test_inference_masked_image_modeling_head(self):
|
def test_inference_masked_image_modeling_head(self):
|
||||||
model = FlaxBeitForMaskedImageModeling.from_pretrained("microsoft/beit-base-patch16-224-pt22k")
|
model = FlaxBeitForMaskedImageModeling.from_pretrained("microsoft/beit-base-patch16-224-pt22k")
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
pixel_values = feature_extractor(images=image, return_tensors="np").pixel_values
|
pixel_values = image_processor(images=image, return_tensors="np").pixel_values
|
||||||
|
|
||||||
# prepare bool_masked_pos
|
# prepare bool_masked_pos
|
||||||
bool_masked_pos = np.ones((1, 196), dtype=bool)
|
bool_masked_pos = np.ones((1, 196), dtype=bool)
|
||||||
@@ -253,9 +251,9 @@ class FlaxBeitModelIntegrationTest(unittest.TestCase):
|
|||||||
def test_inference_image_classification_head_imagenet_1k(self):
|
def test_inference_image_classification_head_imagenet_1k(self):
|
||||||
model = FlaxBeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224")
|
model = FlaxBeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224")
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="np")
|
inputs = image_processor(images=image, return_tensors="np")
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
@@ -276,9 +274,9 @@ class FlaxBeitModelIntegrationTest(unittest.TestCase):
|
|||||||
def test_inference_image_classification_head_imagenet_22k(self):
|
def test_inference_image_classification_head_imagenet_22k(self):
|
||||||
model = FlaxBeitForImageClassification.from_pretrained("microsoft/beit-large-patch16-224-pt22k-ft22k")
|
model = FlaxBeitForImageClassification.from_pretrained("microsoft/beit-large-patch16-224-pt22k-ft22k")
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="np")
|
inputs = image_processor(images=image, return_tensors="np")
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
|
|||||||
@@ -297,7 +297,7 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class BitModelIntegrationTest(unittest.TestCase):
|
class BitModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return (
|
return (
|
||||||
BitImageProcessor.from_pretrained(BIT_PRETRAINED_MODEL_ARCHIVE_LIST[0]) if is_vision_available() else None
|
BitImageProcessor.from_pretrained(BIT_PRETRAINED_MODEL_ARCHIVE_LIST[0]) if is_vision_available() else None
|
||||||
)
|
)
|
||||||
@@ -306,9 +306,9 @@ class BitModelIntegrationTest(unittest.TestCase):
|
|||||||
def test_inference_image_classification_head(self):
|
def test_inference_image_classification_head(self):
|
||||||
model = BitForImageClassification.from_pretrained(BIT_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)
|
model = BitForImageClassification.from_pretrained(BIT_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
|
|||||||
@@ -145,7 +145,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
|
|||||||
pass
|
pass
|
||||||
|
|
||||||
def test_call_pil(self):
|
def test_call_pil(self):
|
||||||
# Initialize feature_extractor
|
# Initialize image processor
|
||||||
image_processing = self.image_processing_class(**self.image_processor_dict)
|
image_processing = self.image_processing_class(**self.image_processor_dict)
|
||||||
# create random PIL images
|
# create random PIL images
|
||||||
image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False)
|
image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False)
|
||||||
@@ -176,7 +176,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
|
|||||||
)
|
)
|
||||||
|
|
||||||
def test_call_numpy(self):
|
def test_call_numpy(self):
|
||||||
# Initialize feature_extractor
|
# Initialize image processor
|
||||||
image_processing = self.image_processing_class(**self.image_processor_dict)
|
image_processing = self.image_processing_class(**self.image_processor_dict)
|
||||||
# create random numpy tensors
|
# create random numpy tensors
|
||||||
image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False, numpify=True)
|
image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False, numpify=True)
|
||||||
@@ -207,7 +207,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
|
|||||||
)
|
)
|
||||||
|
|
||||||
def test_call_pytorch(self):
|
def test_call_pytorch(self):
|
||||||
# Initialize feature_extractor
|
# Initialize image processor
|
||||||
image_processing = self.image_processing_class(**self.image_processor_dict)
|
image_processing = self.image_processing_class(**self.image_processor_dict)
|
||||||
# create random PyTorch tensors
|
# create random PyTorch tensors
|
||||||
image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False, torchify=True)
|
image_inputs = prepare_image_inputs(self.image_processor_tester, equal_resolution=False, torchify=True)
|
||||||
@@ -238,7 +238,7 @@ class BridgeTowerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
|
|||||||
)
|
)
|
||||||
|
|
||||||
def test_equivalence_pad_and_create_pixel_mask(self):
|
def test_equivalence_pad_and_create_pixel_mask(self):
|
||||||
# Initialize feature_extractors
|
# Initialize image processors
|
||||||
image_processing_1 = self.image_processing_class(**self.image_processor_dict)
|
image_processing_1 = self.image_processing_class(**self.image_processor_dict)
|
||||||
image_processing_2 = self.image_processing_class(do_resize=False, do_normalize=False, do_rescale=False)
|
image_processing_2 = self.image_processing_class(do_resize=False, do_normalize=False, do_rescale=False)
|
||||||
# create random PyTorch tensors
|
# create random PyTorch tensors
|
||||||
|
|||||||
@@ -43,7 +43,7 @@ if is_timm_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import ConditionalDetrFeatureExtractor
|
from transformers import ConditionalDetrImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class ConditionalDetrModelTester:
|
class ConditionalDetrModelTester:
|
||||||
@@ -493,9 +493,9 @@ def prepare_img():
|
|||||||
@slow
|
@slow
|
||||||
class ConditionalDetrModelIntegrationTests(unittest.TestCase):
|
class ConditionalDetrModelIntegrationTests(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return (
|
return (
|
||||||
ConditionalDetrFeatureExtractor.from_pretrained("microsoft/conditional-detr-resnet-50")
|
ConditionalDetrImageProcessor.from_pretrained("microsoft/conditional-detr-resnet-50")
|
||||||
if is_vision_available()
|
if is_vision_available()
|
||||||
else None
|
else None
|
||||||
)
|
)
|
||||||
@@ -503,9 +503,9 @@ class ConditionalDetrModelIntegrationTests(unittest.TestCase):
|
|||||||
def test_inference_no_head(self):
|
def test_inference_no_head(self):
|
||||||
model = ConditionalDetrModel.from_pretrained("microsoft/conditional-detr-resnet-50").to(torch_device)
|
model = ConditionalDetrModel.from_pretrained("microsoft/conditional-detr-resnet-50").to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
outputs = model(**encoding)
|
outputs = model(**encoding)
|
||||||
@@ -522,9 +522,9 @@ class ConditionalDetrModelIntegrationTests(unittest.TestCase):
|
|||||||
torch_device
|
torch_device
|
||||||
)
|
)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
pixel_values = encoding["pixel_values"].to(torch_device)
|
pixel_values = encoding["pixel_values"].to(torch_device)
|
||||||
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
||||||
|
|
||||||
@@ -547,7 +547,7 @@ class ConditionalDetrModelIntegrationTests(unittest.TestCase):
|
|||||||
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_slice_boxes, atol=1e-4))
|
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_slice_boxes, atol=1e-4))
|
||||||
|
|
||||||
# verify postprocessing
|
# verify postprocessing
|
||||||
results = feature_extractor.post_process_object_detection(
|
results = image_processor.post_process_object_detection(
|
||||||
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
|
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
|
||||||
)[0]
|
)[0]
|
||||||
expected_scores = torch.tensor([0.8330, 0.8313, 0.8039, 0.6829, 0.5355]).to(torch_device)
|
expected_scores = torch.tensor([0.8330, 0.8313, 0.8039, 0.6829, 0.5355]).to(torch_device)
|
||||||
|
|||||||
@@ -38,7 +38,7 @@ if is_torch_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import AutoFeatureExtractor
|
from transformers import AutoImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class ConvNextModelTester:
|
class ConvNextModelTester:
|
||||||
@@ -285,16 +285,16 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class ConvNextModelIntegrationTest(unittest.TestCase):
|
class ConvNextModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return AutoFeatureExtractor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
|
return AutoImageProcessor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
def test_inference_image_classification_head(self):
|
def test_inference_image_classification_head(self):
|
||||||
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-tiny-224").to(torch_device)
|
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-tiny-224").to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
|
|||||||
@@ -38,7 +38,7 @@ if is_tf_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import ConvNextFeatureExtractor
|
from transformers import ConvNextImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class TFConvNextModelTester:
|
class TFConvNextModelTester:
|
||||||
@@ -279,18 +279,16 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class TFConvNextModelIntegrationTest(unittest.TestCase):
|
class TFConvNextModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return (
|
return ConvNextImageProcessor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
|
||||||
ConvNextFeatureExtractor.from_pretrained("facebook/convnext-tiny-224") if is_vision_available() else None
|
|
||||||
)
|
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
def test_inference_image_classification_head(self):
|
def test_inference_image_classification_head(self):
|
||||||
model = TFConvNextForImageClassification.from_pretrained("facebook/convnext-tiny-224")
|
model = TFConvNextForImageClassification.from_pretrained("facebook/convnext-tiny-224")
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="tf")
|
inputs = image_processor(images=image, return_tensors="tf")
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
|
|||||||
@@ -38,7 +38,7 @@ if is_torch_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import AutoFeatureExtractor
|
from transformers import AutoImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class CvtConfigTester(ConfigTester):
|
class CvtConfigTester(ConfigTester):
|
||||||
@@ -264,16 +264,16 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class CvtModelIntegrationTest(unittest.TestCase):
|
class CvtModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return AutoFeatureExtractor.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
return AutoImageProcessor.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
def test_inference_image_classification_head(self):
|
def test_inference_image_classification_head(self):
|
||||||
model = CvtForImageClassification.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)
|
model = CvtForImageClassification.from_pretrained(CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
|
|||||||
@@ -28,7 +28,7 @@ if is_tf_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import AutoFeatureExtractor
|
from transformers import AutoImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class TFCvtConfigTester(ConfigTester):
|
class TFCvtConfigTester(ConfigTester):
|
||||||
@@ -265,16 +265,16 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class TFCvtModelIntegrationTest(unittest.TestCase):
|
class TFCvtModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return AutoFeatureExtractor.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
return AutoImageProcessor.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
def test_inference_image_classification_head(self):
|
def test_inference_image_classification_head(self):
|
||||||
model = TFCvtForImageClassification.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
model = TFCvtForImageClassification.from_pretrained(TF_CVT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="tf")
|
inputs = image_processor(images=image, return_tensors="tf")
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
|
|||||||
@@ -44,7 +44,7 @@ if is_torch_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import BeitFeatureExtractor
|
from transformers import BeitImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class Data2VecVisionModelTester:
|
class Data2VecVisionModelTester:
|
||||||
@@ -327,11 +327,9 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class Data2VecVisionModelIntegrationTest(unittest.TestCase):
|
class Data2VecVisionModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return (
|
return (
|
||||||
BeitFeatureExtractor.from_pretrained("facebook/data2vec-vision-base-ft1k")
|
BeitImageProcessor.from_pretrained("facebook/data2vec-vision-base-ft1k") if is_vision_available() else None
|
||||||
if is_vision_available()
|
|
||||||
else None
|
|
||||||
)
|
)
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
@@ -340,9 +338,9 @@ class Data2VecVisionModelIntegrationTest(unittest.TestCase):
|
|||||||
torch_device
|
torch_device
|
||||||
)
|
)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
|
|||||||
@@ -46,7 +46,7 @@ if is_tf_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import BeitFeatureExtractor
|
from transformers import BeitImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class TFData2VecVisionModelTester:
|
class TFData2VecVisionModelTester:
|
||||||
@@ -469,20 +469,18 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class TFData2VecVisionModelIntegrationTest(unittest.TestCase):
|
class TFData2VecVisionModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return (
|
return (
|
||||||
BeitFeatureExtractor.from_pretrained("facebook/data2vec-vision-base-ft1k")
|
BeitImageProcessor.from_pretrained("facebook/data2vec-vision-base-ft1k") if is_vision_available() else None
|
||||||
if is_vision_available()
|
|
||||||
else None
|
|
||||||
)
|
)
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
def test_inference_image_classification_head_imagenet_1k(self):
|
def test_inference_image_classification_head_imagenet_1k(self):
|
||||||
model = TFData2VecVisionForImageClassification.from_pretrained("facebook/data2vec-vision-base-ft1k")
|
model = TFData2VecVisionForImageClassification.from_pretrained("facebook/data2vec-vision-base-ft1k")
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="tf")
|
inputs = image_processor(images=image, return_tensors="tf")
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
|
|||||||
@@ -39,7 +39,7 @@ if is_timm_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import AutoFeatureExtractor
|
from transformers import AutoImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class DeformableDetrModelTester:
|
class DeformableDetrModelTester:
|
||||||
@@ -563,15 +563,15 @@ def prepare_img():
|
|||||||
@slow
|
@slow
|
||||||
class DeformableDetrModelIntegrationTests(unittest.TestCase):
|
class DeformableDetrModelIntegrationTests(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return AutoFeatureExtractor.from_pretrained("SenseTime/deformable-detr") if is_vision_available() else None
|
return AutoImageProcessor.from_pretrained("SenseTime/deformable-detr") if is_vision_available() else None
|
||||||
|
|
||||||
def test_inference_object_detection_head(self):
|
def test_inference_object_detection_head(self):
|
||||||
model = DeformableDetrForObjectDetection.from_pretrained("SenseTime/deformable-detr").to(torch_device)
|
model = DeformableDetrForObjectDetection.from_pretrained("SenseTime/deformable-detr").to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
pixel_values = encoding["pixel_values"].to(torch_device)
|
pixel_values = encoding["pixel_values"].to(torch_device)
|
||||||
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
||||||
|
|
||||||
@@ -595,7 +595,7 @@ class DeformableDetrModelIntegrationTests(unittest.TestCase):
|
|||||||
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_boxes, atol=1e-4))
|
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_boxes, atol=1e-4))
|
||||||
|
|
||||||
# verify postprocessing
|
# verify postprocessing
|
||||||
results = feature_extractor.post_process_object_detection(
|
results = image_processor.post_process_object_detection(
|
||||||
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
|
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
|
||||||
)[0]
|
)[0]
|
||||||
expected_scores = torch.tensor([0.7999, 0.7894, 0.6331, 0.4720, 0.4382]).to(torch_device)
|
expected_scores = torch.tensor([0.7999, 0.7894, 0.6331, 0.4720, 0.4382]).to(torch_device)
|
||||||
@@ -612,9 +612,9 @@ class DeformableDetrModelIntegrationTests(unittest.TestCase):
|
|||||||
"SenseTime/deformable-detr-with-box-refine-two-stage"
|
"SenseTime/deformable-detr-with-box-refine-two-stage"
|
||||||
).to(torch_device)
|
).to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
pixel_values = encoding["pixel_values"].to(torch_device)
|
pixel_values = encoding["pixel_values"].to(torch_device)
|
||||||
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
||||||
|
|
||||||
@@ -639,9 +639,9 @@ class DeformableDetrModelIntegrationTests(unittest.TestCase):
|
|||||||
|
|
||||||
@require_torch_gpu
|
@require_torch_gpu
|
||||||
def test_inference_object_detection_head_equivalence_cpu_gpu(self):
|
def test_inference_object_detection_head_equivalence_cpu_gpu(self):
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
encoding = feature_extractor(images=image, return_tensors="pt")
|
encoding = image_processor(images=image, return_tensors="pt")
|
||||||
pixel_values = encoding["pixel_values"]
|
pixel_values = encoding["pixel_values"]
|
||||||
pixel_mask = encoding["pixel_mask"]
|
pixel_mask = encoding["pixel_mask"]
|
||||||
|
|
||||||
|
|||||||
@@ -55,7 +55,7 @@ if is_torch_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import DeiTFeatureExtractor
|
from transformers import DeiTImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class DeiTModelTester:
|
class DeiTModelTester:
|
||||||
@@ -381,9 +381,9 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class DeiTModelIntegrationTest(unittest.TestCase):
|
class DeiTModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return (
|
return (
|
||||||
DeiTFeatureExtractor.from_pretrained("facebook/deit-base-distilled-patch16-224")
|
DeiTImageProcessor.from_pretrained("facebook/deit-base-distilled-patch16-224")
|
||||||
if is_vision_available()
|
if is_vision_available()
|
||||||
else None
|
else None
|
||||||
)
|
)
|
||||||
@@ -394,9 +394,9 @@ class DeiTModelIntegrationTest(unittest.TestCase):
|
|||||||
torch_device
|
torch_device
|
||||||
)
|
)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
@@ -420,10 +420,10 @@ class DeiTModelIntegrationTest(unittest.TestCase):
|
|||||||
model = DeiTModel.from_pretrained(
|
model = DeiTModel.from_pretrained(
|
||||||
"facebook/deit-base-distilled-patch16-224", torch_dtype=torch.float16, device_map="auto"
|
"facebook/deit-base-distilled-patch16-224", torch_dtype=torch.float16, device_map="auto"
|
||||||
)
|
)
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
|
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt")
|
inputs = image_processor(images=image, return_tensors="pt")
|
||||||
pixel_values = inputs.pixel_values.to(torch_device)
|
pixel_values = inputs.pixel_values.to(torch_device)
|
||||||
|
|
||||||
# forward pass to make sure inference works in fp16
|
# forward pass to make sure inference works in fp16
|
||||||
|
|||||||
@@ -46,7 +46,7 @@ if is_tf_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import DeiTFeatureExtractor
|
from transformers import DeiTImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class TFDeiTModelTester:
|
class TFDeiTModelTester:
|
||||||
@@ -266,9 +266,9 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class DeiTModelIntegrationTest(unittest.TestCase):
|
class DeiTModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return (
|
return (
|
||||||
DeiTFeatureExtractor.from_pretrained("facebook/deit-base-distilled-patch16-224")
|
DeiTImageProcessor.from_pretrained("facebook/deit-base-distilled-patch16-224")
|
||||||
if is_vision_available()
|
if is_vision_available()
|
||||||
else None
|
else None
|
||||||
)
|
)
|
||||||
@@ -277,9 +277,9 @@ class DeiTModelIntegrationTest(unittest.TestCase):
|
|||||||
def test_inference_image_classification_head(self):
|
def test_inference_image_classification_head(self):
|
||||||
model = TFDeiTForImageClassificationWithTeacher.from_pretrained("facebook/deit-base-distilled-patch16-224")
|
model = TFDeiTForImageClassificationWithTeacher.from_pretrained("facebook/deit-base-distilled-patch16-224")
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="tf")
|
inputs = image_processor(images=image, return_tensors="tf")
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
outputs = model(**inputs)
|
outputs = model(**inputs)
|
||||||
|
|||||||
@@ -38,7 +38,7 @@ if is_timm_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import DetrFeatureExtractor
|
from transformers import DetrImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class DetrModelTester:
|
class DetrModelTester:
|
||||||
@@ -512,15 +512,15 @@ def prepare_img():
|
|||||||
@slow
|
@slow
|
||||||
class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return DetrFeatureExtractor.from_pretrained("facebook/detr-resnet-50") if is_vision_available() else None
|
return DetrImageProcessor.from_pretrained("facebook/detr-resnet-50") if is_vision_available() else None
|
||||||
|
|
||||||
def test_inference_no_head(self):
|
def test_inference_no_head(self):
|
||||||
model = DetrModel.from_pretrained("facebook/detr-resnet-50").to(torch_device)
|
model = DetrModel.from_pretrained("facebook/detr-resnet-50").to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
outputs = model(**encoding)
|
outputs = model(**encoding)
|
||||||
@@ -535,9 +535,9 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
|||||||
def test_inference_object_detection_head(self):
|
def test_inference_object_detection_head(self):
|
||||||
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50").to(torch_device)
|
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50").to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
pixel_values = encoding["pixel_values"].to(torch_device)
|
pixel_values = encoding["pixel_values"].to(torch_device)
|
||||||
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
||||||
|
|
||||||
@@ -560,7 +560,7 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
|||||||
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_slice_boxes, atol=1e-4))
|
self.assertTrue(torch.allclose(outputs.pred_boxes[0, :3, :3], expected_slice_boxes, atol=1e-4))
|
||||||
|
|
||||||
# verify postprocessing
|
# verify postprocessing
|
||||||
results = feature_extractor.post_process_object_detection(
|
results = image_processor.post_process_object_detection(
|
||||||
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
|
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
|
||||||
)[0]
|
)[0]
|
||||||
expected_scores = torch.tensor([0.9982, 0.9960, 0.9955, 0.9988, 0.9987]).to(torch_device)
|
expected_scores = torch.tensor([0.9982, 0.9960, 0.9955, 0.9988, 0.9987]).to(torch_device)
|
||||||
@@ -575,9 +575,9 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
|||||||
def test_inference_panoptic_segmentation_head(self):
|
def test_inference_panoptic_segmentation_head(self):
|
||||||
model = DetrForSegmentation.from_pretrained("facebook/detr-resnet-50-panoptic").to(torch_device)
|
model = DetrForSegmentation.from_pretrained("facebook/detr-resnet-50-panoptic").to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
pixel_values = encoding["pixel_values"].to(torch_device)
|
pixel_values = encoding["pixel_values"].to(torch_device)
|
||||||
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
pixel_mask = encoding["pixel_mask"].to(torch_device)
|
||||||
|
|
||||||
@@ -607,7 +607,7 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
|||||||
self.assertTrue(torch.allclose(outputs.pred_masks[0, 0, :3, :3], expected_slice_masks, atol=1e-3))
|
self.assertTrue(torch.allclose(outputs.pred_masks[0, 0, :3, :3], expected_slice_masks, atol=1e-3))
|
||||||
|
|
||||||
# verify postprocessing
|
# verify postprocessing
|
||||||
results = feature_extractor.post_process_panoptic_segmentation(
|
results = image_processor.post_process_panoptic_segmentation(
|
||||||
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
|
outputs, threshold=0.3, target_sizes=[image.size[::-1]]
|
||||||
)[0]
|
)[0]
|
||||||
|
|
||||||
@@ -633,9 +633,9 @@ class DetrModelIntegrationTestsTimmBackbone(unittest.TestCase):
|
|||||||
@slow
|
@slow
|
||||||
class DetrModelIntegrationTests(unittest.TestCase):
|
class DetrModelIntegrationTests(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return (
|
return (
|
||||||
DetrFeatureExtractor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
|
DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
|
||||||
if is_vision_available()
|
if is_vision_available()
|
||||||
else None
|
else None
|
||||||
)
|
)
|
||||||
@@ -643,9 +643,9 @@ class DetrModelIntegrationTests(unittest.TestCase):
|
|||||||
def test_inference_no_head(self):
|
def test_inference_no_head(self):
|
||||||
model = DetrModel.from_pretrained("facebook/detr-resnet-50", revision="no_timm").to(torch_device)
|
model = DetrModel.from_pretrained("facebook/detr-resnet-50", revision="no_timm").to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
encoding = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
encoding = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
outputs = model(**encoding)
|
outputs = model(**encoding)
|
||||||
|
|||||||
@@ -367,16 +367,16 @@ class DinatModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase):
|
|||||||
@require_torch
|
@require_torch
|
||||||
class DinatModelIntegrationTest(unittest.TestCase):
|
class DinatModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return AutoImageProcessor.from_pretrained("shi-labs/dinat-mini-in1k-224") if is_vision_available() else None
|
return AutoImageProcessor.from_pretrained("shi-labs/dinat-mini-in1k-224") if is_vision_available() else None
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
def test_inference_image_classification_head(self):
|
def test_inference_image_classification_head(self):
|
||||||
model = DinatForImageClassification.from_pretrained("shi-labs/dinat-mini-in1k-224").to(torch_device)
|
model = DinatForImageClassification.from_pretrained("shi-labs/dinat-mini-in1k-224").to(torch_device)
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
|
|
||||||
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
|
image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
|
|||||||
@@ -25,7 +25,7 @@ if is_torch_available():
|
|||||||
from transformers import AutoModelForImageClassification
|
from transformers import AutoModelForImageClassification
|
||||||
|
|
||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from transformers import AutoFeatureExtractor
|
from transformers import AutoImageProcessor
|
||||||
|
|
||||||
|
|
||||||
@require_torch
|
@require_torch
|
||||||
@@ -33,7 +33,7 @@ if is_vision_available():
|
|||||||
class DiTIntegrationTest(unittest.TestCase):
|
class DiTIntegrationTest(unittest.TestCase):
|
||||||
@slow
|
@slow
|
||||||
def test_for_image_classification(self):
|
def test_for_image_classification(self):
|
||||||
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
|
image_processor = AutoImageProcessor.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
|
||||||
model = AutoModelForImageClassification.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
|
model = AutoModelForImageClassification.from_pretrained("microsoft/dit-base-finetuned-rvlcdip")
|
||||||
model.to(torch_device)
|
model.to(torch_device)
|
||||||
|
|
||||||
@@ -43,7 +43,7 @@ class DiTIntegrationTest(unittest.TestCase):
|
|||||||
|
|
||||||
image = dataset["train"][0]["image"].convert("RGB")
|
image = dataset["train"][0]["image"].convert("RGB")
|
||||||
|
|
||||||
inputs = feature_extractor(image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
|
|||||||
@@ -39,7 +39,7 @@ if is_torch_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import DPTFeatureExtractor
|
from transformers import DPTImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class DPTModelTester:
|
class DPTModelTester:
|
||||||
@@ -293,11 +293,11 @@ def prepare_img():
|
|||||||
@slow
|
@slow
|
||||||
class DPTModelIntegrationTest(unittest.TestCase):
|
class DPTModelIntegrationTest(unittest.TestCase):
|
||||||
def test_inference_depth_estimation(self):
|
def test_inference_depth_estimation(self):
|
||||||
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large")
|
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large")
|
||||||
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large").to(torch_device)
|
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large").to(torch_device)
|
||||||
|
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
@@ -315,11 +315,11 @@ class DPTModelIntegrationTest(unittest.TestCase):
|
|||||||
self.assertTrue(torch.allclose(outputs.predicted_depth[0, :3, :3], expected_slice, atol=1e-4))
|
self.assertTrue(torch.allclose(outputs.predicted_depth[0, :3, :3], expected_slice, atol=1e-4))
|
||||||
|
|
||||||
def test_inference_semantic_segmentation(self):
|
def test_inference_semantic_segmentation(self):
|
||||||
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large-ade")
|
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large-ade")
|
||||||
model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade").to(torch_device)
|
model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade").to(torch_device)
|
||||||
|
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
@@ -336,11 +336,11 @@ class DPTModelIntegrationTest(unittest.TestCase):
|
|||||||
self.assertTrue(torch.allclose(outputs.logits[0, 0, :3, :3], expected_slice, atol=1e-4))
|
self.assertTrue(torch.allclose(outputs.logits[0, 0, :3, :3], expected_slice, atol=1e-4))
|
||||||
|
|
||||||
def test_post_processing_semantic_segmentation(self):
|
def test_post_processing_semantic_segmentation(self):
|
||||||
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large-ade")
|
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large-ade")
|
||||||
model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade").to(torch_device)
|
model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade").to(torch_device)
|
||||||
|
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
@@ -348,10 +348,10 @@ class DPTModelIntegrationTest(unittest.TestCase):
|
|||||||
|
|
||||||
outputs.logits = outputs.logits.detach().cpu()
|
outputs.logits = outputs.logits.detach().cpu()
|
||||||
|
|
||||||
segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
|
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs, target_sizes=[(500, 300)])
|
||||||
expected_shape = torch.Size((500, 300))
|
expected_shape = torch.Size((500, 300))
|
||||||
self.assertEqual(segmentation[0].shape, expected_shape)
|
self.assertEqual(segmentation[0].shape, expected_shape)
|
||||||
|
|
||||||
segmentation = feature_extractor.post_process_semantic_segmentation(outputs=outputs)
|
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs)
|
||||||
expected_shape = torch.Size((480, 480))
|
expected_shape = torch.Size((480, 480))
|
||||||
self.assertEqual(segmentation[0].shape, expected_shape)
|
self.assertEqual(segmentation[0].shape, expected_shape)
|
||||||
|
|||||||
@@ -39,7 +39,7 @@ if is_torch_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import DPTFeatureExtractor
|
from transformers import DPTImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class DPTModelTester:
|
class DPTModelTester:
|
||||||
@@ -314,11 +314,11 @@ def prepare_img():
|
|||||||
@slow
|
@slow
|
||||||
class DPTModelIntegrationTest(unittest.TestCase):
|
class DPTModelIntegrationTest(unittest.TestCase):
|
||||||
def test_inference_depth_estimation(self):
|
def test_inference_depth_estimation(self):
|
||||||
feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-hybrid-midas")
|
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
|
||||||
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to(torch_device)
|
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to(torch_device)
|
||||||
|
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
|
|||||||
@@ -444,7 +444,7 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class EfficientFormerModelIntegrationTest(unittest.TestCase):
|
class EfficientFormerModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return (
|
return (
|
||||||
EfficientFormerImageProcessor.from_pretrained("snap-research/efficientformer-l1-300")
|
EfficientFormerImageProcessor.from_pretrained("snap-research/efficientformer-l1-300")
|
||||||
if is_vision_available()
|
if is_vision_available()
|
||||||
@@ -457,9 +457,9 @@ class EfficientFormerModelIntegrationTest(unittest.TestCase):
|
|||||||
torch_device
|
torch_device
|
||||||
)
|
)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
@@ -478,9 +478,9 @@ class EfficientFormerModelIntegrationTest(unittest.TestCase):
|
|||||||
"snap-research/efficientformer-l1-300"
|
"snap-research/efficientformer-l1-300"
|
||||||
).to(torch_device)
|
).to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
|
|||||||
@@ -37,7 +37,7 @@ if is_torch_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import GLPNFeatureExtractor
|
from transformers import GLPNImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class GLPNConfigTester(ConfigTester):
|
class GLPNConfigTester(ConfigTester):
|
||||||
@@ -337,11 +337,11 @@ def prepare_img():
|
|||||||
class GLPNModelIntegrationTest(unittest.TestCase):
|
class GLPNModelIntegrationTest(unittest.TestCase):
|
||||||
@slow
|
@slow
|
||||||
def test_inference_depth_estimation(self):
|
def test_inference_depth_estimation(self):
|
||||||
feature_extractor = GLPNFeatureExtractor.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
image_processor = GLPNImageProcessor.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||||
model = GLPNForDepthEstimation.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)
|
model = GLPNForDepthEstimation.from_pretrained(GLPN_PRETRAINED_MODEL_ARCHIVE_LIST[0]).to(torch_device)
|
||||||
|
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
|
|||||||
@@ -49,7 +49,7 @@ if is_torch_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import ImageGPTFeatureExtractor
|
from transformers import ImageGPTImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class ImageGPTModelTester:
|
class ImageGPTModelTester:
|
||||||
@@ -535,16 +535,16 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class ImageGPTModelIntegrationTest(unittest.TestCase):
|
class ImageGPTModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return ImageGPTFeatureExtractor.from_pretrained("openai/imagegpt-small") if is_vision_available() else None
|
return ImageGPTImageProcessor.from_pretrained("openai/imagegpt-small") if is_vision_available() else None
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
def test_inference_causal_lm_head(self):
|
def test_inference_causal_lm_head(self):
|
||||||
model = ImageGPTForCausalImageModeling.from_pretrained("openai/imagegpt-small").to(torch_device)
|
model = ImageGPTForCausalImageModeling.from_pretrained("openai/imagegpt-small").to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
|
|||||||
@@ -45,7 +45,7 @@ if is_torch_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import LayoutLMv3FeatureExtractor
|
from transformers import LayoutLMv3ImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class LayoutLMv3ModelTester:
|
class LayoutLMv3ModelTester:
|
||||||
@@ -382,16 +382,16 @@ def prepare_img():
|
|||||||
@require_torch
|
@require_torch
|
||||||
class LayoutLMv3ModelIntegrationTest(unittest.TestCase):
|
class LayoutLMv3ModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return LayoutLMv3FeatureExtractor(apply_ocr=False) if is_vision_available() else None
|
return LayoutLMv3ImageProcessor(apply_ocr=False) if is_vision_available() else None
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
def test_inference_no_head(self):
|
def test_inference_no_head(self):
|
||||||
model = LayoutLMv3Model.from_pretrained("microsoft/layoutlmv3-base").to(torch_device)
|
model = LayoutLMv3Model.from_pretrained("microsoft/layoutlmv3-base").to(torch_device)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values.to(torch_device)
|
pixel_values = image_processor(images=image, return_tensors="pt").pixel_values.to(torch_device)
|
||||||
|
|
||||||
input_ids = torch.tensor([[1, 2]])
|
input_ids = torch.tensor([[1, 2]])
|
||||||
bbox = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]]).unsqueeze(0)
|
bbox = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]]).unsqueeze(0)
|
||||||
|
|||||||
@@ -51,7 +51,7 @@ if is_tf_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import LayoutLMv3FeatureExtractor
|
from transformers import LayoutLMv3ImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class TFLayoutLMv3ModelTester:
|
class TFLayoutLMv3ModelTester:
|
||||||
@@ -482,16 +482,16 @@ def prepare_img():
|
|||||||
@require_tf
|
@require_tf
|
||||||
class TFLayoutLMv3ModelIntegrationTest(unittest.TestCase):
|
class TFLayoutLMv3ModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return LayoutLMv3FeatureExtractor(apply_ocr=False) if is_vision_available() else None
|
return LayoutLMv3ImageProcessor(apply_ocr=False) if is_vision_available() else None
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
def test_inference_no_head(self):
|
def test_inference_no_head(self):
|
||||||
model = TFLayoutLMv3Model.from_pretrained("microsoft/layoutlmv3-base")
|
model = TFLayoutLMv3Model.from_pretrained("microsoft/layoutlmv3-base")
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
pixel_values = feature_extractor(images=image, return_tensors="tf").pixel_values
|
pixel_values = image_processor(images=image, return_tensors="tf").pixel_values
|
||||||
|
|
||||||
input_ids = tf.constant([[1, 2]])
|
input_ids = tf.constant([[1, 2]])
|
||||||
bbox = tf.expand_dims(tf.constant([[1, 2, 3, 4], [5, 6, 7, 8]]), axis=0)
|
bbox = tf.expand_dims(tf.constant([[1, 2, 3, 4], [5, 6, 7, 8]]), axis=0)
|
||||||
|
|||||||
@@ -36,7 +36,7 @@ from transformers.utils import FEATURE_EXTRACTOR_NAME, cached_property, is_pytes
|
|||||||
if is_pytesseract_available():
|
if is_pytesseract_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import LayoutLMv2FeatureExtractor, LayoutXLMProcessor
|
from transformers import LayoutLMv2ImageProcessor, LayoutXLMProcessor
|
||||||
|
|
||||||
|
|
||||||
@require_pytesseract
|
@require_pytesseract
|
||||||
@@ -47,7 +47,7 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
|||||||
rust_tokenizer_class = LayoutXLMTokenizerFast
|
rust_tokenizer_class = LayoutXLMTokenizerFast
|
||||||
|
|
||||||
def setUp(self):
|
def setUp(self):
|
||||||
feature_extractor_map = {
|
image_processor_map = {
|
||||||
"do_resize": True,
|
"do_resize": True,
|
||||||
"size": 224,
|
"size": 224,
|
||||||
"apply_ocr": True,
|
"apply_ocr": True,
|
||||||
@@ -56,7 +56,7 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
|||||||
self.tmpdirname = tempfile.mkdtemp()
|
self.tmpdirname = tempfile.mkdtemp()
|
||||||
self.feature_extraction_file = os.path.join(self.tmpdirname, FEATURE_EXTRACTOR_NAME)
|
self.feature_extraction_file = os.path.join(self.tmpdirname, FEATURE_EXTRACTOR_NAME)
|
||||||
with open(self.feature_extraction_file, "w", encoding="utf-8") as fp:
|
with open(self.feature_extraction_file, "w", encoding="utf-8") as fp:
|
||||||
fp.write(json.dumps(feature_extractor_map) + "\n")
|
fp.write(json.dumps(image_processor_map) + "\n")
|
||||||
|
|
||||||
# taken from `test_tokenization_layoutxlm.LayoutXLMTokenizationTest.test_save_pretrained`
|
# taken from `test_tokenization_layoutxlm.LayoutXLMTokenizationTest.test_save_pretrained`
|
||||||
self.tokenizer_pretrained_name = "hf-internal-testing/tiny-random-layoutxlm"
|
self.tokenizer_pretrained_name = "hf-internal-testing/tiny-random-layoutxlm"
|
||||||
@@ -70,8 +70,8 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
|||||||
def get_tokenizers(self, **kwargs) -> List[PreTrainedTokenizerBase]:
|
def get_tokenizers(self, **kwargs) -> List[PreTrainedTokenizerBase]:
|
||||||
return [self.get_tokenizer(**kwargs), self.get_rust_tokenizer(**kwargs)]
|
return [self.get_tokenizer(**kwargs), self.get_rust_tokenizer(**kwargs)]
|
||||||
|
|
||||||
def get_feature_extractor(self, **kwargs):
|
def get_image_processor(self, **kwargs):
|
||||||
return LayoutLMv2FeatureExtractor.from_pretrained(self.tmpdirname, **kwargs)
|
return LayoutLMv2ImageProcessor.from_pretrained(self.tmpdirname, **kwargs)
|
||||||
|
|
||||||
def tearDown(self):
|
def tearDown(self):
|
||||||
shutil.rmtree(self.tmpdirname)
|
shutil.rmtree(self.tmpdirname)
|
||||||
@@ -88,10 +88,10 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
|||||||
return image_inputs
|
return image_inputs
|
||||||
|
|
||||||
def test_save_load_pretrained_default(self):
|
def test_save_load_pretrained_default(self):
|
||||||
feature_extractor = self.get_feature_extractor()
|
image_processor = self.get_image_processor()
|
||||||
tokenizers = self.get_tokenizers()
|
tokenizers = self.get_tokenizers()
|
||||||
for tokenizer in tokenizers:
|
for tokenizer in tokenizers:
|
||||||
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||||
|
|
||||||
processor.save_pretrained(self.tmpdirname)
|
processor.save_pretrained(self.tmpdirname)
|
||||||
processor = LayoutXLMProcessor.from_pretrained(self.tmpdirname)
|
processor = LayoutXLMProcessor.from_pretrained(self.tmpdirname)
|
||||||
@@ -99,16 +99,16 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
|||||||
self.assertEqual(processor.tokenizer.get_vocab(), tokenizer.get_vocab())
|
self.assertEqual(processor.tokenizer.get_vocab(), tokenizer.get_vocab())
|
||||||
self.assertIsInstance(processor.tokenizer, (LayoutXLMTokenizer, LayoutXLMTokenizerFast))
|
self.assertIsInstance(processor.tokenizer, (LayoutXLMTokenizer, LayoutXLMTokenizerFast))
|
||||||
|
|
||||||
self.assertEqual(processor.feature_extractor.to_json_string(), feature_extractor.to_json_string())
|
self.assertEqual(processor.image_processor.to_json_string(), image_processor.to_json_string())
|
||||||
self.assertIsInstance(processor.feature_extractor, LayoutLMv2FeatureExtractor)
|
self.assertIsInstance(processor.image_processor, LayoutLMv2ImageProcessor)
|
||||||
|
|
||||||
def test_save_load_pretrained_additional_features(self):
|
def test_save_load_pretrained_additional_features(self):
|
||||||
processor = LayoutXLMProcessor(feature_extractor=self.get_feature_extractor(), tokenizer=self.get_tokenizer())
|
processor = LayoutXLMProcessor(image_processor=self.get_image_processor(), tokenizer=self.get_tokenizer())
|
||||||
processor.save_pretrained(self.tmpdirname)
|
processor.save_pretrained(self.tmpdirname)
|
||||||
|
|
||||||
# slow tokenizer
|
# slow tokenizer
|
||||||
tokenizer_add_kwargs = self.get_tokenizer(bos_token="(BOS)", eos_token="(EOS)")
|
tokenizer_add_kwargs = self.get_tokenizer(bos_token="(BOS)", eos_token="(EOS)")
|
||||||
feature_extractor_add_kwargs = self.get_feature_extractor(do_resize=False, size=30)
|
image_processor_add_kwargs = self.get_image_processor(do_resize=False, size=30)
|
||||||
|
|
||||||
processor = LayoutXLMProcessor.from_pretrained(
|
processor = LayoutXLMProcessor.from_pretrained(
|
||||||
self.tmpdirname,
|
self.tmpdirname,
|
||||||
@@ -122,12 +122,12 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
|||||||
self.assertEqual(processor.tokenizer.get_vocab(), tokenizer_add_kwargs.get_vocab())
|
self.assertEqual(processor.tokenizer.get_vocab(), tokenizer_add_kwargs.get_vocab())
|
||||||
self.assertIsInstance(processor.tokenizer, LayoutXLMTokenizer)
|
self.assertIsInstance(processor.tokenizer, LayoutXLMTokenizer)
|
||||||
|
|
||||||
self.assertEqual(processor.feature_extractor.to_json_string(), feature_extractor_add_kwargs.to_json_string())
|
self.assertEqual(processor.image_processor.to_json_string(), image_processor_add_kwargs.to_json_string())
|
||||||
self.assertIsInstance(processor.feature_extractor, LayoutLMv2FeatureExtractor)
|
self.assertIsInstance(processor.image_processor, LayoutLMv2ImageProcessor)
|
||||||
|
|
||||||
# fast tokenizer
|
# fast tokenizer
|
||||||
tokenizer_add_kwargs = self.get_rust_tokenizer(bos_token="(BOS)", eos_token="(EOS)")
|
tokenizer_add_kwargs = self.get_rust_tokenizer(bos_token="(BOS)", eos_token="(EOS)")
|
||||||
feature_extractor_add_kwargs = self.get_feature_extractor(do_resize=False, size=30)
|
image_processor_add_kwargs = self.get_image_processor(do_resize=False, size=30)
|
||||||
|
|
||||||
processor = LayoutXLMProcessor.from_pretrained(
|
processor = LayoutXLMProcessor.from_pretrained(
|
||||||
self.tmpdirname, use_xlm=True, bos_token="(BOS)", eos_token="(EOS)", do_resize=False, size=30
|
self.tmpdirname, use_xlm=True, bos_token="(BOS)", eos_token="(EOS)", do_resize=False, size=30
|
||||||
@@ -136,14 +136,14 @@ class LayoutXLMProcessorTest(unittest.TestCase):
|
|||||||
self.assertEqual(processor.tokenizer.get_vocab(), tokenizer_add_kwargs.get_vocab())
|
self.assertEqual(processor.tokenizer.get_vocab(), tokenizer_add_kwargs.get_vocab())
|
||||||
self.assertIsInstance(processor.tokenizer, LayoutXLMTokenizerFast)
|
self.assertIsInstance(processor.tokenizer, LayoutXLMTokenizerFast)
|
||||||
|
|
||||||
self.assertEqual(processor.feature_extractor.to_json_string(), feature_extractor_add_kwargs.to_json_string())
|
self.assertEqual(processor.image_processor.to_json_string(), image_processor_add_kwargs.to_json_string())
|
||||||
self.assertIsInstance(processor.feature_extractor, LayoutLMv2FeatureExtractor)
|
self.assertIsInstance(processor.image_processor, LayoutLMv2ImageProcessor)
|
||||||
|
|
||||||
def test_model_input_names(self):
|
def test_model_input_names(self):
|
||||||
feature_extractor = self.get_feature_extractor()
|
image_processor = self.get_image_processor()
|
||||||
tokenizer = self.get_tokenizer()
|
tokenizer = self.get_tokenizer()
|
||||||
|
|
||||||
processor = LayoutXLMProcessor(tokenizer=tokenizer, feature_extractor=feature_extractor)
|
processor = LayoutXLMProcessor(tokenizer=tokenizer, image_processor=image_processor)
|
||||||
|
|
||||||
input_str = "lower newer"
|
input_str = "lower newer"
|
||||||
image_input = self.prepare_image_inputs()
|
image_input = self.prepare_image_inputs()
|
||||||
@@ -215,15 +215,15 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
|
|||||||
def test_processor_case_1(self):
|
def test_processor_case_1(self):
|
||||||
# case 1: document image classification (training, inference) + token classification (inference), apply_ocr = True
|
# case 1: document image classification (training, inference) + token classification (inference), apply_ocr = True
|
||||||
|
|
||||||
feature_extractor = LayoutLMv2FeatureExtractor()
|
image_processor = LayoutLMv2ImageProcessor()
|
||||||
tokenizers = self.get_tokenizers
|
tokenizers = self.get_tokenizers
|
||||||
images = self.get_images
|
images = self.get_images
|
||||||
|
|
||||||
for tokenizer in tokenizers:
|
for tokenizer in tokenizers:
|
||||||
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||||
|
|
||||||
# not batched
|
# not batched
|
||||||
input_feat_extract = feature_extractor(images[0], return_tensors="pt")
|
input_feat_extract = image_processor(images[0], return_tensors="pt")
|
||||||
input_processor = processor(images[0], return_tensors="pt")
|
input_processor = processor(images[0], return_tensors="pt")
|
||||||
|
|
||||||
# verify keys
|
# verify keys
|
||||||
@@ -245,7 +245,7 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
|
|||||||
self.assertSequenceEqual(decoding, expected_decoding)
|
self.assertSequenceEqual(decoding, expected_decoding)
|
||||||
|
|
||||||
# batched
|
# batched
|
||||||
input_feat_extract = feature_extractor(images, return_tensors="pt")
|
input_feat_extract = image_processor(images, return_tensors="pt")
|
||||||
input_processor = processor(images, padding=True, return_tensors="pt")
|
input_processor = processor(images, padding=True, return_tensors="pt")
|
||||||
|
|
||||||
# verify keys
|
# verify keys
|
||||||
@@ -270,12 +270,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
|
|||||||
def test_processor_case_2(self):
|
def test_processor_case_2(self):
|
||||||
# case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False
|
# case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False
|
||||||
|
|
||||||
feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
|
image_processor = LayoutLMv2ImageProcessor(apply_ocr=False)
|
||||||
tokenizers = self.get_tokenizers
|
tokenizers = self.get_tokenizers
|
||||||
images = self.get_images
|
images = self.get_images
|
||||||
|
|
||||||
for tokenizer in tokenizers:
|
for tokenizer in tokenizers:
|
||||||
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||||
|
|
||||||
# not batched
|
# not batched
|
||||||
words = ["hello", "world"]
|
words = ["hello", "world"]
|
||||||
@@ -324,12 +324,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
|
|||||||
def test_processor_case_3(self):
|
def test_processor_case_3(self):
|
||||||
# case 3: token classification (training), apply_ocr=False
|
# case 3: token classification (training), apply_ocr=False
|
||||||
|
|
||||||
feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
|
image_processor = LayoutLMv2ImageProcessor(apply_ocr=False)
|
||||||
tokenizers = self.get_tokenizers
|
tokenizers = self.get_tokenizers
|
||||||
images = self.get_images
|
images = self.get_images
|
||||||
|
|
||||||
for tokenizer in tokenizers:
|
for tokenizer in tokenizers:
|
||||||
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||||
|
|
||||||
# not batched
|
# not batched
|
||||||
words = ["weirdly", "world"]
|
words = ["weirdly", "world"]
|
||||||
@@ -389,12 +389,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
|
|||||||
def test_processor_case_4(self):
|
def test_processor_case_4(self):
|
||||||
# case 4: visual question answering (inference), apply_ocr=True
|
# case 4: visual question answering (inference), apply_ocr=True
|
||||||
|
|
||||||
feature_extractor = LayoutLMv2FeatureExtractor()
|
image_processor = LayoutLMv2ImageProcessor()
|
||||||
tokenizers = self.get_tokenizers
|
tokenizers = self.get_tokenizers
|
||||||
images = self.get_images
|
images = self.get_images
|
||||||
|
|
||||||
for tokenizer in tokenizers:
|
for tokenizer in tokenizers:
|
||||||
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||||
|
|
||||||
# not batched
|
# not batched
|
||||||
question = "What's his name?"
|
question = "What's his name?"
|
||||||
@@ -440,12 +440,12 @@ class LayoutXLMProcessorIntegrationTests(unittest.TestCase):
|
|||||||
def test_processor_case_5(self):
|
def test_processor_case_5(self):
|
||||||
# case 5: visual question answering (inference), apply_ocr=False
|
# case 5: visual question answering (inference), apply_ocr=False
|
||||||
|
|
||||||
feature_extractor = LayoutLMv2FeatureExtractor(apply_ocr=False)
|
image_processor = LayoutLMv2ImageProcessor(apply_ocr=False)
|
||||||
tokenizers = self.get_tokenizers
|
tokenizers = self.get_tokenizers
|
||||||
images = self.get_images
|
images = self.get_images
|
||||||
|
|
||||||
for tokenizer in tokenizers:
|
for tokenizer in tokenizers:
|
||||||
processor = LayoutXLMProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
|
processor = LayoutXLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
|
||||||
|
|
||||||
# not batched
|
# not batched
|
||||||
question = "What's his name?"
|
question = "What's his name?"
|
||||||
|
|||||||
@@ -46,7 +46,7 @@ if is_torch_available():
|
|||||||
if is_vision_available():
|
if is_vision_available():
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
|
|
||||||
from transformers import LevitFeatureExtractor
|
from transformers import LevitImageProcessor
|
||||||
|
|
||||||
|
|
||||||
class LevitConfigTester(ConfigTester):
|
class LevitConfigTester(ConfigTester):
|
||||||
@@ -409,8 +409,8 @@ def prepare_img():
|
|||||||
@require_vision
|
@require_vision
|
||||||
class LevitModelIntegrationTest(unittest.TestCase):
|
class LevitModelIntegrationTest(unittest.TestCase):
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return LevitFeatureExtractor.from_pretrained(LEVIT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
return LevitImageProcessor.from_pretrained(LEVIT_PRETRAINED_MODEL_ARCHIVE_LIST[0])
|
||||||
|
|
||||||
@slow
|
@slow
|
||||||
def test_inference_image_classification_head(self):
|
def test_inference_image_classification_head(self):
|
||||||
@@ -418,9 +418,9 @@ class LevitModelIntegrationTest(unittest.TestCase):
|
|||||||
torch_device
|
torch_device
|
||||||
)
|
)
|
||||||
|
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(images=image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(images=image, return_tensors="pt").to(torch_device)
|
||||||
|
|
||||||
# forward pass
|
# forward pass
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
|
|||||||
@@ -545,9 +545,9 @@ class Mask2FormerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
|
|||||||
self.assertEqual(segmentation[0].shape, target_sizes[0])
|
self.assertEqual(segmentation[0].shape, target_sizes[0])
|
||||||
|
|
||||||
def test_post_process_instance_segmentation(self):
|
def test_post_process_instance_segmentation(self):
|
||||||
feature_extractor = self.image_processing_class(num_labels=self.image_processor_tester.num_classes)
|
image_processor = self.image_processing_class(num_labels=self.image_processor_tester.num_classes)
|
||||||
outputs = self.image_processor_tester.get_fake_mask2former_outputs()
|
outputs = self.image_processor_tester.get_fake_mask2former_outputs()
|
||||||
segmentation = feature_extractor.post_process_instance_segmentation(outputs, threshold=0)
|
segmentation = image_processor.post_process_instance_segmentation(outputs, threshold=0)
|
||||||
|
|
||||||
self.assertTrue(len(segmentation) == self.image_processor_tester.batch_size)
|
self.assertTrue(len(segmentation) == self.image_processor_tester.batch_size)
|
||||||
for el in segmentation:
|
for el in segmentation:
|
||||||
@@ -556,7 +556,7 @@ class Mask2FormerImageProcessingTest(ImageProcessingSavingTestMixin, unittest.Te
|
|||||||
self.assertEqual(type(el["segments_info"]), list)
|
self.assertEqual(type(el["segments_info"]), list)
|
||||||
self.assertEqual(el["segmentation"].shape, (384, 384))
|
self.assertEqual(el["segmentation"].shape, (384, 384))
|
||||||
|
|
||||||
segmentation = feature_extractor.post_process_instance_segmentation(
|
segmentation = image_processor.post_process_instance_segmentation(
|
||||||
outputs, threshold=0, return_binary_maps=True
|
outputs, threshold=0, return_binary_maps=True
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|||||||
@@ -325,14 +325,14 @@ class Mask2FormerModelIntegrationTest(unittest.TestCase):
|
|||||||
return "facebook/mask2former-swin-small-coco-instance"
|
return "facebook/mask2former-swin-small-coco-instance"
|
||||||
|
|
||||||
@cached_property
|
@cached_property
|
||||||
def default_feature_extractor(self):
|
def default_image_processor(self):
|
||||||
return Mask2FormerImageProcessor.from_pretrained(self.model_checkpoints) if is_vision_available() else None
|
return Mask2FormerImageProcessor.from_pretrained(self.model_checkpoints) if is_vision_available() else None
|
||||||
|
|
||||||
def test_inference_no_head(self):
|
def test_inference_no_head(self):
|
||||||
model = Mask2FormerModel.from_pretrained(self.model_checkpoints).to(torch_device)
|
model = Mask2FormerModel.from_pretrained(self.model_checkpoints).to(torch_device)
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(image, return_tensors="pt").to(torch_device)
|
||||||
inputs_shape = inputs["pixel_values"].shape
|
inputs_shape = inputs["pixel_values"].shape
|
||||||
# check size is divisible by 32
|
# check size is divisible by 32
|
||||||
self.assertTrue((inputs_shape[-1] % 32) == 0 and (inputs_shape[-2] % 32) == 0)
|
self.assertTrue((inputs_shape[-1] % 32) == 0 and (inputs_shape[-2] % 32) == 0)
|
||||||
@@ -371,9 +371,9 @@ class Mask2FormerModelIntegrationTest(unittest.TestCase):
|
|||||||
|
|
||||||
def test_inference_universal_segmentation_head(self):
|
def test_inference_universal_segmentation_head(self):
|
||||||
model = Mask2FormerForUniversalSegmentation.from_pretrained(self.model_checkpoints).to(torch_device).eval()
|
model = Mask2FormerForUniversalSegmentation.from_pretrained(self.model_checkpoints).to(torch_device).eval()
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
image = prepare_img()
|
image = prepare_img()
|
||||||
inputs = feature_extractor(image, return_tensors="pt").to(torch_device)
|
inputs = image_processor(image, return_tensors="pt").to(torch_device)
|
||||||
inputs_shape = inputs["pixel_values"].shape
|
inputs_shape = inputs["pixel_values"].shape
|
||||||
# check size is divisible by 32
|
# check size is divisible by 32
|
||||||
self.assertTrue((inputs_shape[-1] % 32) == 0 and (inputs_shape[-2] % 32) == 0)
|
self.assertTrue((inputs_shape[-1] % 32) == 0 and (inputs_shape[-2] % 32) == 0)
|
||||||
@@ -408,9 +408,9 @@ class Mask2FormerModelIntegrationTest(unittest.TestCase):
|
|||||||
|
|
||||||
def test_with_segmentation_maps_and_loss(self):
|
def test_with_segmentation_maps_and_loss(self):
|
||||||
model = Mask2FormerForUniversalSegmentation.from_pretrained(self.model_checkpoints).to(torch_device).eval()
|
model = Mask2FormerForUniversalSegmentation.from_pretrained(self.model_checkpoints).to(torch_device).eval()
|
||||||
feature_extractor = self.default_feature_extractor
|
image_processor = self.default_image_processor
|
||||||
|
|
||||||
inputs = feature_extractor(
|
inputs = image_processor(
|
||||||
[np.zeros((3, 800, 1333)), np.zeros((3, 800, 1333))],
|
[np.zeros((3, 800, 1333)), np.zeros((3, 800, 1333))],
|
||||||
segmentation_maps=[np.zeros((384, 384)).astype(np.float32), np.zeros((384, 384)).astype(np.float32)],
|
segmentation_maps=[np.zeros((384, 384)).astype(np.float32), np.zeros((384, 384)).astype(np.float32)],
|
||||||
return_tensors="pt",
|
return_tensors="pt",
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user