Add Image Processor Fast RT-DETR (#34354)
* add fast image processor rtdetr * add gpu/cpu test and fix docstring * remove prints * add to doc * nit docstring * avoid iterating over images/annotations several times * change torch typing * Add image processor fast documentation
This commit is contained in:
@@ -18,6 +18,49 @@ rendered properly in your Markdown viewer.
|
||||
|
||||
An image processor is in charge of preparing input features for vision models and post processing their outputs. This includes transformations such as resizing, normalization, and conversion to PyTorch, TensorFlow, Flax and Numpy tensors. It may also include model specific post-processing such as converting logits to segmentation masks.
|
||||
|
||||
Fast image processors are available for a few models and more will be added in the future. They are based on the [torchvision](https://pytorch.org/vision/stable/index.html) library and provide a significant speed-up, especially when processing on GPU.
|
||||
They have the same API as the base image processors and can be used as drop-in replacements.
|
||||
To use a fast image processor, you need to install the `torchvision` library, and set the `use_fast` argument to `True` when instantiating the image processor:
|
||||
|
||||
```python
|
||||
from transformers import AutoImageProcessor
|
||||
|
||||
processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50", use_fast=True)
|
||||
```
|
||||
|
||||
When using a fast image processor, you can also set the `device` argument to specify the device on which the processing should be done. By default, the processing is done on the same device as the inputs if the inputs are tensors, or on the CPU otherwise.
|
||||
|
||||
```python
|
||||
from torchvision.io import read_image
|
||||
from transformers import DetrImageProcessorFast
|
||||
|
||||
images = read_image("image.jpg")
|
||||
processor = DetrImageProcessorFast.from_pretrained("facebook/detr-resnet-50")
|
||||
images_processed = processor(images, return_tensors="pt", device="cuda")
|
||||
```
|
||||
|
||||
Here are some speed comparisons between the base and fast image processors for the `DETR` and `RT-DETR` models, and how they impact overall inference time:
|
||||
|
||||
<div class="flex">
|
||||
<div>
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/benchmark_results_full_pipeline_detr_fast_padded.png" />
|
||||
</div>
|
||||
<div>
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/benchmark_results_full_pipeline_detr_fast_batched_compiled.png" />
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="flex">
|
||||
<div>
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/benchmark_results_full_pipeline_rt_detr_fast_single.png" />
|
||||
</div>
|
||||
<div>
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/benchmark_results_full_pipeline_rt_detr_fast_batched.png" />
|
||||
</div>
|
||||
</div>
|
||||
|
||||
These benchmarks were run on an [AWS EC2 g5.2xlarge instance](https://aws.amazon.com/ec2/instance-types/g5/), utilizing an NVIDIA A10G Tensor Core GPU.
|
||||
|
||||
|
||||
## ImageProcessingMixin
|
||||
|
||||
|
||||
@@ -46,7 +46,7 @@ Initially, an image is processed using a pre-trained convolutional neural networ
|
||||
>>> from PIL import Image
|
||||
>>> from transformers import RTDetrForObjectDetection, RTDetrImageProcessor
|
||||
|
||||
>>> url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
|
||||
>>> url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
|
||||
>>> image = Image.open(requests.get(url, stream=True).raw)
|
||||
|
||||
>>> image_processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_r50vd")
|
||||
@@ -95,6 +95,12 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
|
||||
- preprocess
|
||||
- post_process_object_detection
|
||||
|
||||
## RTDetrImageProcessorFast
|
||||
|
||||
[[autodoc]] RTDetrImageProcessorFast
|
||||
- preprocess
|
||||
- post_process_object_detection
|
||||
|
||||
## RTDetrModel
|
||||
|
||||
[[autodoc]] RTDetrModel
|
||||
|
||||
Reference in New Issue
Block a user