From 743bb5f52e29d83e5d3fd3db4d83146bd4edce28 Mon Sep 17 00:00:00 2001 From: Arpon Kapuria Date: Thu, 7 Aug 2025 01:45:14 +0600 Subject: [PATCH] chore: update Deformable_Detr model card (#39902) * chore: update Deformable_Detr model card * fix: added pipeline, automodel examples and checkpoints link * Update deformable_detr.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/model_doc/deformable_detr.md | 86 +++++++++++++++------ 1 file changed, 63 insertions(+), 23 deletions(-) diff --git a/docs/source/en/model_doc/deformable_detr.md b/docs/source/en/model_doc/deformable_detr.md index a260bbdb8e..84c8de5496 100644 --- a/docs/source/en/model_doc/deformable_detr.md +++ b/docs/source/en/model_doc/deformable_detr.md @@ -14,43 +14,83 @@ rendered properly in your Markdown viewer. --> -# Deformable DETR - -
-PyTorch +
+
+ PyTorch +
-## Overview +# Deformable DETR -The Deformable DETR model was proposed in [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https://huggingface.co/papers/2010.04159) by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai. -Deformable DETR mitigates the slow convergence issues and limited feature spatial resolution of the original [DETR](detr) by leveraging a new deformable attention module which only attends to a small set of key sampling points around a reference. - -The abstract from the paper is the following: - -*DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the limitation of Transformer attention modules in processing image feature maps. To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. Deformable DETR can achieve better performance than DETR (especially on small objects) with 10 times less training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach.* +[Deformable DETR](https://huggingface.co/papers/2010.04159) improves on the original [DETR](./detr) by using a deformable attention module. This mechanism selectively attends to a small set of key sampling points around a reference. It improves training speed and improves accuracy. drawing Deformable DETR architecture. Taken from the original paper. -This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/fundamentalvision/Deformable-DETR). +You can find all the available Deformable DETR checkpoints under the [SenseTime](https://huggingface.co/SenseTime) organization. -## Usage tips +> [!TIP] +> This model was contributed by [nielsr](https://huggingface.co/nielsr). +> +> Click on the Deformable DETR models in the right sidebar for more examples of how to apply Deformable DETR to different object detection and segmentation tasks. -- Training Deformable DETR is equivalent to training the original [DETR](detr) model. See the [resources](#resources) section below for demo notebooks. +The example below demonstrates how to perform object detection with the [`Pipeline`] and the [`AutoModel`] class. + + + + +```python +from transformers import pipeline +import torch + +pipeline = pipeline( + "object-detection", + model="SenseTime/deformable-detr", + torch_dtype=torch.float16, + device_map=0 +) + +pipeline("http://images.cocodataset.org/val2017/000000039769.jpg") +``` + + + + +```python +from transformers import AutoImageProcessor, AutoModelForObjectDetection +from PIL import Image +import requests +import torch + +url = "http://images.cocodataset.org/val2017/000000039769.jpg" +image = Image.open(requests.get(url, stream=True).raw) + +image_processor = AutoImageProcessor.from_pretrained("SenseTime/deformable-detr") +model = AutoModelForObjectDetection.from_pretrained("SenseTime/deformable-detr") + +# prepare image for the model +inputs = image_processor(images=image, return_tensors="pt") + +with torch.no_grad(): + outputs = model(**inputs) + +results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([image.size[::-1]]), threshold=0.3) + +for result in results: + for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]): + score, label = score.item(), label_id.item() + box = [round(i, 2) for i in box.tolist()] + print(f"{model.config.id2label[label]}: {score:.2f} {box}") +``` + + + ## Resources -A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Deformable DETR. - - - -- Demo notebooks regarding inference + fine-tuning on a custom dataset for [`DeformableDetrForObjectDetection`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Deformable-DETR). -- Scripts for finetuning [`DeformableDetrForObjectDetection`] with [`Trainer`] or [Accelerate](https://huggingface.co/docs/accelerate/index) can be found [here](https://github.com/huggingface/transformers/tree/main/examples/pytorch/object-detection). -- See also: [Object detection task guide](../tasks/object_detection). - -If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. +- Refer to this set of [notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Deformable-DETR) for inference and fine-tuning [`DeformableDetrForObjectDetection`] on a custom dataset. ## DeformableDetrImageProcessor