Update SuperPoint model card (#38896)

* docs: first draft to more standard SuperPoint documentation * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * docs: reverted changes on Auto classes * docs: addressed the rest of the comments * docs: remove outdated reference to keypoint detection task guide in SuperPoint documentation * Update superpoint.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-26 19:13:06 +02:00
parent 2f50230c59
commit f171e7e884
1 changed files with 80 additions and 80 deletions
--- a/docs/source/en/model_doc/superpoint.md
+++ b/docs/source/en/model_doc/superpoint.md
@@ -10,48 +10,35 @@ specific language governing permissions and limitations under the License.
 ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
 rendered properly in your Markdown viewer.
 -->
 <div style="float: right;">
    <div class="flex flex-wrap space-x-1">
        <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white" >
    </div>
 </div>
 # SuperPoint
-<div class="flex flex-wrap space-x-1">
+[SuperPoint](https://huggingface.co/papers/1712.07629) is the result of self-supervised training of a fully-convolutional network for interest point detection and description. The model is able to detect interest points that are repeatable under homographic transformations and provide a descriptor for each point. Usage on it's own is limited, but it can be used as a feature extractor for other tasks such as homography estimation and image matching.
 <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
 </div>
 ## Overview
 The SuperPoint model was proposed
 in [SuperPoint: Self-Supervised Interest Point Detection and Description](https://huggingface.co/papers/1712.07629) by Daniel
 DeTone, Tomasz Malisiewicz and Andrew Rabinovich.
 This model is the result of a self-supervised training of a fully-convolutional network for interest point detection and
 description. The model is able to detect interest points that are repeatable under homographic transformations and
 provide a descriptor for each point. The use of the model in its own is limited, but it can be used as a feature
 extractor for other tasks such as homography estimation, image matching, etc.
 The abstract from the paper is the following:
 *This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a
 large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our
 fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and
 associated descriptors in one forward pass. We introduce Homographic Adaptation, a multi-scale, multi-homography
 approach for boosting interest point detection repeatability and performing cross-domain adaptation (e.g.,
 synthetic-to-real). Our model, when trained on the MS-COCO generic image dataset using Homographic Adaptation, is able
 to repeatedly detect a much richer set of interest points than the initial pre-adapted deep model and any other
 traditional corner detector. The final system gives rise to state-of-the-art homography estimation results on HPatches
 when compared to LIFT, SIFT and ORB.*
 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/superpoint_architecture.png"
 alt="drawing" width="500"/>
-<small> SuperPoint overview. Taken from the <a href="https://huggingface.co/papers/1712.07629v4">original paper.</a> </small>
+You can find all the original SuperPoint checkpoints under the [Magic Leap Community](https://huggingface.co/magic-leap-community) organization.
-## Usage tips
+> [!TIP]
 > This model was contributed by [stevenbucaille](https://huggingface.co/stevenbucaille).
 >
 > Click on the SuperPoint models in the right sidebar for more examples of how to apply SuperPoint to different computer vision tasks.
 Here is a quick example of using the model to detect interest points in an image:
-```python
+
 The example below demonstrates how to detect interest points in an image with the [`AutoModel`] class.
 <hfoptions id="usage">
 <hfoption id="AutoModel">
 ```py
 from transformers import AutoImageProcessor, SuperPointForKeypointDetection
 import torch
 from PIL import Image
@@ -64,67 +51,76 @@ processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint"
 model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")
 inputs = processor(image, return_tensors="pt")
-outputs = model(**inputs)
+with torch.no_grad():
    outputs = model(**inputs)
 # Post-process to get keypoints, scores, and descriptors
 image_size = (image.height, image.width)
 processed_outputs = processor.post_process_keypoint_detection(outputs, [image_size])
 ```
-The outputs contain the list of keypoint coordinates with their respective score and description (a 256-long vector).
+</hfoption>
 </hfoptions>
-You can also feed multiple images to the model. Due to the nature of SuperPoint, to output a dynamic number of keypoints,
+## Notes
 you will need to use the mask attribute to retrieve the respective information :
-```python
+- SuperPoint outputs a dynamic number of keypoints per image, which makes it suitable for tasks requiring variable-length feature representations.
 from transformers import AutoImageProcessor, SuperPointForKeypointDetection
 import torch
 from PIL import Image
 import requests
-url_image_1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
+    ```py
-image_1 = Image.open(requests.get(url_image_1, stream=True).raw)
+    from transformers import AutoImageProcessor, SuperPointForKeypointDetection
-url_image_2 = "http://images.cocodataset.org/test-stuff2017/000000000568.jpg"
+    import torch
-image_2 = Image.open(requests.get(url_image_2, stream=True).raw)
+    from PIL import Image
    import requests
    processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
    model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")
    url_image_1 = "http://images.cocodataset.org/val2017/000000039769.jpg"
    image_1 = Image.open(requests.get(url_image_1, stream=True).raw)
    url_image_2 = "http://images.cocodataset.org/test-stuff2017/000000000568.jpg"
    image_2 = Image.open(requests.get(url_image_2, stream=True).raw)
    images = [image_1, image_2]
    inputs = processor(images, return_tensors="pt")
    # Example of handling dynamic keypoint output
    outputs = model(**inputs)
    keypoints = outputs.keypoints  # Shape varies per image
    scores = outputs.scores        # Confidence scores for each keypoint
    descriptors = outputs.descriptors  # 256-dimensional descriptors
    mask = outputs.mask # Value of 1 corresponds to a keypoint detection
    ```
-images = [image_1, image_2]
+- The model provides both keypoint coordinates and their corresponding descriptors (256-dimensional vectors) in a single forward pass.
 - For batch processing with multiple images, you need to use the mask attribute to retrieve the respective information for each image. You can use the `post_process_keypoint_detection` from the `SuperPointImageProcessor` to retrieve the each image information.
-processor = AutoImageProcessor.from_pretrained("magic-leap-community/superpoint")
+    ```py
-model = SuperPointForKeypointDetection.from_pretrained("magic-leap-community/superpoint")
+    # Batch processing example
    images = [image1, image2, image3]
    inputs = processor(images, return_tensors="pt")
    outputs = model(**inputs)
    image_sizes = [(img.height, img.width) for img in images]
    processed_outputs = processor.post_process_keypoint_detection(outputs, image_sizes)
    ```
-inputs = processor(images, return_tensors="pt")
+- You can then print the keypoints on the image of your choice to visualize the result:
-outputs = model(**inputs)
+    ```py
-image_sizes = [(image.height, image.width) for image in images]
+    import matplotlib.pyplot as plt
-outputs = processor.post_process_keypoint_detection(outputs, image_sizes)
+    plt.axis("off")
    plt.imshow(image_1)
    plt.scatter(
        outputs[0]["keypoints"][:, 0],
        outputs[0]["keypoints"][:, 1],
        c=outputs[0]["scores"] * 100,
        s=outputs[0]["scores"] * 50,
        alpha=0.8
    )
    plt.savefig(f"output_image.png")
    ```
-for output in outputs:
+<div class="flex justify-center">
-    for keypoints, scores, descriptors in zip(output["keypoints"], output["scores"], output["descriptors"]):
+    <img src="https://cdn-uploads.huggingface.co/production/uploads/632885ba1558dac67c440aa8/ZtFmphEhx8tcbEQqOolyE.png">
-        print(f"Keypoints: {keypoints}")
+</div>
        print(f"Scores: {scores}")
        print(f"Descriptors: {descriptors}")
 ```
 You can then print the keypoints on the image of your choice to visualize the result:
 ```python
 import matplotlib.pyplot as plt
 plt.axis("off")
 plt.imshow(image_1)
 plt.scatter(
    outputs[0]["keypoints"][:, 0],
    outputs[0]["keypoints"][:, 1],
    c=outputs[0]["scores"] * 100,
    s=outputs[0]["scores"] * 50,
    alpha=0.8
 )
 plt.savefig(f"output_image.png")
 ```
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/632885ba1558dac67c440aa8/ZtFmphEhx8tcbEQqOolyE.png)
 This model was contributed by [stevenbucaille](https://huggingface.co/stevenbucaille).
 The original code can be found [here](https://github.com/magicleap/SuperPointPretrainedNetwork).
 ## Resources
-A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with SuperPoint. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
+- Refer to this [noteboook](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SuperPoint/Inference_with_SuperPoint_to_detect_interest_points_in_an_image.ipynb) for an inference and visualization example.
 - A notebook showcasing inference and visualization with SuperPoint can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SuperPoint/Inference_with_SuperPoint_to_detect_interest_points_in_an_image.ipynb). 🌎
 ## SuperPointConfig
@@ -137,8 +133,12 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
 - preprocess
 - post_process_keypoint_detection
 <frameworkcontent>
 <pt>
 ## SuperPointForKeypointDetection
 [[autodoc]] SuperPointForKeypointDetection
 - forward
 </pt>