Grounding DINO Processor standardization (#34853)
* Add input ids to model output * Add text preprocessing for processor * Fix snippet * Add test for equivalence * Add type checking guard * Fixing typehint * Fix test for added `input_ids` in output * Add deprecations and "text_labels" to output * Adjust tests * Fix test * Update code examples * Minor docs and code improvement * Remove one-liner functions and rename class to CamelCase * Update docstring * Fixup
This commit is contained in:
committed by
GitHub
parent
42b2857b01
commit
099d93d2e9
@@ -56,25 +56,26 @@ Here's how to use the model for zero-shot object detection:
|
||||
>>> image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
>>> image = Image.open(requests.get(image_url, stream=True).raw)
|
||||
>>> # Check for cats and remote controls
|
||||
>>> text = "a cat. a remote control."
|
||||
>>> text_labels = [["a cat", "a remote control"]]
|
||||
|
||||
>>> inputs = processor(images=image, text=text, return_tensors="pt").to(device)
|
||||
>>> inputs = processor(images=image, text=text_labels, return_tensors="pt").to(device)
|
||||
>>> with torch.no_grad():
|
||||
... outputs = model(**inputs)
|
||||
|
||||
>>> results = processor.post_process_grounded_object_detection(
|
||||
... outputs,
|
||||
... inputs.input_ids,
|
||||
... box_threshold=0.4,
|
||||
... threshold=0.4,
|
||||
... text_threshold=0.3,
|
||||
... target_sizes=[image.size[::-1]]
|
||||
... target_sizes=[(image.height, image.width)]
|
||||
... )
|
||||
>>> print(results)
|
||||
[{'boxes': tensor([[344.6959, 23.1090, 637.1833, 374.2751],
|
||||
[ 12.2666, 51.9145, 316.8582, 472.4392],
|
||||
[ 38.5742, 70.0015, 176.7838, 118.1806]], device='cuda:0'),
|
||||
'labels': ['a cat', 'a cat', 'a remote control'],
|
||||
'scores': tensor([0.4785, 0.4381, 0.4776], device='cuda:0')}]
|
||||
>>> # Retrieve the first image result
|
||||
>>> result = results[0]
|
||||
>>> for box, score, text_label in zip(result["boxes"], result["scores"], result["text_labels"]):
|
||||
... box = [round(x, 2) for x in box.tolist()]
|
||||
... print(f"Detected {text_label} with confidence {round(score.item(), 3)} at location {box}")
|
||||
Detected a cat with confidence 0.479 at location [344.7, 23.11, 637.18, 374.28]
|
||||
Detected a cat with confidence 0.438 at location [12.27, 51.91, 316.86, 472.44]
|
||||
Detected a remote control with confidence 0.478 at location [38.57, 70.0, 176.78, 118.18]
|
||||
```
|
||||
|
||||
## Grounded SAM
|
||||
|
||||
Reference in New Issue
Block a user