Fix incomplete sentence in Zero-shot object detection documentation (#33430)

Rephrase sentence in zero-shot object detection docs
2024-09-12 11:25:44 +02:00
parent e0ff4321d1
commit 516ee6adc2
1 changed files with 2 additions and 2 deletions
--- a/docs/source/en/tasks/zero_shot_object_detection.md
+++ b/docs/source/en/tasks/zero_shot_object_detection.md
@@ -26,8 +26,8 @@ is an open-vocabulary object detector. It means that it can detect objects in im
 the need to fine-tune the model on labeled datasets.
 OWL-ViT leverages multi-modal representations to perform open-vocabulary detection. It combines [CLIP](../model_doc/clip) with
-lightweight object classification and localization heads. Open-vocabulary detection is achieved by embedding free-text queries with the text encoder of CLIP and using them as input to the object classification and localization heads.
+lightweight object classification and localization heads. Open-vocabulary detection is achieved by embedding free-text queries with the text encoder of CLIP and using them as input to the object classification and localization heads,
-associate images and their corresponding textual descriptions, and ViT processes image patches as inputs. The authors
+which associate images with their corresponding textual descriptions, while ViT processes image patches as inputs. The authors
 of OWL-ViT first trained CLIP from scratch and then fine-tuned OWL-ViT end to end on standard object detection datasets using
 a bipartite matching loss.