Fix incomplete sentence in Zero-shot object detection documentation (#33430)
Rephrase sentence in zero-shot object detection docs
This commit is contained in:
committed by
GitHub
parent
e0ff4321d1
commit
516ee6adc2
@@ -26,8 +26,8 @@ is an open-vocabulary object detector. It means that it can detect objects in im
|
|||||||
the need to fine-tune the model on labeled datasets.
|
the need to fine-tune the model on labeled datasets.
|
||||||
|
|
||||||
OWL-ViT leverages multi-modal representations to perform open-vocabulary detection. It combines [CLIP](../model_doc/clip) with
|
OWL-ViT leverages multi-modal representations to perform open-vocabulary detection. It combines [CLIP](../model_doc/clip) with
|
||||||
lightweight object classification and localization heads. Open-vocabulary detection is achieved by embedding free-text queries with the text encoder of CLIP and using them as input to the object classification and localization heads.
|
lightweight object classification and localization heads. Open-vocabulary detection is achieved by embedding free-text queries with the text encoder of CLIP and using them as input to the object classification and localization heads,
|
||||||
associate images and their corresponding textual descriptions, and ViT processes image patches as inputs. The authors
|
which associate images with their corresponding textual descriptions, while ViT processes image patches as inputs. The authors
|
||||||
of OWL-ViT first trained CLIP from scratch and then fine-tuned OWL-ViT end to end on standard object detection datasets using
|
of OWL-ViT first trained CLIP from scratch and then fine-tuned OWL-ViT end to end on standard object detection datasets using
|
||||||
a bipartite matching loss.
|
a bipartite matching loss.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user