[Docs] Model_doc structure/clarity improvements (#26876)
* first batch of structure improvements for model_docs * second batch of structure improvements for model_docs * more structure improvements for model_docs * more structure improvements for model_docs * structure improvements for cv model_docs * more structural refactoring * addressed feedback about image processors
This commit is contained in:
@@ -40,7 +40,9 @@ for any dataset specific training. For instance, we match the accuracy of the or
|
||||
without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained
|
||||
model weights at this https URL.*
|
||||
|
||||
## Usage
|
||||
This model was contributed by [valhalla](https://huggingface.co/valhalla). The original code can be found [here](https://github.com/openai/CLIP).
|
||||
|
||||
## Usage tips and example
|
||||
|
||||
CLIP is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image
|
||||
classification. CLIP uses a ViT like transformer to get visual features and a causal language model to get the text
|
||||
@@ -77,8 +79,6 @@ encode the text and prepare the images. The following example shows how to get t
|
||||
>>> probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
|
||||
```
|
||||
|
||||
This model was contributed by [valhalla](https://huggingface.co/valhalla). The original code can be found [here](https://github.com/openai/CLIP).
|
||||
|
||||
## Resources
|
||||
|
||||
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CLIP.
|
||||
@@ -142,6 +142,9 @@ The resource should ideally demonstrate something new instead of duplicating an
|
||||
|
||||
[[autodoc]] CLIPProcessor
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
|
||||
## CLIPModel
|
||||
|
||||
[[autodoc]] CLIPModel
|
||||
@@ -164,12 +167,14 @@ The resource should ideally demonstrate something new instead of duplicating an
|
||||
[[autodoc]] CLIPVisionModelWithProjection
|
||||
- forward
|
||||
|
||||
|
||||
## CLIPVisionModel
|
||||
|
||||
[[autodoc]] CLIPVisionModel
|
||||
- forward
|
||||
|
||||
</pt>
|
||||
<tf>
|
||||
|
||||
## TFCLIPModel
|
||||
|
||||
[[autodoc]] TFCLIPModel
|
||||
@@ -187,6 +192,9 @@ The resource should ideally demonstrate something new instead of duplicating an
|
||||
[[autodoc]] TFCLIPVisionModel
|
||||
- call
|
||||
|
||||
</tf>
|
||||
<jax>
|
||||
|
||||
## FlaxCLIPModel
|
||||
|
||||
[[autodoc]] FlaxCLIPModel
|
||||
@@ -208,3 +216,6 @@ The resource should ideally demonstrate something new instead of duplicating an
|
||||
|
||||
[[autodoc]] FlaxCLIPVisionModel
|
||||
- __call__
|
||||
|
||||
</jax>
|
||||
</frameworkcontent>
|
||||
|
||||
Reference in New Issue
Block a user