[Docs] Model_doc structure/clarity improvements (#26876)
* first batch of structure improvements for model_docs * second batch of structure improvements for model_docs * more structure improvements for model_docs * more structure improvements for model_docs * structure improvements for cv model_docs * more structural refactoring * addressed feedback about image processors
This commit is contained in:
@@ -34,7 +34,14 @@ Vision-and-Language Transformer (ViLT), monolithic in the sense that the process
|
||||
simplified to just the same convolution-free manner that we process textual inputs. We show that ViLT is up to tens of
|
||||
times faster than previous VLP models, yet with competitive or better downstream task performance.*
|
||||
|
||||
Tips:
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/vilt_architecture.jpg"
|
||||
alt="drawing" width="600"/>
|
||||
|
||||
<small> ViLT architecture. Taken from the <a href="https://arxiv.org/abs/2102.03334">original paper</a>. </small>
|
||||
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/dandelin/ViLT).
|
||||
|
||||
## Usage tips
|
||||
|
||||
- The quickest way to get started with ViLT is by checking the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/ViLT)
|
||||
(which showcase both inference and fine-tuning on custom data).
|
||||
@@ -45,17 +52,6 @@ Tips:
|
||||
which pixel values are real and which are padding. [`ViltProcessor`] automatically creates this for you.
|
||||
- The design of ViLT is very similar to that of a standard Vision Transformer (ViT). The only difference is that the model includes
|
||||
additional embedding layers for the language modality.
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/vilt_architecture.jpg"
|
||||
alt="drawing" width="600"/>
|
||||
|
||||
<small> ViLT architecture. Taken from the <a href="https://arxiv.org/abs/2102.03334">original paper</a>. </small>
|
||||
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/dandelin/ViLT).
|
||||
|
||||
|
||||
Tips:
|
||||
|
||||
- The PyTorch version of this model is only available in torch 1.10 and higher.
|
||||
|
||||
## ViltConfig
|
||||
|
||||
Reference in New Issue
Block a user