From c008afea3cbf771f56a3f4bd9fb073a4b19f3581 Mon Sep 17 00:00:00 2001 From: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Date: Tue, 1 Mar 2022 17:44:20 +0100 Subject: [PATCH] Add link to notebooks (#15791) Co-authored-by: Niels Rogge --- docs/source/model_doc/vilt.mdx | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/model_doc/vilt.mdx b/docs/source/model_doc/vilt.mdx index 9170d84ead..34397e7b3c 100644 --- a/docs/source/model_doc/vilt.mdx +++ b/docs/source/model_doc/vilt.mdx @@ -32,6 +32,8 @@ times faster than previous VLP models, yet with competitive or better downstream Tips: +- The quickest way to get started with ViLT is by checking the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/ViLT) + (which showcase both inference and fine-tuning on custom data). - ViLT is a model that takes both `pixel_values` and `input_ids` as input. One can use [`ViltProcessor`] to prepare data for the model. This processor wraps a feature extractor (for the image modality) and a tokenizer (for the language modality) into one. - ViLT is trained with images of various sizes: the authors resize the shorter edge of input images to 384 and limit the longer edge to