Add video links to the documentation (#12162)

2021-06-15 06:37:37 -04:00
parent 040283170c
commit a55dc157e3
7 changed files with 167 additions and 26 deletions
--- a/docs/source/model_summary.rst
+++ b/docs/source/model_summary.rst
@@ -28,6 +28,12 @@ Each one of the models in the library falls into one of the following categories
  * :ref:`multimodal-models`
  * :ref:`retrieval-based-models`

+.. raw:: html
+
+   <iframe width="560" height="315" src="https://www.youtube.com/embed/H39Z_720T5s" title="YouTube video player"
+   frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
+   picture-in-picture" allowfullscreen></iframe>
+
 Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the
 previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full
 sentence so that the attention heads can only see what was before in the text, and not what’s after. Although those
@@ -54,12 +60,18 @@ Multimodal models mix text inputs with other kinds (e.g. images) and are more sp

 .. _autoregressive-models:

-Autoregressive models
+Decoders or autoregressive models
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 As mentioned before, these models rely on the decoder part of the original transformer and use an attention mask so
 that at each position, the model can only look at the tokens before the attention heads.

+.. raw:: html
+
+   <iframe width="560" height="315" src="https://www.youtube.com/embed/d_ixlCubqQw" title="YouTube video player"
+   frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
+   picture-in-picture" allowfullscreen></iframe>
+
 Original GPT
 -----------------------------------------------------------------------------------------------------------------------

@@ -215,13 +227,19 @@ multiple choice classification and question answering.

 .. _autoencoding-models:

-Autoencoding models
+Encoders or autoencoding models
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 As mentioned before, these models rely on the encoder part of the original transformer and use no mask so the model can
 look at all the tokens in the attention heads. For pretraining, targets are the original sentences and inputs are their
 corrupted versions.

+.. raw:: html
+
+   <iframe width="560" height="315" src="https://www.youtube.com/embed/MUqNwgPjJvQ" title="YouTube video player"
+   frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
+   picture-in-picture" allowfullscreen></iframe>
+
 BERT
 -----------------------------------------------------------------------------------------------------------------------

@@ -526,6 +544,12 @@ Sequence-to-sequence models

 As mentioned before, these models keep both the encoder and the decoder of the original transformer.

+.. raw:: html
+
+   <iframe width="560" height="315" src="https://www.youtube.com/embed/0_4KEb08xrE" title="YouTube video player"
+   frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
+   picture-in-picture" allowfullscreen></iframe>
+
 BART
 -----------------------------------------------------------------------------------------------------------------------