Add video links to the documentation (#12162)

This commit is contained in:
Sylvain Gugger
2021-06-15 06:37:37 -04:00
committed by GitHub
parent 040283170c
commit a55dc157e3
7 changed files with 167 additions and 26 deletions

View File

@@ -28,6 +28,12 @@ Each one of the models in the library falls into one of the following categories
* :ref:`multimodal-models`
* :ref:`retrieval-based-models`
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/H39Z_720T5s" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the
previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full
sentence so that the attention heads can only see what was before in the text, and not whats after. Although those
@@ -54,12 +60,18 @@ Multimodal models mix text inputs with other kinds (e.g. images) and are more sp
.. _autoregressive-models:
Autoregressive models
Decoders or autoregressive models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As mentioned before, these models rely on the decoder part of the original transformer and use an attention mask so
that at each position, the model can only look at the tokens before the attention heads.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/d_ixlCubqQw" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Original GPT
-----------------------------------------------------------------------------------------------------------------------
@@ -215,13 +227,19 @@ multiple choice classification and question answering.
.. _autoencoding-models:
Autoencoding models
Encoders or autoencoding models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As mentioned before, these models rely on the encoder part of the original transformer and use no mask so the model can
look at all the tokens in the attention heads. For pretraining, targets are the original sentences and inputs are their
corrupted versions.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/MUqNwgPjJvQ" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
BERT
-----------------------------------------------------------------------------------------------------------------------
@@ -526,6 +544,12 @@ Sequence-to-sequence models
As mentioned before, these models keep both the encoder and the decoder of the original transformer.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/0_4KEb08xrE" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
BART
-----------------------------------------------------------------------------------------------------------------------