Add video links to the documentation (#12162)
This commit is contained in:
@@ -55,6 +55,12 @@ Input IDs
|
||||
The input ids are often the only required parameters to be passed to the model as input. *They are token indices,
|
||||
numerical representations of tokens building the sequences that will be used as input by the model*.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/VFp38yj8h3A" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
Each tokenizer works differently but the underlying mechanism remains the same. Here's an example using the BERT
|
||||
tokenizer, which is a `WordPiece <https://arxiv.org/pdf/1609.08144.pdf>`__ tokenizer:
|
||||
|
||||
@@ -120,8 +126,15 @@ because this is the way a :class:`~transformers.BertModel` is going to expect it
|
||||
Attention mask
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The attention mask is an optional argument used when batching sequences together. This argument indicates to the model
|
||||
which tokens should be attended to, and which should not.
|
||||
The attention mask is an optional argument used when batching sequences together.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/M6adb1j2jPI" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
This argument indicates to the model which tokens should be attended to, and which should not.
|
||||
|
||||
For example, consider these two sequences:
|
||||
|
||||
@@ -175,10 +188,17 @@ in the dictionary returned by the tokenizer under the key "attention_mask":
|
||||
Token Type IDs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Some models' purpose is to do sequence classification or question answering. These require two different sequences to
|
||||
be joined in a single "input_ids" entry, which usually is performed with the help of special tokens, such as the
|
||||
classifier (``[CLS]``) and separator (``[SEP]``) tokens. For example, the BERT model builds its two sequence input as
|
||||
such:
|
||||
Some models' purpose is to do classification on pairs of sentences or question answering.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/0u3ioSwev3s" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
These require two different sequences to be joined in a single "input_ids" entry, which usually is performed with the
|
||||
help of special tokens, such as the classifier (``[CLS]``) and separator (``[SEP]``) tokens. For example, the BERT
|
||||
model builds its two sequence input as such:
|
||||
|
||||
.. code-block::
|
||||
|
||||
|
||||
@@ -16,6 +16,12 @@ Model sharing and uploading
|
||||
In this page, we will show you how to share a model you have trained or fine-tuned on new data with the community on
|
||||
the `model hub <https://huggingface.co/models>`__.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/XvSGPZFEjDY" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
.. note::
|
||||
|
||||
You will need to create an account on `huggingface.co <https://huggingface.co/join>`__ for this.
|
||||
@@ -77,6 +83,12 @@ token that you can just copy.
|
||||
Directly push your model to the hub
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/Z1-XMy-GNLQ" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
Once you have an API token (either stored in the cache or copied and pasted in your notebook), you can directly push a
|
||||
finetuned model you saved in :obj:`save_drectory` by calling:
|
||||
|
||||
@@ -152,6 +164,12 @@ or
|
||||
Use your terminal and git
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/rkCly_cbMBk" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
Basic steps
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
||||
@@ -28,6 +28,12 @@ Each one of the models in the library falls into one of the following categories
|
||||
* :ref:`multimodal-models`
|
||||
* :ref:`retrieval-based-models`
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/H39Z_720T5s" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the
|
||||
previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full
|
||||
sentence so that the attention heads can only see what was before in the text, and not what’s after. Although those
|
||||
@@ -54,12 +60,18 @@ Multimodal models mix text inputs with other kinds (e.g. images) and are more sp
|
||||
|
||||
.. _autoregressive-models:
|
||||
|
||||
Autoregressive models
|
||||
Decoders or autoregressive models
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
As mentioned before, these models rely on the decoder part of the original transformer and use an attention mask so
|
||||
that at each position, the model can only look at the tokens before the attention heads.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/d_ixlCubqQw" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
Original GPT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
@@ -215,13 +227,19 @@ multiple choice classification and question answering.
|
||||
|
||||
.. _autoencoding-models:
|
||||
|
||||
Autoencoding models
|
||||
Encoders or autoencoding models
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
As mentioned before, these models rely on the encoder part of the original transformer and use no mask so the model can
|
||||
look at all the tokens in the attention heads. For pretraining, targets are the original sentences and inputs are their
|
||||
corrupted versions.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/MUqNwgPjJvQ" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
BERT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
@@ -526,6 +544,12 @@ Sequence-to-sequence models
|
||||
|
||||
As mentioned before, these models keep both the encoder and the decoder of the original transformer.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/0_4KEb08xrE" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
BART
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
|
||||
@@ -39,6 +39,12 @@ To automatically download the vocab used during pretraining or fine-tuning a giv
|
||||
Base use
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/Yffk5aydLzg" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
A :class:`~transformers.PreTrainedTokenizer` has many methods, but the only one you need to remember for preprocessing
|
||||
is its ``__call__``: you just need to feed your sentence to your tokenizer object.
|
||||
|
||||
@@ -138,6 +144,12 @@ can safely ignore it. You can also pass ``verbose=False`` to stop the tokenizer
|
||||
Preprocessing pairs of sentences
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/0u3ioSwev3s" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
Sometimes you need to feed a pair of sentences to your model. For instance, if you want to classify if two sentences in
|
||||
a pair are similar, or for question-answering models, which take a context and a question. For BERT models, the input
|
||||
is then represented like this: :obj:`[CLS] Sequence A [SEP] Sequence B [SEP]`
|
||||
|
||||
@@ -28,8 +28,15 @@ will dig a little bit more and see how the library gives you access to those mod
|
||||
Getting started on a task with a pipeline
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The easiest way to use a pretrained model on a given task is to use :func:`~transformers.pipeline`. 🤗 Transformers
|
||||
provides the following tasks out of the box:
|
||||
The easiest way to use a pretrained model on a given task is to use :func:`~transformers.pipeline`.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/tiZFewofSLM" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
🤗 Transformers provides the following tasks out of the box:
|
||||
|
||||
- Sentiment analysis: is a text positive or negative?
|
||||
- Text generation (in English): provide a prompt and the model will generate what follows.
|
||||
@@ -137,8 +144,15 @@ to share your fine-tuned model on the hub with the community, using :doc:`this t
|
||||
Under the hood: pretrained models
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Let's now see what happens beneath the hood when using those pipelines. As we saw, the model and tokenizer are created
|
||||
using the :obj:`from_pretrained` method:
|
||||
Let's now see what happens beneath the hood when using those pipelines.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/AhChOFRegn4" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
As we saw, the model and tokenizer are created using the :obj:`from_pretrained` method:
|
||||
|
||||
.. code-block::
|
||||
|
||||
|
||||
@@ -13,12 +13,20 @@
|
||||
Summary of the tokenizers
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
On this page, we will have a closer look at tokenization. As we saw in :doc:`the preprocessing tutorial
|
||||
<preprocessing>`, tokenizing a text is splitting it into words or subwords, which then are converted to ids through a
|
||||
look-up table. Converting words or subwords to ids is straightforward, so in this summary, we will focus on splitting a
|
||||
text into words or subwords (i.e. tokenizing a text). More specifically, we will look at the three main types of
|
||||
tokenizers used in 🤗 Transformers: :ref:`Byte-Pair Encoding (BPE) <byte-pair-encoding>`, :ref:`WordPiece <wordpiece>`,
|
||||
and :ref:`SentencePiece <sentencepiece>`, and show examples of which tokenizer type is used by which model.
|
||||
On this page, we will have a closer look at tokenization.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/VFp38yj8h3A" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
As we saw in :doc:`the preprocessing tutorial <preprocessing>`, tokenizing a text is splitting it into words or
|
||||
subwords, which then are converted to ids through a look-up table. Converting words or subwords to ids is
|
||||
straightforward, so in this summary, we will focus on splitting a text into words or subwords (i.e. tokenizing a text).
|
||||
More specifically, we will look at the three main types of tokenizers used in 🤗 Transformers: :ref:`Byte-Pair Encoding
|
||||
(BPE) <byte-pair-encoding>`, :ref:`WordPiece <wordpiece>`, and :ref:`SentencePiece <sentencepiece>`, and show examples
|
||||
of which tokenizer type is used by which model.
|
||||
|
||||
Note that on each model page, you can look at the documentation of the associated tokenizer to know which tokenizer
|
||||
type was used by the pretrained model. For instance, if we look at :class:`~transformers.BertTokenizer`, we can see
|
||||
@@ -28,8 +36,15 @@ Introduction
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Splitting a text into smaller chunks is a task that is harder than it looks, and there are multiple ways of doing so.
|
||||
For instance, let's look at the sentence ``"Don't you love 🤗 Transformers? We sure do."`` A simple way of tokenizing
|
||||
this text is to split it by spaces, which would give:
|
||||
For instance, let's look at the sentence ``"Don't you love 🤗 Transformers? We sure do."``
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/nhJxYji1aho" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
A simple way of tokenizing this text is to split it by spaces, which would give:
|
||||
|
||||
.. code-block::
|
||||
|
||||
@@ -69,16 +84,30 @@ Such a big vocabulary size forces the model to have an enormous embedding matrix
|
||||
causes both an increased memory and time complexity. In general, transformers models rarely have a vocabulary size
|
||||
greater than 50,000, especially if they are pretrained only on a single language.
|
||||
|
||||
So if simple space and punctuation tokenization is unsatisfactory, why not simply tokenize on characters? While
|
||||
character tokenization is very simple and would greatly reduce memory and time complexity it makes it much harder for
|
||||
the model to learn meaningful input representations. *E.g.* learning a meaningful context-independent representation
|
||||
for the letter ``"t"`` is much harder than learning a context-independent representation for the word ``"today"``.
|
||||
Therefore, character tokenization is often accompanied by a loss of performance. So to get the best of both worlds,
|
||||
transformers models use a hybrid between word-level and character-level tokenization called **subword** tokenization.
|
||||
So if simple space and punctuation tokenization is unsatisfactory, why not simply tokenize on characters?
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/ssLq_EK2jLE" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
While character tokenization is very simple and would greatly reduce memory and time complexity it makes it much harder
|
||||
for the model to learn meaningful input representations. *E.g.* learning a meaningful context-independent
|
||||
representation for the letter ``"t"`` is much harder than learning a context-independent representation for the word
|
||||
``"today"``. Therefore, character tokenization is often accompanied by a loss of performance. So to get the best of
|
||||
both worlds, transformers models use a hybrid between word-level and character-level tokenization called **subword**
|
||||
tokenization.
|
||||
|
||||
Subword tokenization
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/zHvTiHr506c" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
Subword tokenization algorithms rely on the principle that frequently used words should not be split into smaller
|
||||
subwords, but rare words should be decomposed into meaningful subwords. For instance ``"annoyingly"`` might be
|
||||
considered a rare word and could be decomposed into ``"annoying"`` and ``"ly"``. Both ``"annoying"`` and ``"ly"`` as
|
||||
|
||||
@@ -27,6 +27,12 @@ negative. For examples of other tasks, refer to the :ref:`additional-resources`
|
||||
Preparing the datasets
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/_BZearw7f0w" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
We will use the `🤗 Datasets <https:/github.com/huggingface/datasets/>`__ library to download and preprocess the IMDB
|
||||
datasets. We will go over this part pretty quickly. Since the focus of this tutorial is on training, you should refer
|
||||
to the 🤗 Datasets `documentation <https://huggingface.co/docs/datasets/>`__ or the :doc:`preprocessing` tutorial for
|
||||
@@ -95,6 +101,12 @@ them by their `full` equivalent to train or evaluate on the full dataset.
|
||||
Fine-tuning in PyTorch with the Trainer API
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/nvBXf7s7vTI" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
Since PyTorch does not provide a training loop, the 🤗 Transformers library provides a :class:`~transformers.Trainer`
|
||||
API that is optimized for 🤗 Transformers models, with a wide range of training options and with built-in features like
|
||||
logging, gradient accumulation, and mixed precision.
|
||||
@@ -200,6 +212,12 @@ See the documentation of :class:`~transformers.TrainingArguments` for more optio
|
||||
Fine-tuning with Keras
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/rnTGBy2ax1c" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
Models can also be trained natively in TensorFlow using the Keras API. First, let's define our model:
|
||||
|
||||
.. code-block:: python
|
||||
@@ -257,6 +275,12 @@ as a PyTorch model (or vice-versa):
|
||||
Fine-tuning in native PyTorch
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/Dh9CL8fyG80" title="YouTube video player"
|
||||
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
|
||||
picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
You might need to restart your notebook at this stage to free some memory, or excute the following code:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
Reference in New Issue
Block a user