Fix t5 doc typos (#3978)
* Fix tpo in into and add line under * Add missing blank line under * Correct types under
This commit is contained in:
@@ -20,13 +20,14 @@ Training
|
|||||||
~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing.
|
T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing.
|
||||||
This means that for training we always need an input sequence and a target sequence.
|
This means that for training we always need an input sequence and a target sequence.
|
||||||
The input sequence is fed to the model using ``input_ids``. The target sequence is shifted to the right, *i.e.* perprended by a start-sequence token and fed to the decoder using the `decoder_input_ids`. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the ``lm_labels``. The PAD token is hereby used as the start-sequence token.
|
The input sequence is fed to the model using ``input_ids``. The target sequence is shifted to the right, *i.e.* prepended by a start-sequence token and fed to the decoder using the `decoder_input_ids`. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the ``lm_labels``. The PAD token is hereby used as the start-sequence token.
|
||||||
T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
|
T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
|
||||||
|
|
||||||
- Unsupervised denoising training
|
- Unsupervised denoising training
|
||||||
|
|
||||||
In this setup spans of the input sequence are masked by so-called sentinel tokens (*a.k.a* unique mask tokens)
|
In this setup spans of the input sequence are masked by so-called sentinel tokens (*a.k.a* unique mask tokens)
|
||||||
and the output sequence is formed as a concatenation of the same sentinel tokens and the *real* masked tokens.
|
and the output sequence is formed as a concatenation of the same sentinel tokens and the *real* masked tokens.
|
||||||
Each sentinel tokens represents a unique mask token for this sentence and should start with ``<extra_id_1>``, ``<extrac_id_2>``, ... up to ``<extra_id_100>``. As a default 100 sentinel tokens are available in ``T5Tokenizer``.
|
Each sentinel token represents a unique mask token for this sentence and should start with ``<extra_id_1>``, ``<extra_id_2>``, ... up to ``<extra_id_100>``. As a default 100 sentinel tokens are available in ``T5Tokenizer``.
|
||||||
*E.g.* the sentence "The cute dog walks in the park" with the masks put on "cute dog" and "the" should be processed as follows:
|
*E.g.* the sentence "The cute dog walks in the park" with the masks put on "cute dog" and "the" should be processed as follows:
|
||||||
|
|
||||||
::
|
::
|
||||||
@@ -37,6 +38,7 @@ T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
|
|||||||
model(input_ids=input_ids, lm_labels=lm_labels)
|
model(input_ids=input_ids, lm_labels=lm_labels)
|
||||||
|
|
||||||
- Supervised training
|
- Supervised training
|
||||||
|
|
||||||
In this setup the input sequence and output sequence are standard sequence to sequence input output mapping.
|
In this setup the input sequence and output sequence are standard sequence to sequence input output mapping.
|
||||||
In translation, *e.g.* the input sequence "The house is wonderful." and output sequence "Das Haus ist wunderbar." should
|
In translation, *e.g.* the input sequence "The house is wonderful." and output sequence "Das Haus ist wunderbar." should
|
||||||
be processed as follows:
|
be processed as follows:
|
||||||
|
|||||||
Reference in New Issue
Block a user