diff --git a/docs/source/model_doc/t5.rst b/docs/source/model_doc/t5.rst index 07592ff347..0ff96d0a42 100644 --- a/docs/source/model_doc/t5.rst +++ b/docs/source/model_doc/t5.rst @@ -44,9 +44,9 @@ Tips: For more information about which prefix to use, it is easiest to look into Appendix D of the `paper `__. - For sequence-to-sequence generation, it is recommended to use - :obj:`T5ForConditionalGeneration.generate()``. This method takes care of feeding the encoded input via - cross-attention layers to the decoder and auto-regressively generates the decoder output. - T5 uses relative scalar - embeddings. Encoder input padding can be done on the left and on the right. + :obj:`T5ForConditionalGeneration.generate()`. This method takes care of feeding the encoded input via cross-attention + layers to the decoder and auto-regressively generates the decoder output. - T5 uses relative scalar embeddings. + Encoder input padding can be done on the left and on the right. The original code can be found `here `__. @@ -55,7 +55,7 @@ Training T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing. This means that for training we always need an input sequence and a target sequence. The input sequence is fed -to the model using :obj:`input_ids``. The target sequence is shifted to the right, i.e., prepended by a start-sequence +to the model using :obj:`input_ids`. The target sequence is shifted to the right, i.e., prepended by a start-sequence token and fed to the decoder using the :obj:`decoder_input_ids`. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the :obj:`labels`. The PAD token is hereby used as the start-sequence token. T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.