Improve T5 docs (#13240)

* Remove disclaimer * First draft * Fix rebase * Improve docs some more * Add inference section * Improve example scripts section * Improve code examples of modeling files * Add docs regarding task prefix * Address @craffel's comments * Apply suggestions from @patrickvonplaten's review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Add suggestions from code review * Apply @sgugger's suggestions * Fix Flax code examples * Fix index.rst Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-09-01 15:05:40 +02:00
parent ba1b3db709
commit 4766e009b0
9 changed files with 363 additions and 66 deletions
--- a/docs/source/model_doc/mt5.rst
+++ b/docs/source/model_doc/mt5.rst
@@ -28,6 +28,23 @@ multilingual variant of T5 that was pre-trained on a new Common Crawl-based data
 the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual
 benchmarks. All of the code and model checkpoints*

+Note: mT5 was only pre-trained on `mC4 <https://huggingface.co/datasets/mc4>`__ excluding any supervised training.
+Therefore, this model has to be fine-tuned before it is useable on a downstream task, unlike the original T5 model.
+Since mT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task
+fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.
+
+Google has released the following variants:
+
+- `google/mt5-small <https://huggingface.co/google/mt5-small>`__
+
+- `google/mt5-base <https://huggingface.co/google/mt5-base>`__
+
+- `google/mt5-large <https://huggingface.co/google/mt5-large>`__
+
+- `google/mt5-xl <https://huggingface.co/google/mt5-xl>`__
+
+- `google/mt5-xxl <https://huggingface.co/google/mt5-xxl>`__.
+
 This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
 found `here <https://github.com/google-research/multilingual-t5>`__.