[Docs] Model_doc structure/clarity improvements (#26876)

* first batch of structure improvements for model_docs * second batch of structure improvements for model_docs * more structure improvements for model_docs * more structure improvements for model_docs * structure improvements for cv model_docs * more structural refactoring * addressed feedback about image processors
2023-11-03 10:57:03 -04:00
parent ad8ff96224
commit 5964f820db
223 changed files with 1796 additions and 1116 deletions
--- a/docs/source/en/model_doc/t5.md
+++ b/docs/source/en/model_doc/t5.md
@@ -45,7 +45,11 @@ with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-
 summarization, question answering, text classification, and more. To facilitate future work on transfer learning for
 NLP, we release our dataset, pre-trained models, and code.*

-Tips:
+All checkpoints can be found on the [hub](https://huggingface.co/models?search=t5).
+
+This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/google-research/text-to-text-transfer-transformer).
+
+## Usage tips

 - T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which
 each task is converted into a text-to-text format. T5 works well on a variety of tasks out-of-the-box by prepending a
@@ -91,12 +95,6 @@ Based on the original T5 model, Google has released some follow-up works:
 - **UMT5**: UmT5 is a multilingual T5 model trained on an improved and refreshed mC4 multilingual corpus,  29 trillion characters across 107 language, using a new sampling method, UniMax. Refer to
 the documentation of mT5 which can be found [here](umt5).

-All checkpoints can be found on the [hub](https://huggingface.co/models?search=t5).
-
-This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/google-research/text-to-text-transfer-transformer).
-
-<a id='training'></a>
-
 ## Training

 T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher
@@ -249,8 +247,6 @@ batches to the longest example is not recommended on TPU as it triggers a recomp
 encountered during training thus significantly slowing down the training. only padding up to the longest example in a
 batch) leads to very slow training on TPU.

-<a id='inference'></a>
-
 ## Inference

 At inference time, it is recommended to use [`~generation.GenerationMixin.generate`]. This
@@ -316,9 +312,6 @@ The predicted tokens will then be placed between the sentinel tokens.
 ['<pad><extra_id_0> park offers<extra_id_1> the<extra_id_2> park.</s>']
 ```

-
-<a id='scripts'></a>
-
 ## Performance

 If you'd like a faster training and inference performance, install [apex](https://github.com/NVIDIA/apex#quick-start) and then the model will automatically use `apex.normalization.FusedRMSNorm` instead of `T5LayerNorm`. The former uses an optimized fused kernel which is several times faster than the latter.
@@ -386,6 +379,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h

 [[autodoc]] T5TokenizerFast

+<frameworkcontent>
+<pt>
+
 ## T5Model

 [[autodoc]] T5Model
@@ -411,6 +407,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
 [[autodoc]] T5ForQuestionAnswering
    - forward

+</pt>
+<tf>
+
 ## TFT5Model

 [[autodoc]] TFT5Model
@@ -426,6 +425,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
 [[autodoc]] TFT5EncoderModel
    - call

+</tf>
+<jax>
+
 ## FlaxT5Model

 [[autodoc]] FlaxT5Model
@@ -444,3 +446,6 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h

 [[autodoc]] FlaxT5EncoderModel
    - __call__
+
+</jax>
+</frameworkcontent>