[Docs] Model_doc structure/clarity improvements (#26876)

* first batch of structure improvements for model_docs

* second batch of structure improvements for model_docs

* more structure improvements for model_docs

* more structure improvements for model_docs

* structure improvements for cv model_docs

* more structural refactoring

* addressed feedback about image processors
This commit is contained in:
Maria Khalusova
2023-11-03 10:57:03 -04:00
committed by GitHub
parent ad8ff96224
commit 5964f820db
223 changed files with 1796 additions and 1116 deletions

View File

@@ -45,7 +45,11 @@ with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-
summarization, question answering, text classification, and more. To facilitate future work on transfer learning for
NLP, we release our dataset, pre-trained models, and code.*
Tips:
All checkpoints can be found on the [hub](https://huggingface.co/models?search=t5).
This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/google-research/text-to-text-transfer-transformer).
## Usage tips
- T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which
each task is converted into a text-to-text format. T5 works well on a variety of tasks out-of-the-box by prepending a
@@ -91,12 +95,6 @@ Based on the original T5 model, Google has released some follow-up works:
- **UMT5**: UmT5 is a multilingual T5 model trained on an improved and refreshed mC4 multilingual corpus, 29 trillion characters across 107 language, using a new sampling method, UniMax. Refer to
the documentation of mT5 which can be found [here](umt5).
All checkpoints can be found on the [hub](https://huggingface.co/models?search=t5).
This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/google-research/text-to-text-transfer-transformer).
<a id='training'></a>
## Training
T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher
@@ -249,8 +247,6 @@ batches to the longest example is not recommended on TPU as it triggers a recomp
encountered during training thus significantly slowing down the training. only padding up to the longest example in a
batch) leads to very slow training on TPU.
<a id='inference'></a>
## Inference
At inference time, it is recommended to use [`~generation.GenerationMixin.generate`]. This
@@ -316,9 +312,6 @@ The predicted tokens will then be placed between the sentinel tokens.
['<pad><extra_id_0> park offers<extra_id_1> the<extra_id_2> park.</s>']
```
<a id='scripts'></a>
## Performance
If you'd like a faster training and inference performance, install [apex](https://github.com/NVIDIA/apex#quick-start) and then the model will automatically use `apex.normalization.FusedRMSNorm` instead of `T5LayerNorm`. The former uses an optimized fused kernel which is several times faster than the latter.
@@ -386,6 +379,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] T5TokenizerFast
<frameworkcontent>
<pt>
## T5Model
[[autodoc]] T5Model
@@ -411,6 +407,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] T5ForQuestionAnswering
- forward
</pt>
<tf>
## TFT5Model
[[autodoc]] TFT5Model
@@ -426,6 +425,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] TFT5EncoderModel
- call
</tf>
<jax>
## FlaxT5Model
[[autodoc]] FlaxT5Model
@@ -444,3 +446,6 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] FlaxT5EncoderModel
- __call__
</jax>
</frameworkcontent>