[Docs] Model_doc structure/clarity improvements (#26876)

* first batch of structure improvements for model_docs * second batch of structure improvements for model_docs * more structure improvements for model_docs * more structure improvements for model_docs * structure improvements for cv model_docs * more structural refactoring * addressed feedback about image processors
2023-11-03 10:57:03 -04:00
parent ad8ff96224
commit 5964f820db
223 changed files with 1796 additions and 1116 deletions
--- a/docs/source/en/model_doc/wav2vec2_phoneme.md
+++ b/docs/source/en/model_doc/wav2vec2_phoneme.md
@@ -31,7 +31,13 @@ mapping phonemes of the training languages to the target language using articula
 this simple method significantly outperforms prior work which introduced task-specific architectures and used only part
 of a monolingually pretrained model.*

-Tips:
+Relevant checkpoints can be found under https://huggingface.co/models?other=phoneme-recognition.
+
+This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten)
+
+The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/fairseq/models/wav2vec).
+
+## Usage tips

 - Wav2Vec2Phoneme uses the exact same architecture as Wav2Vec2
 - Wav2Vec2Phoneme is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.
@@ -39,17 +45,16 @@ Tips:
  decoded using [`Wav2Vec2PhonemeCTCTokenizer`].
 - Wav2Vec2Phoneme can be fine-tuned on multiple language at once and decode unseen languages in a single forward pass
  to a sequence of phonemes
- By default the model outputs a sequence of phonemes. In order to transform the phonemes to a sequence of words one
+- By default, the model outputs a sequence of phonemes. In order to transform the phonemes to a sequence of words one
  should make use of a dictionary and language model.

-Relevant checkpoints can be found under https://huggingface.co/models?other=phoneme-recognition.

-This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten)
+<Tip>

-The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/fairseq/models/wav2vec).
-
-Wav2Vec2Phoneme's architecture is based on the Wav2Vec2 model, so one can refer to [`Wav2Vec2`]'s documentation page except for the tokenizer.
+Wav2Vec2Phoneme's architecture is based on the Wav2Vec2 model, for API reference, check out [`Wav2Vec2`](wav2vec2)'s documentation page 
+except for the tokenizer.

+</Tip>

 ## Wav2Vec2PhonemeCTCTokenizer