[Docs] Model_doc structure/clarity improvements (#26876)

* first batch of structure improvements for model_docs

* second batch of structure improvements for model_docs

* more structure improvements for model_docs

* more structure improvements for model_docs

* structure improvements for cv model_docs

* more structural refactoring

* addressed feedback about image processors
This commit is contained in:
Maria Khalusova
2023-11-03 10:57:03 -04:00
committed by GitHub
parent ad8ff96224
commit 5964f820db
223 changed files with 1796 additions and 1116 deletions

View File

@@ -31,7 +31,13 @@ mapping phonemes of the training languages to the target language using articula
this simple method significantly outperforms prior work which introduced task-specific architectures and used only part
of a monolingually pretrained model.*
Tips:
Relevant checkpoints can be found under https://huggingface.co/models?other=phoneme-recognition.
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten)
The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/fairseq/models/wav2vec).
## Usage tips
- Wav2Vec2Phoneme uses the exact same architecture as Wav2Vec2
- Wav2Vec2Phoneme is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.
@@ -39,17 +45,16 @@ Tips:
decoded using [`Wav2Vec2PhonemeCTCTokenizer`].
- Wav2Vec2Phoneme can be fine-tuned on multiple language at once and decode unseen languages in a single forward pass
to a sequence of phonemes
- By default the model outputs a sequence of phonemes. In order to transform the phonemes to a sequence of words one
- By default, the model outputs a sequence of phonemes. In order to transform the phonemes to a sequence of words one
should make use of a dictionary and language model.
Relevant checkpoints can be found under https://huggingface.co/models?other=phoneme-recognition.
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten)
<Tip>
The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/fairseq/models/wav2vec).
Wav2Vec2Phoneme's architecture is based on the Wav2Vec2 model, so one can refer to [`Wav2Vec2`]'s documentation page except for the tokenizer.
Wav2Vec2Phoneme's architecture is based on the Wav2Vec2 model, for API reference, check out [`Wav2Vec2`](wav2vec2)'s documentation page
except for the tokenizer.
</Tip>
## Wav2Vec2PhonemeCTCTokenizer