Add Text-To-Speech pipeline (#24952)

* add AutoModelForTextToSpeech class * add TTS pipeline and tessting * add docstrings to text_to_speech pipeline * fix torch dependency * corrector 'processor is None' case in Pipeline * correct repo id * modify text-to-speech -> text-to-audio * remove processor * rename text_to_speech pipelines files to text_audio * add textToWaveform and textToSpectrogram instead of textToAudio classes * update TTS pipeline to the bare minimum * update tests TTS pipeline * make style and erase useless import torch in TTS pipeline tests * modify how to check if generate or forward in TTS pipeline * remove unnecessary extra new lines * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * refactor input_texts -> text_inputs * correct docstrings of TTS.__call__ * correct the shape of generated waveform * take care of Bark tokenizer special case * correct run_pipeline_test TTS * make style * update TTS docstrings * address Sylvain nit refactors * make style * refactor into one liners * correct squeeze * correct way to test if forward or generate * Update output audio waveform shape * make style * correct import * modify how the TTS pipeline test if a model can generate * align shape output of TTS pipeline with consistent shape --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-08-17 18:34:47 +02:00
parent c4c0ceff09
commit b8f69d0d10
11 changed files with 425 additions and 0 deletions
--- a/utils/update_metadata.py
+++ b/utils/update_metadata.py
@@ -115,6 +115,7 @@ PIPELINE_TAGS_AND_AUTO_MODELS = [
    ("depth-estimation", "MODEL_FOR_DEPTH_ESTIMATION_MAPPING_NAMES", "AutoModelForDepthEstimation"),
    ("video-classification", "MODEL_FOR_VIDEO_CLASSIFICATION_MAPPING_NAMES", "AutoModelForVideoClassification"),
    ("mask-generation", "MODEL_FOR_MASK_GENERATION_MAPPING_NAMES", "AutoModelForMaskGeneration"),
+    ("text-to-audio", "MODEL_FOR_TEXT_TO_WAVEFORM_NAMES", "AutoModelForTextToWaveform"),
 ]