Add FastSpeech2Conformer (#23439)
* start - docs, SpeechT5 copy and rename * add relevant code from FastSpeech2 draft, have tests pass * make it an actual conformer, demo ex. * matching inference with original repo, includes debug code * refactor nn.Sequentials, start more desc. var names * more renaming * more renaming * vocoder scratchwork * matching vocoder outputs * hifigan vocoder conversion script * convert model script, rename some config vars * replace postnet with speecht5's implementation * passing common tests, file cleanup * expand testing, add output hidden states and attention * tokenizer + passing tokenizer tests * variety of updates and tests * g2p_en pckg setup * import structure edits * docstrings and cleanup * repo consistency * deps * small cleanup * forward signature param order * address comments except for masks and labels * address comments on attention_mask and labels * address second round of comments * remove old unneeded line * address comments part 1 * address comments pt 2 * rename auto mapping * fixes for failing tests * address comments part 3 (bart-like, train loss) * make style * pass config where possible * add forward method + tests to WithHifiGan model * make style * address arg passing and generate_speech comments * address Arthur comments * address Arthur comments pt2 * lint changes * Sanchit comment * add g2p-en to doctest deps * move up self.encoder * onnx compatible tensor method * fix is symbolic * fix paper url * move models to espnet org * make style * make fix-copies * update docstring * Arthur comments * update docstring w/ new updates * add model architecture images * header size * md wording update * make style
This commit is contained in:
@@ -123,6 +123,7 @@ SPECIAL_CASES_TO_ALLOW.update(
|
||||
"DinatConfig": True,
|
||||
"DonutSwinConfig": True,
|
||||
"EfficientFormerConfig": True,
|
||||
"FastSpeech2ConformerConfig": True,
|
||||
"FSMTConfig": True,
|
||||
"JukeboxConfig": True,
|
||||
"LayoutLMv2Config": True,
|
||||
|
||||
@@ -90,6 +90,8 @@ IGNORE_NON_TESTED = PRIVATE_MODELS.copy() + [
|
||||
"UMT5EncoderModel", # Building part of bigger (tested) model.
|
||||
"Blip2QFormerModel", # Building part of bigger (tested) model.
|
||||
"ErnieMForInformationExtraction",
|
||||
"FastSpeech2ConformerHifiGan", # Already tested by SpeechT5HifiGan (# Copied from)
|
||||
"FastSpeech2ConformerWithHifiGan", # Built with two smaller (tested) models.
|
||||
"GraphormerDecoderHead", # Building part of bigger (tested) model.
|
||||
"JukeboxVQVAE", # Building part of bigger (tested) model.
|
||||
"JukeboxPrior", # Building part of bigger (tested) model.
|
||||
@@ -159,6 +161,8 @@ IGNORE_NON_AUTO_CONFIGURED = PRIVATE_MODELS.copy() + [
|
||||
"Blip2QFormerModel",
|
||||
"Blip2VisionModel",
|
||||
"ErnieMForInformationExtraction",
|
||||
"FastSpeech2ConformerHifiGan",
|
||||
"FastSpeech2ConformerWithHifiGan",
|
||||
"GitVisionModel",
|
||||
"GraphormerModel",
|
||||
"GraphormerForGraphClassification",
|
||||
|
||||
Reference in New Issue
Block a user