Matthijs Hollemans
4ece3b9433
add VITS model (#24085)
* add VITS model
* let's vits
* finish TextEncoder (mostly)
* rename VITS to Vits
* add StochasticDurationPredictor
* ads flow model
* add generator
* correctly set vocab size
* add tokenizer
* remove processor & feature extractor
* add PosteriorEncoder
* add missing weights to SDP
* also convert LJSpeech and VCTK checkpoints
* add training stuff in forward
* add placeholder tests for tokenizer
* add placeholder tests for model
* starting cleanup
* let the great renaming begin!
* use config
* global_conditioning
* more cleaning
* renaming variables
* more renaming
* more renaming
* it never ends
* reticulating the splines
* more renaming
* HiFi-GAN
* doc strings for main model
* fixup
* fix-copies
* don't make it a PreTrainedModel
* fixup
* rename config options
* remove training logic from forward pass
* simplify relative position
* use actual checkpoint
* style
* PR review fixes
* more review changes
* fixup
* more unit tests
* fixup
* fix doc test
* add integration test
* improve tokenizer tests
* add tokenizer integration test
* fix tests on GPU (gave OOM)
* conversion script can handle repos from hub
* add conversion script for all MMS-TTS checkpoints
* automatically create a README for the converted checkpoint
* small changes to config
* push README to hub
* only show uroman note for checkpoints that need it
* remove conversion script because code formatting breaks the readme
* make WaveNet layers configurable
* rename variables
* simplifying the math
* output attentions and hidden states
* remove VitsFlip in flow model
* also got rid of the other flip
* fix tests
* rename more variables
* rename tokenizer, add phonemization
* raise error when phonemizer missing
* re-order config docstrings to match method
* change config naming
* remove redundant str -> list
* fix copyright: vits authors -> kakao enterprise
* (mean, log_variances) -> (prior_mean, prior_log_variances)
* if return dict -> if not return dict
* speed -> speaking rate
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* update fused tanh sigmoid
* reduce dims in tester
* audio -> output_values
* audio -> output_values in tuple out
* fix return type
* fix return type
* make _unconstrained_rational_quadratic_spline a function
* all nn's to accept a config
* add spectro to output
* move {speaking rate, noise scale, noise scale duration} to config
* path -> attn_path
* idxs -> valid idxs -> padded idxs
* output values -> waveform
* use config for attention
* make generation work
* harden integration test
* add spectrogram to dict output
* tokenizer refactor
* make style
* remove 'fake' padding token
* harden tokenizer tests
* ron norm test
* fprop / save tests deterministic
* move uroman to tokenizer as much as possible
* better logger message
* fix vivit imports
* add uroman integration test
* make style
* up
* matthijs -> sanchit-gandhi
* fix tokenizer test
* make fix-copies
* fix dict comprehension
* fix config tests
* fix model tests
* make outputs consistent with reverse/not reverse
* fix key concat
* more model details
* add author
* return dict
* speaker error
* labels error
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/vits/convert_original_checkpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* remove uromanize
* add docstrings
* add docstrings for tokenizer
* upper-case skip messages
* fix return dict
* style
* finish tests
* update checkpoints
* make style
* remove doctest file
* revert
* fix docstring
* fix tokenizer
* remove uroman integration test
* add sampling rate
* fix docs / docstrings
* style
* add sr to model output
* fix outputs
* style / copies
* fix docstring
* fix copies
* remove sr from model outputs
* Update utils/documentation_tests.txt
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add sr as allowed attr
---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-09-01 10:50:06 +01:00
..
2022-11-08 19:54:41 +00:00
2021-02-15 07:55:10 -05:00
2023-04-06 18:08:14 +02:00
2023-05-18 14:14:43 -04:00
2023-09-01 10:50:06 +01:00
2023-07-23 21:17:26 -07:00
2023-08-10 10:53:22 +02:00
2023-08-10 10:53:22 +02:00
2023-08-21 11:08:38 +02:00
2023-08-10 10:53:22 +02:00
2023-08-10 10:53:22 +02:00
2023-03-13 19:11:19 +01:00
2023-08-29 11:05:27 +01:00
2023-06-06 18:17:41 +02:00
2023-08-17 07:58:35 +02:00
2023-08-17 07:58:35 +02:00
2021-02-15 07:55:10 -05:00
2023-08-04 15:13:14 +02:00
2023-08-17 07:58:35 +02:00
2023-09-01 10:50:06 +01:00
2021-10-07 12:44:23 +05:30
2023-02-28 17:12:44 +01:00
2023-04-19 19:27:37 +02:00
2023-02-28 17:12:44 +01:00
2023-02-03 12:57:02 -05:00
2023-04-21 20:36:35 +02:00
2023-03-01 17:53:29 +01:00
2023-08-29 16:15:05 +01:00
2023-06-20 18:07:47 -04:00
2023-06-12 21:27:10 +02:00
2023-03-30 21:06:35 +02:00
2022-06-02 10:24:16 +02:00
2023-08-17 07:58:35 +02:00
2023-08-29 16:15:05 +01:00
2023-08-17 07:58:35 +02:00
2023-08-29 16:15:05 +01:00
2023-08-18 22:01:35 +02:00
2023-04-06 22:52:59 +02:00