Add SeamlessM4T v2 (#27779)

* add working convertion script

* first non-working version of modeling code

* update modeling code (working)

* make style

* make fix-copies

* add config docstrings

* add config to ignore docstrings formatage due to unconventional markdown

* fix copies

* fix generation num_return_sequences

* enrich docs

* add and fix tests beside integration tests

* update integration tests

* update repo id

* add tie weights and make style

* correct naming in .md

* fix imports and so on

* correct docstrings

* fix fp16 speech forward

* fix speechencoder attention

* make style

* fix copied from

* rename SeamlessM4Tv2-v2 to SeamlessM4Tv2

* Apply suggestions on configuration

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove useless public models

* fix private models + better naming for T2U models

* clean speech encoder relative position embeddings

* refactor chunk attention

* add docstrings to chunk attention method

* improve naming and docstrings

* rename some attention variables + add temperature sampling in T2U model

* rename DOCSTRINGS variable names

* make style + remove 2 useless config parameters

* enrich model card

* remove any attention_head reference + fix temperature in T2U

* new fmt and make style

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* rename spkr_id->speaker_id and change docstrings of get_char_input_ids

* simplify v2attention

* make style

* Update seamless_m4t_v2.md

* update code and tests with last update

* update repo ids

* fill article name, abstract andauthors

* update not_doctested and slow_doc tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
Yoach Lacombe
2023-11-30 19:24:43 +00:00
committed by GitHub
parent 510270af34
commit 29f1aee3b6
31 changed files with 7179 additions and 216 deletions

View File

@@ -35,7 +35,7 @@ The task illustrated in this tutorial is supported by the following model archit
<!--This tip is automatically generated by `make fix-copies`, do not fill manually!-->
[BART](../model_doc/bart), [BigBird-Pegasus](../model_doc/bigbird_pegasus), [Blenderbot](../model_doc/blenderbot), [BlenderbotSmall](../model_doc/blenderbot-small), [Encoder decoder](../model_doc/encoder-decoder), [FairSeq Machine-Translation](../model_doc/fsmt), [GPTSAN-japanese](../model_doc/gptsan-japanese), [LED](../model_doc/led), [LongT5](../model_doc/longt5), [M2M100](../model_doc/m2m_100), [Marian](../model_doc/marian), [mBART](../model_doc/mbart), [MT5](../model_doc/mt5), [MVP](../model_doc/mvp), [NLLB](../model_doc/nllb), [NLLB-MOE](../model_doc/nllb-moe), [Pegasus](../model_doc/pegasus), [PEGASUS-X](../model_doc/pegasus_x), [PLBart](../model_doc/plbart), [ProphetNet](../model_doc/prophetnet), [SeamlessM4T](../model_doc/seamless_m4t), [SwitchTransformers](../model_doc/switch_transformers), [T5](../model_doc/t5), [UMT5](../model_doc/umt5), [XLM-ProphetNet](../model_doc/xlm-prophetnet)
[BART](../model_doc/bart), [BigBird-Pegasus](../model_doc/bigbird_pegasus), [Blenderbot](../model_doc/blenderbot), [BlenderbotSmall](../model_doc/blenderbot-small), [Encoder decoder](../model_doc/encoder-decoder), [FairSeq Machine-Translation](../model_doc/fsmt), [GPTSAN-japanese](../model_doc/gptsan-japanese), [LED](../model_doc/led), [LongT5](../model_doc/longt5), [M2M100](../model_doc/m2m_100), [Marian](../model_doc/marian), [mBART](../model_doc/mbart), [MT5](../model_doc/mt5), [MVP](../model_doc/mvp), [NLLB](../model_doc/nllb), [NLLB-MOE](../model_doc/nllb-moe), [Pegasus](../model_doc/pegasus), [PEGASUS-X](../model_doc/pegasus_x), [PLBart](../model_doc/plbart), [ProphetNet](../model_doc/prophetnet), [SeamlessM4T](../model_doc/seamless_m4t), [SeamlessM4Tv2](../model_doc/seamless_m4t_v2), [SwitchTransformers](../model_doc/switch_transformers), [T5](../model_doc/t5), [UMT5](../model_doc/umt5), [XLM-ProphetNet](../model_doc/xlm-prophetnet)
<!--End of the generated tip-->