* Add data2vec model cloned from roberta
* Add checkpoint conversion script
* Fix copies
* Update docs
* Add checkpoint conversion script
* Remove fairseq data2vec_text script and fix format
* Add comment on where to get data2vec_text.py
* Remove mock implementation cheat.py and fix style
* Fix copies
* Remove TF and Flax classes from init
* Add back copy from fairseq data2vec_text.py and fix style
* Update model name in docs/source/index.mdx to be CamelCase
* Revert model name in table to lower-case to get check_table test to pass
* Update src/transformers/models/data2vec/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/model_doc/data2vec.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/model_doc/data2vec.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update documentation
* Copy-paste Data2VecConfig from BertConfig
* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency
* Update config special tokens to match RoBERTa
* Split multiple assertions and add individual error messages
* Rename Data2VecModel to Data2VecForTextModel
* Add Data2Vec to _toctree.yml
* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings
* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).
* finish audio model
* finish audio file
* Update names and fix style, quality and repo consistency
* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.
* add inputs to logits to data2vec'
* correct autio models
* correct config auto
* correct tok auto
* Update utils/tests_fetcher.py
* delete unnecessary files
* delete unnecessary files
* further renaming
* make all tests pass
* finish
* remove useless test file
* Update tests/test_modeling_common.py
* Update utils/check_repo.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec_text.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix copies
* Update docs
* Remove fairseq data2vec_text script and fix format
* Add comment on where to get data2vec_text.py
* Remove mock implementation cheat.py and fix style
* Fix copies
* Remove TF and Flax classes from init
* Add back copy from fairseq data2vec_text.py and fix style
* Update model name in docs/source/index.mdx to be CamelCase
* Revert model name in table to lower-case to get check_table test to pass
* Update documentation
* Update src/transformers/models/data2vec/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Copy-paste Data2VecConfig from BertConfig
* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency
* Update config special tokens to match RoBERTa
* Split multiple assertions and add individual error messages
* Rename Data2VecModel to Data2VecForTextModel
* Add Data2Vec to _toctree.yml
* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings
* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).
* finish audio model
* finish audio file
* add inputs to logits to data2vec'
* Update names and fix style, quality and repo consistency
* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.
* correct autio models
* correct config auto
* correct tok auto
* delete unnecessary files
* delete unnecessary files
* Update utils/tests_fetcher.py
* further renaming
* make all tests pass
* finish
* remove useless test file
* Update tests/test_modeling_common.py
* Update utils/check_repo.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec_text.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Move data2vec tests to new structure
* Fix test imports for text tests
* Remove fairseq files
* Change paper link to arxiv
* Modify Data2Vec documentation to reflect that the encoder is not shared across the audio and text models in the current implementation.
* Update text model checkpoint to be facebook/data2vec-text-base
* Add 'Copy from' statements and update paper links and docs
* fix copy from statements
* improve copied from
* correct more copied from statements
* finish copied from stuff
* make style
* add model to README
* add to master
Co-authored-by: Eduardo Gonzalez Ponferrada <eduardo@ferrumhealth.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>