[Data2Vec] Add data2vec vision (#16760)

* save intermediate * add vision * add vision * save * finish models * finish models * continue * finish * up * up * up * tests all pass * clean up * up * up * fix bugs in beit * correct docs * finish * finish docs * make style * up * more fixes * fix type hint * make style * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/data2vec/test_modeling_data2vec_vision.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * fix test Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-18 17:52:13 +02:00
parent 33cd4be576
commit 8d3f952adb
15 changed files with 2362 additions and 15 deletions
--- a/docs/source/en/model_doc/data2vec.mdx
+++ b/docs/source/en/model_doc/data2vec.mdx
@@ -33,10 +33,13 @@ Models and code are available at www.github.com/pytorch/fairseq/tree/master/exam

 Tips:

- Both Data2VecAudio and Data2VecText have been trained using the same self-supervised learning method.
-  In the case of Data2VecAudio, preprocessing is identical to [`RobertaModel`], including tokenization.
+- Data2VecAudio, Data2VecText, and Data2VecVision have all been trained using the same self-supervised learning method.
+- For Data2VecAudio, preprocessing is identical to [`Wav2Vec2Model`], including feature extraction
+- For Data2VecText, preprocessing is identical to [`RobertaModel`], including tokenization.
+- For Data2VecVision, preprocessing is identical to [`BeitModel`], including feature extraction.
+
+This model was contributed by [edugp](https://huggingface.co/edugp) and [patrickvonplaten](https://huggingface.co/patrickvonplaten)

-This model was contributed by [edugp](https://huggingface.co/edugp).
 The original code can be found [here](https://github.com/pytorch/fairseq/tree/main/examples/data2vec).


@@ -48,12 +51,16 @@ The original code can be found [here](https://github.com/pytorch/fairseq/tree/ma

 [[autodoc]] Data2VecAudioConfig

+## Data2VecVisionConfig
+
+[[autodoc]] Data2VecVisionConfig
+
+
 ## Data2VecAudioModel

 [[autodoc]] Data2VecAudioModel
    - forward

-
 ## Data2VecAudioForAudioFrameClassification

 [[autodoc]] Data2VecAudioForAudioFrameClassification
@@ -108,3 +115,18 @@ The original code can be found [here](https://github.com/pytorch/fairseq/tree/ma

 [[autodoc]] Data2VecTextForQuestionAnswering
    - forward
+
+## Data2VecVisionModel
+
+[[autodoc]] Data2VecVisionModel
+    - forward
+
+## Data2VecVisionForImageClassification
+
+[[autodoc]] Data2VecVisionForImageClassification
+    - forward
+
+## Data2VecVisionForSemanticSegmentation
+
+[[autodoc]] Data2VecVisionForSemanticSegmentation
+    - forward