Add MAE (#15120)
* First draft * More improvements * More improvements * More improvements * Fix embeddings * Add conversion script * Finish conversion script * More improvements * Fix forward pass * Remove print statements * Add weights initialization * Add initialization of decoder weights * Add support for other models in the conversion script * Fix patch_size for huge model * Fix most of the tests * Fix integration test * Fix docs * Fix archive_list * Apply suggestions from code review * Improve documentation * Apply more suggestions * Skip some tests due to non-deterministic behaviour * Fix test_initialization * Remove unneccessary initialization of nn.Embedding * Improve docs * Fix dummies * Remove ViTMAEFeatureExtractor from docs * Add model to README and table of contents * Delete inference file
This commit is contained in:
@@ -288,6 +288,8 @@
|
||||
title: Vision Text Dual Encoder
|
||||
- local: model_doc/vit
|
||||
title: Vision Transformer (ViT)
|
||||
- local: model_doc/vit_mae
|
||||
title: ViTMAE
|
||||
- local: model_doc/visual_bert
|
||||
title: VisualBERT
|
||||
- local: model_doc/wav2vec2
|
||||
|
||||
Reference in New Issue
Block a user