Add TF ViT MAE (#16255)
* ported TFViTMAEIntermediate and TFViTMAEOutput. * added TFViTMAEModel and TFViTMAEDecoder. * feat: added a noise argument in the implementation for reproducibility. * feat: vit mae models with an additional noise argument for reproducibility. Co-authored-by: ariG23498 <aritra.born2fly@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
This commit is contained in:
@@ -41,13 +41,16 @@ fine-tuning, one can directly plug in the weights into a [`ViTForImageClassifica
|
||||
- Note that the encoder of MAE is only used to encode the visual patches. The encoded patches are then concatenated with mask tokens, which the decoder (which also
|
||||
consists of Transformer blocks) takes as input. Each mask token is a shared, learned vector that indicates the presence of a missing patch to be predicted. Fixed
|
||||
sin/cos position embeddings are added both to the input of the encoder and the decoder.
|
||||
- For a visual understanding of how MAEs work you can check out this [post](https://keras.io/examples/vision/masked_image_modeling/).
|
||||
|
||||
<img src="https://user-images.githubusercontent.com/11435359/146857310-f258c86c-fde6-48e8-9cee-badd2b21bd2c.png"
|
||||
alt="drawing" width="600"/>
|
||||
|
||||
<small> MAE architecture. Taken from the <a href="https://arxiv.org/abs/2111.06377">original paper.</a> </small>
|
||||
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/facebookresearch/mae).
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr). TensorFlow version of the model was contributed by [sayakpaul](https://github.com/sayakpaul) and
|
||||
[ariG23498](https://github.com/ariG23498) (equal contribution). The original code can be found [here](https://github.com/facebookresearch/mae).
|
||||
|
||||
|
||||
## ViTMAEConfig
|
||||
|
||||
@@ -64,3 +67,15 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr). The origi
|
||||
|
||||
[[autodoc]] transformers.ViTMAEForPreTraining
|
||||
- forward
|
||||
|
||||
|
||||
## TFViTMAEModel
|
||||
|
||||
[[autodoc]] TFViTMAEModel
|
||||
- call
|
||||
|
||||
|
||||
## TFViTMAEForPreTraining
|
||||
|
||||
[[autodoc]] transformers.TFViTMAEForPreTraining
|
||||
- call
|
||||
|
||||
Reference in New Issue
Block a user