Add InstructBLIP (#23460)

* Squash 88 commits * Use markdown * Remove mdx files due to bad rebase * Fix modeling files due to bad rebase * Fix style * Update comment * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-06-26 11:23:57 +02:00
parent 8e164c5400
commit 868363abb9
32 changed files with 3425 additions and 21 deletions
--- a/docs/source/en/index.md
+++ b/docs/source/en/index.md
@@ -139,6 +139,7 @@ The documentation is organized into five sections:
 1. **[I-BERT](model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
 1. **[ImageGPT](model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
 1. **[Informer](model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
+1. **[InstructBLIP](model_doc/instructblip)** (from Salesforce) released with the paper [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi.
 1. **[Jukebox](model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever.
 1. **[LayoutLM](model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
 1. **[LayoutLMv2](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
@@ -348,6 +349,7 @@ Flax), PyTorch, and/or TensorFlow.
 |            I-BERT             |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
 |           ImageGPT            |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
 |           Informer            |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+|         InstructBLIP          |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
 |            Jukebox            |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
 |           LayoutLM            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
 |          LayoutLMv2           |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |