Add Mask2Former (#20792)
* Adds Mask2Former to transformers Co-authored-by: Shivalika Singh <shivalikasingh95@gmail.com> Co-authored-by: Shivalika Singh <73357305+shivalikasingh95@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
@@ -422,6 +422,8 @@
|
||||
title: ImageGPT
|
||||
- local: model_doc/levit
|
||||
title: LeViT
|
||||
- local: model_doc/mask2former
|
||||
title: Mask2Former
|
||||
- local: model_doc/maskformer
|
||||
title: MaskFormer
|
||||
- local: model_doc/mobilenet_v1
|
||||
|
||||
@@ -133,6 +133,7 @@ The documentation is organized into five sections:
|
||||
1. **[M2M100](model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
|
||||
1. **[MarianMT](model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
|
||||
1. **[MarkupLM](model_doc/markuplm)** (from Microsoft Research Asia) released with the paper [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https://arxiv.org/abs/2110.08518) by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.
|
||||
1. **[Mask2Former](model_doc/mask2former)** (from FAIR and UIUC) released with the paper [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) by Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar.
|
||||
1. **[MaskFormer](model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
|
||||
1. **[mBART](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||
1. **[mBART-50](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
|
||||
@@ -306,6 +307,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
| M2M100 | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| Marian | ✅ | ❌ | ✅ | ✅ | ✅ |
|
||||
| MarkupLM | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| Mask2Former | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| MaskFormer | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| MaskFormerSwin | ❌ | ❌ | ❌ | ❌ | ❌ |
|
||||
| mBART | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
|
||||
48
docs/source/en/model_doc/mask2former.mdx
Normal file
48
docs/source/en/model_doc/mask2former.mdx
Normal file
@@ -0,0 +1,48 @@
|
||||
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Mask2Former
|
||||
|
||||
## Overview
|
||||
|
||||
The Mask2Former model was proposed in [Masked-attention Mask Transformer for Universal Image Segmentation](https://arxiv.org/abs/2112.01527) by Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. Mask2Former is a unified framework for panoptic, instance and semantic segmentation and features significant performance and efficiency improvements over [MaskFormer](maskformer).
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Image segmentation groups pixels with different semantics, e.g., category or instance membership. Each choice
|
||||
of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).*
|
||||
|
||||
Tips:
|
||||
- Mask2Former uses the same preprocessing and postprocessing steps as [MaskFormer](maskformer). Use [`MaskFormerImageProcessor`] or [`AutoImageProcessor`] to prepare images and optional targets for the model.
|
||||
- To get the final segmentation, depending on the task, you can call [`~MaskFormerImageProcessor.post_process_semantic_segmentation`] or [`~MaskFormerImageProcessor.post_process_instance_segmentation`] or [`~MaskFormerImageProcessor.post_process_panoptic_segmentation`]. All three tasks can be solved using [`Mask2FormerForUniversalSegmentation`] output, panoptic segmentation accepts an optional `label_ids_to_fuse` argument to fuse instances of the target object/s (e.g. sky) together.
|
||||
|
||||
This model was contributed by [Shivalika Singh](https://huggingface.co/shivi) and [Alara Dirik](https://huggingface.co/adirik). The original code can be found [here](https://github.com/facebookresearch/Mask2Former).
|
||||
|
||||
## MaskFormer specific outputs
|
||||
|
||||
[[autodoc]] models.mask2former.modeling_mask2former.Mask2FormerModelOutput
|
||||
|
||||
[[autodoc]] models.mask2former.modeling_mask2former.Mask2FormerForUniversalSegmentationOutput
|
||||
|
||||
## Mask2FormerConfig
|
||||
|
||||
[[autodoc]] Mask2FormerConfig
|
||||
|
||||
## Mask2FormerModel
|
||||
|
||||
[[autodoc]] Mask2FormerModel
|
||||
- forward
|
||||
|
||||
## Mask2FormerForUniversalSegmentation
|
||||
|
||||
[[autodoc]] Mask2FormerForUniversalSegmentation
|
||||
- forward
|
||||
@@ -83,4 +83,4 @@ This model was contributed by [francesco](https://huggingface.co/francesco). The
|
||||
## MaskFormerForInstanceSegmentation
|
||||
|
||||
[[autodoc]] MaskFormerForInstanceSegmentation
|
||||
- forward
|
||||
- forward
|
||||
Reference in New Issue
Block a user