[SegFormer] TensorFlow port (#17910)

* add: segformer utils and img. classification.

* add: segmentation layer.

* feat: working implementation of segformer.

* chore: remove unused variable.

* add test, remaining modifications.

* remove: unnecessary files.

* add: rest of the files.

Co-authored-by: matt <rocketknight1@gmail.com>

* chore: remove ModuleList comment.

* chore: apply make style.

* chore: apply make fixup-copies.

* add  to check_repo.py

* add decode head to IGNORE_NON_TESTED

* chore: run make style.

* chore: PR comments.

* chore: minor changes to model doc.

* tests: reduction across samples.

* add a note on the space.

* sort importats.

* fix: reduction in loss computation.

* chore: align loss function with that of NER.

* chore: correct utils/documentation_tests.txt

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* chore: simplify the interpolation of logits in loss computation.

* chore: return transposed logits when return_dict=False.

* chore: add link to the tf fine-tuning repo.

* address pr comments.

* address niels's comments.

* remove from_pt=True since tf weights are in.

* remove comment from pt model.

* address niels's comments.

Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
This commit is contained in:
Sayak Paul
2022-07-21 22:52:37 +05:30
committed by GitHub
parent 2c5747edfe
commit 561b9a8c00
12 changed files with 1554 additions and 10 deletions

View File

@@ -278,7 +278,7 @@ Flax), PyTorch, and/or TensorFlow.
| RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
| RoFormer | ✅ | ✅ | ✅ | ✅ | ✅ |
| SegFormer | ❌ | ❌ | ✅ | | ❌ |
| SegFormer | ❌ | ❌ | ✅ | | ❌ |
| SEW | ❌ | ❌ | ✅ | ❌ | ❌ |
| SEW-D | ❌ | ❌ | ✅ | ❌ | ❌ |
| Speech Encoder decoder | ❌ | ❌ | ✅ | ❌ | ✅ |

View File

@@ -36,13 +36,14 @@ The figure below illustrates the architecture of SegFormer. Taken from the [orig
<img width="600" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/segformer_architecture.png"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/NVlabs/SegFormer).
This model was contributed by [nielsr](https://huggingface.co/nielsr). The TensorFlow version
of the model was contributed by [sayakpaul](https://huggingface.co/sayakpaul). The original code can be found [here](https://github.com/NVlabs/SegFormer).
Tips:
- SegFormer consists of a hierarchical Transformer encoder, and a lightweight all-MLP decode head.
- SegFormer consists of a hierarchical Transformer encoder, and a lightweight all-MLP decoder head.
[`SegformerModel`] is the hierarchical Transformer encoder (which in the paper is also referred to
as Mix Transformer or MiT). [`SegformerForSemanticSegmentation`] adds the all-MLP decode head on
as Mix Transformer or MiT). [`SegformerForSemanticSegmentation`] adds the all-MLP decoder head on
top to perform semantic segmentation of images. In addition, there's
[`SegformerForImageClassification`] which can be used to - you guessed it - classify images. The
authors of SegFormer first pre-trained the Transformer encoder on ImageNet-1k to classify images. Next, they throw
@@ -51,6 +52,9 @@ Tips:
found on the [hub](https://huggingface.co/models?other=segformer).
- The quickest way to get started with SegFormer is by checking the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/SegFormer) (which showcase both inference and
fine-tuning on custom data). One can also check out the [blog post](https://huggingface.co/blog/fine-tune-segformer) introducing SegFormer and illustrating how it can be fine-tuned on custom data.
- TensorFlow users should refer to [this repository](https://github.com/deep-diver/segformer-tf-transformers) that shows off-the-shelf inference and fine-tuning.
- One can also check out [this interactive demo on Hugging Face Spaces](https://huggingface.co/spaces/chansung/segformer-tf-transformers)
to try out a SegFormer model on custom images.
- SegFormer works on any input size, as it pads the input to be divisible by `config.patch_sizes`.
- One can use [`SegformerFeatureExtractor`] to prepare images and corresponding segmentation maps
for the model. Note that this feature extractor is fairly basic and does not include all data augmentations used in
@@ -65,7 +69,8 @@ Tips:
used by [`SegformerForSemanticSegmentation`]). However, other datasets use the 0 index as
background class and include this class as part of all labels. In that case, `reduce_labels` should be set to
`False`, as loss should also be computed for the background class.
- As most models, SegFormer comes in different sizes, the details of which can be found in the table below.
- As most models, SegFormer comes in different sizes, the details of which can be found in the table below
(taken from Table 7 of the [original paper](https://arxiv.org/abs/2105.15203)).
| **Model variant** | **Depths** | **Hidden sizes** | **Decoder hidden size** | **Params (M)** | **ImageNet-1k Top 1** |
| :---------------: | ------------- | ------------------- | :---------------------: | :------------: | :-------------------: |
@@ -76,6 +81,10 @@ Tips:
| MiT-b4 | [3, 8, 27, 3] | [64, 128, 320, 512] | 768 | 62.6 | 83.6 |
| MiT-b5 | [3, 6, 40, 3] | [64, 128, 320, 512] | 768 | 82.0 | 83.8 |
Note that MiT in the above table refers to the Mix Transformer encoder backbone introduced in SegFormer. For
SegFormer's results on the segmentation datasets like ADE20k, refer to the [paper](https://arxiv.org/abs/2105.15203).
## SegformerConfig
[[autodoc]] SegformerConfig
@@ -104,3 +113,23 @@ Tips:
[[autodoc]] SegformerForSemanticSegmentation
- forward
## TFSegformerDecodeHead
[[autodoc]] TFSegformerDecodeHead
- call
## TFSegformerModel
[[autodoc]] TFSegformerModel
- call
## TFSegformerForImageClassification
[[autodoc]] TFSegformerForImageClassification
- call
## TFSegformerForSemanticSegmentation
[[autodoc]] TFSegformerForSemanticSegmentation
- call