AutoImageProcessor (#20111)
* AutoImageProcessor skeleton * Update references * Add mapping in init * Add model image processors to __init__ for importing * Add AutoImageProcessor tests * Fix up * Image Processor documentation * Remove pdb * Update docs/source/en/model_doc/mobilevit.mdx * Update docs * Don't add whitespace on json files * Remove fixtures * Move checking model config down * Fix up * Add check for image processor * Remove FeatureExtractorMixin in docstrings * Rename model_tmpfile to config_tmpfile * Don't make None if not in image processor map
This commit is contained in:
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
|
||||
|
||||
## Overview
|
||||
|
||||
The MobileViT model was proposed in [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari. MobileViT introduces a new layer that replaces local processing in convolutions with global processing using transformers.
|
||||
The MobileViT model was proposed in [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari. MobileViT introduces a new layer that replaces local processing in convolutions with global processing using transformers.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
@@ -25,10 +25,10 @@ Tips:
|
||||
- MobileViT is more like a CNN than a Transformer model. It does not work on sequence data but on batches of images. Unlike ViT, there are no embeddings. The backbone model outputs a feature map. You can follow [this tutorial](https://keras.io/examples/vision/mobilevit) for a lightweight introduction.
|
||||
- One can use [`MobileViTFeatureExtractor`] to prepare images for the model. Note that if you do your own preprocessing, the pretrained checkpoints expect images to be in BGR pixel order (not RGB).
|
||||
- The available image classification checkpoints are pre-trained on [ImageNet-1k](https://huggingface.co/datasets/imagenet-1k) (also referred to as ILSVRC 2012, a collection of 1.3 million images and 1,000 classes).
|
||||
- The segmentation model uses a [DeepLabV3](https://arxiv.org/abs/1706.05587) head. The available semantic segmentation checkpoints are pre-trained on [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/).
|
||||
- As the name suggests MobileViT was designed to be performant and efficient on mobile phones. The TensorFlow versions of the MobileViT models are fully compatible with [TensorFlow Lite](https://www.tensorflow.org/lite).
|
||||
- The segmentation model uses a [DeepLabV3](https://arxiv.org/abs/1706.05587) head. The available semantic segmentation checkpoints are pre-trained on [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/).
|
||||
- As the name suggests MobileViT was designed to be performant and efficient on mobile phones. The TensorFlow versions of the MobileViT models are fully compatible with [TensorFlow Lite](https://www.tensorflow.org/lite).
|
||||
|
||||
You can use the following code to convert a MobileViT checkpoint (be it image classification or semantic segmentation) to generate a
|
||||
You can use the following code to convert a MobileViT checkpoint (be it image classification or semantic segmentation) to generate a
|
||||
TensorFlow Lite model:
|
||||
|
||||
```py
|
||||
@@ -52,7 +52,7 @@ with open(tflite_filename, "wb") as f:
|
||||
```
|
||||
|
||||
The resulting model will be just **about an MB** making it a good fit for mobile applications where resources and network
|
||||
bandwidth can be constrained.
|
||||
bandwidth can be constrained.
|
||||
|
||||
|
||||
This model was contributed by [matthijs](https://huggingface.co/Matthijs). The TensorFlow version of the model was contributed by [sayakpaul](https://huggingface.co/sayakpaul). The original code and weights can be found [here](https://github.com/apple/ml-cvnets).
|
||||
@@ -68,6 +68,12 @@ This model was contributed by [matthijs](https://huggingface.co/Matthijs). The T
|
||||
- __call__
|
||||
- post_process_semantic_segmentation
|
||||
|
||||
## MobileViTImageProcessor
|
||||
|
||||
[[autodoc]] MobileViTImageProcessor
|
||||
- preprocess
|
||||
- post_process_semantic_segmentation
|
||||
|
||||
## MobileViTModel
|
||||
|
||||
[[autodoc]] MobileViTModel
|
||||
@@ -86,14 +92,14 @@ This model was contributed by [matthijs](https://huggingface.co/Matthijs). The T
|
||||
## TFMobileViTModel
|
||||
|
||||
[[autodoc]] TFMobileViTModel
|
||||
- call
|
||||
- call
|
||||
|
||||
## TFMobileViTForImageClassification
|
||||
|
||||
[[autodoc]] TFMobileViTForImageClassification
|
||||
- call
|
||||
- call
|
||||
|
||||
## TFMobileViTForSemanticSegmentation
|
||||
|
||||
[[autodoc]] TFMobileViTForSemanticSegmentation
|
||||
- call
|
||||
- call
|
||||
|
||||
Reference in New Issue
Block a user