AutoImageProcessor (#20111)
* AutoImageProcessor skeleton * Update references * Add mapping in init * Add model image processors to __init__ for importing * Add AutoImageProcessor tests * Fix up * Image Processor documentation * Remove pdb * Update docs/source/en/model_doc/mobilevit.mdx * Update docs * Don't add whitespace on json files * Remove fixtures * Move checking model config down * Fix up * Add check for image processor * Remove FeatureExtractorMixin in docstrings * Rename model_tmpfile to config_tmpfile * Don't make None if not in image processor map
This commit is contained in:
@@ -22,7 +22,7 @@ The abstract from the paper is the following:
|
||||
*We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks. We assemble tokens from various stages of the vision transformer into image-like representations at various resolutions and progressively combine them into full-resolution predictions using a convolutional decoder. The transformer backbone processes representations at a constant and relatively high resolution and has a global receptive field at every stage. These properties allow the dense vision transformer to provide finer-grained and more globally coherent predictions when compared to fully-convolutional networks. Our experiments show that this architecture yields substantial improvements on dense prediction tasks, especially when a large amount of training data is available. For monocular depth estimation, we observe an improvement of up to 28% in relative performance when compared to a state-of-the-art fully-convolutional network. When applied to semantic segmentation, dense vision transformers set a new state of the art on ADE20K with 49.02% mIoU. We further show that the architecture can be fine-tuned on smaller datasets such as NYUv2, KITTI, and Pascal Context where it also sets the new state of the art.*
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dpt_architecture.jpg"
|
||||
alt="drawing" width="600"/>
|
||||
alt="drawing" width="600"/>
|
||||
|
||||
<small> DPT architecture. Taken from the <a href="https://arxiv.org/abs/2103.13413" target="_blank">original paper</a>. </small>
|
||||
|
||||
@@ -40,6 +40,13 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr). The origi
|
||||
- post_process_semantic_segmentation
|
||||
|
||||
|
||||
## DPTImageProcessor
|
||||
|
||||
[[autodoc]] DPTImageProcessor
|
||||
- preprocess
|
||||
- post_process_semantic_segmentation
|
||||
|
||||
|
||||
## DPTModel
|
||||
|
||||
[[autodoc]] DPTModel
|
||||
@@ -55,4 +62,4 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr). The origi
|
||||
## DPTForSemanticSegmentation
|
||||
|
||||
[[autodoc]] DPTForSemanticSegmentation
|
||||
- forward
|
||||
- forward
|
||||
|
||||
Reference in New Issue
Block a user