Add batch of resources (#20647)

* Add resources

* Add more resources

* Add more resources

* Add TAPAS

* Fix pipeline tag

* Fix pipeline tags

* Remove pipeline tag

* Remove depth-estimation tag

* Update docs/source/en/model_doc/segformer.mdx

Co-authored-by: Maria Khalusova <kafooster@gmail.com>

* Apply suggestion

* Fix segformer

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Maria Khalusova <kafooster@gmail.com>
This commit is contained in:
NielsRogge
2023-01-17 17:18:56 +01:00
committed by GitHub
parent bb300ac686
commit cf028d0c3d
34 changed files with 322 additions and 50 deletions

View File

@@ -91,14 +91,21 @@ In Computer Vision:
- [Image classification with ViT](https://huggingface.co/google/vit-base-patch16-224) - [Image classification with ViT](https://huggingface.co/google/vit-base-patch16-224)
- [Object Detection with DETR](https://huggingface.co/facebook/detr-resnet-50) - [Object Detection with DETR](https://huggingface.co/facebook/detr-resnet-50)
- [Semantic Segmentation with SegFormer](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512) - [Semantic Segmentation with SegFormer](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512)
- [Panoptic Segmentation with DETR](https://huggingface.co/facebook/detr-resnet-50-panoptic) - [Panoptic Segmentation with MaskFormer](https://huggingface.co/facebook/maskformer-swin-small-coco)
- [Depth Estimation with DPT](https://huggingface.co/docs/transformers/model_doc/dpt)
- [Video Classification with VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)
In Audio: In Audio:
- [Automatic Speech Recognition with Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h) - [Automatic Speech Recognition with Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h)
- [Keyword Spotting with Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks) - [Keyword Spotting with Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks)
- [Audio Classification with Audio Spectrogram Transformer](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593)
In Multimodal tasks: In Multimodal tasks:
- [Table Question Answering with TAPAS](https://huggingface.co/google/tapas-base-finetuned-wtq)
- [Visual Question Answering with ViLT](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa) - [Visual Question Answering with ViLT](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa)
- [Zero-shot Image Classification with CLIP](https://huggingface.co/openai/clip-vit-large-patch14)
- [Document Question Answering with LayoutLM](https://huggingface.co/impira/layoutlm-document-qa)
- [Zero-shot Video Classification with X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team, is the official demo of this repos text generation capabilities. **[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team, is the official demo of this repos text generation capabilities.

View File

@@ -67,6 +67,15 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The JAX/FLAX version of this model was This model was contributed by [nielsr](https://huggingface.co/nielsr). The JAX/FLAX version of this model was
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/beit). contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/beit).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BEiT.
<PipelineTag pipeline="image-classification"/>
- [`BeitForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## BEiT specific outputs ## BEiT specific outputs

View File

@@ -30,7 +30,6 @@ impact on transfer learning.
This model was contributed by [nielsr](https://huggingface.co/nielsr). This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/google-research/big_transfer). The original code can be found [here](https://github.com/google-research/big_transfer).
## Resources ## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BiT. A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BiT.
@@ -45,19 +44,16 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] BitConfig [[autodoc]] BitConfig
## BitImageProcessor ## BitImageProcessor
[[autodoc]] BitImageProcessor [[autodoc]] BitImageProcessor
- preprocess - preprocess
## BitModel ## BitModel
[[autodoc]] BitModel [[autodoc]] BitModel
- forward - forward
## BitForImageClassification ## BitForImageClassification
[[autodoc]] BitForImageClassification [[autodoc]] BitForImageClassification

View File

@@ -77,23 +77,14 @@ This model was contributed by [valhalla](https://huggingface.co/valhalla). The o
## Resources ## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CLIP. If you're A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CLIP.
interested in submitting a resource to be included here, please feel free to open a Pull Request and we will review it.
- A blog post on [How to fine-tune CLIP on 10,000 image-text pairs](https://huggingface.co/blog/fine-tune-clip-rsicd).
- CLIP is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we will review it.
The resource should ideally demonstrate something new instead of duplicating an existing resource. The resource should ideally demonstrate something new instead of duplicating an existing resource.
<PipelineTag pipeline="text-to-image"/>
- A blog post on [How to use CLIP to retrieve images from text](https://huggingface.co/blog/fine-tune-clip-rsicd).
- A blog bost on [How to use CLIP for Japanese text to image generation](https://huggingface.co/blog/japanese-stable-diffusion).
<PipelineTag pipeline="image-to-text"/>
- A notebook showing [Video to text matching with CLIP for videos](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/X-CLIP/Video_text_matching_with_X_CLIP.ipynb).
<PipelineTag pipeline="zero-shot-classification"/>
- A notebook showing [Zero shot video classification using CLIP for video](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/X-CLIP/Zero_shot_classify_a_YouTube_video_with_X_CLIP.ipynb).
## CLIPConfig ## CLIPConfig
[[autodoc]] CLIPConfig [[autodoc]] CLIPConfig

View File

@@ -40,16 +40,24 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr). TensorFlow version of the model was contributed by [ariG23498](https://github.com/ariG23498), This model was contributed by [nielsr](https://huggingface.co/nielsr). TensorFlow version of the model was contributed by [ariG23498](https://github.com/ariG23498),
[gante](https://github.com/gante), and [sayakpaul](https://github.com/sayakpaul) (equal contribution). The original code can be found [here](https://github.com/facebookresearch/ConvNeXt). [gante](https://github.com/gante), and [sayakpaul](https://github.com/sayakpaul) (equal contribution). The original code can be found [here](https://github.com/facebookresearch/ConvNeXt).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ConvNeXT.
<PipelineTag pipeline="image-classification"/>
- [`ConvNextForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## ConvNextConfig ## ConvNextConfig
[[autodoc]] ConvNextConfig [[autodoc]] ConvNextConfig
## ConvNextFeatureExtractor ## ConvNextFeatureExtractor
[[autodoc]] ConvNextFeatureExtractor [[autodoc]] ConvNextFeatureExtractor
## ConvNextImageProcessor ## ConvNextImageProcessor
[[autodoc]] ConvNextImageProcessor [[autodoc]] ConvNextImageProcessor
@@ -60,7 +68,6 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr). TensorFlo
[[autodoc]] ConvNextModel [[autodoc]] ConvNextModel
- forward - forward
## ConvNextForImageClassification ## ConvNextForImageClassification
[[autodoc]] ConvNextForImageClassification [[autodoc]] ConvNextForImageClassification

View File

@@ -38,6 +38,16 @@ Tips:
This model was contributed by [anugunj](https://huggingface.co/anugunj). The original code can be found [here](https://github.com/microsoft/CvT). This model was contributed by [anugunj](https://huggingface.co/anugunj). The original code can be found [here](https://github.com/microsoft/CvT).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CvT.
<PipelineTag pipeline="image-classification"/>
- [`CvtForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## CvtConfig ## CvtConfig
[[autodoc]] CvtConfig [[autodoc]] CvtConfig

View File

@@ -37,9 +37,6 @@ Tips:
- For Data2VecAudio, preprocessing is identical to [`Wav2Vec2Model`], including feature extraction - For Data2VecAudio, preprocessing is identical to [`Wav2Vec2Model`], including feature extraction
- For Data2VecText, preprocessing is identical to [`RobertaModel`], including tokenization. - For Data2VecText, preprocessing is identical to [`RobertaModel`], including tokenization.
- For Data2VecVision, preprocessing is identical to [`BeitModel`], including feature extraction. - For Data2VecVision, preprocessing is identical to [`BeitModel`], including feature extraction.
- To know how a pre-trained Data2Vec vision model can be fine-tuned on the task of image classification, you can check out
[this notebook](https://colab.research.google.com/github/sayakpaul/TF-2.0-Hacks/blob/master/data2vec_vision_image_classification.ipynb).
This model was contributed by [edugp](https://huggingface.co/edugp) and [patrickvonplaten](https://huggingface.co/patrickvonplaten). This model was contributed by [edugp](https://huggingface.co/edugp) and [patrickvonplaten](https://huggingface.co/patrickvonplaten).
[sayakpaul](https://github.com/sayakpaul) and [Rocketknight1](https://github.com/Rocketknight1) contributed Data2Vec for vision in TensorFlow. [sayakpaul](https://github.com/sayakpaul) and [Rocketknight1](https://github.com/Rocketknight1) contributed Data2Vec for vision in TensorFlow.
@@ -48,6 +45,17 @@ The original code (for NLP and Speech) can be found [here](https://github.com/py
The original code for vision can be found [here](https://github.com/facebookresearch/data2vec_vision/tree/main/beit). The original code for vision can be found [here](https://github.com/facebookresearch/data2vec_vision/tree/main/beit).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Data2Vec.
<PipelineTag pipeline="image-classification"/>
- [`Data2VecVisionForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
- To fine-tune [`TFData2VecVisionForImageClassification`] on a custom dataset, see [this notebook](https://colab.research.google.com/github/sayakpaul/TF-2.0-Hacks/blob/master/data2vec_vision_image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## Data2VecTextConfig ## Data2VecTextConfig
[[autodoc]] Data2VecTextConfig [[autodoc]] Data2VecTextConfig

View File

@@ -24,7 +24,7 @@ The abstract from the paper is the following:
Tips: Tips:
- One can use [`DeformableDetrImageProcessor`] to prepare images (and optional targets) for the model. - One can use [`DeformableDetrImageProcessor`] to prepare images (and optional targets) for the model.
- Training Deformable DETR is equivalent to training the original [DETR](detr) model. Demo notebooks can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR). - Training Deformable DETR is equivalent to training the original [DETR](detr) model. See the [resources](#resources) section below for demo notebooks.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/deformable_detr_architecture.png" <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/deformable_detr_architecture.png"
alt="drawing" width="600"/> alt="drawing" width="600"/>
@@ -33,6 +33,16 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/fundamentalvision/Deformable-DETR). This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/fundamentalvision/Deformable-DETR).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Deformable DETR.
<PipelineTag pipeline="object-detection"/>
- Demo notebooks regarding inference + fine-tuning on a custom dataset for [`DeformableDetrForObjectDetection`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Deformable-DETR).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## DeformableDetrImageProcessor ## DeformableDetrImageProcessor
[[autodoc]] DeformableDetrImageProcessor [[autodoc]] DeformableDetrImageProcessor
@@ -47,18 +57,15 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr). The origi
- pad_and_create_pixel_mask - pad_and_create_pixel_mask
- post_process_object_detection - post_process_object_detection
## DeformableDetrConfig ## DeformableDetrConfig
[[autodoc]] DeformableDetrConfig [[autodoc]] DeformableDetrConfig
## DeformableDetrModel ## DeformableDetrModel
[[autodoc]] DeformableDetrModel [[autodoc]] DeformableDetrModel
- forward - forward
## DeformableDetrForObjectDetection ## DeformableDetrForObjectDetection
[[autodoc]] DeformableDetrForObjectDetection [[autodoc]] DeformableDetrForObjectDetection

View File

@@ -71,6 +71,19 @@ Tips:
This model was contributed by [nielsr](https://huggingface.co/nielsr). The TensorFlow version of this model was added by [amyeroberts](https://huggingface.co/amyeroberts). This model was contributed by [nielsr](https://huggingface.co/nielsr). The TensorFlow version of this model was added by [amyeroberts](https://huggingface.co/amyeroberts).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DeiT.
<PipelineTag pipeline="image-classification"/>
- [`DeiTForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
Besides that:
- [`DeiTForMaskedImageModeling`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## DeiTConfig ## DeiTConfig

View File

@@ -37,9 +37,6 @@ baselines.*
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/facebookresearch/detr). This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/facebookresearch/detr).
The quickest way to get started with DETR is by checking the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR) (which showcase both inference and
fine-tuning on custom data).
Here's a TLDR explaining how [`~transformers.DetrForObjectDetection`] works: Here's a TLDR explaining how [`~transformers.DetrForObjectDetection`] works:
First, an image is sent through a pre-trained convolutional backbone (in the paper, the authors use First, an image is sent through a pre-trained convolutional backbone (in the paper, the authors use
@@ -153,6 +150,15 @@ outputs of the model using one of the postprocessing methods of [`~transformers.
be be provided to either `CocoEvaluator` or `PanopticEvaluator`, which allow you to calculate metrics like be be provided to either `CocoEvaluator` or `PanopticEvaluator`, which allow you to calculate metrics like
mean Average Precision (mAP) and Panoptic Quality (PQ). The latter objects are implemented in the [original repository](https://github.com/facebookresearch/detr). See the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR) for more info regarding evaluation. mean Average Precision (mAP) and Panoptic Quality (PQ). The latter objects are implemented in the [original repository](https://github.com/facebookresearch/detr). See the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR) for more info regarding evaluation.
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DETR.
<PipelineTag pipeline="object-detection"/>
- All example notebooks illustrating fine-tuning [`DetrForObjectDetection`] and [`DetrForSegmentation`] on a custom dataset an be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## DETR specific outputs ## DETR specific outputs

View File

@@ -61,12 +61,20 @@ Taken from the <a href="https://arxiv.org/abs/2209.15001">original paper</a>.</s
This model was contributed by [Ali Hassani](https://huggingface.co/alihassanijr). This model was contributed by [Ali Hassani](https://huggingface.co/alihassanijr).
The original code can be found [here](https://github.com/SHI-Labs/Neighborhood-Attention-Transformer). The original code can be found [here](https://github.com/SHI-Labs/Neighborhood-Attention-Transformer).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DiNAT.
<PipelineTag pipeline="image-classification"/>
- [`DinatForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## DinatConfig ## DinatConfig
[[autodoc]] DinatConfig [[autodoc]] DinatConfig
## DinatModel ## DinatModel
[[autodoc]] DinatModel [[autodoc]] DinatModel

View File

@@ -65,3 +65,13 @@ A notebook that illustrates inference for document image classification can be f
As DiT's architecture is equivalent to that of BEiT, one can refer to [BEiT's documentation page](beit) for all tips, code examples and notebooks. As DiT's architecture is equivalent to that of BEiT, one can refer to [BEiT's documentation page](beit) for all tips, code examples and notebooks.
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/dit). This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/dit).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DiT.
<PipelineTag pipeline="image-classification"/>
- [`BeitForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.

View File

@@ -28,37 +28,40 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/isl-org/DPT). This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/isl-org/DPT).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DPT.
- Demo notebooks for [`DPTForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DPT).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## DPTConfig ## DPTConfig
[[autodoc]] DPTConfig [[autodoc]] DPTConfig
## DPTFeatureExtractor ## DPTFeatureExtractor
[[autodoc]] DPTFeatureExtractor [[autodoc]] DPTFeatureExtractor
- __call__ - __call__
- post_process_semantic_segmentation - post_process_semantic_segmentation
## DPTImageProcessor ## DPTImageProcessor
[[autodoc]] DPTImageProcessor [[autodoc]] DPTImageProcessor
- preprocess - preprocess
- post_process_semantic_segmentation - post_process_semantic_segmentation
## DPTModel ## DPTModel
[[autodoc]] DPTModel [[autodoc]] DPTModel
- forward - forward
## DPTForDepthEstimation ## DPTForDepthEstimation
[[autodoc]] DPTForDepthEstimation [[autodoc]] DPTForDepthEstimation
- forward - forward
## DPTForSemanticSegmentation ## DPTForSemanticSegmentation
[[autodoc]] DPTForSemanticSegmentation [[autodoc]] DPTForSemanticSegmentation

View File

@@ -31,7 +31,6 @@ The abstract from the paper is the following:
Tips: Tips:
- A notebook illustrating inference with [`GLPNForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/GLPN/GLPN_inference_(depth_estimation).ipynb).
- One can use [`GLPNImageProcessor`] to prepare images for the model. - One can use [`GLPNImageProcessor`] to prepare images for the model.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/glpn_architecture.jpg" <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/glpn_architecture.jpg"
@@ -41,6 +40,12 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/vinvino02/GLPDepth). This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/vinvino02/GLPDepth).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with GLPN.
- Demo notebooks for [`GLPNForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/GLPN).
## GLPNConfig ## GLPNConfig
[[autodoc]] GLPNConfig [[autodoc]] GLPNConfig

View File

@@ -24,11 +24,16 @@ The abstract from the paper is the following:
Tips: Tips:
- You may specify `output_segmentation=True` in the forward of `GroupViTModel` to get the segmentation logits of input texts. - You may specify `output_segmentation=True` in the forward of `GroupViTModel` to get the segmentation logits of input texts.
- The quickest way to get started with GroupViT is by checking the [example notebooks](https://github.com/xvjiarui/GroupViT/blob/main/demo/GroupViT_hf_inference_notebook.ipynb) (which showcase zero-shot segmentation inference). One can also check out the [HuggingFace Spaces demo](https://huggingface.co/spaces/xvjiarui/GroupViT) to play with GroupViT.
This model was contributed by [xvjiarui](https://huggingface.co/xvjiarui). The TensorFlow version was contributed by [ariG23498](https://huggingface.co/ariG23498) with the help of [Yih-Dar SHIEH](https://huggingface.co/ydshieh), [Amy Roberts](https://huggingface.co/amyeroberts), and [Joao Gante](https://huggingface.co/joaogante). This model was contributed by [xvjiarui](https://huggingface.co/xvjiarui). The TensorFlow version was contributed by [ariG23498](https://huggingface.co/ariG23498) with the help of [Yih-Dar SHIEH](https://huggingface.co/ydshieh), [Amy Roberts](https://huggingface.co/amyeroberts), and [Joao Gante](https://huggingface.co/joaogante).
The original code can be found [here](https://github.com/NVlabs/GroupViT). The original code can be found [here](https://github.com/NVlabs/GroupViT).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with GroupViT.
- The quickest way to get started with GroupViT is by checking the [example notebooks](https://github.com/xvjiarui/GroupViT/blob/main/demo/GroupViT_hf_inference_notebook.ipynb) (which showcase zero-shot segmentation inference).
- One can also check out the [HuggingFace Spaces demo](https://huggingface.co/spaces/xvjiarui/GroupViT) to play with GroupViT.
## GroupViTConfig ## GroupViTConfig

View File

@@ -38,8 +38,6 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr), based on
Tips: Tips:
- Demo notebooks for ImageGPT can be found
[here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/ImageGPT).
- ImageGPT is almost exactly the same as [GPT-2](gpt2), with the exception that a different activation - ImageGPT is almost exactly the same as [GPT-2](gpt2), with the exception that a different activation
function is used (namely "quick gelu"), and the layer normalization layers don't mean center the inputs. ImageGPT function is used (namely "quick gelu"), and the layer normalization layers don't mean center the inputs. ImageGPT
also doesn't have tied input- and output embeddings. also doesn't have tied input- and output embeddings.
@@ -71,6 +69,17 @@ Tips:
| MiT-b4 | [3, 8, 27, 3] | [64, 128, 320, 512] | 768 | 62.6 | 83.6 | | MiT-b4 | [3, 8, 27, 3] | [64, 128, 320, 512] | 768 | 62.6 | 83.6 |
| MiT-b5 | [3, 6, 40, 3] | [64, 128, 320, 512] | 768 | 82.0 | 83.8 | | MiT-b5 | [3, 6, 40, 3] | [64, 128, 320, 512] | 768 | 82.0 | 83.8 |
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ImageGPT.
<PipelineTag pipeline="image-classification"/>
- Demo notebooks for ImageGPT can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/ImageGPT).
- [`ImageGPTForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## ImageGPTConfig ## ImageGPTConfig
[[autodoc]] ImageGPTConfig [[autodoc]] ImageGPTConfig

View File

@@ -61,6 +61,15 @@ Tips:
This model was contributed by [anugunj](https://huggingface.co/anugunj). The original code can be found [here](https://github.com/facebookresearch/LeViT). This model was contributed by [anugunj](https://huggingface.co/anugunj). The original code can be found [here](https://github.com/facebookresearch/LeViT).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with LeViT.
<PipelineTag pipeline="image-classification"/>
- [`LevitForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## LevitConfig ## LevitConfig

View File

@@ -37,7 +37,6 @@ model.push_to_hub("name_of_repo_on_the_hub")
- When preparing data for the model, make sure to use the token vocabulary that corresponds to the RoBERTa checkpoint you combined with the Layout Transformer. - When preparing data for the model, make sure to use the token vocabulary that corresponds to the RoBERTa checkpoint you combined with the Layout Transformer.
- As [lilt-roberta-en-base](https://huggingface.co/SCUT-DLVCLab/lilt-roberta-en-base) uses the same vocabulary as [LayoutLMv3](layoutlmv3), one can use [`LayoutLMv3TokenizerFast`] to prepare data for the model. - As [lilt-roberta-en-base](https://huggingface.co/SCUT-DLVCLab/lilt-roberta-en-base) uses the same vocabulary as [LayoutLMv3](layoutlmv3), one can use [`LayoutLMv3TokenizerFast`] to prepare data for the model.
The same is true for [lilt-roberta-en-base](https://huggingface.co/SCUT-DLVCLab/lilt-infoxlm-base): one can use [`LayoutXLMTokenizerFast`] for that model. The same is true for [lilt-roberta-en-base](https://huggingface.co/SCUT-DLVCLab/lilt-infoxlm-base): one can use [`LayoutXLMTokenizerFast`] for that model.
- Demo notebooks for LiLT can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LiLT).
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/lilt_architecture.jpg" <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/lilt_architecture.jpg"
alt="drawing" width="600"/> alt="drawing" width="600"/>
@@ -47,6 +46,13 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr). This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/jpwang/lilt). The original code can be found [here](https://github.com/jpwang/lilt).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with LiLT.
- Demo notebooks for LiLT can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LiLT).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## LiltConfig ## LiltConfig

View File

@@ -44,6 +44,16 @@ Unsupported features:
This model was contributed by [matthijs](https://huggingface.co/Matthijs). The original code and weights can be found [here](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md). This model was contributed by [matthijs](https://huggingface.co/Matthijs). The original code and weights can be found [here](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with MobileNetV1.
<PipelineTag pipeline="image-classification"/>
- [`MobileNetV1ForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## MobileNetV1Config ## MobileNetV1Config
[[autodoc]] MobileNetV1Config [[autodoc]] MobileNetV1Config

View File

@@ -48,6 +48,16 @@ Unsupported features:
This model was contributed by [matthijs](https://huggingface.co/Matthijs). The original code and weights can be found [here for the main model](https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet) and [here for DeepLabV3+](https://github.com/tensorflow/models/tree/master/research/deeplab). This model was contributed by [matthijs](https://huggingface.co/Matthijs). The original code and weights can be found [here for the main model](https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet) and [here for DeepLabV3+](https://github.com/tensorflow/models/tree/master/research/deeplab).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with MobileNetV2.
<PipelineTag pipeline="image-classification"/>
- [`MobileNetV2ForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## MobileNetV2Config ## MobileNetV2Config
[[autodoc]] MobileNetV2Config [[autodoc]] MobileNetV2Config

View File

@@ -57,6 +57,15 @@ with open(tflite_filename, "wb") as f:
This model was contributed by [matthijs](https://huggingface.co/Matthijs). The TensorFlow version of the model was contributed by [sayakpaul](https://huggingface.co/sayakpaul). The original code and weights can be found [here](https://github.com/apple/ml-cvnets). This model was contributed by [matthijs](https://huggingface.co/Matthijs). The TensorFlow version of the model was contributed by [sayakpaul](https://huggingface.co/sayakpaul). The original code and weights can be found [here](https://github.com/apple/ml-cvnets).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with MobileViT.
<PipelineTag pipeline="image-classification"/>
- [`MobileViTForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## MobileViTConfig ## MobileViTConfig

View File

@@ -56,6 +56,15 @@ Taken from the <a href="https://arxiv.org/abs/2204.07143">original paper</a>.</s
This model was contributed by [Ali Hassani](https://huggingface.co/alihassanijr). This model was contributed by [Ali Hassani](https://huggingface.co/alihassanijr).
The original code can be found [here](https://github.com/SHI-Labs/Neighborhood-Attention-Transformer). The original code can be found [here](https://github.com/SHI-Labs/Neighborhood-Attention-Transformer).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with NAT.
<PipelineTag pipeline="image-classification"/>
- [`NatForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## NatConfig ## NatConfig

View File

@@ -41,6 +41,16 @@ Tips:
This model was contributed by [heytanay](https://huggingface.co/heytanay). The original code can be found [here](https://github.com/sail-sg/poolformer). This model was contributed by [heytanay](https://huggingface.co/heytanay). The original code can be found [here](https://github.com/sail-sg/poolformer).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with PoolFormer.
<PipelineTag pipeline="image-classification"/>
- [`PoolFormerForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## PoolFormerConfig ## PoolFormerConfig
[[autodoc]] PoolFormerConfig [[autodoc]] PoolFormerConfig

View File

@@ -31,6 +31,15 @@ This model was contributed by [Francesco](https://huggingface.co/Francesco). The
was contributed by [sayakpaul](https://huggingface.com/sayakpaul) and [ariG23498](https://huggingface.com/ariG23498). was contributed by [sayakpaul](https://huggingface.com/sayakpaul) and [ariG23498](https://huggingface.com/ariG23498).
The original code can be found [here](https://github.com/facebookresearch/pycls). The original code can be found [here](https://github.com/facebookresearch/pycls).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with RegNet.
<PipelineTag pipeline="image-classification"/>
- [`RegNetForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## RegNetConfig ## RegNetConfig

View File

@@ -33,6 +33,16 @@ The figure below illustrates the architecture of ResNet. Taken from the [origina
This model was contributed by [Francesco](https://huggingface.co/Francesco). The TensorFlow version of this model was added by [amyeroberts](https://huggingface.co/amyeroberts). The original code can be found [here](https://github.com/KaimingHe/deep-residual-networks). This model was contributed by [Francesco](https://huggingface.co/Francesco). The TensorFlow version of this model was added by [amyeroberts](https://huggingface.co/amyeroberts). The original code can be found [here](https://github.com/KaimingHe/deep-residual-networks).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ResNet.
<PipelineTag pipeline="image-classification"/>
- [`ResNetForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## ResNetConfig ## ResNetConfig
[[autodoc]] ResNetConfig [[autodoc]] ResNetConfig

View File

@@ -84,6 +84,22 @@ Tips:
Note that MiT in the above table refers to the Mix Transformer encoder backbone introduced in SegFormer. For Note that MiT in the above table refers to the Mix Transformer encoder backbone introduced in SegFormer. For
SegFormer's results on the segmentation datasets like ADE20k, refer to the [paper](https://arxiv.org/abs/2105.15203). SegFormer's results on the segmentation datasets like ADE20k, refer to the [paper](https://arxiv.org/abs/2105.15203).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with SegFormer.
<PipelineTag pipeline="image-classification"/>
- [`SegformerForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
Semantic segmentation:
- [`SegformerForSemanticSegmentation`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/semantic-segmentation).
- A blog on fine-tuning SegFormer on a custom dataset can be found [here](https://huggingface.co/blog/fine-tune-segformer).
- More demo notebooks on SegFormer (both inference + fine-tuning on a custom dataset) can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/SegFormer).
- [`TFSegformerForSemanticSegmentation`] is supported by this [example notebook](https://github.com/huggingface/notebooks/blob/main/examples/semantic_segmentation-tf.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## SegformerConfig ## SegformerConfig

View File

@@ -45,6 +45,20 @@ alt="drawing" width="600"/>
This model was contributed by [novice03](https://huggingface.co/novice03). The Tensorflow version of this model was contributed by [amyeroberts](https://huggingface.co/amyeroberts). The original code can be found [here](https://github.com/microsoft/Swin-Transformer). This model was contributed by [novice03](https://huggingface.co/novice03). The Tensorflow version of this model was contributed by [amyeroberts](https://huggingface.co/amyeroberts). The original code can be found [here](https://github.com/microsoft/Swin-Transformer).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Swin Transformer.
<PipelineTag pipeline="image-classification"/>
- [`SwinForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
Besides that:
- [`SwinForMaskedImageModeling`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## SwinConfig ## SwinConfig
[[autodoc]] SwinConfig [[autodoc]] SwinConfig

View File

@@ -26,6 +26,19 @@ Tips:
This model was contributed by [nandwalritik](https://huggingface.co/nandwalritik). This model was contributed by [nandwalritik](https://huggingface.co/nandwalritik).
The original code can be found [here](https://github.com/microsoft/Swin-Transformer). The original code can be found [here](https://github.com/microsoft/Swin-Transformer).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Swin Transformer v2.
<PipelineTag pipeline="image-classification"/>
- [`Swinv2ForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
Besides that:
- [`Swinv2ForMaskedImageModeling`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## Swinv2Config ## Swinv2Config

View File

@@ -32,6 +32,15 @@ The figure below illustrates the architecture of a Visual Aattention Layer. Take
This model was contributed by [Francesco](https://huggingface.co/Francesco). The original code can be found [here](https://github.com/Visual-Attention-Network/VAN-Classification). This model was contributed by [Francesco](https://huggingface.co/Francesco). The original code can be found [here](https://github.com/Visual-Attention-Network/VAN-Classification).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with VAN.
<PipelineTag pipeline="image-classification"/>
- [`VanForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## VanConfig ## VanConfig

View File

@@ -86,6 +86,21 @@ found [here](https://github.com/google-research/vision_transformer).
Note that we converted the weights from Ross Wightman's [timm library](https://github.com/rwightman/pytorch-image-models), who already converted the weights from JAX to PyTorch. Credits Note that we converted the weights from Ross Wightman's [timm library](https://github.com/rwightman/pytorch-image-models), who already converted the weights from JAX to PyTorch. Credits
go to him! go to him!
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ViT.
<PipelineTag pipeline="image-classification"/>
- [`ViTForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
- A blog on fine-tuning [`ViTForImageClassification`] on a custom dataset can be found [here](https://huggingface.co/blog/fine-tune-vit).
- More demo notebooks to fine-tune [`ViTForImageClassification`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/VisionTransformer).
Besides that:
- [`ViTForMaskedImageModeling`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## Resources ## Resources

View File

@@ -32,9 +32,6 @@ Tips:
- MAE (masked auto encoding) is a method for self-supervised pre-training of Vision Transformers (ViTs). The pre-training objective is relatively simple: - MAE (masked auto encoding) is a method for self-supervised pre-training of Vision Transformers (ViTs). The pre-training objective is relatively simple:
by masking a large portion (75%) of the image patches, the model must reconstruct raw pixel values. One can use [`ViTMAEForPreTraining`] for this purpose. by masking a large portion (75%) of the image patches, the model must reconstruct raw pixel values. One can use [`ViTMAEForPreTraining`] for this purpose.
- An example Python script that illustrates how to pre-train [`ViTMAEForPreTraining`] from scratch can be found [here](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining).
One can easily tweak it for their own use case.
- A notebook that illustrates how to visualize reconstructed pixel values with [`ViTMAEForPreTraining`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViTMAE/ViT_MAE_visualization_demo.ipynb).
- After pre-training, one "throws away" the decoder used to reconstruct pixels, and one uses the encoder for fine-tuning/linear probing. This means that after - After pre-training, one "throws away" the decoder used to reconstruct pixels, and one uses the encoder for fine-tuning/linear probing. This means that after
fine-tuning, one can directly plug in the weights into a [`ViTForImageClassification`]. fine-tuning, one can directly plug in the weights into a [`ViTForImageClassification`].
- One can use [`ViTImageProcessor`] to prepare images for the model. See the code examples for more info. - One can use [`ViTImageProcessor`] to prepare images for the model. See the code examples for more info.
@@ -51,6 +48,14 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr). TensorFlow version of the model was contributed by [sayakpaul](https://github.com/sayakpaul) and This model was contributed by [nielsr](https://huggingface.co/nielsr). TensorFlow version of the model was contributed by [sayakpaul](https://github.com/sayakpaul) and
[ariG23498](https://github.com/ariG23498) (equal contribution). The original code can be found [here](https://github.com/facebookresearch/mae). [ariG23498](https://github.com/ariG23498) (equal contribution). The original code can be found [here](https://github.com/facebookresearch/mae).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ViTMAE.
- [`ViTMAEForPreTraining`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining), allowing you to pre-train the model from scratch/further pre-train the model on custom data.
- A notebook that illustrates how to visualize reconstructed pixel values with [`ViTMAEForPreTraining`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViTMAE/ViT_MAE_visualization_demo.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## ViTMAEConfig ## ViTMAEConfig

View File

@@ -46,6 +46,15 @@ labels when fine-tuned.
This model was contributed by [sayakpaul](https://huggingface.co/sayakpaul). The original code can be found [here](https://github.com/facebookresearch/msn). This model was contributed by [sayakpaul](https://huggingface.co/sayakpaul). The original code can be found [here](https://github.com/facebookresearch/msn).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ViT MSN.
<PipelineTag pipeline="image-classification"/>
- [`ViTMSNForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## ViTMSNConfig ## ViTMSNConfig

View File

@@ -24,7 +24,6 @@ The abstract from the paper is the following:
Tips: Tips:
- Usage of X-CLIP is identical to [CLIP](clip). - Usage of X-CLIP is identical to [CLIP](clip).
- Demo notebooks for X-CLIP can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/X-CLIP).
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/xclip_architecture.png" <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/xclip_architecture.png"
alt="drawing" width="600"/> alt="drawing" width="600"/>
@@ -34,6 +33,13 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr). This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/microsoft/VideoX/tree/master/X-CLIP). The original code can be found [here](https://github.com/microsoft/VideoX/tree/master/X-CLIP).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with X-CLIP.
- Demo notebooks for X-CLIP can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/X-CLIP).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## XCLIPProcessor ## XCLIPProcessor

View File

@@ -24,7 +24,6 @@ The abstract from the paper is the following:
Tips: Tips:
- One can use [`YolosImageProcessor`] for preparing images (and optional targets) for the model. Contrary to [DETR](detr), YOLOS doesn't require a `pixel_mask` to be created. - One can use [`YolosImageProcessor`] for preparing images (and optional targets) for the model. Contrary to [DETR](detr), YOLOS doesn't require a `pixel_mask` to be created.
- Demo notebooks (regarding inference and fine-tuning on custom data) can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/YOLOS).
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/yolos_architecture.png" <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/yolos_architecture.png"
alt="drawing" width="600"/> alt="drawing" width="600"/>
@@ -33,6 +32,16 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/hustvl/YOLOS). This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/hustvl/YOLOS).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with YOLOS.
<PipelineTag pipeline="object-detection"/>
- All example notebooks illustrating inference + fine-tuning [`YolosForObjectDetection`] on a custom dataset can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/YOLOS).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## YolosConfig ## YolosConfig
[[autodoc]] YolosConfig [[autodoc]] YolosConfig