Use HF papers (#38184)

* Use hf papers

* Hugging Face papers

* doi to hf papers

* style
This commit is contained in:
Quentin Gallouédec
2025-06-13 13:07:09 +02:00
committed by GitHub
parent 1031ed5166
commit de24fb63ed
811 changed files with 2622 additions and 2617 deletions

View File

@@ -31,7 +31,7 @@ You can do so by running the following command: `pip install -U transformers==4.
## Overview
The VAN model was proposed in [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
The VAN model was proposed in [Visual Attention Network](https://huggingface.co/papers/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
This paper introduces a new attention layer based on convolution operations able to capture both local and distant relationships. This is done by combining normal and large kernel convolution layers. The latter uses a dilated convolution to capture distant correlations.
@@ -43,7 +43,7 @@ Tips:
- VAN does not have an embedding layer, thus the `hidden_states` will have a length equal to the number of stages.
The figure below illustrates the architecture of a Visual Attention Layer. Taken from the [original paper](https://arxiv.org/abs/2202.09741).
The figure below illustrates the architecture of a Visual Attention Layer. Taken from the [original paper](https://huggingface.co/papers/2202.09741).
<img width="600" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/van_architecture.png"/>