Add dates to the model docs (#39320)

* added dates to the models with a single hf papers link

* added the dates for models with multiple papers

* half of no_papers models done

* rest of no_papers models also done, only the exceptions left

* added copyright disclaimer to sam_hw, cohere, cohere2 + dates

* some more fixes, hf links + typo

* some new models + a rough script

* the script looks robust, changed all paper links to hf

* minor change to handle technical reports along with blogs

* ran make fixup to remove the white space

* refactor
This commit is contained in:
MAHIR DAIYAN
2025-08-14 23:08:46 +06:00
committed by GitHub
parent 8a658ac119
commit b02f2d8b6a
390 changed files with 835 additions and 87 deletions

View File

@@ -52,6 +52,7 @@ repo-consistency:
python utils/check_doctest_list.py
python utils/update_metadata.py --check-only
python utils/check_docstrings.py
python utils/add_dates.py
# this target runs checks on all files

View File

@@ -13,12 +13,13 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-11-21 and added to Hugging Face Transformers on 2025-07-08.*
# AIMv2
## Overview
The AIMv2 model was proposed in [Multimodal Autoregressive Pre-training of Large Vision Encoders](https://arxiv.org/abs/2411.14402) by Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Victor Guilherme Turrisi da Costa, Louis Béthune, Zhe Gan, Alexander T Toshev, Marcin Eichner, Moin Nabi, Yinfei Yang, Joshua M. Susskind, Alaaeldin El-Nouby.
The AIMv2 model was proposed in [Multimodal Autoregressive Pre-training of Large Vision Encoders](https://huggingface.co/papers/2411.14402) by Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Victor Guilherme Turrisi da Costa, Louis Béthune, Zhe Gan, Alexander T Toshev, Marcin Eichner, Moin Nabi, Yinfei Yang, Joshua M. Susskind, Alaaeldin El-Nouby.
The abstract from the paper is the following:

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2019-09-26 and added to Hugging Face Transformers on 2020-11-16.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-02-11 and added to Hugging Face Transformers on 2023-03-01.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-11-12 and added to Hugging Face Transformers on 2023-01-04.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2025-06-18 and added to Hugging Face Transformers on 2025-06-24.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
@@ -24,7 +25,7 @@ rendered properly in your Markdown viewer.
# Arcee
Arcee is a decoder-only transformer model based on the Llama architecture with a key modification: it uses ReLU² (ReLU-squared) activation in the MLP blocks instead of SiLU, following recent research showing improved training efficiency with squared activations. This architecture is designed for efficient training and inference while maintaining the proven stability of the Llama design.
[Arcee](https://www.arcee.ai/blog/deep-dive-afm-4-5b-the-first-arcee-foundational-model) is a decoder-only transformer model based on the Llama architecture with a key modification: it uses ReLU² (ReLU-squared) activation in the MLP blocks instead of SiLU, following recent research showing improved training efficiency with squared activations. This architecture is designed for efficient training and inference while maintaining the proven stability of the Llama design.
The Arcee model is architecturally similar to Llama but uses `x * relu(x)` in MLP layers for improved gradient flow and is optimized for efficiency in both training and inference scenarios.

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-10-08 and added to Hugging Face Transformers on 2024-12-06.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-04-05 and added to Hugging Face Transformers on 2022-11-21.*
# Audio Spectrogram Transformer

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-06-24 and added to Hugging Face Transformers on 2023-05-30.*
# Autoformer

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2025-05-13 and added to Hugging Face Transformers on 2025-03-04.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-12-18 and added to Hugging Face Transformers on 2024-12-19.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -9,6 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
*This model was released on 2023-04-09 and added to Hugging Face Transformers on 2023-07-17.*
# Bark
@@ -19,7 +20,7 @@ specific language governing permissions and limitations under the License.
## Overview
Bark is a transformer-based text-to-speech model proposed by Suno AI in [suno-ai/bark](https://github.com/suno-ai/bark).
[Bark](https://huggingface.co/suno/bark) is a transformer-based text-to-speech model proposed by Suno AI in [suno-ai/bark](https://github.com/suno-ai/bark).
Bark is made of 4 main models:

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2019-10-29 and added to Hugging Face Transformers on 2020-11-16.*
<div style="float: right;">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-10-23 and added to Hugging Face Transformers on 2020-11-27.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-09-20 and added to Hugging Face Transformers on 2021-10-18.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-06-15 and added to Hugging Face Transformers on 2021-08-04.*
# BEiT

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2019-07-29 and added to Hugging Face Transformers on 2020-11-16.*
# BertGeneration

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2019-03-24 and added to Hugging Face Transformers on 2020-11-16.*
# BertJapanese

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2018-10-11 and added to Hugging Face Transformers on 2020-11-16.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-05-20 and added to Hugging Face Transformers on 2020-11-16.*
# BERTweet

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-07-28 and added to Hugging Face Transformers on 2021-03-30.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-07-28 and added to Hugging Face Transformers on 2021-05-07.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-10-19 and added to Hugging Face Transformers on 2022-12-05.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2019-12-24 and added to Hugging Face Transformers on 2022-12-07.*
# Big Transfer (BiT)

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2025-04-16 and added to Hugging Face Transformers on 2025-04-28.*
# BitNet

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-04-28 and added to Hugging Face Transformers on 2021-01-05.*
# Blenderbot Small

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-04-28 and added to Hugging Face Transformers on 2020-11-16.*
# Blenderbot

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2023-01-30 and added to Hugging Face Transformers on 2023-02-09.*
# BLIP-2

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-01-28 and added to Hugging Face Transformers on 2022-12-21.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-11-09 and added to Hugging Face Transformers on 2022-06-09.*
# BLOOM
@@ -24,7 +25,7 @@ rendered properly in your Markdown viewer.
## Overview
The BLOOM model has been proposed with its various versions through the [BigScience Workshop](https://bigscience.huggingface.co/). BigScience is inspired by other open science initiatives where researchers have pooled their time and resources to collectively achieve a higher impact.
The [BLOOM](https://huggingface.co/papers/2211.05100) model has been proposed with its various versions through the [BigScience Workshop](https://bigscience.huggingface.co/). BigScience is inspired by other open science initiatives where researchers have pooled their time and resources to collectively achieve a higher impact.
The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on 46 different languages and 13 programming languages.
Several smaller versions of the models have been trained on the same dataset. BLOOM is available in the following versions:

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-10-20 and added to Hugging Face Transformers on 2023-06-20.*
# BORT

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-06-17 and added to Hugging Face Transformers on 2023-01-25.*
# BridgeTower

View File

@@ -9,6 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
*This model was released on 2021-08-10 and added to Hugging Face Transformers on 2023-09-15.*
# BROS

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-05-28 and added to Hugging Face Transformers on 2021-06-01.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2019-11-10 and added to Hugging Face Transformers on 2020-11-16.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-03-11 and added to Hugging Face Transformers on 2021-06-30.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-05-16 and added to Hugging Face Transformers on 2024-07-17.*
# Chameleon

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-11-02 and added to Hugging Face Transformers on 2022-12-01.*
# Chinese-CLIP

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-11-12 and added to Hugging Face Transformers on 2023-02-16.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-02-26 and added to Hugging Face Transformers on 2021-05-12.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-12-18 and added to Hugging Face Transformers on 2022-11-08.*
# CLIPSeg

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2023-05-12 and added to Hugging Face Transformers on 2023-11-10.*
# CLVP

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2023-08-24 and added to Hugging Face Transformers on 2023-08-25.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-03-25 and added to Hugging Face Transformers on 2022-06-24.*
# CodeGen

View File

@@ -1,3 +1,18 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-03-12 and added to Hugging Face Transformers on 2024-03-15.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
@@ -10,7 +25,7 @@
# Cohere
Cohere Command-R is a 35B parameter multilingual large language model designed for long context tasks like retrieval-augmented generation (RAG) and calling external APIs and tools. The model is specifically trained for grounded generation and supports both single-step and multi-step tool use. It supports a context length of 128K tokens.
Cohere [Command-R](https://cohere.com/blog/command-r) is a 35B parameter multilingual large language model designed for long context tasks like retrieval-augmented generation (RAG) and calling external APIs and tools. The model is specifically trained for grounded generation and supports both single-step and multi-step tool use. It supports a context length of 128K tokens.
You can find all the original Command-R checkpoints under the [Command Models](https://huggingface.co/collections/CohereForAI/command-models-67652b401665205e17b192ad) collection.

View File

@@ -1,3 +1,18 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-12-13 and added to Hugging Face Transformers on 2024-12-13.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
@@ -8,7 +23,7 @@
</div>
# Cohere2
# Cohere 2
[Cohere Command R7B](https://cohere.com/blog/command-r7b) is an open weights research release of a 7B billion parameter model. It is a multilingual model trained on 23 languages and has a context window of 128k. The model features three layers with sliding window attention and ROPE for efficient local context modeling and relative positional encoding. A fourth layer uses global attention without positional embeddings, enabling unrestricted token interactions across the entire sequence.

View File

@@ -1,3 +1,20 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
*This model was released on 2025-07-31 and added to Hugging Face Transformers on 2025-07-31.*
# Command A Vision
<div class="flex flex-wrap space-x-1">
@@ -9,7 +26,7 @@
## Overview
Command A Vision is a state-of-the-art multimodal model designed to seamlessly integrate visual and textual information for a wide range of applications. By combining advanced computer vision techniques with natural language processing capabilities, Command A Vision enables users to analyze, understand, and generate insights from both visual and textual data.
Command A Vision ([blog post](https://cohere.com/blog/command-a-vision)) is a state-of-the-art multimodal model designed to seamlessly integrate visual and textual information for a wide range of applications. By combining advanced computer vision techniques with natural language processing capabilities, Command A Vision enables users to analyze, understand, and generate insights from both visual and textual data.
The model excels at tasks including image captioning, visual question answering, document understanding, and chart understanding. This makes it a versatile tool for AI practitioners. Its ability to process complex visual and textual inputs makes it useful in settings where text-only representations are imprecise or unavailable, like real-world image understanding and graphics-heavy document processing.

View File

@@ -11,6 +11,7 @@ specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-06-27 and added to Hugging Face Transformers on 2024-12-17.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-06-27 and added to Hugging Face Transformers on 2025-06-02.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-08-13 and added to Hugging Face Transformers on 2022-09-22.*
# Conditional DETR

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-08-06 and added to Hugging Face Transformers on 2021-01-27.*
# ConvBERT

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-01-10 and added to Hugging Face Transformers on 2022-02-07.*
# ConvNeXT

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2023-01-02 and added to Hugging Face Transformers on 2023-03-14.*
# ConvNeXt V2

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-12-01 and added to Hugging Face Transformers on 2021-04-10.*
# CPM

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-09-16 and added to Hugging Face Transformers on 2023-04-12.*
# CPMAnt

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2025-02-27 and added to Hugging Face Transformers on 2025-05-07.*
# Csm

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2019-09-11 and added to Hugging Face Transformers on 2020-11-16.*
# CTRL

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-03-29 and added to Hugging Face Transformers on 2022-05-18.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
@@ -23,7 +24,7 @@ rendered properly in your Markdown viewer.
# Convolutional Vision Transformer (CvT)
Convolutional Vision Transformer (CvT) is a model that combines the strengths of convolutional neural networks (CNNs) and Vision transformers for the computer vision tasks. It introduces convolutional layers into the vision transformer architecture, allowing it to capture local patterns in images while maintaining the global context provided by self-attention mechanisms.
[Convolutional Vision Transformer (CvT)](https://huggingface.co/papers/2103.15808) is a model that combines the strengths of convolutional neural networks (CNNs) and Vision transformers for the computer vision tasks. It introduces convolutional layers into the vision transformer architecture, allowing it to capture local patterns in images while maintaining the global context provided by self-attention mechanisms.
You can find all the CvT checkpoints under the [Microsoft](https://huggingface.co/microsoft?search_models=cvt) organization.

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-10-17 and added to Hugging Face Transformers on 2025-04-29.*
# D-FINE

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-01-28 and added to Hugging Face Transformers on 2025-02-04.*
# DAB-DETR

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2023-06-11 and added to Hugging Face Transformers on 2024-08-19.*
# DAC

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-02-07 and added to Hugging Face Transformers on 2022-03-01.*
# Data2Vec

View File

@@ -9,6 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
*This model was released on 2024-03-27 and added to Hugging Face Transformers on 2024-04-18.*
# DBRX

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-06-05 and added to Hugging Face Transformers on 2021-02-19.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-06-05 and added to Hugging Face Transformers on 2020-11-16.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-06-02 and added to Hugging Face Transformers on 2022-03-23.*
# Decision Transformer

View File

@@ -13,12 +13,13 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-05-07 and added to Hugging Face Transformers on 2025-07-09.*
# DeepSeek-V2
## Overview
The DeepSeek-V2 model was proposed in [DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model](https://arxiv.org/abs/2405.04434) by DeepSeek-AI Team.
The DeepSeek-V2 model was proposed in [DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model](https://huggingface.co/papers/2405.04434) by DeepSeek-AI Team.
The abstract from the paper is the following:
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-12-27 and added to Hugging Face Transformers on 2025-03-28.*
# DeepSeek-V3

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-03-08 and added to Hugging Face Transformers on 2025-07-25.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
@@ -24,7 +25,7 @@ rendered properly in your Markdown viewer.
# DeepseekVL
[Deepseek-VL](https://arxiv.org/abs/2403.05525) was introduced by the DeepSeek AI team. It is a vision-language model (VLM) designed to process both text and images for generating contextually relevant responses. The model leverages [LLaMA](./llama) as its text encoder, while [SigLip](./siglip) is used for encoding images.
[Deepseek-VL](https://huggingface.co/papers/2403.05525) was introduced by the DeepSeek AI team. It is a vision-language model (VLM) designed to process both text and images for generating contextually relevant responses. The model leverages [LLaMA](./llama) as its text encoder, while [SigLip](./siglip) is used for encoding images.
You can find all the original Deepseek-VL checkpoints under the [DeepSeek-community](https://huggingface.co/deepseek-community) organization.

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-03-08 and added to Hugging Face Transformers on 2025-07-25.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
@@ -23,7 +24,7 @@ rendered properly in your Markdown viewer.
# DeepseekVLHybrid
[Deepseek-VL-Hybrid](https://arxiv.org/abs/2403.05525) was introduced by the DeepSeek AI team. It is a vision-language model (VLM) designed to process both text and images for generating contextually relevant responses. The model leverages [LLaMA](./llama) as its text encoder, while [SigLip](./siglip) is used for encoding low-resolution images and [SAM (Segment Anything Model)](./sam) is incorporated to handle high-resolution image encoding, enhancing the models ability to process fine-grained visual details. Deepseek-VL-Hybrid is a variant of Deepseek-VL that uses [SAM (Segment Anything Model)](./sam) to handle high-resolution image encoding.
[Deepseek-VL-Hybrid](https://huggingface.co/papers/2403.05525) was introduced by the DeepSeek AI team. It is a vision-language model (VLM) designed to process both text and images for generating contextually relevant responses. The model leverages [LLaMA](./llama) as its text encoder, while [SigLip](./siglip) is used for encoding low-resolution images and [SAM (Segment Anything Model)](./sam) is incorporated to handle high-resolution image encoding, enhancing the models ability to process fine-grained visual details. Deepseek-VL-Hybrid is a variant of Deepseek-VL that uses [SAM (Segment Anything Model)](./sam) to handle high-resolution image encoding.
You can find all the original Deepseek-VL-Hybrid checkpoints under the [DeepSeek-community](https://huggingface.co/deepseek-community) organization.

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-10-08 and added to Hugging Face Transformers on 2022-09-14.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-12-23 and added to Hugging Face Transformers on 2021-04-13.*
# DeiT

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-12-20 and added to Hugging Face Transformers on 2023-06-20.*
# DePlot

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-01-19 and added to Hugging Face Transformers on 2024-01-25.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-06-13 and added to Hugging Face Transformers on 2024-07-05.*
# Depth Anything V2

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-10-02 and added to Hugging Face Transformers on 2025-02-10.*
# DepthPro

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-12-12 and added to Hugging Face Transformers on 2023-06-20.*
# DETA

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-05-26 and added to Hugging Face Transformers on 2021-06-09.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2025-04-21 and added to Hugging Face Transformers on 2025-06-26.*
# Dia
@@ -26,7 +27,7 @@ rendered properly in your Markdown viewer.
## Overview
Dia is an open-source text-to-speech (TTS) model (1.6B parameters) developed by [Nari Labs](https://huggingface.co/nari-labs).
[Dia](https://github.com/nari-labs/dia) is an open-source text-to-speech (TTS) model (1.6B parameters) developed by [Nari Labs](https://huggingface.co/nari-labs).
It can generate highly realistic dialogue from transcript including non-verbal communications such as laughter and coughing.
Furthermore, emotion and tone control is also possible via audio conditioning (voice cloning).

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2019-11-01 and added to Hugging Face Transformers on 2020-11-16.*
# DialoGPT

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-10-07 and added to Hugging Face Transformers on 2025-01-07.*
# DiffLlama

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-09-29 and added to Hugging Face Transformers on 2022-11-18.*
# Dilated Neighborhood Attention Transformer

View File

@@ -9,6 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
*This model was released on 2023-04-14 and added to Hugging Face Transformers on 2023-07-18.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -6,6 +6,7 @@ Unless required by applicable law or agreed to in writing, software distributed
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
*This model was released on 2023-09-28 and added to Hugging Face Transformers on 2024-12-24.*
# DINOv2 with Registers

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2019-10-02 and added to Hugging Face Transformers on 2020-11-16.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-03-04 and added to Hugging Face Transformers on 2022-03-10.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-12-27 and added to Hugging Face Transformers on 2025-07-08.*
# Doge

View File

@@ -12,6 +12,7 @@ Unless required by applicable law or agreed to in writing, software distributed
rendered properly in your Markdown viewer.
specific language governing permissions and limitations under the License. -->
*This model was released on 2021-11-30 and added to Hugging Face Transformers on 2022-08-12.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
@@ -21,7 +22,7 @@ specific language governing permissions and limitations under the License. -->
# Donut
[Donut (Document Understanding Transformer)](https://huggingface.co/papers2111.15664) is a visual document understanding model that doesn't require an Optical Character Recognition (OCR) engine. Unlike traditional approaches that extract text using OCR before processing, Donut employs an end-to-end Transformer-based architecture to directly analyze document images. This eliminates OCR-related inefficiencies making it more accurate and adaptable to diverse languages and formats.
[Donut (Document Understanding Transformer)](https://huggingface.co/papers/2111.15664) is a visual document understanding model that doesn't require an Optical Character Recognition (OCR) engine. Unlike traditional approaches that extract text using OCR before processing, Donut employs an end-to-end Transformer-based architecture to directly analyze document images. This eliminates OCR-related inefficiencies making it more accurate and adaptable to diverse languages and formats.
Donut features vision encoder ([Swin](./swin)) and a text decoder ([BART](./bart)). Swin converts document images into embeddings and BART processes them into meaningful text sequences.

View File

@@ -13,12 +13,13 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2025-06-06 and added to Hugging Face Transformers on 2025-06-25.*
# dots.llm1
## Overview
The `dots.llm1` model was proposed in [dots.llm1 technical report](https://www.arxiv.org/pdf/2506.05767) by rednote-hilab team.
The `dots.llm1` model was proposed in [dots.llm1 technical report](https://huggingface.co/papers/2506.05767) by rednote-hilab team.
The abstract from the report is the following:

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-04-10 and added to Hugging Face Transformers on 2020-11-16.*
# DPR

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2021-03-24 and added to Hugging Face Transformers on 2022-03-28.*
# DPT

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-06-02 and added to Hugging Face Transformers on 2023-06-20.*
# EfficientFormer

View File

@@ -11,6 +11,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-03-07 and added to Hugging Face Transformers on 2025-07-22.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2019-05-28 and added to Hugging Face Transformers on 2023-02-20.*
# EfficientNet

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2020-03-23 and added to Hugging Face Transformers on 2020-11-16.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2024-09-27 and added to Hugging Face Transformers on 2025-01-10.*
# Emu3

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2022-10-24 and added to Hugging Face Transformers on 2023-06-14.*
# EnCodec

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2017-06-12 and added to Hugging Face Transformers on 2020-11-16.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">

View File

@@ -8,6 +8,7 @@ specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
*This model was released on 2025-03-24 and added to Hugging Face Transformers on 2025-06-27.*
# EoMT
@@ -17,7 +18,7 @@ rendered properly in your Markdown viewer.
## Overview
The Encoder-only Mask Transformer (EoMT) model was introduced in the CVPR 2025 Highlight Paper [Your ViT is Secretly an Image Segmentation Model](https://www.tue-mps.org/eomt) by Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus.
[The Encoder-only Mask Transformer]((https://www.tue-mps.org/eomt)) (EoMT) model was introduced in the CVPR 2025 Highlight Paper *[Your ViT is Secretly an Image Segmentation Model](https://huggingface.co/papers/2503.19108)* by Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus.
EoMT reveals Vision Transformers can perform image segmentation efficiently without task-specific components.
The abstract from the paper is the following:

View File

@@ -13,6 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->
*This model was released on 2019-04-19 and added to Hugging Face Transformers on 2022-09-09.*
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
@@ -22,8 +23,8 @@ rendered properly in your Markdown viewer.
# ERNIE
[ERNIE1.0](https://arxiv.org/abs/1904.09223), [ERNIE2.0](https://ojs.aaai.org/index.php/AAAI/article/view/6428),
[ERNIE3.0](https://arxiv.org/abs/2107.02137), [ERNIE-Gram](https://arxiv.org/abs/2010.12148), [ERNIE-health](https://arxiv.org/abs/2110.07244) are a series of powerful models proposed by baidu, especially in Chinese tasks.
[ERNIE1.0](https://huggingface.co/papers/1904.09223), [ERNIE2.0](https://ojs.aaai.org/index.php/AAAI/article/view/6428),
[ERNIE3.0](https://huggingface.co/papers/2107.02137), [ERNIE-Gram](https://huggingface.co/papers/2010.12148), [ERNIE-health](https://huggingface.co/papers/2110.07244) are a series of powerful models proposed by baidu, especially in Chinese tasks.
ERNIE (Enhanced Representation through kNowledge IntEgration) is designed to learn language representation enhanced by knowledge masking strategies, which includes entity-level masking and phrase-level masking.

Some files were not shown because too many files have changed in this diff Show More