[doc] Try a few ≠ ways of linking to Papers, users, and org profiles (#22611)
* [doc] Try a few ≠ ways of linking to Papers, users, and org profiles * Empty commit * Empty commit now that the backend is fixed --------- Co-authored-by: Lysandre <lysandre@huggingface.co>
This commit is contained in:
@@ -19,13 +19,16 @@ specific language governing permissions and limitations under the License.
|
||||
<a href="https://huggingface.co/spaces/docs-demos/distilbert-base-uncased">
|
||||
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
|
||||
</a>
|
||||
<a href="https://huggingface.co/papers/1910.01108">
|
||||
<img alt="Paper page" src="https://img.shields.io/badge/Paper%20page-1910.01108-green">
|
||||
</a>
|
||||
</div>
|
||||
|
||||
## Overview
|
||||
|
||||
The DistilBERT model was proposed in the blog post [Smaller, faster, cheaper, lighter: Introducing DistilBERT, a
|
||||
distilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5), and the paper [DistilBERT, a
|
||||
distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108). DistilBERT is a
|
||||
distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/papers/1910.01108). DistilBERT is a
|
||||
small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than
|
||||
*bert-base-uncased*, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language
|
||||
understanding benchmark.
|
||||
|
||||
@@ -24,7 +24,7 @@ specific language governing permissions and limitations under the License.
|
||||
## Overview
|
||||
|
||||
OpenAI GPT-2 model was proposed in [Language Models are Unsupervised Multitask Learners](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) by Alec
|
||||
Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. It's a causal (unidirectional)
|
||||
Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever from [OpenAI](https://huggingface.co/openai). It's a causal (unidirectional)
|
||||
transformer pretrained using language modeling on a very large corpus of ~40 GB of text data.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
@@ -19,11 +19,14 @@ specific language governing permissions and limitations under the License.
|
||||
<a href="https://huggingface.co/spaces/docs-demos/roberta-base">
|
||||
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
|
||||
</a>
|
||||
<a href="https://huggingface.co/papers/1907.11692">
|
||||
<img alt="Paper page" src="https://img.shields.io/badge/Paper%20page-1907.11692-green">
|
||||
</a>
|
||||
</div>
|
||||
|
||||
## Overview
|
||||
|
||||
The RoBERTa model was proposed in [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
|
||||
The RoBERTa model was proposed in [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, [Myle Ott](https://huggingface.co/myleott), Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
|
||||
Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. It is based on Google's BERT model released in 2018.
|
||||
|
||||
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with
|
||||
|
||||
@@ -19,12 +19,15 @@ specific language governing permissions and limitations under the License.
|
||||
<a href="https://huggingface.co/spaces/docs-demos/t5-base">
|
||||
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
|
||||
</a>
|
||||
<a href="https://huggingface.co/papers/1910.10683">
|
||||
<img alt="Paper page" src="https://img.shields.io/badge/Paper%20page-1910.10683-green">
|
||||
</a>
|
||||
</div>
|
||||
|
||||
## Overview
|
||||
|
||||
The T5 model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
|
||||
Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
|
||||
The T5 model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by [Colin Raffel](https://huggingface.co/craffel), Noam Shazeer, [Adam Roberts](https://huggingface.co/adarob), Katherine Lee, Sharan Narang,
|
||||
Michael Matena, Yanqi Zhou, Wei Li, [Peter J. Liu](https://huggingface.co/peterjliu).
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user