From ca7eb27ed590dd583bc028ba1a0f78eb00dbb243 Mon Sep 17 00:00:00 2001 From: Julien Chaumond Date: Wed, 3 May 2023 18:23:09 +0200 Subject: [PATCH] =?UTF-8?q?[doc]=20Try=20a=20few=20=E2=89=A0=20ways=20of?= =?UTF-8?q?=20linking=20to=20Papers,=20users,=20and=20org=20profiles=20(#2?= =?UTF-8?q?2611)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * [doc] Try a few ≠ ways of linking to Papers, users, and org profiles * Empty commit * Empty commit now that the backend is fixed --------- Co-authored-by: Lysandre --- docs/source/en/model_doc/distilbert.mdx | 5 ++++- docs/source/en/model_doc/gpt2.mdx | 2 +- docs/source/en/model_doc/roberta.mdx | 5 ++++- docs/source/en/model_doc/t5.mdx | 7 +++++-- 4 files changed, 14 insertions(+), 5 deletions(-) diff --git a/docs/source/en/model_doc/distilbert.mdx b/docs/source/en/model_doc/distilbert.mdx index cc1e037151..837f0319ec 100644 --- a/docs/source/en/model_doc/distilbert.mdx +++ b/docs/source/en/model_doc/distilbert.mdx @@ -19,13 +19,16 @@ specific language governing permissions and limitations under the License. Spaces + +Paper page + ## Overview The DistilBERT model was proposed in the blog post [Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5), and the paper [DistilBERT, a -distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108). DistilBERT is a +distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/papers/1910.01108). DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than *bert-base-uncased*, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark. diff --git a/docs/source/en/model_doc/gpt2.mdx b/docs/source/en/model_doc/gpt2.mdx index ee80eb2f8b..6288e46eb5 100644 --- a/docs/source/en/model_doc/gpt2.mdx +++ b/docs/source/en/model_doc/gpt2.mdx @@ -24,7 +24,7 @@ specific language governing permissions and limitations under the License. ## Overview OpenAI GPT-2 model was proposed in [Language Models are Unsupervised Multitask Learners](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) by Alec -Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. It's a causal (unidirectional) +Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever from [OpenAI](https://huggingface.co/openai). It's a causal (unidirectional) transformer pretrained using language modeling on a very large corpus of ~40 GB of text data. The abstract from the paper is the following: diff --git a/docs/source/en/model_doc/roberta.mdx b/docs/source/en/model_doc/roberta.mdx index 7c0818a014..49007409ad 100644 --- a/docs/source/en/model_doc/roberta.mdx +++ b/docs/source/en/model_doc/roberta.mdx @@ -19,11 +19,14 @@ specific language governing permissions and limitations under the License. Spaces + +Paper page + ## Overview -The RoBERTa model was proposed in [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer +The RoBERTa model was proposed in [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, [Myle Ott](https://huggingface.co/myleott), Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. It is based on Google's BERT model released in 2018. It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining objective and training with diff --git a/docs/source/en/model_doc/t5.mdx b/docs/source/en/model_doc/t5.mdx index f7665c11ae..58074f1403 100644 --- a/docs/source/en/model_doc/t5.mdx +++ b/docs/source/en/model_doc/t5.mdx @@ -19,12 +19,15 @@ specific language governing permissions and limitations under the License. Spaces + +Paper page + ## Overview -The T5 model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, -Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. +The T5 model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by [Colin Raffel](https://huggingface.co/craffel), Noam Shazeer, [Adam Roberts](https://huggingface.co/adarob), Katherine Lee, Sharan Narang, +Michael Matena, Yanqi Zhou, Wei Li, [Peter J. Liu](https://huggingface.co/peterjliu). The abstract from the paper is the following: