Update all references to canonical models (#29001)

* Script & Manual edition

* Update
This commit is contained in:
Lysandre Debut
2024-02-16 08:16:58 +01:00
committed by GitHub
parent 1e402b957d
commit f497f564bb
561 changed files with 2682 additions and 2687 deletions

View File

@@ -29,7 +29,7 @@ the left. This means the model cannot see future tokens. GPT-2 is an example of
This guide will show you how to:
1. Finetune [DistilGPT2](https://huggingface.co/distilgpt2) on the [r/askscience](https://www.reddit.com/r/askscience/) subset of the [ELI5](https://huggingface.co/datasets/eli5) dataset.
1. Finetune [DistilGPT2](https://huggingface.co/distilbert/distilgpt2) on the [r/askscience](https://www.reddit.com/r/askscience/) subset of the [ELI5](https://huggingface.co/datasets/eli5) dataset.
2. Use your finetuned model for inference.
<Tip>
@@ -110,7 +110,7 @@ The next step is to load a DistilGPT2 tokenizer to process the `text` subfield:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilgpt2")
```
You'll notice from the example above, the `text` field is actually nested inside `answers`. This means you'll need to
@@ -236,7 +236,7 @@ You're ready to start training your model now! Load DistilGPT2 with [`AutoModelF
```py
>>> from transformers import AutoModelForCausalLM, TrainingArguments, Trainer
>>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
>>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
```
At this point, only three steps remain:
@@ -300,7 +300,7 @@ Then you can load DistilGPT2 with [`TFAutoModelForCausalLM`]:
```py
>>> from transformers import TFAutoModelForCausalLM
>>> model = TFAutoModelForCausalLM.from_pretrained("distilgpt2")
>>> model = TFAutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
```
Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:

View File

@@ -26,7 +26,7 @@ require a good contextual understanding of an entire sequence. BERT is an exampl
This guide will show you how to:
1. Finetune [DistilRoBERTa](https://huggingface.co/distilroberta-base) on the [r/askscience](https://www.reddit.com/r/askscience/) subset of the [ELI5](https://huggingface.co/datasets/eli5) dataset.
1. Finetune [DistilRoBERTa](https://huggingface.co/distilbert/distilroberta-base) on the [r/askscience](https://www.reddit.com/r/askscience/) subset of the [ELI5](https://huggingface.co/datasets/eli5) dataset.
2. Use your finetuned model for inference.
<Tip>
@@ -105,7 +105,7 @@ For masked language modeling, the next step is to load a DistilRoBERTa tokenizer
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("distilroberta-base")
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilroberta-base")
```
You'll notice from the example above, the `text` field is actually nested inside `answers`. This means you'll need to extract the `text` subfield from its nested structure with the [`flatten`](https://huggingface.co/docs/datasets/process#flatten) method:
@@ -226,7 +226,7 @@ You're ready to start training your model now! Load DistilRoBERTa with [`AutoMod
```py
>>> from transformers import AutoModelForMaskedLM
>>> model = AutoModelForMaskedLM.from_pretrained("distilroberta-base")
>>> model = AutoModelForMaskedLM.from_pretrained("distilbert/distilroberta-base")
```
At this point, only three steps remain:
@@ -291,7 +291,7 @@ Then you can load DistilRoBERTa with [`TFAutoModelForMaskedLM`]:
```py
>>> from transformers import TFAutoModelForMaskedLM
>>> model = TFAutoModelForMaskedLM.from_pretrained("distilroberta-base")
>>> model = TFAutoModelForMaskedLM.from_pretrained("distilbert/distilroberta-base")
```
Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:

View File

@@ -22,7 +22,7 @@ A multiple choice task is similar to question answering, except several candidat
This guide will show you how to:
1. Finetune [BERT](https://huggingface.co/bert-base-uncased) on the `regular` configuration of the [SWAG](https://huggingface.co/datasets/swag) dataset to select the best answer given multiple options and some context.
1. Finetune [BERT](https://huggingface.co/google-bert/bert-base-uncased) on the `regular` configuration of the [SWAG](https://huggingface.co/datasets/swag) dataset to select the best answer given multiple options and some context.
2. Use your finetuned model for inference.
<Tip>
@@ -90,7 +90,7 @@ The next step is to load a BERT tokenizer to process the sentence starts and the
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
```
The preprocessing function you want to create needs to:
@@ -253,7 +253,7 @@ You're ready to start training your model now! Load BERT with [`AutoModelForMult
```py
>>> from transformers import AutoModelForMultipleChoice, TrainingArguments, Trainer
>>> model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
>>> model = AutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-uncased")
```
At this point, only three steps remain:
@@ -317,7 +317,7 @@ Then you can load BERT with [`TFAutoModelForMultipleChoice`]:
```py
>>> from transformers import TFAutoModelForMultipleChoice
>>> model = TFAutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
>>> model = TFAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-uncased")
```
Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:

View File

@@ -76,7 +76,7 @@ Run inference with decoder-only models with the `text-generation` pipeline:
>>> torch.manual_seed(0) # doctest: +IGNORE_RESULT
>>> generator = pipeline('text-generation', model = 'gpt2')
>>> generator = pipeline('text-generation', model = 'openai-community/gpt2')
>>> prompt = "Hello, I'm a language model"
>>> generator(prompt, max_length = 30)

View File

@@ -27,7 +27,7 @@ Question answering tasks return an answer given a question. If you've ever asked
This guide will show you how to:
1. Finetune [DistilBERT](https://huggingface.co/distilbert-base-uncased) on the [SQuAD](https://huggingface.co/datasets/squad) dataset for extractive question answering.
1. Finetune [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) on the [SQuAD](https://huggingface.co/datasets/squad) dataset for extractive question answering.
2. Use your finetuned model for inference.
<Tip>
@@ -100,7 +100,7 @@ The next step is to load a DistilBERT tokenizer to process the `question` and `c
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
There are a few preprocessing steps particular to question answering tasks you should be aware of:
@@ -206,7 +206,7 @@ You're ready to start training your model now! Load DistilBERT with [`AutoModelF
```py
>>> from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
>>> model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
>>> model = AutoModelForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
At this point, only three steps remain:
@@ -271,7 +271,7 @@ Then you can load DistilBERT with [`TFAutoModelForQuestionAnswering`]:
```py
>>> from transformers import TFAutoModelForQuestionAnswering
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:

View File

@@ -24,7 +24,7 @@ Text classification is a common NLP task that assigns a label or class to text.
This guide will show you how to:
1. Finetune [DistilBERT](https://huggingface.co/distilbert-base-uncased) on the [IMDb](https://huggingface.co/datasets/imdb) dataset to determine whether a movie review is positive or negative.
1. Finetune [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) on the [IMDb](https://huggingface.co/datasets/imdb) dataset to determine whether a movie review is positive or negative.
2. Use your finetuned model for inference.
<Tip>
@@ -87,7 +87,7 @@ The next step is to load a DistilBERT tokenizer to preprocess the `text` field:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
Create a preprocessing function to tokenize `text` and truncate sequences to be no longer than DistilBERT's maximum input length:
@@ -169,7 +169,7 @@ You're ready to start training your model now! Load DistilBERT with [`AutoModelF
>>> from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
>>> model = AutoModelForSequenceClassification.from_pretrained(
... "distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
... "distilbert/distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
... )
```
@@ -243,7 +243,7 @@ Then you can load DistilBERT with [`TFAutoModelForSequenceClassification`] along
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained(
... "distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
... "distilbert/distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
... )
```

View File

@@ -27,7 +27,7 @@ Summarization creates a shorter version of a document or an article that capture
This guide will show you how to:
1. Finetune [T5](https://huggingface.co/t5-small) on the California state bill subset of the [BillSum](https://huggingface.co/datasets/billsum) dataset for abstractive summarization.
1. Finetune [T5](https://huggingface.co/google-t5/t5-small) on the California state bill subset of the [BillSum](https://huggingface.co/datasets/billsum) dataset for abstractive summarization.
2. Use your finetuned model for inference.
<Tip>
@@ -92,7 +92,7 @@ The next step is to load a T5 tokenizer to process `text` and `summary`:
```py
>>> from transformers import AutoTokenizer
>>> checkpoint = "t5-small"
>>> checkpoint = "google-t5/t5-small"
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
```

View File

@@ -24,7 +24,7 @@ Token classification assigns a label to individual tokens in a sentence. One of
This guide will show you how to:
1. Finetune [DistilBERT](https://huggingface.co/distilbert-base-uncased) on the [WNUT 17](https://huggingface.co/datasets/wnut_17) dataset to detect new entities.
1. Finetune [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) on the [WNUT 17](https://huggingface.co/datasets/wnut_17) dataset to detect new entities.
2. Use your finetuned model for inference.
<Tip>
@@ -110,7 +110,7 @@ The next step is to load a DistilBERT tokenizer to preprocess the `tokens` field
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
```
As you saw in the example `tokens` field above, it looks like the input has already been tokenized. But the input actually hasn't been tokenized yet and you'll need to set `is_split_into_words=True` to tokenize the words into subwords. For example:
@@ -272,7 +272,7 @@ You're ready to start training your model now! Load DistilBERT with [`AutoModelF
>>> from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer
>>> model = AutoModelForTokenClassification.from_pretrained(
... "distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
... "distilbert/distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
... )
```
@@ -343,7 +343,7 @@ Then you can load DistilBERT with [`TFAutoModelForTokenClassification`] along wi
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained(
... "distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
... "distilbert/distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
... )
```

View File

@@ -24,7 +24,7 @@ Translation converts a sequence of text from one language to another. It is one
This guide will show you how to:
1. Finetune [T5](https://huggingface.co/t5-small) on the English-French subset of the [OPUS Books](https://huggingface.co/datasets/opus_books) dataset to translate English text to French.
1. Finetune [T5](https://huggingface.co/google-t5/t5-small) on the English-French subset of the [OPUS Books](https://huggingface.co/datasets/opus_books) dataset to translate English text to French.
2. Use your finetuned model for inference.
<Tip>
@@ -88,7 +88,7 @@ The next step is to load a T5 tokenizer to process the English-French language p
```py
>>> from transformers import AutoTokenizer
>>> checkpoint = "t5-small"
>>> checkpoint = "google-t5/t5-small"
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
```