Fix en documentation typos (#21799)
* fix wrong url * typos in english documentation
This commit is contained in:
@@ -168,7 +168,7 @@ Apply the `group_texts` function over the entire dataset:
|
||||
>>> lm_dataset = tokenized_eli5.map(group_texts, batched=True, num_proc=4)
|
||||
```
|
||||
|
||||
Now create a batch of examples using [`DataCollatorForLanguageModeling`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
|
||||
Now create a batch of examples using [`DataCollatorForLanguageModeling`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
|
||||
@@ -119,7 +119,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
|
||||
tokenized_swag = swag.map(preprocess_function, batched=True)
|
||||
```
|
||||
|
||||
🤗 Transformers doesn't have a data collator for multiple choice, so you'll need to adapt the [`DataCollatorWithPadding`] to create a batch of examples. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
|
||||
🤗 Transformers doesn't have a data collator for multiple choice, so you'll need to adapt the [`DataCollatorWithPadding`] to create a batch of examples. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
|
||||
|
||||
`DataCollatorForMultipleChoice` flattens all the model inputs, applies padding, and then unflattens the results:
|
||||
|
||||
|
||||
@@ -54,7 +54,7 @@ We encourage you to login to your Hugging Face account so you can upload and sha
|
||||
|
||||
## Load SQuAD dataset
|
||||
|
||||
Start by loading a smaller subset of the SQuAD dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everythings works before spending more time training on the full dataset.
|
||||
Start by loading a smaller subset of the SQuAD dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everything works before spending more time training on the full dataset.
|
||||
|
||||
```py
|
||||
>>> from datasets import load_dataset
|
||||
|
||||
@@ -50,7 +50,7 @@ We encourage you to log in to your Hugging Face account so you can upload and sh
|
||||
|
||||
## Load SceneParse150 dataset
|
||||
|
||||
Start by loading a smaller subset of the SceneParse150 dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everythings works before spending more time training on the full dataset.
|
||||
Start by loading a smaller subset of the SceneParse150 dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everything works before spending more time training on the full dataset.
|
||||
|
||||
```py
|
||||
>>> from datasets import load_dataset
|
||||
|
||||
@@ -97,7 +97,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
|
||||
tokenized_imdb = imdb.map(preprocess_function, batched=True)
|
||||
```
|
||||
|
||||
Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
|
||||
Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
|
||||
@@ -118,7 +118,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
|
||||
>>> tokenized_billsum = billsum.map(preprocess_function, batched=True)
|
||||
```
|
||||
|
||||
Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
|
||||
Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
|
||||
@@ -156,7 +156,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
|
||||
>>> tokenized_wnut = wnut.map(tokenize_and_align_labels, batched=True)
|
||||
```
|
||||
|
||||
Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
|
||||
Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
|
||||
@@ -113,7 +113,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
|
||||
>>> tokenized_books = books.map(preprocess_function, batched=True)
|
||||
```
|
||||
|
||||
Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
|
||||
Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
|
||||
|
||||
<frameworkcontent>
|
||||
<pt>
|
||||
|
||||
Reference in New Issue
Block a user