Fix en documentation typos (#21799)

* fix wrong url

* typos in english documentation
This commit is contained in:
Thomas Paviot
2023-02-27 08:36:36 +01:00
committed by GitHub
parent a36983653e
commit ba2a5f13f7
17 changed files with 19 additions and 19 deletions

View File

@@ -168,7 +168,7 @@ Apply the `group_texts` function over the entire dataset:
>>> lm_dataset = tokenized_eli5.map(group_texts, batched=True, num_proc=4)
```
Now create a batch of examples using [`DataCollatorForLanguageModeling`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
Now create a batch of examples using [`DataCollatorForLanguageModeling`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
<frameworkcontent>
<pt>

View File

@@ -119,7 +119,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
tokenized_swag = swag.map(preprocess_function, batched=True)
```
🤗 Transformers doesn't have a data collator for multiple choice, so you'll need to adapt the [`DataCollatorWithPadding`] to create a batch of examples. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
🤗 Transformers doesn't have a data collator for multiple choice, so you'll need to adapt the [`DataCollatorWithPadding`] to create a batch of examples. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
`DataCollatorForMultipleChoice` flattens all the model inputs, applies padding, and then unflattens the results:

View File

@@ -54,7 +54,7 @@ We encourage you to login to your Hugging Face account so you can upload and sha
## Load SQuAD dataset
Start by loading a smaller subset of the SQuAD dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everythings works before spending more time training on the full dataset.
Start by loading a smaller subset of the SQuAD dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everything works before spending more time training on the full dataset.
```py
>>> from datasets import load_dataset

View File

@@ -50,7 +50,7 @@ We encourage you to log in to your Hugging Face account so you can upload and sh
## Load SceneParse150 dataset
Start by loading a smaller subset of the SceneParse150 dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everythings works before spending more time training on the full dataset.
Start by loading a smaller subset of the SceneParse150 dataset from the 🤗 Datasets library. This'll give you a chance to experiment and make sure everything works before spending more time training on the full dataset.
```py
>>> from datasets import load_dataset

View File

@@ -97,7 +97,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
tokenized_imdb = imdb.map(preprocess_function, batched=True)
```
Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
<frameworkcontent>
<pt>

View File

@@ -118,7 +118,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
>>> tokenized_billsum = billsum.map(preprocess_function, batched=True)
```
Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
<frameworkcontent>
<pt>

View File

@@ -156,7 +156,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
>>> tokenized_wnut = wnut.map(tokenize_and_align_labels, batched=True)
```
Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
Now create a batch of examples using [`DataCollatorWithPadding`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
<frameworkcontent>
<pt>

View File

@@ -113,7 +113,7 @@ To apply the preprocessing function over the entire dataset, use 🤗 Datasets [
>>> tokenized_books = books.map(preprocess_function, batched=True)
```
Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximium length.
Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more efficient to *dynamically pad* the sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.
<frameworkcontent>
<pt>