Documentation code sample fixes (#21302)
* Fixed the following: pipe -> pipeline out in pipe(data()) is a list of dict, not a dict * Fixed the TypeError: __init__() missing 1 required positional argument: 'key' * Added a tip: code sample requires additional libraries to run * Fixed custom config's name * added seqeval to the required libraries * fixed a missing dependency, fixed metric naming, added checkpoint to fix the datacollator * added checkpoint to fix the datacollator, added missing dependency
This commit is contained in:
@@ -95,7 +95,7 @@ Once you are satisfied with your model configuration, you can save it with [`~Pr
|
|||||||
To reuse the configuration file, load it with [`~PretrainedConfig.from_pretrained`]:
|
To reuse the configuration file, load it with [`~PretrainedConfig.from_pretrained`]:
|
||||||
|
|
||||||
```py
|
```py
|
||||||
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/my_config.json")
|
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/config.json")
|
||||||
```
|
```
|
||||||
|
|
||||||
<Tip>
|
<Tip>
|
||||||
@@ -115,7 +115,7 @@ Load your custom configuration attributes into the model:
|
|||||||
```py
|
```py
|
||||||
>>> from transformers import DistilBertModel
|
>>> from transformers import DistilBertModel
|
||||||
|
|
||||||
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/my_config.json")
|
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/config.json")
|
||||||
>>> model = DistilBertModel(my_config)
|
>>> model = DistilBertModel(my_config)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -156,10 +156,10 @@ def data():
|
|||||||
yield f"My example {i}"
|
yield f"My example {i}"
|
||||||
|
|
||||||
|
|
||||||
pipe = pipe(model="gpt2", device=0)
|
pipe = pipeline(model="gpt2", device=0)
|
||||||
generated_characters = 0
|
generated_characters = 0
|
||||||
for out in pipe(data()):
|
for out in pipe(data()):
|
||||||
generated_characters += len(out["generated_text"])
|
generated_characters += len(out[0]["generated_text"])
|
||||||
```
|
```
|
||||||
|
|
||||||
The iterator `data()` yields each result, and the pipeline automatically
|
The iterator `data()` yields each result, and the pipeline automatically
|
||||||
@@ -175,11 +175,12 @@ The simplest way to iterate over a dataset is to just load one from 🤗 [Datase
|
|||||||
```py
|
```py
|
||||||
# KeyDataset is a util that will just output the item we're interested in.
|
# KeyDataset is a util that will just output the item we're interested in.
|
||||||
from transformers.pipelines.pt_utils import KeyDataset
|
from transformers.pipelines.pt_utils import KeyDataset
|
||||||
|
from datasets import load_dataset
|
||||||
|
|
||||||
pipe = pipeline(model="hf-internal-testing/tiny-random-wav2vec2", device=0)
|
pipe = pipeline(model="hf-internal-testing/tiny-random-wav2vec2", device=0)
|
||||||
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation[:10]")
|
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation[:10]")
|
||||||
|
|
||||||
for out in pipe(KeyDataset(dataset["audio"])):
|
for out in pipe(KeyDataset(dataset, "audio")):
|
||||||
print(out)
|
print(out)
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -246,3 +247,14 @@ For example, if you use this [invoice image](https://huggingface.co/spaces/impir
|
|||||||
... )
|
... )
|
||||||
[{'score': 0.42514941096305847, 'answer': 'us-001', 'start': 16, 'end': 16}]
|
[{'score': 0.42514941096305847, 'answer': 'us-001', 'start': 16, 'end': 16}]
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
To run the example above you need to have [`pytesseract`](https://pypi.org/project/pytesseract/) installed in addition to 🤗 Transformers:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo apt install -y tesseract-ocr
|
||||||
|
pip install pytesseract
|
||||||
|
```
|
||||||
|
|
||||||
|
</Tip>
|
||||||
@@ -33,7 +33,7 @@ See the summarization [task page](https://huggingface.co/tasks/summarization) fo
|
|||||||
Before you begin, make sure you have all the necessary libraries installed:
|
Before you begin, make sure you have all the necessary libraries installed:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install transformers datasets evaluate
|
pip install transformers datasets evaluate rouge_score
|
||||||
```
|
```
|
||||||
|
|
||||||
We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
|
We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
|
||||||
@@ -81,7 +81,8 @@ The next step is to load a T5 tokenizer to process `text` and `summary`:
|
|||||||
```py
|
```py
|
||||||
>>> from transformers import AutoTokenizer
|
>>> from transformers import AutoTokenizer
|
||||||
|
|
||||||
>>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
|
>>> checkpoint = "t5-small"
|
||||||
|
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
||||||
```
|
```
|
||||||
|
|
||||||
The preprocessing function you want to create needs to:
|
The preprocessing function you want to create needs to:
|
||||||
@@ -117,14 +118,14 @@ Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more effic
|
|||||||
```py
|
```py
|
||||||
>>> from transformers import DataCollatorForSeq2Seq
|
>>> from transformers import DataCollatorForSeq2Seq
|
||||||
|
|
||||||
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)
|
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint)
|
||||||
```
|
```
|
||||||
</pt>
|
</pt>
|
||||||
<tf>
|
<tf>
|
||||||
```py
|
```py
|
||||||
>>> from transformers import DataCollatorForSeq2Seq
|
>>> from transformers import DataCollatorForSeq2Seq
|
||||||
|
|
||||||
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model, return_tensors="tf")
|
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint, return_tensors="tf")
|
||||||
```
|
```
|
||||||
</tf>
|
</tf>
|
||||||
</frameworkcontent>
|
</frameworkcontent>
|
||||||
@@ -175,7 +176,7 @@ You're ready to start training your model now! Load T5 with [`AutoModelForSeq2Se
|
|||||||
```py
|
```py
|
||||||
>>> from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
|
>>> from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
|
||||||
|
|
||||||
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
|
>>> model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
|
||||||
```
|
```
|
||||||
|
|
||||||
At this point, only three steps remain:
|
At this point, only three steps remain:
|
||||||
@@ -237,7 +238,7 @@ Then you can load T5 with [`TFAutoModelForSeq2SeqLM`]:
|
|||||||
```py
|
```py
|
||||||
>>> from transformers import TFAutoModelForSeq2SeqLM
|
>>> from transformers import TFAutoModelForSeq2SeqLM
|
||||||
|
|
||||||
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")
|
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained(checkpoint)
|
||||||
```
|
```
|
||||||
|
|
||||||
Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
|
Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
|
||||||
|
|||||||
@@ -32,7 +32,7 @@ See the token classification [task page](https://huggingface.co/tasks/token-clas
|
|||||||
Before you begin, make sure you have all the necessary libraries installed:
|
Before you begin, make sure you have all the necessary libraries installed:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install transformers datasets evaluate
|
pip install transformers datasets evaluate seqeval
|
||||||
```
|
```
|
||||||
|
|
||||||
We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
|
We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
|
||||||
|
|||||||
@@ -30,7 +30,7 @@ See the translation [task page](https://huggingface.co/tasks/translation) for mo
|
|||||||
Before you begin, make sure you have all the necessary libraries installed:
|
Before you begin, make sure you have all the necessary libraries installed:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install transformers datasets evaluate
|
pip install transformers datasets evaluate sacrebleu
|
||||||
```
|
```
|
||||||
|
|
||||||
We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
|
We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
|
||||||
@@ -77,7 +77,8 @@ The next step is to load a T5 tokenizer to process the English-French language p
|
|||||||
```py
|
```py
|
||||||
>>> from transformers import AutoTokenizer
|
>>> from transformers import AutoTokenizer
|
||||||
|
|
||||||
>>> tokenizer = AutoTokenizer.from_pretrained("t5-small")
|
>>> checkpoint = "t5-small"
|
||||||
|
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
||||||
```
|
```
|
||||||
|
|
||||||
The preprocessing function you want to create needs to:
|
The preprocessing function you want to create needs to:
|
||||||
@@ -112,7 +113,7 @@ Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more effic
|
|||||||
```py
|
```py
|
||||||
>>> from transformers import DataCollatorForSeq2Seq
|
>>> from transformers import DataCollatorForSeq2Seq
|
||||||
|
|
||||||
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)
|
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint)
|
||||||
```
|
```
|
||||||
</pt>
|
</pt>
|
||||||
<tf>
|
<tf>
|
||||||
@@ -120,7 +121,7 @@ Now create a batch of examples using [`DataCollatorForSeq2Seq`]. It's more effic
|
|||||||
```py
|
```py
|
||||||
>>> from transformers import DataCollatorForSeq2Seq
|
>>> from transformers import DataCollatorForSeq2Seq
|
||||||
|
|
||||||
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model, return_tensors="tf")
|
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint, return_tensors="tf")
|
||||||
```
|
```
|
||||||
</tf>
|
</tf>
|
||||||
</frameworkcontent>
|
</frameworkcontent>
|
||||||
@@ -132,7 +133,7 @@ Including a metric during training is often helpful for evaluating your model's
|
|||||||
```py
|
```py
|
||||||
>>> import evaluate
|
>>> import evaluate
|
||||||
|
|
||||||
>>> sacrebleu = evaluate.load("sacrebleu")
|
>>> metric = evaluate.load("sacrebleu")
|
||||||
```
|
```
|
||||||
|
|
||||||
Then create a function that passes your predictions and labels to [`~evaluate.EvaluationModule.compute`] to calculate the SacreBLEU score:
|
Then create a function that passes your predictions and labels to [`~evaluate.EvaluationModule.compute`] to calculate the SacreBLEU score:
|
||||||
@@ -184,7 +185,7 @@ You're ready to start training your model now! Load T5 with [`AutoModelForSeq2Se
|
|||||||
```py
|
```py
|
||||||
>>> from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
|
>>> from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
|
||||||
|
|
||||||
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
|
>>> model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
|
||||||
```
|
```
|
||||||
|
|
||||||
At this point, only three steps remain:
|
At this point, only three steps remain:
|
||||||
@@ -246,7 +247,7 @@ Then you can load T5 with [`TFAutoModelForSeq2SeqLM`]:
|
|||||||
```py
|
```py
|
||||||
>>> from transformers import TFAutoModelForSeq2SeqLM
|
>>> from transformers import TFAutoModelForSeq2SeqLM
|
||||||
|
|
||||||
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")
|
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained(checkpoint)
|
||||||
```
|
```
|
||||||
|
|
||||||
Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
|
Convert your datasets to the `tf.data.Dataset` format with [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
|
||||||
|
|||||||
Reference in New Issue
Block a user