Doc styler examples (#14953)
* Fix bad examples * Add black formatting to style_doc * Use first nonempty line * Put it at the right place * Don't add spaces to empty lines * Better templates * Deal with triple quotes in docstrings * Result of style_doc * Enable mdx treatment and fix code examples in MDXs * Result of doc styler on doc source files * Last fixes * Break copy from
This commit is contained in:
@@ -51,12 +51,14 @@ ByT5 works on raw UTF-8 bytes, so it can be used without a tokenizer:
|
||||
from transformers import T5ForConditionalGeneration
|
||||
import torch
|
||||
|
||||
model = T5ForConditionalGeneration.from_pretrained('google/byt5-small')
|
||||
model = T5ForConditionalGeneration.from_pretrained("google/byt5-small")
|
||||
|
||||
input_ids = torch.tensor([list("Life is like a box of chocolates.".encode("utf-8"))]) + 3 # add 3 for special tokens
|
||||
labels = torch.tensor([list("La vie est comme une boîte de chocolat.".encode("utf-8"))]) + 3 # add 3 for special tokens
|
||||
labels = (
|
||||
torch.tensor([list("La vie est comme une boîte de chocolat.".encode("utf-8"))]) + 3
|
||||
) # add 3 for special tokens
|
||||
|
||||
loss = model(input_ids, labels=labels).loss # forward pass
|
||||
loss = model(input_ids, labels=labels).loss # forward pass
|
||||
```
|
||||
|
||||
For batched inference and training it is however recommended to make use of the tokenizer:
|
||||
@@ -64,13 +66,17 @@ For batched inference and training it is however recommended to make use of the
|
||||
```python
|
||||
from transformers import T5ForConditionalGeneration, AutoTokenizer
|
||||
|
||||
model = T5ForConditionalGeneration.from_pretrained('google/byt5-small')
|
||||
tokenizer = AutoTokenizer.from_pretrained('google/byt5-small')
|
||||
model = T5ForConditionalGeneration.from_pretrained("google/byt5-small")
|
||||
tokenizer = AutoTokenizer.from_pretrained("google/byt5-small")
|
||||
|
||||
model_inputs = tokenizer(["Life is like a box of chocolates.", "Today is Monday."], padding="longest", return_tensors="pt")
|
||||
labels = tokenizer(["La vie est comme une boîte de chocolat.", "Aujourd'hui c'est lundi."], padding="longest", return_tensors="pt").input_ids
|
||||
model_inputs = tokenizer(
|
||||
["Life is like a box of chocolates.", "Today is Monday."], padding="longest", return_tensors="pt"
|
||||
)
|
||||
labels = tokenizer(
|
||||
["La vie est comme une boîte de chocolat.", "Aujourd'hui c'est lundi."], padding="longest", return_tensors="pt"
|
||||
).input_ids
|
||||
|
||||
loss = model(**model_inputs, labels=labels).loss # forward pass
|
||||
loss = model(**model_inputs, labels=labels).loss # forward pass
|
||||
```
|
||||
|
||||
## ByT5Tokenizer
|
||||
|
||||
Reference in New Issue
Block a user