Doc styler examples (#14953)

* Fix bad examples * Add black formatting to style_doc * Use first nonempty line * Put it at the right place * Don't add spaces to empty lines * Better templates * Deal with triple quotes in docstrings * Result of style_doc * Enable mdx treatment and fix code examples in MDXs * Result of doc styler on doc source files * Last fixes * Break copy from
2021-12-27 19:07:46 -05:00
parent e13f72fbff
commit b5e2b183af
211 changed files with 2738 additions and 1711 deletions
--- a/docs/source/training.mdx
+++ b/docs/source/training.mdx
@@ -74,6 +74,7 @@ However, we can instead apply these preprocessing steps to all the splits of our
 def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

+
 tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
 ```

@@ -82,8 +83,8 @@ You can learn more about the map method or the other ways to preprocess the data
 Next we will generate a small subset of the training and validation set, to enable faster training:

 ```python
-small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000)) 
-small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000)) 
+small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
+small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
 full_train_dataset = tokenized_datasets["train"]
 full_eval_dataset = tokenized_datasets["test"]
 ```
@@ -130,9 +131,7 @@ Then we can instantiate a [`Trainer`] like this:
 ```python
 from transformers import Trainer

-trainer = Trainer(
-    model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset
-)
+trainer = Trainer(model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset)
 ```

 To fine-tune our model, we just need to call
@@ -160,6 +159,7 @@ from datasets import load_metric

 metric = load_metric("accuracy")

+
 def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
@@ -322,12 +322,7 @@ from transformers import get_scheduler

 num_epochs = 3
 num_training_steps = num_epochs * len(train_dataloader)
-lr_scheduler = get_scheduler(
-    "linear",
-    optimizer=optimizer,
-    num_warmup_steps=0,
-    num_training_steps=num_training_steps
-)
+lr_scheduler = get_scheduler("linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=num_training_steps)
 ```

 One last thing, we will want to use the GPU if we have access to one (otherwise training might take several hours
@@ -372,7 +367,7 @@ use a metric from the datasets library. Here we accumulate the predictions at ea
 result when the loop is finished.

 ```python
-metric= load_metric("accuracy")
+metric = load_metric("accuracy")
 model.eval()
 for batch in eval_dataloader:
    batch = {k: v.to(device) for k, v in batch.items()}