Doc styler examples (#14953)

* Fix bad examples * Add black formatting to style_doc * Use first nonempty line * Put it at the right place * Don't add spaces to empty lines * Better templates * Deal with triple quotes in docstrings * Result of style_doc * Enable mdx treatment and fix code examples in MDXs * Result of doc styler on doc source files * Last fixes * Break copy from
2021-12-27 19:07:46 -05:00
parent e13f72fbff
commit b5e2b183af
211 changed files with 2738 additions and 1711 deletions
--- a/docs/source/custom_datasets.mdx
+++ b/docs/source/custom_datasets.mdx
@@ -54,6 +54,7 @@ The 🤗 Datasets library makes it simple to load a dataset:

 ```python
 from datasets import load_dataset
+
 imdb = load_dataset("imdb")
 ```

@@ -61,8 +62,9 @@ This loads a `DatasetDict` object which you can index into to view an example:

 ```python
 imdb["train"][0]
-{'label': 1,
- 'text': 'Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High\'s satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers\' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I\'m here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn\'t!'
+{
+    "label": 1,
+    "text": "Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as \"Teachers\". My 35 years in the teaching profession lead me to believe that Bromwell High's satire is much closer to reality than is \"Teachers\". The scramble to survive financially, the insightful students who can see right through their pathetic teachers' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I'm here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn't!",
 }
 ```

@@ -74,6 +76,7 @@ model was trained with to ensure appropriately tokenized words. Load the DistilB

 ```python
 from transformers import AutoTokenizer
+
 tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
 ```

@@ -99,6 +102,7 @@ batch. This is known as **dynamic padding**. You can do this with the `DataColla

 ```python
 from transformers import DataCollatorWithPadding
+
 data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
 ```

@@ -108,6 +112,7 @@ Now load your model with the [`AutoModelForSequenceClassification`] class along

 ```python
 from transformers import AutoModelForSequenceClassification
+
 model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
 ```

@@ -121,7 +126,7 @@ At this point, only three steps remain:
 from transformers import TrainingArguments, Trainer

 training_args = TrainingArguments(
-    output_dir='./results',
+    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
@@ -150,6 +155,7 @@ Make sure you set `return_tensors="tf"` to return `tf.Tensor` outputs instead of

 ```python
 from transformers import DataCollatorWithPadding
+
 data_collator = DataCollatorWithPadding(tokenizer, return_tensors="tf")
 ```

@@ -158,14 +164,14 @@ Next, convert your datasets to the `tf.data.Dataset` format with `to_tf_dataset`

 ```python
 tf_train_dataset = tokenized_imdb["train"].to_tf_dataset(
-    columns=['attention_mask', 'input_ids', 'label'],
+    columns=["attention_mask", "input_ids", "label"],
    shuffle=True,
    batch_size=16,
    collate_fn=data_collator,
 )

 tf_validation_dataset = tokenized_imdb["train"].to_tf_dataset(
-    columns=['attention_mask', 'input_ids', 'label'],
+    columns=["attention_mask", "input_ids", "label"],
    shuffle=False,
    batch_size=16,
    collate_fn=data_collator,
@@ -182,17 +188,14 @@ batch_size = 16
 num_epochs = 5
 batches_per_epoch = len(tokenized_imdb["train"]) // batch_size
 total_train_steps = int(batches_per_epoch * num_epochs)
-optimizer, schedule = create_optimizer(
-    init_lr=2e-5, 
-    num_warmup_steps=0, 
-    num_train_steps=total_train_steps
-)
+optimizer, schedule = create_optimizer(init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
 ```

 Load your model with the [`TFAutoModelForSequenceClassification`] class along with the number of expected labels:

 ```python
 from transformers import TFAutoModelForSequenceClassification
+
 model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
 ```

@@ -200,6 +203,7 @@ Compile the model:

 ```python
 import tensorflow as tf
+
 model.compile(optimizer=optimizer)
 ```

@@ -234,14 +238,15 @@ or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/no
 Load the WNUT 17 dataset from the 🤗 Datasets library:

 ```python
-from datasets import load_dataset
-wnut = load_dataset("wnut_17")
+>>> from datasets import load_dataset
+
+>>> wnut = load_dataset("wnut_17")
 ```

 A quick look at the dataset shows the labels associated with each word in the sentence:

 ```python
-wnut["train"][0]
+>>> wnut["train"][0]
 {'id': '0',
 'ner_tags': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8, 8, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0],
 'tokens': ['@paulwalk', 'It', "'s", 'the', 'view', 'from', 'where', 'I', "'m", 'living', 'for', 'two', 'weeks', '.', 'Empire', 'State', 'Building', '=', 'ESB', '.', 'Pretty', 'bad', 'storm', 'here', 'last', 'evening', '.']
@@ -251,21 +256,22 @@ wnut["train"][0]
 View the specific NER tags by:

 ```python
-label_list = wnut["train"].features[f"ner_tags"].feature.names
-label_list
-['O',
- 'B-corporation',
- 'I-corporation',
- 'B-creative-work',
- 'I-creative-work',
- 'B-group',
- 'I-group',
- 'B-location',
- 'I-location',
- 'B-person',
- 'I-person',
- 'B-product',
- 'I-product'
+>>> label_list = wnut["train"].features[f"ner_tags"].feature.names
+>>> label_list
+[
+    "O",
+    "B-corporation",
+    "I-corporation",
+    "B-creative-work",
+    "I-creative-work",
+    "B-group",
+    "I-group",
+    "B-location",
+    "I-location",
+    "B-person",
+    "I-person",
+    "B-product",
+    "I-product",
 ]
 ```

@@ -282,6 +288,7 @@ Now you need to tokenize the text. Load the DistilBERT tokenizer with an [`AutoT

 ```python
 from transformers import AutoTokenizer
+
 tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
 ```

@@ -289,9 +296,9 @@ Since the input has already been split into words, set `is_split_into_words=True
 subwords:

 ```python
-tokenized_input = tokenizer(example["tokens"], is_split_into_words=True)
-tokens = tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"])
-tokens
+>>> tokenized_input = tokenizer(example["tokens"], is_split_into_words=True)
+>>> tokens = tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"])
+>>> tokens
 ['[CLS]', '@', 'paul', '##walk', 'it', "'", 's', 'the', 'view', 'from', 'where', 'i', "'", 'm', 'living', 'for', 'two', 'weeks', '.', 'empire', 'state', 'building', '=', 'es', '##b', '.', 'pretty', 'bad', 'storm', 'here', 'last', 'evening', '.', '[SEP]']
 ```

@@ -314,10 +321,10 @@ def tokenize_and_align_labels(examples):
        word_ids = tokenized_inputs.word_ids(batch_index=i)  # Map tokens to their respective word.
        previous_word_idx = None
        label_ids = []
-        for word_idx in word_ids:                            # Set the special tokens to -100.
+        for word_idx in word_ids:  # Set the special tokens to -100.
            if word_idx is None:
                label_ids.append(-100)
-            elif word_idx != previous_word_idx:              # Only label the first token of a given word.
+            elif word_idx != previous_word_idx:  # Only label the first token of a given word.
                label_ids.append(label[word_idx])

        labels.append(label_ids)
@@ -336,6 +343,7 @@ Finally, pad your text and labels, so they are a uniform length:

 ```python
 from transformers import DataCollatorForTokenClassification
+
 data_collator = DataCollatorForTokenClassification(tokenizer)
 ```

@@ -345,6 +353,7 @@ Load your model with the [`AutoModelForTokenClassification`] class along with th

 ```python
 from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer
+
 model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list))
 ```

@@ -352,7 +361,7 @@ Gather your training arguments in [`TrainingArguments`]:

 ```python
 training_args = TrainingArguments(
-    output_dir='./results',
+    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
@@ -387,6 +396,7 @@ Batch your examples together and pad your text and labels, so they are a uniform

 ```python
 from transformers import DataCollatorForTokenClassification
+
 data_collator = DataCollatorForTokenClassification(tokenizer, return_tensors="tf")
 ```

@@ -412,6 +422,7 @@ Load the model with the [`TFAutoModelForTokenClassification`] class along with t

 ```python
 from transformers import TFAutoModelForTokenClassification
+
 model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list))
 ```

@@ -435,6 +446,7 @@ Compile the model:

 ```python
 import tensorflow as tf
+
 model.compile(optimizer=optimizer)
 ```

@@ -469,13 +481,14 @@ Load the SQuAD dataset from the 🤗 Datasets library:

 ```python
 from datasets import load_dataset
+
 squad = load_dataset("squad")
 ```

 Take a look at an example from the dataset:

 ```python
-squad["train"][0]
+>>> squad["train"][0]
 {'answers': {'answer_start': [515], 'text': ['Saint Bernadette Soubirous']},
 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.',
 'id': '5733be284776f41900661182',
@@ -490,6 +503,7 @@ Load the DistilBERT tokenizer with an [`AutoTokenizer`]:

 ```python
 from transformers import AutoTokenizer
+
 tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
 ```

@@ -567,6 +581,7 @@ Batch the processed examples together:

 ```python
 from transformers import default_data_collator
+
 data_collator = default_data_collator
 ```

@@ -576,6 +591,7 @@ Load your model with the [`AutoModelForQuestionAnswering`] class:

 ```python
 from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
+
 model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
 ```

@@ -583,7 +599,7 @@ Gather your training arguments in [`TrainingArguments`]:

 ```python
 training_args = TrainingArguments(
-    output_dir='./results',
+    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
@@ -618,6 +634,7 @@ Batch the processed examples together with a TensorFlow default data collator:

 ```python
 from transformers.data.data_collator import tf_default_collator
+
 data_collator = tf_default_collator
 ```

@@ -650,8 +667,8 @@ batch_size = 16
 num_epochs = 2
 total_train_steps = (len(tokenized_squad["train"]) // batch_size) * num_epochs
 optimizer, schedule = create_optimizer(
-    init_lr=2e-5, 
-    num_warmup_steps=0, 
+    init_lr=2e-5,
+    num_warmup_steps=0,
    num_train_steps=total_train_steps,
 )
 ```
@@ -660,6 +677,7 @@ Load your model with the [`TFAutoModelForQuestionAnswering`] class:

 ```python
 from transformers import TFAutoModelForQuestionAnswering
+
 model = TFAutoModelForQuestionAnswering("distilbert-base-uncased")
 ```

@@ -667,6 +685,7 @@ Compile the model:

 ```python
 import tensorflow as tf
+
 model.compile(optimizer=optimizer)
 ```