From 77321481247787c97568c3b9f64b19e22351bab8 Mon Sep 17 00:00:00 2001
From: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Date: Tue, 22 Mar 2022 14:14:58 -0700
Subject: [PATCH] Adopt framework-specific blocks for content (#16342)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* ✨ refactor code samples with framework-specific blocks

* ✨ update training.mdx

* 🖍 apply feedback
---
 docs/source/_toctree.yml                      |  2 +-
 docs/source/model_sharing.mdx                 | 45 +++++++++-------
 docs/source/tasks/asr.mdx                     |  6 ++-
 docs/source/tasks/audio_classification.mdx    |  6 ++-
 docs/source/tasks/image_classification.mdx    |  6 ++-
 docs/source/tasks/language_modeling.mdx       | 54 ++++++++++---------
 docs/source/tasks/multiple_choice.mdx         | 29 +++++-----
 docs/source/tasks/question_answering.mdx      | 27 +++++-----
 docs/source/tasks/sequence_classification.mdx | 27 +++++-----
 docs/source/tasks/summarization.mdx           | 27 +++++-----
 docs/source/tasks/token_classification.mdx    | 27 +++++-----
 docs/source/tasks/translation.mdx             | 27 +++++-----
 docs/source/training.mdx                      | 19 ++++---
 13 files changed, 169 insertions(+), 133 deletions(-)
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
index 60bb5030d9..22896b5297 100644
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -22,7 +22,7 @@
   - local: model_summary
     title: Summary of the models
   - local: training
-    title: Fine-tuning a pretrained model
+    title: Fine-tune a pretrained model
   - local: accelerate
     title: Distributed training with 🤗 Accelerate
   - local: model_sharing
diff --git a/docs/source/model_sharing.mdx b/docs/source/model_sharing.mdx
index 048dd8639f..fb8a30a9cc 100644
--- a/docs/source/model_sharing.mdx
+++ b/docs/source/model_sharing.mdx
@@ -75,25 +75,29 @@ To ensure your model can be used by someone working with a different framework,
 
 Converting a checkpoint for another framework is easy. Make sure you have PyTorch and TensorFlow installed (see [here](installation) for installation instructions), and then find the specific model for your task in the other framework. 
 
-For example, suppose you trained DistilBert for sequence classification in PyTorch and want to convert it to it's TensorFlow equivalent. Load the TensorFlow equivalent of your model for your task, and specify `from_pt=True` so 🤗 Transformers will convert the PyTorch checkpoint to a TensorFlow checkpoint:
-
-```py
->>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
-```
-
-Then save your new TensorFlow model with it's new checkpoint:
-
-```py
->>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
-```
-
-Similarly, specify `from_tf=True` to convert a checkpoint from TensorFlow to PyTorch:
+<frameworkcontent>
+<pt>
+Specify `from_tf=True` to convert a checkpoint from TensorFlow to PyTorch:
 
 ```py
 >>> pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
 >>> pt_model.save_pretrained("path/to/awesome-name-you-picked")
 ```
+</pt>
+<tf>
+Specify `from_pt=True` to convert a checkpoint from PyTorch to TensorFlow:
 
+```py
+>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
+```
+
+Then you can save your new TensorFlow model with it's new checkpoint:
+
+```py
+>>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
+```
+</tf>
+<jax>
 If a model is available in Flax, you can also convert a checkpoint from PyTorch to Flax:
 
 ```py
@@ -101,9 +105,13 @@ If a model is available in Flax, you can also convert a checkpoint from PyTorch
 ...     "path/to/awesome-name-you-picked", from_pt=True
 ... )
 ```
+</jax>
+</frameworkcontent>
 
-## Push a model with `Trainer`
+## Push a model during training
 
+<frameworkcontent>
+<pt>
 <Youtube id="Z1-XMy-GNLQ"/>
 
 Sharing a model to the Hub is as simple as adding an extra parameter or callback. Remember from the [fine-tuning tutorial](training), the [`TrainingArguments`] class is where you specify hyperparameters and additional training options. One of these training options includes the ability to push a model directly to the Hub. Set `push_to_hub=True` in your [`TrainingArguments`]:
@@ -129,10 +137,9 @@ After you fine-tune your model, call [`~transformers.Trainer.push_to_hub`] on [`
 ```py
 >>> trainer.push_to_hub()
 ```
-
-## Push a model with `PushToHubCallback`
-
-TensorFlow users can enable the same functionality with [`PushToHubCallback`]. In the [`PushToHubCallback`] function, add:
+</pt>
+<tf>
+Share a model to the Hub with [`PushToHubCallback`]. In the [`PushToHubCallback`] function, add:
 
 - An output directory for your model.
 - A tokenizer.
@@ -151,6 +158,8 @@ Add the callback to [`fit`](https://keras.io/api/models/model_training_apis/), a
 ```py
 >>> model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3, callbacks=push_to_hub_callback)
 ```
+</tf>
+</frameworkcontent>
 
 ## Use the `push_to_hub` function
 
diff --git a/docs/source/tasks/asr.mdx b/docs/source/tasks/asr.mdx
index ce9db3c9dd..6fe90e5cd7 100644
--- a/docs/source/tasks/asr.mdx
+++ b/docs/source/tasks/asr.mdx
@@ -155,8 +155,10 @@ Create a batch of examples and dynamically pad them with `DataCollatorForCTCWith
 >>> data_collator = DataCollatorCTCWithPadding(processor=processor, padding=True)
 ```
 
-## Fine-tune with Trainer
+## Train
 
+<frameworkcontent>
+<pt>
 Load Wav2Vec2 with [`AutoModelForCTC`]. For `ctc_loss_reduction`, it is often better to use the average instead of the default summation:
 
 ```py
@@ -206,6 +208,8 @@ At this point, only three steps remain:
 
 >>> trainer.train()
 ```
+</pt>
+</frameworkcontent>
 
 <Tip>
 
diff --git a/docs/source/tasks/audio_classification.mdx b/docs/source/tasks/audio_classification.mdx
index 63c3c7bd6b..183bfe4c1d 100644
--- a/docs/source/tasks/audio_classification.mdx
+++ b/docs/source/tasks/audio_classification.mdx
@@ -91,8 +91,10 @@ Use 🤗 Datasets [`map`](https://huggingface.co/docs/datasets/package_reference
 >>> encoded_ks = ks.map(preprocess_function, remove_columns=["audio", "file"], batched=True)
 ```
 
-## Fine-tune with Trainer
+## Train
 
+<frameworkcontent>
+<pt>
 Load Wav2Vec2 with [`AutoModelForAudioClassification`]. Specify the number of labels, and pass the model the mapping between label number and label class:
 
 ```py
@@ -135,6 +137,8 @@ At this point, only three steps remain:
 
 >>> trainer.train()
 ```
+</pt>
+</frameworkcontent>
 
 <Tip>
 
diff --git a/docs/source/tasks/image_classification.mdx b/docs/source/tasks/image_classification.mdx
index ae85493c01..7646feb55c 100644
--- a/docs/source/tasks/image_classification.mdx
+++ b/docs/source/tasks/image_classification.mdx
@@ -109,8 +109,10 @@ Use [`DefaultDataCollator`] to create a batch of examples. Unlike other data col
 >>> data_collator = DefaultDataCollator()
 ```
 
-## Fine-tune with Trainer
+## Train
 
+<frameworkcontent>
+<pt>
 Load ViT with [`AutoModelForImageClassification`]. Specify the number of labels, and pass the model the mapping between label number and label class:
 
 ```py
@@ -162,6 +164,8 @@ At this point, only three steps remain:
 
 >>> trainer.train()
 ```
+</pt>
+</frameworkcontent>
 
 <Tip>
 
diff --git a/docs/source/tasks/language_modeling.mdx b/docs/source/tasks/language_modeling.mdx
index 458b4cb3d3..d79be859ef 100644
--- a/docs/source/tasks/language_modeling.mdx
+++ b/docs/source/tasks/language_modeling.mdx
@@ -200,8 +200,10 @@ For masked language modeling, use the same [`DataCollatorForLanguageModeling`] e
 
 Causal language modeling is frequently used for text generation. This section shows you how to fine-tune [DistilGPT2](https://huggingface.co/distilgpt2) to generate new text.
 
-### Fine-tune with Trainer
+### Train
 
+<frameworkcontent>
+<pt>
 Load DistilGPT2 with [`AutoModelForCausalLM`]:
 
 ```py
@@ -240,18 +242,9 @@ At this point, only three steps remain:
 
 >>> trainer.train()
 ```
-
-### Fine-tune with TensorFlow
-
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-
-<Tip>
-
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-
-</Tip>
-
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
+</pt>
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 
 ```py
 >>> tf_train_set = lm_dataset["train"].to_tf_dataset(
@@ -271,6 +264,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
 
+<Tip>
+
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+
+</Tip>
+
 Set up an optimizer function, learning rate, and some training hyperparameters:
 
 ```py
@@ -300,13 +299,17 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 
 ## Masked language modeling
 
 Masked language modeling is also known as a fill-mask task because it predicts a masked token in a sequence. Models for masked language modeling require a good contextual understanding of an entire sequence instead of only the left context. This section shows you how to fine-tune [DistilRoBERTa](https://huggingface.co/distilroberta-base) to predict a masked word.
 
-### Fine-tune with Trainer
+### Train
 
+<frameworkcontent>
+<pt>
 Load DistilRoBERTa with [`AutoModelForMaskedlM`]:
 
 ```py
@@ -346,18 +349,9 @@ At this point, only three steps remain:
 
 >>> trainer.train()
 ```
-
-### Fine-tune with TensorFlow
-
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-
-<Tip>
-
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-
-</Tip>
-
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
+</pt>
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 
 ```py
 >>> tf_train_set = lm_dataset["train"].to_tf_dataset(
@@ -377,6 +371,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
 
+<Tip>
+
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+
+</Tip>
+
 Set up an optimizer function, learning rate, and some training hyperparameters:
 
 ```py
@@ -406,6 +406,8 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 
 <Tip>
 
diff --git a/docs/source/tasks/multiple_choice.mdx b/docs/source/tasks/multiple_choice.mdx
index 6b2d08be53..2ec7019a15 100644
--- a/docs/source/tasks/multiple_choice.mdx
+++ b/docs/source/tasks/multiple_choice.mdx
@@ -176,8 +176,10 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
 </tf>
 </frameworkcontent>
 
-## Fine-tune with Trainer
+## Train
 
+<frameworkcontent>
+<pt>
 Load BERT with [`AutoModelForMultipleChoice`]:
 
 ```py
@@ -220,18 +222,9 @@ At this point, only three steps remain:
 
 >>> trainer.train()
 ```
-
-## Fine-tune with TensorFlow
-
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-
-<Tip>
-
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-
-</Tip>
-
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs in `columns`, targets in `label_cols`, whether to shuffle the dataset order, batch size, and the data collator:
+</pt>
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs in `columns`, targets in `label_cols`, whether to shuffle the dataset order, batch size, and the data collator:
 
 ```py
 >>> data_collator = DataCollatorForMultipleChoice(tokenizer=tokenizer)
@@ -252,6 +245,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
 
+<Tip>
+
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+
+</Tip>
+
 Set up an optimizer function, learning rate schedule, and some training hyperparameters:
 
 ```py
@@ -284,4 +283,6 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=2)
-```
\ No newline at end of file
+```
+</tf>
+</frameworkcontent>
\ No newline at end of file
diff --git a/docs/source/tasks/question_answering.mdx b/docs/source/tasks/question_answering.mdx
index 1c2160db0e..61f81cb3e1 100644
--- a/docs/source/tasks/question_answering.mdx
+++ b/docs/source/tasks/question_answering.mdx
@@ -151,8 +151,10 @@ Use [`DefaultDataCollator`] to create a batch of examples. Unlike other data col
 </tf>
 </frameworkcontent>
 
-## Fine-tune with Trainer
+## Train
 
+<frameworkcontent>
+<pt>
 Load DistilBERT with [`AutoModelForQuestionAnswering`]:
 
 ```py
@@ -195,18 +197,9 @@ At this point, only three steps remain:
 
 >>> trainer.train()
 ```
-
-## Fine-tune with TensorFlow
-
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-
-<Tip>
-
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-
-</Tip>
-
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and the start and end positions of an answer in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
+</pt>
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and the start and end positions of an answer in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 
 ```py
 >>> tf_train_set = tokenized_squad["train"].to_tf_dataset(
@@ -226,6 +219,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
 
+<Tip>
+
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+
+</Tip>
+
 Set up an optimizer function, learning rate schedule, and some training hyperparameters:
 
 ```py
@@ -262,6 +261,8 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 
 <Tip>
 
diff --git a/docs/source/tasks/sequence_classification.mdx b/docs/source/tasks/sequence_classification.mdx
index 63db0d7f61..0908848b9a 100644
--- a/docs/source/tasks/sequence_classification.mdx
+++ b/docs/source/tasks/sequence_classification.mdx
@@ -91,8 +91,10 @@ Use [`DataCollatorWithPadding`] to create a batch of examples. It will also *dyn
 </tf>
 </frameworkcontent>
 
-## Fine-tune with Trainer
+## Train
 
+<frameworkcontent>
+<pt>
 Load DistilBERT with [`AutoModelForSequenceClassification`] along with the number of expected labels:
 
 ```py
@@ -140,18 +142,9 @@ At this point, only three steps remain:
 [`Trainer`] will apply dynamic padding by default when you pass `tokenizer` to it. In this case, you don't need to specify a data collator explicitly.
 
 </Tip>
-
-## Fine-tune with TensorFlow
-
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-
-<Tip>
-
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-
-</Tip>
-
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
+</pt>
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 
 ```py
 >>> tf_train_set = tokenized_imdb["train"].to_tf_dataset(
@@ -169,6 +162,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
 
+<Tip>
+
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+
+</Tip>
+
 Set up an optimizer function, learning rate schedule, and some training hyperparameters:
 
 ```py
@@ -203,6 +202,8 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 
 <Tip>
 
diff --git a/docs/source/tasks/summarization.mdx b/docs/source/tasks/summarization.mdx
index a5e1bc4e0a..7083cdce4d 100644
--- a/docs/source/tasks/summarization.mdx
+++ b/docs/source/tasks/summarization.mdx
@@ -110,8 +110,10 @@ Use [`DataCollatorForSeq2Seq`] to create a batch of examples. It will also *dyna
 </tf>
 </frameworkcontent>
 
-## Fine-tune with Trainer
+## Train
 
+<frameworkcontent>
+<pt>
 Load T5 with [`AutoModelForSeq2SeqLM`]:
 
 ```py
@@ -156,18 +158,9 @@ At this point, only three steps remain:
 
 >>> trainer.train()
 ```
-
-## Fine-tune with TensorFlow
-
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-
-<Tip>
-
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-
-</Tip>
-
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
+</pt>
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 
 ```py
 >>> tf_train_set = tokenized_billsum["train"].to_tf_dataset(
@@ -185,6 +178,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
 
+<Tip>
+
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+
+</Tip>
+
 Set up an optimizer function, learning rate schedule, and some training hyperparameters:
 
 ```py
@@ -212,6 +211,8 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 
 <Tip>
 
diff --git a/docs/source/tasks/token_classification.mdx b/docs/source/tasks/token_classification.mdx
index 37b316e652..ff26b3af94 100644
--- a/docs/source/tasks/token_classification.mdx
+++ b/docs/source/tasks/token_classification.mdx
@@ -151,8 +151,10 @@ Use [`DataCollatorForTokenClassification`] to create a batch of examples. It wil
 </tf>
 </frameworkcontent>
 
-## Fine-tune with Trainer
+## Train
 
+<frameworkcontent>
+<pt>
 Load DistilBERT with [`AutoModelForTokenClassification`] along with the number of expected labels:
 
 ```py
@@ -195,18 +197,9 @@ At this point, only three steps remain:
 
 >>> trainer.train()
 ```
-
-## Fine-tune with TensorFlow
-
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-
-<Tip>
-
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-
-</Tip>
-
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
+</pt>
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 
 ```py
 >>> tf_train_set = tokenized_wnut["train"].to_tf_dataset(
@@ -224,6 +217,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
 
+<Tip>
+
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+
+</Tip>
+
 Set up an optimizer function, learning rate schedule, and some training hyperparameters:
 
 ```py
@@ -261,6 +260,8 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 
 <Tip>
 
diff --git a/docs/source/tasks/translation.mdx b/docs/source/tasks/translation.mdx
index d4a2eae424..26723241a1 100644
--- a/docs/source/tasks/translation.mdx
+++ b/docs/source/tasks/translation.mdx
@@ -112,8 +112,10 @@ Use [`DataCollatorForSeq2Seq`] to create a batch of examples. It will also *dyna
 </tf>
 </frameworkcontent>
 
-## Fine-tune with Trainer
+## Train
 
+<frameworkcontent>
+<pt>
 Load T5 with [`AutoModelForSeq2SeqLM`]:
 
 ```py
@@ -158,18 +160,9 @@ At this point, only three steps remain:
 
 >>> trainer.train()
 ```
-
-## Fine-tune with TensorFlow
-
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-
-<Tip>
-
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-
-</Tip>
-
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
+</pt>
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 
 ```py
 >>> tf_train_set = tokenized_books["train"].to_tf_dataset(
@@ -187,6 +180,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
 
+<Tip>
+
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+
+</Tip>
+
 Set up an optimizer function, learning rate schedule, and some training hyperparameters:
 
 ```py
@@ -214,6 +213,8 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 
 <Tip>
 
diff --git a/docs/source/training.mdx b/docs/source/training.mdx
index 976fa8d999..c1c4f1e049 100644
--- a/docs/source/training.mdx
+++ b/docs/source/training.mdx
@@ -63,8 +63,10 @@ If you like, you can create a smaller subset of the full dataset to fine-tune on
 
 <a id='trainer'></a>
 
-## Fine-tune with `Trainer`
+## Train
 
+<frameworkcontent>
+<pt>
 <Youtube id="nvBXf7s7vTI"/>
 
 🤗 Transformers provides a [`Trainer`] class optimized for training 🤗 Transformers models, making it easier to start training without manually writing your own training loop. The [`Trainer`] API supports a wide range of training options and features such as logging, gradient accumulation, and mixed precision.
@@ -143,14 +145,13 @@ Then fine-tune your model by calling [`~transformers.Trainer.train`]:
 ```py
 >>> trainer.train()
 ```
-
+</pt>
+<tf>
 <a id='keras'></a>
 
-## Fine-tune with Keras
-
 <Youtube id="rnTGBy2ax1c"/>
 
-🤗 Transformers models also supports training in TensorFlow with the Keras API. You only need to make a few changes before you can fine-tune.
+🤗 Transformers models also supports training in TensorFlow with the Keras API.
 
 ### Convert dataset to TensorFlow format
 
@@ -210,11 +211,15 @@ Then compile and fine-tune your model with [`fit`](https://keras.io/api/models/m
 
 >>> model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 
 <a id='pytorch_native'></a>
 
-## Fine-tune in native PyTorch
+## Train in native PyTorch
 
+<frameworkcontent>
+<pt>
 <Youtube id="Dh9CL8fyG80"/>
 
 [`Trainer`] takes care of the training loop and allows you to fine-tune a model in a single line of code. For users who prefer to write their own training loop, you can also fine-tune a 🤗 Transformers model in native PyTorch.
@@ -354,6 +359,8 @@ Just like how you need to add an evaluation function to [`Trainer`], you need to
 
 >>> metric.compute()
 ```
+</pt>
+</frameworkcontent>
 
 <a id='additional-resources'></a>