Doc to dataset (#18037)
* Link to the Datasets doc * Remove unwanted file
This commit is contained in:
@@ -85,7 +85,7 @@ The preprocessing function needs to:
|
||||
... return model_inputs
|
||||
```
|
||||
|
||||
Use 🤗 Datasets [`map`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.map) function to apply the preprocessing function over the entire dataset. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once:
|
||||
Use 🤗 Datasets [`~datasets.Dataset.map`] function to apply the preprocessing function over the entire dataset. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once:
|
||||
|
||||
```py
|
||||
>>> tokenized_billsum = billsum.map(preprocess_function, batched=True)
|
||||
@@ -160,7 +160,7 @@ At this point, only three steps remain:
|
||||
```
|
||||
</pt>
|
||||
<tf>
|
||||
To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
|
||||
To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`~datasets.Dataset.to_tf_dataset`]. Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
|
||||
|
||||
```py
|
||||
>>> tf_train_set = tokenized_billsum["train"].to_tf_dataset(
|
||||
|
||||
Reference in New Issue
Block a user