Add: TensorFlow example for semantic segmentation task guide (#21223)
* wip: adding tf example for semantic segmentation guide * completed the working example in tf * make style * Update docs/source/en/tasks/semantic_segmentation.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/semantic_segmentation.mdx Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fixed a callback doc links Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
This commit is contained in:
@@ -35,7 +35,7 @@ Before you begin, make sure you have all the necessary libraries installed:
|
|||||||
pip install -q datasets transformers evaluate
|
pip install -q datasets transformers evaluate
|
||||||
```
|
```
|
||||||
|
|
||||||
We encourage you to login to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to login:
|
We encourage you to log in to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to log in:
|
||||||
|
|
||||||
```py
|
```py
|
||||||
>>> from huggingface_hub import notebook_login
|
>>> from huggingface_hub import notebook_login
|
||||||
@@ -95,9 +95,13 @@ The next step is to load a SegFormer image processor to prepare the images and a
|
|||||||
```py
|
```py
|
||||||
>>> from transformers import AutoImageProcessor
|
>>> from transformers import AutoImageProcessor
|
||||||
|
|
||||||
>>> feature_extractor = AutoImageProcessor.from_pretrained("nvidia/mit-b0", reduce_labels=True)
|
>>> checkpoint = "nvidia/mit-b0"
|
||||||
|
>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint, reduce_labels=True)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<frameworkcontent>
|
||||||
|
<pt>
|
||||||
|
|
||||||
It is common to apply some data augmentations to an image dataset to make a model more robust against overfitting. In this guide, you'll use the [`ColorJitter`](https://pytorch.org/vision/stable/generated/torchvision.transforms.ColorJitter.html) function from [torchvision](https://pytorch.org/vision/stable/index.html) to randomly change the color properties of an image, but you can also use any image library you like.
|
It is common to apply some data augmentations to an image dataset to make a model more robust against overfitting. In this guide, you'll use the [`ColorJitter`](https://pytorch.org/vision/stable/generated/torchvision.transforms.ColorJitter.html) function from [torchvision](https://pytorch.org/vision/stable/index.html) to randomly change the color properties of an image, but you can also use any image library you like.
|
||||||
|
|
||||||
```py
|
```py
|
||||||
@@ -112,14 +116,14 @@ Now create two preprocessing functions to prepare the images and annotations for
|
|||||||
>>> def train_transforms(example_batch):
|
>>> def train_transforms(example_batch):
|
||||||
... images = [jitter(x) for x in example_batch["image"]]
|
... images = [jitter(x) for x in example_batch["image"]]
|
||||||
... labels = [x for x in example_batch["annotation"]]
|
... labels = [x for x in example_batch["annotation"]]
|
||||||
... inputs = feature_extractor(images, labels)
|
... inputs = image_processor(images, labels)
|
||||||
... return inputs
|
... return inputs
|
||||||
|
|
||||||
|
|
||||||
>>> def val_transforms(example_batch):
|
>>> def val_transforms(example_batch):
|
||||||
... images = [x for x in example_batch["image"]]
|
... images = [x for x in example_batch["image"]]
|
||||||
... labels = [x for x in example_batch["annotation"]]
|
... labels = [x for x in example_batch["annotation"]]
|
||||||
... inputs = feature_extractor(images, labels)
|
... inputs = image_processor(images, labels)
|
||||||
... return inputs
|
... return inputs
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -130,6 +134,67 @@ To apply the `jitter` over the entire dataset, use the 🤗 Datasets [`~datasets
|
|||||||
>>> test_ds.set_transform(val_transforms)
|
>>> test_ds.set_transform(val_transforms)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
</pt>
|
||||||
|
</frameworkcontent>
|
||||||
|
|
||||||
|
<frameworkcontent>
|
||||||
|
<tf>
|
||||||
|
It is common to apply some data augmentations to an image dataset to make a model more robust against overfitting.
|
||||||
|
In this guide, you'll use [`tf.image`](https://www.tensorflow.org/api_docs/python/tf/image) to randomly change the color properties of an image, but you can also use any image
|
||||||
|
library you like.
|
||||||
|
Define two separate transformation functions:
|
||||||
|
- training data transformations that include image augmentation
|
||||||
|
- validation data transformations that only transpose the images, since computer vision models in 🤗 Transformers expect channels-first layout
|
||||||
|
|
||||||
|
```py
|
||||||
|
>>> import tensorflow as tf
|
||||||
|
|
||||||
|
|
||||||
|
>>> def aug_transforms(image):
|
||||||
|
... image = tf.keras.utils.img_to_array(image)
|
||||||
|
... image = tf.image.random_brightness(image, 0.25)
|
||||||
|
... image = tf.image.random_contrast(image, 0.5, 2.0)
|
||||||
|
... image = tf.image.random_saturation(image, 0.75, 1.25)
|
||||||
|
... image = tf.image.random_hue(image, 0.1)
|
||||||
|
... image = tf.transpose(image, (2, 0, 1))
|
||||||
|
... return image
|
||||||
|
|
||||||
|
|
||||||
|
>>> def transforms(image):
|
||||||
|
... image = tf.keras.utils.img_to_array(image)
|
||||||
|
... image = tf.transpose(image, (2, 0, 1))
|
||||||
|
... return image
|
||||||
|
```
|
||||||
|
|
||||||
|
Next, create two preprocessing functions to prepare batches of images and annotations for the model. These functions apply
|
||||||
|
the image transformations and use the earlier loaded `image_processor` to convert the images into `pixel_values` and
|
||||||
|
annotations to `labels`. `ImageProcessor` also takes care of resizing and normalizing the images.
|
||||||
|
|
||||||
|
```py
|
||||||
|
>>> def train_transforms(example_batch):
|
||||||
|
... images = [aug_transforms(x.convert("RGB")) for x in example_batch["image"]]
|
||||||
|
... labels = [x for x in example_batch["annotation"]]
|
||||||
|
... inputs = image_processor(images, labels)
|
||||||
|
... return inputs
|
||||||
|
|
||||||
|
|
||||||
|
>>> def val_transforms(example_batch):
|
||||||
|
... images = [transforms(x.convert("RGB")) for x in example_batch["image"]]
|
||||||
|
... labels = [x for x in example_batch["annotation"]]
|
||||||
|
... inputs = image_processor(images, labels)
|
||||||
|
... return inputs
|
||||||
|
```
|
||||||
|
|
||||||
|
To apply the preprocessing transformations over the entire dataset, use the 🤗 Datasets [`~datasets.Dataset.set_transform`] function.
|
||||||
|
The transform is applied on the fly which is faster and consumes less disk space:
|
||||||
|
|
||||||
|
```py
|
||||||
|
>>> train_ds.set_transform(train_transforms)
|
||||||
|
>>> test_ds.set_transform(val_transforms)
|
||||||
|
```
|
||||||
|
</tf>
|
||||||
|
</frameworkcontent>
|
||||||
|
|
||||||
## Evaluate
|
## Evaluate
|
||||||
|
|
||||||
Including a metric during training is often helpful for evaluating your model's performance. You can quickly load a evaluation method with the 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load the [mean Intersection over Union](https://huggingface.co/spaces/evaluate-metric/accuracy) (IoU) metric (see the 🤗 Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric):
|
Including a metric during training is often helpful for evaluating your model's performance. You can quickly load a evaluation method with the 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load the [mean Intersection over Union](https://huggingface.co/spaces/evaluate-metric/accuracy) (IoU) metric (see the 🤗 Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric):
|
||||||
@@ -140,7 +205,11 @@ Including a metric during training is often helpful for evaluating your model's
|
|||||||
>>> metric = evaluate.load("mean_iou")
|
>>> metric = evaluate.load("mean_iou")
|
||||||
```
|
```
|
||||||
|
|
||||||
Then create a function to [`~evaluate.EvaluationModule.compute`] the metrics. Your predictions need to be converted to logits first, and then reshaped to match the size of the labels before you can call [`~evaluate.EvaluationModule.compute`]:
|
Then create a function to [`~evaluate.EvaluationModule.compute`] the metrics. Your predictions need to be converted to
|
||||||
|
logits first, and then reshaped to match the size of the labels before you can call [`~evaluate.EvaluationModule.compute`]:
|
||||||
|
|
||||||
|
<frameworkcontent>
|
||||||
|
<pt>
|
||||||
|
|
||||||
```py
|
```py
|
||||||
>>> def compute_metrics(eval_pred):
|
>>> def compute_metrics(eval_pred):
|
||||||
@@ -168,10 +237,48 @@ Then create a function to [`~evaluate.EvaluationModule.compute`] the metrics. Yo
|
|||||||
... return metrics
|
... return metrics
|
||||||
```
|
```
|
||||||
|
|
||||||
|
</pt>
|
||||||
|
</frameworkcontent>
|
||||||
|
|
||||||
|
|
||||||
|
<frameworkcontent>
|
||||||
|
<tf>
|
||||||
|
|
||||||
|
```py
|
||||||
|
>>> def compute_metrics(eval_pred):
|
||||||
|
... logits, labels = eval_pred
|
||||||
|
... logits = tf.transpose(logits, perm=[0, 2, 3, 1])
|
||||||
|
... logits_resized = tf.image.resize(
|
||||||
|
... logits,
|
||||||
|
... size=tf.shape(labels)[1:],
|
||||||
|
... method="bilinear",
|
||||||
|
... )
|
||||||
|
|
||||||
|
... pred_labels = tf.argmax(logits_resized, axis=-1)
|
||||||
|
... metrics = metric.compute(
|
||||||
|
... predictions=pred_labels,
|
||||||
|
... references=labels,
|
||||||
|
... num_labels=num_labels,
|
||||||
|
... ignore_index=-1,
|
||||||
|
... reduce_labels=image_processor.do_reduce_labels,
|
||||||
|
... )
|
||||||
|
|
||||||
|
... per_category_accuracy = metrics.pop("per_category_accuracy").tolist()
|
||||||
|
... per_category_iou = metrics.pop("per_category_iou").tolist()
|
||||||
|
|
||||||
|
... metrics.update({f"accuracy_{id2label[i]}": v for i, v in enumerate(per_category_accuracy)})
|
||||||
|
... metrics.update({f"iou_{id2label[i]}": v for i, v in enumerate(per_category_iou)})
|
||||||
|
... return {"val_" + k: v for k, v in metrics.items()}
|
||||||
|
```
|
||||||
|
|
||||||
|
</tf>
|
||||||
|
</frameworkcontent>
|
||||||
|
|
||||||
Your `compute_metrics` function is ready to go now, and you'll return to it when you setup your training.
|
Your `compute_metrics` function is ready to go now, and you'll return to it when you setup your training.
|
||||||
|
|
||||||
## Train
|
## Train
|
||||||
|
<frameworkcontent>
|
||||||
|
<pt>
|
||||||
<Tip>
|
<Tip>
|
||||||
|
|
||||||
If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
|
If you aren't familiar with finetuning a model with the [`Trainer`], take a look at the basic tutorial [here](../training#finetune-with-trainer)!
|
||||||
@@ -183,10 +290,7 @@ You're ready to start training your model now! Load SegFormer with [`AutoModelFo
|
|||||||
```py
|
```py
|
||||||
>>> from transformers import AutoModelForSemanticSegmentation, TrainingArguments, Trainer
|
>>> from transformers import AutoModelForSemanticSegmentation, TrainingArguments, Trainer
|
||||||
|
|
||||||
>>> pretrained_model_name = "nvidia/mit-b0"
|
>>> model = AutoModelForSemanticSegmentation.from_pretrained(checkpoint, id2label=id2label, label2id=label2id)
|
||||||
>>> model = AutoModelForSemanticSegmentation.from_pretrained(
|
|
||||||
... pretrained_model_name, id2label=id2label, label2id=label2id
|
|
||||||
... )
|
|
||||||
```
|
```
|
||||||
|
|
||||||
At this point, only three steps remain:
|
At this point, only three steps remain:
|
||||||
@@ -229,6 +333,112 @@ Once training is completed, share your model to the Hub with the [`~transformers
|
|||||||
```py
|
```py
|
||||||
>>> trainer.push_to_hub()
|
>>> trainer.push_to_hub()
|
||||||
```
|
```
|
||||||
|
</pt>
|
||||||
|
</frameworkcontent>
|
||||||
|
|
||||||
|
<frameworkcontent>
|
||||||
|
<tf>
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
If you are unfamiliar with fine-tuning a model with Keras, check out the [basic tutorial](./training#train-a-tensorflow-model-with-keras) first!
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
To fine-tune a model in TensorFlow, follow these steps:
|
||||||
|
1. Define the training hyperparameters, and set up an optimizer and a learning rate schedule.
|
||||||
|
2. Instantiate a pretrained model.
|
||||||
|
3. Convert a 🤗 Dataset to a `tf.data.Dataset`.
|
||||||
|
4. Compile your model.
|
||||||
|
5. Add callbacks to calculate metrics and upload your model to 🤗 Hub
|
||||||
|
6. Use the `fit()` method to run the training.
|
||||||
|
|
||||||
|
Start by defining the hyperparameters, optimizer and learning rate schedule:
|
||||||
|
|
||||||
|
```py
|
||||||
|
>>> from transformers import create_optimizer
|
||||||
|
|
||||||
|
>>> batch_size = 2
|
||||||
|
>>> num_epochs = 50
|
||||||
|
>>> num_train_steps = len(train_ds) * num_epochs
|
||||||
|
>>> learning_rate = 6e-5
|
||||||
|
>>> weight_decay_rate = 0.01
|
||||||
|
|
||||||
|
>>> optimizer, lr_schedule = create_optimizer(
|
||||||
|
... init_lr=learning_rate,
|
||||||
|
... num_train_steps=num_train_steps,
|
||||||
|
... weight_decay_rate=weight_decay_rate,
|
||||||
|
... num_warmup_steps=0,
|
||||||
|
... )
|
||||||
|
```
|
||||||
|
|
||||||
|
Then, load SegFormer with [`TFAutoModelForSemanticSegmentation`] along with the label mappings, and compile it with the
|
||||||
|
optimizer:
|
||||||
|
|
||||||
|
```py
|
||||||
|
>>> from transformers import TFAutoModelForSemanticSegmentation
|
||||||
|
|
||||||
|
>>> model = TFAutoModelForSemanticSegmentation.from_pretrained(
|
||||||
|
... checkpoint,
|
||||||
|
... id2label=id2label,
|
||||||
|
... label2id=label2id,
|
||||||
|
... )
|
||||||
|
>>> model.compile(optimizer=optimizer)
|
||||||
|
```
|
||||||
|
|
||||||
|
Convert your datasets to the `tf.data.Dataset` format using the [`~datasets.Dataset.to_tf_dataset`] and the [`DefaultDataCollator`]:
|
||||||
|
|
||||||
|
```py
|
||||||
|
>>> from transformers import DefaultDataCollator
|
||||||
|
|
||||||
|
>>> data_collator = DefaultDataCollator(return_tensors="tf")
|
||||||
|
|
||||||
|
>>> tf_train_dataset = train_ds.to_tf_dataset(
|
||||||
|
... columns=["pixel_values", "label"],
|
||||||
|
... shuffle=True,
|
||||||
|
... batch_size=batch_size,
|
||||||
|
... collate_fn=data_collator,
|
||||||
|
... )
|
||||||
|
|
||||||
|
>>> tf_eval_dataset = test_ds.to_tf_dataset(
|
||||||
|
... columns=["pixel_values", "label"],
|
||||||
|
... shuffle=True,
|
||||||
|
... batch_size=batch_size,
|
||||||
|
... collate_fn=data_collator,
|
||||||
|
... )
|
||||||
|
```
|
||||||
|
|
||||||
|
To compute the accuracy from the predictions and push your model to the 🤗 Hub, use [Keras callbacks](./main_classes/keras_callbacks).
|
||||||
|
Pass your `compute_metrics` function to [`KerasMetricCallback`],
|
||||||
|
and use the [`PushToHubCallback`] to upload the model:
|
||||||
|
|
||||||
|
```py
|
||||||
|
>>> from transformers.keras_callbacks import KerasMetricCallback, PushToHubCallback
|
||||||
|
|
||||||
|
>>> metric_callback = KerasMetricCallback(
|
||||||
|
... metric_fn=compute_metrics, eval_dataset=tf_eval_dataset, batch_size=batch_size, label_cols=["labels"]
|
||||||
|
... )
|
||||||
|
|
||||||
|
>>> push_to_hub_callback = PushToHubCallback(output_dir="scene_segmentation", tokenizer=image_processor)
|
||||||
|
|
||||||
|
>>> callbacks = [metric_callback, push_to_hub_callback]
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, you are ready to train your model! Call `fit()` with your training and validation datasets, the number of epochs,
|
||||||
|
and your callbacks to fine-tune the model:
|
||||||
|
|
||||||
|
```py
|
||||||
|
>>> model.fit(
|
||||||
|
... tf_train_dataset,
|
||||||
|
... validation_data=tf_eval_dataset,
|
||||||
|
... callbacks=callbacks,
|
||||||
|
... epochs=num_epochs,
|
||||||
|
... )
|
||||||
|
```
|
||||||
|
|
||||||
|
Congratulations! You have fine-tuned your model and shared it on the 🤗 Hub. You can now use it for inference!
|
||||||
|
</tf>
|
||||||
|
</frameworkcontent>
|
||||||
|
|
||||||
|
|
||||||
## Inference
|
## Inference
|
||||||
|
|
||||||
@@ -245,6 +455,8 @@ Load an image for inference:
|
|||||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/semantic-seg-image.png" alt="Image of bedroom"/>
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/semantic-seg-image.png" alt="Image of bedroom"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
<frameworkcontent>
|
||||||
|
<pt>
|
||||||
The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for image segmentation with your model, and pass your image to it:
|
The simplest way to try out your finetuned model for inference is to use it in a [`pipeline`]. Instantiate a `pipeline` for image segmentation with your model, and pass your image to it:
|
||||||
|
|
||||||
```py
|
```py
|
||||||
@@ -285,7 +497,7 @@ You can also manually replicate the results of the `pipeline` if you'd like. Pro
|
|||||||
|
|
||||||
```py
|
```py
|
||||||
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # use GPU if available, otherwise use a CPU
|
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # use GPU if available, otherwise use a CPU
|
||||||
>>> encoding = feature_extractor(image, return_tensors="pt")
|
>>> encoding = image_processor(image, return_tensors="pt")
|
||||||
>>> pixel_values = encoding.pixel_values.to(device)
|
>>> pixel_values = encoding.pixel_values.to(device)
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -309,10 +521,50 @@ Next, rescale the logits to the original image size:
|
|||||||
>>> pred_seg = upsampled_logits.argmax(dim=1)[0]
|
>>> pred_seg = upsampled_logits.argmax(dim=1)[0]
|
||||||
```
|
```
|
||||||
|
|
||||||
To visualize the results, load the [dataset color palette](https://github.com/tensorflow/models/blob/3f1ca33afe3c1631b733ea7e40c294273b9e406d/research/deeplab/utils/get_dataset_colormap.py#L51) that maps each class to their RGB values. Then you can combine and plot your image and the predicted segmentation map:
|
</pt>
|
||||||
|
</frameworkcontent>
|
||||||
|
|
||||||
|
<frameworkcontent>
|
||||||
|
<tf>
|
||||||
|
Load an image processor to preprocess the image and return the input as TensorFlow tensors:
|
||||||
|
|
||||||
|
```py
|
||||||
|
>>> from transformers import AutoImageProcessor
|
||||||
|
|
||||||
|
>>> image_processor = AutoImageProcessor.from_pretrained("MariaK/scene_segmentation")
|
||||||
|
>>> inputs = image_processor(image, return_tensors="tf")
|
||||||
|
```
|
||||||
|
|
||||||
|
Pass your input to the model and return the `logits`:
|
||||||
|
|
||||||
|
```py
|
||||||
|
>>> from transformers import TFAutoModelForSemanticSegmentation
|
||||||
|
|
||||||
|
>>> model = TFAutoModelForSemanticSegmentation.from_pretrained("MariaK/scene_segmentation")
|
||||||
|
>>> logits = model(**inputs).logits
|
||||||
|
```
|
||||||
|
|
||||||
|
Next, rescale the logits to the original image size and apply argmax on the class dimension:
|
||||||
|
```py
|
||||||
|
>>> logits = tf.transpose(logits, [0, 2, 3, 1])
|
||||||
|
|
||||||
|
>>> upsampled_logits = tf.image.resize(
|
||||||
|
... logits,
|
||||||
|
... # We reverse the shape of `image` because `image.size` returns width and height.
|
||||||
|
... image.size[::-1],
|
||||||
|
... )
|
||||||
|
|
||||||
|
>>> pred_seg = tf.math.argmax(upsampled_logits, axis=-1)[0]
|
||||||
|
```
|
||||||
|
|
||||||
|
</tf>
|
||||||
|
</frameworkcontent>
|
||||||
|
|
||||||
|
To visualize the results, load the [dataset color palette](https://github.com/tensorflow/models/blob/3f1ca33afe3c1631b733ea7e40c294273b9e406d/research/deeplab/utils/get_dataset_colormap.py#L51) as `ade_palette()` that maps each class to their RGB values. Then you can combine and plot your image and the predicted segmentation map:
|
||||||
|
|
||||||
```py
|
```py
|
||||||
>>> import matplotlib.pyplot as plt
|
>>> import matplotlib.pyplot as plt
|
||||||
|
>>> import numpy as np
|
||||||
|
|
||||||
>>> color_seg = np.zeros((pred_seg.shape[0], pred_seg.shape[1], 3), dtype=np.uint8)
|
>>> color_seg = np.zeros((pred_seg.shape[0], pred_seg.shape[1], 3), dtype=np.uint8)
|
||||||
>>> palette = np.array(ade_palette())
|
>>> palette = np.array(ade_palette())
|
||||||
|
|||||||
Reference in New Issue
Block a user