Fix MaskformerFeatureExtractor (#20100)
* Fix bug * Add another fix * Add print statement * Apply fix * Fix feature extractor * Fix feature extractor * Add print statements * Add print statements * Remove print statements * Add instance segmentation integration test * Add integration test for semantic segmentation * Add draft for panoptic segmentation integration test * Fix integration test for panoptic segmentation * Remove slow annotator Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
This commit is contained in:
@@ -225,7 +225,8 @@ class MaskFormerFeatureExtractor(FeatureExtractionMixin, ImageFeatureExtractionM
|
|||||||
ImageNet std.
|
ImageNet std.
|
||||||
ignore_index (`int`, *optional*):
|
ignore_index (`int`, *optional*):
|
||||||
Label to be assigned to background pixels in segmentation maps. If provided, segmentation map pixels
|
Label to be assigned to background pixels in segmentation maps. If provided, segmentation map pixels
|
||||||
denoted with 0 (background) will be replaced with `ignore_index`.
|
denoted with 0 (background) will be replaced with `ignore_index`. The ignore index of the loss function of
|
||||||
|
the model should then correspond to this ignore index.
|
||||||
reduce_labels (`bool`, *optional*, defaults to `False`):
|
reduce_labels (`bool`, *optional*, defaults to `False`):
|
||||||
Whether or not to decrement all label values of segmentation maps by 1. Usually used for datasets where 0
|
Whether or not to decrement all label values of segmentation maps by 1. Usually used for datasets where 0
|
||||||
is used for background, and background itself is not included in all classes of a dataset (e.g. ADE20k).
|
is used for background, and background itself is not included in all classes of a dataset (e.g. ADE20k).
|
||||||
@@ -327,12 +328,24 @@ class MaskFormerFeatureExtractor(FeatureExtractionMixin, ImageFeatureExtractionM
|
|||||||
padded up to the largest image in a batch, and a pixel mask is created that indicates which pixels are
|
padded up to the largest image in a batch, and a pixel mask is created that indicates which pixels are
|
||||||
real/which are padding.
|
real/which are padding.
|
||||||
|
|
||||||
MaskFormer addresses semantic segmentation with a mask classification paradigm, thus input segmentation maps
|
Segmentation maps can be instance, semantic or panoptic segmentation maps. In case of instance and panoptic
|
||||||
will be converted to lists of binary masks and their respective labels. Let's see an example, assuming
|
segmentation, one needs to provide `instance_id_to_semantic_id`, which is a mapping from instance/segment ids
|
||||||
`segmentation_maps = [[2,6,7,9]]`, the output will contain `mask_labels =
|
to semantic category ids.
|
||||||
|
|
||||||
|
MaskFormer addresses all 3 forms of segmentation (instance, semantic and panoptic) in the same way, namely by
|
||||||
|
converting the segmentation maps to a set of binary masks with corresponding classes.
|
||||||
|
|
||||||
|
In case of instance segmentation, the segmentation maps contain the instance ids, and
|
||||||
|
`instance_id_to_semantic_id` maps instance IDs to their corresponding semantic category.
|
||||||
|
|
||||||
|
In case of semantic segmentation, the segmentation maps contain the semantic category ids. Let's see an
|
||||||
|
example, assuming `segmentation_maps = [[2,6,7,9]]`, the output will contain `mask_labels =
|
||||||
[[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]` (four binary masks) and `class_labels = [2,6,7,9]`, the labels for
|
[[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]` (four binary masks) and `class_labels = [2,6,7,9]`, the labels for
|
||||||
each mask.
|
each mask.
|
||||||
|
|
||||||
|
In case of panoptic segmentation, the segmentation maps contain the segment ids, and
|
||||||
|
`instance_id_to_semantic_id` maps segment IDs to their corresponding semantic category.
|
||||||
|
|
||||||
<Tip warning={true}>
|
<Tip warning={true}>
|
||||||
|
|
||||||
NumPy arrays and PyTorch tensors are converted to PIL images when resizing, so the most efficient is to pass
|
NumPy arrays and PyTorch tensors are converted to PIL images when resizing, so the most efficient is to pass
|
||||||
@@ -347,9 +360,9 @@ class MaskFormerFeatureExtractor(FeatureExtractionMixin, ImageFeatureExtractionM
|
|||||||
number of channels, H and W are image height and width.
|
number of channels, H and W are image height and width.
|
||||||
|
|
||||||
segmentation_maps (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, `List[PIL.Image.Image]`, `List[np.ndarray]`, `List[torch.Tensor]`, *optional*):
|
segmentation_maps (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, `List[PIL.Image.Image]`, `List[np.ndarray]`, `List[torch.Tensor]`, *optional*):
|
||||||
The corresponding semantic segmentation maps with the pixel-wise class id annotations or instance
|
The corresponding segmentation maps with the pixel-wise instance id, semantic id or segment id
|
||||||
segmentation maps with pixel-wise instance id annotations. Assumed to be semantic segmentation maps if
|
annotations. Assumed to be semantic segmentation maps if no `instance_id_to_semantic_id map` is
|
||||||
no `instance_id_to_semantic_id map` is provided.
|
provided.
|
||||||
|
|
||||||
pad_and_return_pixel_mask (`bool`, *optional*, defaults to `True`):
|
pad_and_return_pixel_mask (`bool`, *optional*, defaults to `True`):
|
||||||
Whether or not to pad images up to the largest image in a batch and create a pixel mask.
|
Whether or not to pad images up to the largest image in a batch and create a pixel mask.
|
||||||
@@ -360,10 +373,11 @@ class MaskFormerFeatureExtractor(FeatureExtractionMixin, ImageFeatureExtractionM
|
|||||||
- 0 for pixels that are padding (i.e. **masked**).
|
- 0 for pixels that are padding (i.e. **masked**).
|
||||||
|
|
||||||
instance_id_to_semantic_id (`List[Dict[int, int]]` or `Dict[int, int]`, *optional*):
|
instance_id_to_semantic_id (`List[Dict[int, int]]` or `Dict[int, int]`, *optional*):
|
||||||
A mapping between object instance ids and class ids. If passed, `segmentation_maps` is treated as an
|
A mapping between instance/segment ids and semantic category ids. If passed, `segmentation_maps` is
|
||||||
instance segmentation map where each pixel represents an instance id. Can be provided as a single
|
treated as an instance or panoptic segmentation map where each pixel represents an instance or segment
|
||||||
dictionary with a global / dataset-level mapping or as a list of dictionaries (one per image), to map
|
id. Can be provided as a single dictionary with a global / dataset-level mapping or as a list of
|
||||||
instance ids in each image separately.
|
dictionaries (one per image), to map instance ids in each image separately. Note that this assumes a
|
||||||
|
mapping before reduction of labels.
|
||||||
|
|
||||||
return_tensors (`str` or [`~file_utils.TensorType`], *optional*):
|
return_tensors (`str` or [`~file_utils.TensorType`], *optional*):
|
||||||
If set, will return tensors instead of NumPy arrays. If set to `'pt'`, return PyTorch `torch.Tensor`
|
If set, will return tensors instead of NumPy arrays. If set to `'pt'`, return PyTorch `torch.Tensor`
|
||||||
@@ -478,57 +492,81 @@ class MaskFormerFeatureExtractor(FeatureExtractionMixin, ImageFeatureExtractionM
|
|||||||
segmentation_map: "np.ndarray",
|
segmentation_map: "np.ndarray",
|
||||||
instance_id_to_semantic_id: Optional[Dict[int, int]] = None,
|
instance_id_to_semantic_id: Optional[Dict[int, int]] = None,
|
||||||
):
|
):
|
||||||
# Get unique ids (class or instance ids based on input)
|
# Reduce labels, if requested
|
||||||
|
if self.reduce_labels:
|
||||||
|
if self.ignore_index is None:
|
||||||
|
raise ValueError("`ignore_index` must be set when `reduce_labels` is `True`.")
|
||||||
|
segmentation_map[segmentation_map == 0] = self.ignore_index
|
||||||
|
segmentation_map -= 1
|
||||||
|
segmentation_map[segmentation_map == self.ignore_index - 1] = self.ignore_index
|
||||||
|
|
||||||
|
# Get unique ids (instance, class ids or segment ids based on input)
|
||||||
all_labels = np.unique(segmentation_map)
|
all_labels = np.unique(segmentation_map)
|
||||||
|
|
||||||
# Drop background label if applicable
|
# Remove ignored label
|
||||||
if self.reduce_labels:
|
if self.ignore_index is not None:
|
||||||
all_labels = all_labels[all_labels != 0]
|
all_labels = all_labels[all_labels != self.ignore_index]
|
||||||
|
|
||||||
# Generate a binary mask for each object instance
|
# Generate a binary mask for each object instance
|
||||||
binary_masks = [np.ma.masked_where(segmentation_map == i, segmentation_map) for i in all_labels]
|
binary_masks = [(segmentation_map == i) for i in all_labels]
|
||||||
binary_masks = np.stack(binary_masks, axis=0) # (num_labels, height, width)
|
binary_masks = np.stack(binary_masks, axis=0) # (num_labels, height, width)
|
||||||
|
|
||||||
# Convert instance ids to class ids
|
# Convert instance/segment ids to class ids
|
||||||
if instance_id_to_semantic_id is not None:
|
if instance_id_to_semantic_id is not None:
|
||||||
labels = np.zeros(all_labels.shape[0])
|
labels = np.zeros(all_labels.shape[0])
|
||||||
|
|
||||||
for label in all_labels:
|
for label in all_labels:
|
||||||
class_id = instance_id_to_semantic_id[label]
|
class_id = instance_id_to_semantic_id[label + 1 if self.reduce_labels else label]
|
||||||
labels[all_labels == label] = class_id
|
labels[all_labels == label] = class_id - 1 if self.reduce_labels else class_id
|
||||||
else:
|
else:
|
||||||
labels = all_labels
|
labels = all_labels
|
||||||
|
|
||||||
# Decrement labels by 1
|
|
||||||
if self.reduce_labels:
|
|
||||||
labels -= 1
|
|
||||||
|
|
||||||
return binary_masks.astype(np.float32), labels.astype(np.int64)
|
return binary_masks.astype(np.float32), labels.astype(np.int64)
|
||||||
|
|
||||||
def encode_inputs(
|
def encode_inputs(
|
||||||
self,
|
self,
|
||||||
pixel_values_list: List["np.ndarray"],
|
pixel_values_list: Union[List["np.ndarray"], List["torch.Tensor"]],
|
||||||
segmentation_maps: ImageInput = None,
|
segmentation_maps: ImageInput = None,
|
||||||
pad_and_return_pixel_mask: bool = True,
|
pad_and_return_pixel_mask: bool = True,
|
||||||
instance_id_to_semantic_id: Optional[Union[List[Dict[int, int]], Dict[int, int]]] = None,
|
instance_id_to_semantic_id: Optional[Union[List[Dict[int, int]], Dict[int, int]]] = None,
|
||||||
return_tensors: Optional[Union[str, TensorType]] = None,
|
return_tensors: Optional[Union[str, TensorType]] = None,
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
Pad images up to the largest image in a batch and create a corresponding `pixel_mask`.
|
Encode a list of pixel values and an optional list of corresponding segmentation maps.
|
||||||
|
|
||||||
MaskFormer addresses semantic segmentation with a mask classification paradigm, thus input segmentation maps
|
This method is useful if you have resized and normalized your images and segmentation maps yourself, using a
|
||||||
will be converted to lists of binary masks and their respective labels. Let's see an example, assuming
|
library like [torchvision](https://pytorch.org/vision/stable/transforms.html) or
|
||||||
`segmentation_maps = [[2,6,7,9]]`, the output will contain `mask_labels =
|
[albumentations](https://albumentations.ai/).
|
||||||
|
|
||||||
|
Images are padded up to the largest image in a batch, and a corresponding `pixel_mask` is created.
|
||||||
|
|
||||||
|
Segmentation maps can be instance, semantic or panoptic segmentation maps. In case of instance and panoptic
|
||||||
|
segmentation, one needs to provide `instance_id_to_semantic_id`, which is a mapping from instance/segment ids
|
||||||
|
to semantic category ids.
|
||||||
|
|
||||||
|
MaskFormer addresses all 3 forms of segmentation (instance, semantic and panoptic) in the same way, namely by
|
||||||
|
converting the segmentation maps to a set of binary masks with corresponding classes.
|
||||||
|
|
||||||
|
In case of instance segmentation, the segmentation maps contain the instance ids, and
|
||||||
|
`instance_id_to_semantic_id` maps instance IDs to their corresponding semantic category.
|
||||||
|
|
||||||
|
In case of semantic segmentation, the segmentation maps contain the semantic category ids. Let's see an
|
||||||
|
example, assuming `segmentation_maps = [[2,6,7,9]]`, the output will contain `mask_labels =
|
||||||
[[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]` (four binary masks) and `class_labels = [2,6,7,9]`, the labels for
|
[[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]` (four binary masks) and `class_labels = [2,6,7,9]`, the labels for
|
||||||
each mask.
|
each mask.
|
||||||
|
|
||||||
|
In case of panoptic segmentation, the segmentation maps contain the segment ids, and
|
||||||
|
`instance_id_to_semantic_id` maps segment IDs to their corresponding semantic category.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
pixel_values_list (`List[torch.Tensor]`):
|
pixel_values_list (`List[np.ndarray]` or `List[torch.Tensor]`):
|
||||||
List of images (pixel values) to be padded. Each image should be a tensor of shape `(channels, height,
|
List of images (pixel values) to be padded. Each image should be a tensor of shape `(channels, height,
|
||||||
width)`.
|
width)`.
|
||||||
|
|
||||||
segmentation_maps (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, `List[PIL.Image.Image]`, `List[np.ndarray]`, `List[torch.Tensor]`, *optional*):
|
segmentation_maps (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, `List[PIL.Image.Image]`, `List[np.ndarray]`, `List[torch.Tensor]`, *optional*):
|
||||||
The corresponding semantic segmentation maps with the pixel-wise annotations.
|
The corresponding segmentation maps with the pixel-wise instance id, semantic id or segment id
|
||||||
|
annotations. Assumed to be semantic segmentation maps if no `instance_id_to_semantic_id map` is
|
||||||
|
provided.
|
||||||
|
|
||||||
pad_and_return_pixel_mask (`bool`, *optional*, defaults to `True`):
|
pad_and_return_pixel_mask (`bool`, *optional*, defaults to `True`):
|
||||||
Whether or not to pad images up to the largest image in a batch and create a pixel mask.
|
Whether or not to pad images up to the largest image in a batch and create a pixel mask.
|
||||||
@@ -539,10 +577,11 @@ class MaskFormerFeatureExtractor(FeatureExtractionMixin, ImageFeatureExtractionM
|
|||||||
- 0 for pixels that are padding (i.e. **masked**).
|
- 0 for pixels that are padding (i.e. **masked**).
|
||||||
|
|
||||||
instance_id_to_semantic_id (`List[Dict[int, int]]` or `Dict[int, int]`, *optional*):
|
instance_id_to_semantic_id (`List[Dict[int, int]]` or `Dict[int, int]`, *optional*):
|
||||||
A mapping between object instance ids and class ids. If passed, `segmentation_maps` is treated as an
|
A mapping between instance/segment ids and semantic category ids. If passed, `segmentation_maps` is
|
||||||
instance segmentation map where each pixel represents an instance id. Can be provided as a single
|
treated as an instance or panoptic segmentation map where each pixel represents an instance or segment
|
||||||
dictionary with a global/dataset-level mapping or as a list of dictionaries (one per image), to map
|
id. Can be provided as a single dictionary with a global / dataset-level mapping or as a list of
|
||||||
instance ids in each image separately.
|
dictionaries (one per image), to map instance ids in each image separately. Note that this assumes a
|
||||||
|
mapping before reduction of labels.
|
||||||
|
|
||||||
return_tensors (`str` or [`~file_utils.TensorType`], *optional*):
|
return_tensors (`str` or [`~file_utils.TensorType`], *optional*):
|
||||||
If set, will return tensors instead of NumPy arrays. If set to `'pt'`, return PyTorch `torch.Tensor`
|
If set, will return tensors instead of NumPy arrays. If set to `'pt'`, return PyTorch `torch.Tensor`
|
||||||
|
|||||||
@@ -17,7 +17,9 @@
|
|||||||
import unittest
|
import unittest
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
from datasets import load_dataset
|
||||||
|
|
||||||
|
from huggingface_hub import hf_hub_download
|
||||||
from transformers.testing_utils import require_torch, require_vision
|
from transformers.testing_utils import require_torch, require_vision
|
||||||
from transformers.utils import is_torch_available, is_vision_available
|
from transformers.utils import is_torch_available, is_vision_available
|
||||||
|
|
||||||
@@ -345,6 +347,173 @@ class MaskFormerFeatureExtractionTest(FeatureExtractionSavingTestMixin, unittest
|
|||||||
common(is_instance_map=False, segmentation_type="pil")
|
common(is_instance_map=False, segmentation_type="pil")
|
||||||
common(is_instance_map=True, segmentation_type="pil")
|
common(is_instance_map=True, segmentation_type="pil")
|
||||||
|
|
||||||
|
def test_integration_instance_segmentation(self):
|
||||||
|
# load 2 images and corresponding annotations from the hub
|
||||||
|
repo_id = "nielsr/image-segmentation-toy-data"
|
||||||
|
image1 = Image.open(
|
||||||
|
hf_hub_download(repo_id=repo_id, filename="instance_segmentation_image_1.png", repo_type="dataset")
|
||||||
|
)
|
||||||
|
image2 = Image.open(
|
||||||
|
hf_hub_download(repo_id=repo_id, filename="instance_segmentation_image_2.png", repo_type="dataset")
|
||||||
|
)
|
||||||
|
annotation1 = Image.open(
|
||||||
|
hf_hub_download(repo_id=repo_id, filename="instance_segmentation_annotation_1.png", repo_type="dataset")
|
||||||
|
)
|
||||||
|
annotation2 = Image.open(
|
||||||
|
hf_hub_download(repo_id=repo_id, filename="instance_segmentation_annotation_2.png", repo_type="dataset")
|
||||||
|
)
|
||||||
|
|
||||||
|
# get instance segmentations and instance-to-segmentation mappings
|
||||||
|
def get_instance_segmentation_and_mapping(annotation):
|
||||||
|
instance_seg = np.array(annotation)[:, :, 1]
|
||||||
|
class_id_map = np.array(annotation)[:, :, 0]
|
||||||
|
class_labels = np.unique(class_id_map)
|
||||||
|
|
||||||
|
# create mapping between instance IDs and semantic category IDs
|
||||||
|
inst2class = {}
|
||||||
|
for label in class_labels:
|
||||||
|
instance_ids = np.unique(instance_seg[class_id_map == label])
|
||||||
|
inst2class.update({i: label for i in instance_ids})
|
||||||
|
|
||||||
|
return instance_seg, inst2class
|
||||||
|
|
||||||
|
instance_seg1, inst2class1 = get_instance_segmentation_and_mapping(annotation1)
|
||||||
|
instance_seg2, inst2class2 = get_instance_segmentation_and_mapping(annotation2)
|
||||||
|
|
||||||
|
# create a feature extractor
|
||||||
|
feature_extractor = MaskFormerFeatureExtractor(reduce_labels=True, ignore_index=255, size=(512, 512))
|
||||||
|
|
||||||
|
# prepare the images and annotations
|
||||||
|
inputs = feature_extractor(
|
||||||
|
[image1, image2],
|
||||||
|
[instance_seg1, instance_seg2],
|
||||||
|
instance_id_to_semantic_id=[inst2class1, inst2class2],
|
||||||
|
return_tensors="pt",
|
||||||
|
)
|
||||||
|
|
||||||
|
# verify the pixel values and pixel mask
|
||||||
|
self.assertEqual(inputs["pixel_values"].shape, (2, 3, 512, 512))
|
||||||
|
self.assertEqual(inputs["pixel_mask"].shape, (2, 512, 512))
|
||||||
|
|
||||||
|
# verify the class labels
|
||||||
|
self.assertEqual(len(inputs["class_labels"]), 2)
|
||||||
|
self.assertTrue(torch.allclose(inputs["class_labels"][0], torch.tensor([30, 55])))
|
||||||
|
self.assertTrue(torch.allclose(inputs["class_labels"][1], torch.tensor([4, 4, 23, 55])))
|
||||||
|
|
||||||
|
# verify the mask labels
|
||||||
|
self.assertEqual(len(inputs["mask_labels"]), 2)
|
||||||
|
self.assertEqual(inputs["mask_labels"][0].shape, (2, 512, 512))
|
||||||
|
self.assertEqual(inputs["mask_labels"][1].shape, (4, 512, 512))
|
||||||
|
self.assertEquals(inputs["mask_labels"][0].sum().item(), 41527.0)
|
||||||
|
self.assertEquals(inputs["mask_labels"][1].sum().item(), 26259.0)
|
||||||
|
|
||||||
|
def test_integration_semantic_segmentation(self):
|
||||||
|
# load 2 images and corresponding semantic annotations from the hub
|
||||||
|
repo_id = "nielsr/image-segmentation-toy-data"
|
||||||
|
image1 = Image.open(
|
||||||
|
hf_hub_download(repo_id=repo_id, filename="semantic_segmentation_image_1.png", repo_type="dataset")
|
||||||
|
)
|
||||||
|
image2 = Image.open(
|
||||||
|
hf_hub_download(repo_id=repo_id, filename="semantic_segmentation_image_2.png", repo_type="dataset")
|
||||||
|
)
|
||||||
|
annotation1 = Image.open(
|
||||||
|
hf_hub_download(repo_id=repo_id, filename="semantic_segmentation_annotation_1.png", repo_type="dataset")
|
||||||
|
)
|
||||||
|
annotation2 = Image.open(
|
||||||
|
hf_hub_download(repo_id=repo_id, filename="semantic_segmentation_annotation_2.png", repo_type="dataset")
|
||||||
|
)
|
||||||
|
|
||||||
|
# create a feature extractor
|
||||||
|
feature_extractor = MaskFormerFeatureExtractor(reduce_labels=True, ignore_index=255, size=(512, 512))
|
||||||
|
|
||||||
|
# prepare the images and annotations
|
||||||
|
inputs = feature_extractor(
|
||||||
|
[image1, image2],
|
||||||
|
[annotation1, annotation2],
|
||||||
|
return_tensors="pt",
|
||||||
|
)
|
||||||
|
|
||||||
|
# verify the pixel values and pixel mask
|
||||||
|
self.assertEqual(inputs["pixel_values"].shape, (2, 3, 512, 512))
|
||||||
|
self.assertEqual(inputs["pixel_mask"].shape, (2, 512, 512))
|
||||||
|
|
||||||
|
# verify the class labels
|
||||||
|
self.assertEqual(len(inputs["class_labels"]), 2)
|
||||||
|
self.assertTrue(torch.allclose(inputs["class_labels"][0], torch.tensor([2, 4, 60])))
|
||||||
|
self.assertTrue(torch.allclose(inputs["class_labels"][1], torch.tensor([0, 3, 7, 8, 15, 28, 30, 143])))
|
||||||
|
|
||||||
|
# verify the mask labels
|
||||||
|
self.assertEqual(len(inputs["mask_labels"]), 2)
|
||||||
|
self.assertEqual(inputs["mask_labels"][0].shape, (3, 512, 512))
|
||||||
|
self.assertEqual(inputs["mask_labels"][1].shape, (8, 512, 512))
|
||||||
|
self.assertEquals(inputs["mask_labels"][0].sum().item(), 170200.0)
|
||||||
|
self.assertEquals(inputs["mask_labels"][1].sum().item(), 257036.0)
|
||||||
|
|
||||||
|
def test_integration_panoptic_segmentation(self):
|
||||||
|
# load 2 images and corresponding panoptic annotations from the hub
|
||||||
|
dataset = load_dataset("nielsr/ade20k-panoptic-demo")
|
||||||
|
image1 = dataset["train"][0]["image"]
|
||||||
|
image2 = dataset["train"][1]["image"]
|
||||||
|
segments_info1 = dataset["train"][0]["segments_info"]
|
||||||
|
segments_info2 = dataset["train"][1]["segments_info"]
|
||||||
|
annotation1 = dataset["train"][0]["label"]
|
||||||
|
annotation2 = dataset["train"][1]["label"]
|
||||||
|
|
||||||
|
def rgb_to_id(color):
|
||||||
|
if isinstance(color, np.ndarray) and len(color.shape) == 3:
|
||||||
|
if color.dtype == np.uint8:
|
||||||
|
color = color.astype(np.int32)
|
||||||
|
return color[:, :, 0] + 256 * color[:, :, 1] + 256 * 256 * color[:, :, 2]
|
||||||
|
return int(color[0] + 256 * color[1] + 256 * 256 * color[2])
|
||||||
|
|
||||||
|
def create_panoptic_map(annotation, segments_info):
|
||||||
|
annotation = np.array(annotation)
|
||||||
|
# convert RGB to segment IDs per pixel
|
||||||
|
# 0 is the "ignore" label, for which we don't need to make binary masks
|
||||||
|
panoptic_map = rgb_to_id(annotation)
|
||||||
|
|
||||||
|
# create mapping between segment IDs and semantic classes
|
||||||
|
inst2class = {segment["id"]: segment["category_id"] for segment in segments_info}
|
||||||
|
|
||||||
|
return panoptic_map, inst2class
|
||||||
|
|
||||||
|
panoptic_map1, inst2class1 = create_panoptic_map(annotation1, segments_info1)
|
||||||
|
panoptic_map2, inst2class2 = create_panoptic_map(annotation2, segments_info2)
|
||||||
|
|
||||||
|
# create a feature extractor
|
||||||
|
feature_extractor = MaskFormerFeatureExtractor(ignore_index=0, do_resize=False)
|
||||||
|
|
||||||
|
# prepare the images and annotations
|
||||||
|
pixel_values_list = [np.moveaxis(np.array(image1), -1, 0), np.moveaxis(np.array(image2), -1, 0)]
|
||||||
|
inputs = feature_extractor.encode_inputs(
|
||||||
|
pixel_values_list,
|
||||||
|
[panoptic_map1, panoptic_map2],
|
||||||
|
instance_id_to_semantic_id=[inst2class1, inst2class2],
|
||||||
|
return_tensors="pt",
|
||||||
|
)
|
||||||
|
|
||||||
|
# verify the pixel values and pixel mask
|
||||||
|
self.assertEqual(inputs["pixel_values"].shape, (2, 3, 512, 711))
|
||||||
|
self.assertEqual(inputs["pixel_mask"].shape, (2, 512, 711))
|
||||||
|
|
||||||
|
# verify the class labels
|
||||||
|
self.assertEqual(len(inputs["class_labels"]), 2)
|
||||||
|
# fmt: off
|
||||||
|
expected_class_labels = torch.tensor([4, 17, 32, 42, 42, 42, 42, 42, 42, 42, 32, 12, 12, 12, 12, 12, 42, 42, 12, 12, 12, 42, 12, 12, 12, 12, 12, 3, 12, 12, 12, 12, 42, 42, 42, 12, 42, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 5, 12, 12, 12, 12, 12, 12, 12, 0, 43, 43, 43, 96, 43, 104, 43, 31, 125, 31, 125, 138, 87, 125, 149, 138, 125, 87, 87]) # noqa: E231
|
||||||
|
# fmt: on
|
||||||
|
self.assertTrue(torch.allclose(inputs["class_labels"][0], torch.tensor(expected_class_labels)))
|
||||||
|
# fmt: off
|
||||||
|
expected_class_labels = torch.tensor([19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 67, 82, 19, 19, 17, 19, 19, 19, 19, 19, 19, 19, 19, 19, 12, 12, 42, 12, 12, 12, 12, 3, 14, 12, 12, 12, 12, 12, 12, 12, 12, 14, 5, 12, 12, 0, 115, 43, 43, 115, 43, 43, 43, 8, 8, 8, 138, 138, 125, 143]) # noqa: E231
|
||||||
|
# fmt: on
|
||||||
|
self.assertTrue(torch.allclose(inputs["class_labels"][1], expected_class_labels))
|
||||||
|
|
||||||
|
# verify the mask labels
|
||||||
|
self.assertEqual(len(inputs["mask_labels"]), 2)
|
||||||
|
self.assertEqual(inputs["mask_labels"][0].shape, (79, 512, 711))
|
||||||
|
self.assertEqual(inputs["mask_labels"][1].shape, (61, 512, 711))
|
||||||
|
self.assertEquals(inputs["mask_labels"][0].sum().item(), 315193.0)
|
||||||
|
self.assertEquals(inputs["mask_labels"][1].sum().item(), 350747.0)
|
||||||
|
|
||||||
def test_binary_mask_to_rle(self):
|
def test_binary_mask_to_rle(self):
|
||||||
fake_binary_mask = np.zeros((20, 50))
|
fake_binary_mask = np.zeros((20, 50))
|
||||||
fake_binary_mask[0, 20:] = 1
|
fake_binary_mask[0, 20:] = 1
|
||||||
|
|||||||
Reference in New Issue
Block a user