Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies (#32550)

* add colorize_depth and matplotlib availability check

* add post_process_depth_estimation for zoedepth + tests

* add post_process_depth_estimation for DPT + tests

* add post_process_depth_estimation in DepthEstimationPipeline & special case for zoedepth

* run `make fixup`

* fix import related error on tests

* fix more import related errors on test

* forgot some `torch` calls in declerations

* remove `torch` call in zoedepth tests that caused error

* updated docs for depth estimation

* small fix for `colorize` input/output types

* remove `colorize_depth`, fix various names, remove matplotlib dependency

* fix formatting

* run fixup

* different images for test

* update examples in `forward` functions

* fixed broken links

* fix output types for docs

* possible format fix inside `<Tip>`

* Readability related updates

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Readability related update

* cleanup after merge

* refactor `post_process_depth_estimation` to return dict; simplify ZoeDepth's `post_process_depth_estimation`

* rewrite dict merging to support python 3.8

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
This commit is contained in:
Alexandros Benetatos
2024-10-22 16:50:54 +03:00
committed by GitHub
parent 104599d7a8
commit c31a6ff474
13 changed files with 437 additions and 203 deletions

View File

@@ -384,3 +384,29 @@ class DPTModelIntegrationTest(unittest.TestCase):
segmentation = image_processor.post_process_semantic_segmentation(outputs=outputs)
expected_shape = torch.Size((480, 480))
self.assertEqual(segmentation[0].shape, expected_shape)
def test_post_processing_depth_estimation(self):
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-large")
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large")
image = prepare_img()
inputs = image_processor(images=image, return_tensors="pt")
# forward pass
with torch.no_grad():
outputs = model(**inputs)
predicted_depth = image_processor.post_process_depth_estimation(outputs=outputs)[0]["predicted_depth"]
expected_shape = torch.Size((384, 384))
self.assertTrue(predicted_depth.shape == expected_shape)
predicted_depth_l = image_processor.post_process_depth_estimation(outputs=outputs, target_sizes=[(500, 500)])
predicted_depth_l = predicted_depth_l[0]["predicted_depth"]
expected_shape = torch.Size((500, 500))
self.assertTrue(predicted_depth_l.shape == expected_shape)
output_enlarged = torch.nn.functional.interpolate(
predicted_depth.unsqueeze(0).unsqueeze(1), size=(500, 500), mode="bicubic", align_corners=False
).squeeze()
self.assertTrue(output_enlarged.shape == expected_shape)
self.assertTrue(torch.allclose(predicted_depth_l, output_enlarged, rtol=1e-3))