Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies (#32550)
* add colorize_depth and matplotlib availability check * add post_process_depth_estimation for zoedepth + tests * add post_process_depth_estimation for DPT + tests * add post_process_depth_estimation in DepthEstimationPipeline & special case for zoedepth * run `make fixup` * fix import related error on tests * fix more import related errors on test * forgot some `torch` calls in declerations * remove `torch` call in zoedepth tests that caused error * updated docs for depth estimation * small fix for `colorize` input/output types * remove `colorize_depth`, fix various names, remove matplotlib dependency * fix formatting * run fixup * different images for test * update examples in `forward` functions * fixed broken links * fix output types for docs * possible format fix inside `<Tip>` * Readability related updates Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Readability related update * cleanup after merge * refactor `post_process_depth_estimation` to return dict; simplify ZoeDepth's `post_process_depth_estimation` * rewrite dict merging to support python 3.8 --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
This commit is contained in:
committed by
GitHub
parent
104599d7a8
commit
c31a6ff474
@@ -84,27 +84,24 @@ If you want to do the pre- and postprocessing yourself, here's how to do that:
|
||||
|
||||
>>> with torch.no_grad():
|
||||
... outputs = model(**inputs)
|
||||
... predicted_depth = outputs.predicted_depth
|
||||
|
||||
>>> # interpolate to original size
|
||||
>>> prediction = torch.nn.functional.interpolate(
|
||||
... predicted_depth.unsqueeze(1),
|
||||
... size=image.size[::-1],
|
||||
... mode="bicubic",
|
||||
... align_corners=False,
|
||||
>>> # interpolate to original size and visualize the prediction
|
||||
>>> post_processed_output = image_processor.post_process_depth_estimation(
|
||||
... outputs,
|
||||
... target_sizes=[(image.height, image.width)],
|
||||
... )
|
||||
|
||||
>>> # visualize the prediction
|
||||
>>> output = prediction.squeeze().cpu().numpy()
|
||||
>>> formatted = (output * 255 / np.max(output)).astype("uint8")
|
||||
>>> depth = Image.fromarray(formatted)
|
||||
>>> predicted_depth = post_processed_output[0]["predicted_depth"]
|
||||
>>> depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min())
|
||||
>>> depth = depth.detach().cpu().numpy() * 255
|
||||
>>> depth = Image.fromarray(depth.astype("uint8"))
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Depth Anything.
|
||||
|
||||
- [Monocular depth estimation task guide](../tasks/depth_estimation)
|
||||
- [Monocular depth estimation task guide](../tasks/monocular_depth_estimation)
|
||||
- A notebook showcasing inference with [`DepthAnythingForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Depth%20Anything/Predicting_depth_in_an_image_with_Depth_Anything.ipynb). 🌎
|
||||
|
||||
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
|
||||
|
||||
@@ -78,27 +78,24 @@ If you want to do the pre- and post-processing yourself, here's how to do that:
|
||||
|
||||
>>> with torch.no_grad():
|
||||
... outputs = model(**inputs)
|
||||
... predicted_depth = outputs.predicted_depth
|
||||
|
||||
>>> # interpolate to original size
|
||||
>>> prediction = torch.nn.functional.interpolate(
|
||||
... predicted_depth.unsqueeze(1),
|
||||
... size=image.size[::-1],
|
||||
... mode="bicubic",
|
||||
... align_corners=False,
|
||||
>>> # interpolate to original size and visualize the prediction
|
||||
>>> post_processed_output = image_processor.post_process_depth_estimation(
|
||||
... outputs,
|
||||
... target_sizes=[(image.height, image.width)],
|
||||
... )
|
||||
|
||||
>>> # visualize the prediction
|
||||
>>> output = prediction.squeeze().cpu().numpy()
|
||||
>>> formatted = (output * 255 / np.max(output)).astype("uint8")
|
||||
>>> depth = Image.fromarray(formatted)
|
||||
>>> predicted_depth = post_processed_output[0]["predicted_depth"]
|
||||
>>> depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min())
|
||||
>>> depth = depth.detach().cpu().numpy() * 255
|
||||
>>> depth = Image.fromarray(depth.astype("uint8"))
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Depth Anything.
|
||||
|
||||
- [Monocular depth estimation task guide](../tasks/depth_estimation)
|
||||
- [Monocular depth estimation task guide](../tasks/monocular_depth_estimation)
|
||||
- [Depth Anything V2 demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-V2).
|
||||
- A notebook showcasing inference with [`DepthAnythingForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Depth%20Anything/Predicting_depth_in_an_image_with_Depth_Anything.ipynb). 🌎
|
||||
- [Core ML conversion of the `small` variant for use on Apple Silicon](https://huggingface.co/apple/coreml-depth-anything-v2-small).
|
||||
|
||||
@@ -39,54 +39,66 @@ The original code can be found [here](https://github.com/isl-org/ZoeDepth).
|
||||
The easiest to perform inference with ZoeDepth is by leveraging the [pipeline API](../main_classes/pipelines.md):
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
from PIL import Image
|
||||
import requests
|
||||
>>> from transformers import pipeline
|
||||
>>> from PIL import Image
|
||||
>>> import requests
|
||||
|
||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
image = Image.open(requests.get(url, stream=True).raw)
|
||||
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
>>> image = Image.open(requests.get(url, stream=True).raw)
|
||||
|
||||
pipe = pipeline(task="depth-estimation", model="Intel/zoedepth-nyu-kitti")
|
||||
result = pipe(image)
|
||||
depth = result["depth"]
|
||||
>>> pipe = pipeline(task="depth-estimation", model="Intel/zoedepth-nyu-kitti")
|
||||
>>> result = pipe(image)
|
||||
>>> depth = result["depth"]
|
||||
```
|
||||
|
||||
Alternatively, one can also perform inference using the classes:
|
||||
|
||||
```python
|
||||
from transformers import AutoImageProcessor, ZoeDepthForDepthEstimation
|
||||
import torch
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
import requests
|
||||
>>> from transformers import AutoImageProcessor, ZoeDepthForDepthEstimation
|
||||
>>> import torch
|
||||
>>> import numpy as np
|
||||
>>> from PIL import Image
|
||||
>>> import requests
|
||||
|
||||
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
image = Image.open(requests.get(url, stream=True).raw)
|
||||
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
||||
>>> image = Image.open(requests.get(url, stream=True).raw)
|
||||
|
||||
image_processor = AutoImageProcessor.from_pretrained("Intel/zoedepth-nyu-kitti")
|
||||
model = ZoeDepthForDepthEstimation.from_pretrained("Intel/zoedepth-nyu-kitti")
|
||||
>>> image_processor = AutoImageProcessor.from_pretrained("Intel/zoedepth-nyu-kitti")
|
||||
>>> model = ZoeDepthForDepthEstimation.from_pretrained("Intel/zoedepth-nyu-kitti")
|
||||
|
||||
# prepare image for the model
|
||||
inputs = image_processor(images=image, return_tensors="pt")
|
||||
>>> # prepare image for the model
|
||||
>>> inputs = image_processor(images=image, return_tensors="pt")
|
||||
|
||||
with torch.no_grad():
|
||||
outputs = model(**inputs)
|
||||
predicted_depth = outputs.predicted_depth
|
||||
>>> with torch.no_grad():
|
||||
... outputs = model(pixel_values)
|
||||
|
||||
# interpolate to original size
|
||||
prediction = torch.nn.functional.interpolate(
|
||||
predicted_depth.unsqueeze(1),
|
||||
size=image.size[::-1],
|
||||
mode="bicubic",
|
||||
align_corners=False,
|
||||
)
|
||||
>>> # interpolate to original size and visualize the prediction
|
||||
>>> ## ZoeDepth dynamically pads the input image. Thus we pass the original image size as argument
|
||||
>>> ## to `post_process_depth_estimation` to remove the padding and resize to original dimensions.
|
||||
>>> post_processed_output = image_processor.post_process_depth_estimation(
|
||||
... outputs,
|
||||
... source_sizes=[(image.height, image.width)],
|
||||
... )
|
||||
|
||||
# visualize the prediction
|
||||
output = prediction.squeeze().cpu().numpy()
|
||||
formatted = (output * 255 / np.max(output)).astype("uint8")
|
||||
depth = Image.fromarray(formatted)
|
||||
>>> predicted_depth = post_processed_output[0]["predicted_depth"]
|
||||
>>> depth = (predicted_depth - predicted_depth.min()) / (predicted_depth.max() - predicted_depth.min())
|
||||
>>> depth = depth.detach().cpu().numpy() * 255
|
||||
>>> depth = Image.fromarray(depth.astype("uint8"))
|
||||
```
|
||||
|
||||
<Tip>
|
||||
<p>In the <a href="https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/models/depth_model.py#L131">original implementation</a> ZoeDepth model performs inference on both the original and flipped images and averages out the results. The <code>post_process_depth_estimation</code> function can handle this for us by passing the flipped outputs to the optional <code>outputs_flipped</code> argument:</p>
|
||||
<pre><code class="language-Python">>>> with torch.no_grad():
|
||||
... outputs = model(pixel_values)
|
||||
... outputs_flipped = model(pixel_values=torch.flip(inputs.pixel_values, dims=[3]))
|
||||
>>> post_processed_output = image_processor.post_process_depth_estimation(
|
||||
... outputs,
|
||||
... source_sizes=[(image.height, image.width)],
|
||||
... outputs_flipped=outputs_flipped,
|
||||
... )
|
||||
</code></pre>
|
||||
</Tip>
|
||||
|
||||
## Resources
|
||||
|
||||
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ZoeDepth.
|
||||
|
||||
Reference in New Issue
Block a user