Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

07/20/2023
by   Wei Yin, et al.
0

Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-posedness of the single-image reconstruction problem, most well-established methods are built upon multi-view geometry. State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity. Meanwhile, SOTA monocular methods trained on large mixed datasets achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. In this work, we show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problems and can be effortlessly plugged into existing monocular models. Equipped with our module, monocular models can be stably trained with over 8 million images with thousands of camera models, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Experiments demonstrate SOTA performance of our method on 7 zero-shot benchmarks. Notably, our method won the championship in the 2nd Monocular Depth Estimation Challenge. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. The potential benefits extend to downstream tasks, which can be significantly improved by simply plugging in our model. For example, our model relieves the scale drift issues of monocular-SLAM (Fig. 1), leading to high-quality metric scale dense mapping. The code is available at https://github.com/YvanYin/Metric3D.

READ FULL TEXT

page 1

page 7

page 11

page 12

page 13

page 14

page 15

page 16

research
12/17/2020

Learning to Recover 3D Scene Shape from a Single Image

Despite significant progress in monocular depth estimation in the wild, ...
research
08/10/2023

FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models

3D scene reconstruction is a long-standing vision task. Existing approac...
research
07/20/2023

Kick Back Relax: Learning to Reconstruct the World by Watching SlowTV

Self-supervised monocular depth estimation (SS-MDE) has the potential to...
research
02/03/2020

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data

We present a method for depth estimation with monocular images, which ca...
research
06/29/2023

Towards Zero-Shot Scale-Aware Monocular Depth Estimation

Monocular depth estimation is scale-ambiguous, and thus requires scale s...
research
07/03/2022

Can Language Understand Depth?

Besides image classification, Contrastive Language-Image Pre-training (C...
research
02/23/2023

ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

This paper tackles the problem of depth estimation from a single image. ...

Please sign up or login with your details

Forgot password? Click here to reset