MegaDepth: Learning Single-View Depth Prediction from Internet Photos

04/02/2018
by   Zhengqi Li, et al.
2

Single-view depth prediction is a fundamental problem in computer vision. Recently, deep learning methods have led to significant progress, but such methods are limited by the available training data. Current datasets based on 3D sensors have key limitations, including indoor-only images (NYU), small numbers of training examples (Make3D), and sparse sampling (KITTI). We propose to use multi-view Internet photo collections, a virtually unlimited data source, to generate training data via modern structure-from-motion and multi-view stereo (MVS) methods, and present a large depth dataset called MegaDepth based on this idea. Data derived from MVS comes with its own challenges, including noise and unreconstructable objects. We address these challenges with new data cleaning methods, as well as automatically augmenting our data with ordinal depth relations generated using semantic segmentation. We validate the use of large amounts of Internet data by showing that models trained on MegaDepth exhibit strong generalization-not only to novel scenes, but also to other diverse datasets including Make3D, KITTI, and DIW, even when no images from those datasets are seen during training.

READ FULL TEXT

page 1

page 3

page 5

page 7

page 8

research
11/22/2019

BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks

While deep learning has recently achieved great success on multi-view st...
research
04/25/2019

Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes

We present a fully data-driven method to compute depth from diverse mono...
research
07/29/2015

Deep Learning for Single-View Instance Recognition

Deep learning methods have typically been trained on large datasets in w...
research
04/30/2021

Deep Multi-View Stereo gone wild

Deep multi-view stereo (deep MVS) methods have been developed and extens...
research
07/11/2023

Objaverse-XL: A Universe of 10M+ 3D Objects

Natural language processing and 2D vision models have attained remarkabl...
research
12/15/2021

Consistent Depth Prediction under Various Illuminations using Dilated Cross Attention

In this paper, we aim to solve the problem of consistent depth predictio...
research
08/12/2021

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision

The abundance and richness of Internet photos of landmarks and cities ha...

Please sign up or login with your details

Forgot password? Click here to reset