The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

06/02/2023
by   Saurabh Saxena, et al.
0

Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions that are predominant for these tasks. Compared to the point estimates of conventional regression-based methods, diffusion models also enable Monte Carlo inference, e.g., capturing uncertainty and ambiguity in flow and depth. With self-supervised pre-training, the combined use of synthetic and real data for supervised training, and technical innovations (infilling and step-unrolled denoising diffusion training) to handle noisy-incomplete training data, and a simple form of coarse-to-fine refinement, one can train state-of-the-art diffusion models for depth and optical flow estimation. Extensive experiments focus on quantitative performance against benchmarks, ablations, and the model's ability to capture uncertainty and multimodality, and impute missing values. Our model, DDVM (Denoising Diffusion Vision Model), obtains a state-of-the-art relative depth error of 0.074 on the indoor NYU benchmark and an Fl-all outlier rate of 3.26% on the KITTI optical flow benchmark, about 25% better than the best published method. For an overview see https://diffusion-vision.github.io.

READ FULL TEXT

page 6

page 15

page 16

page 17

page 18

page 19

page 21

page 25

research
02/28/2023

Monocular Depth Estimation using Diffusion Models

We formulate monocular depth estimation using denoising diffusion models...
research
06/27/2019

DeepVIO: Self-supervised Deep Learning of Monocular Visual Inertial Odometry using 3D Geometric Constraints

This paper presents an self-supervised deep learning network for monocul...
research
03/28/2022

Learning Optical Flow, Depth, and Scene Flow without Real-World Labels

Self-supervised monocular depth estimation enables robots to learn 3D pe...
research
07/16/2019

Speed estimation evaluation on the KITTI benchmark based on motion and monocular depth information

In this technical report we investigate speed estimation of the ego-vehi...
research
06/08/2021

Learning by Distillation: A Self-Supervised Learning Framework for Optical Flow Estimation

We present DistillFlow, a knowledge distillation approach to learning op...
research
08/25/2022

A Compacted Structure for Cross-domain learning on Monocular Depth and Flow Estimation

Accurate motion and depth recovery is important for many robot vision ta...
research
10/19/2022

CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion

Masked Image Modeling (MIM) has recently been established as a potent pr...

Please sign up or login with your details

Forgot password? Click here to reset