Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes

04/18/2023
by   Rui Li, et al.
0

Multi-frame depth estimation generally achieves high accuracy relying on the multi-view geometric consistency. When applied in dynamic scenes, e.g., autonomous driving, this consistency is usually violated in the dynamic areas, leading to corrupted estimations. Many multi-frame methods handle dynamic areas by identifying them with explicit masks and compensating the multi-view cues with monocular cues represented as local monocular depth or features. The improvements are limited due to the uncontrolled quality of the masks and the underutilized benefits of the fusion of the two types of cues. In this paper, we propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the heuristically crafted masks. As unveiled in our analyses, the multi-view cues capture more accurate geometric information in static areas, and the monocular cues capture more useful contexts in dynamic areas. To let the geometric perception learned from multi-view cues in static areas propagate to the monocular representation in dynamic areas and let monocular cues enhance the representation of multi-view cost volume, we propose a cross-cue fusion (CCF) module, which includes the cross-cue attention (CCA) to encode the spatially non-local relative intra-relations from each source to enhance the representation of the other. Experiments on real-world datasets prove the significant effectiveness and generalization ability of the proposed method.

READ FULL TEXT

page 1

page 3

page 5

page 8

research
08/19/2022

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

Self-supervised monocular methods can efficiently learn depth informatio...
research
10/24/2022

Monocular Dynamic View Synthesis: A Reality Check

We study the recent progress on dynamic view synthesis (DVS) from monocu...
research
12/13/2022

SST: Real-time End-to-end Monocular 3D Reconstruction via Sparse Spatial-Temporal Guidance

Real-time monocular 3D reconstruction is a challenging problem that rema...
research
11/22/2016

Single-View and Multi-View Depth Fusion

Dense and accurate 3D mapping from a monocular sequence is a key technol...
research
05/12/2023

Learning Monocular Depth in Dynamic Environment via Context-aware Temporal Attention

The monocular depth estimation task has recently revealed encouraging pr...
research
06/22/2022

Monocular Spherical Depth Estimation with Explicitly Connected Weak Layout Cues

Spherical cameras capture scenes in a holistic manner and have been used...
research
01/21/2022

Multi-view Monocular Depth and Uncertainty Prediction with Deep SfM in Dynamic Environments

3D reconstruction of depth and motion from monocular video in dynamic en...

Please sign up or login with your details

Forgot password? Click here to reset