DeepAI AI Chat
Log In Sign Up

Unsupervised Learning of Depth and Deep Representation for Visual Odometry from Monocular Videos in a Metric Space

by   Xiaochuan Yin, et al.
Tongji University

For ego-motion estimation, the feature representation of the scenes is crucial. Previous methods indicate that both the low-level and semantic feature-based methods can achieve promising results. Therefore, the incorporation of hierarchical feature representation may benefit from both methods. From this perspective, we propose a novel direct feature odometry framework, named DFO, for depth estimation and hierarchical feature representation learning from monocular videos. By exploiting the metric distance, our framework is able to learn the hierarchical feature representation without supervision. The pose is obtained with a coarse-to-fine approach from high-level to low-level features in enlarged feature maps. The pixel-level attention mask can be self-learned to provide the prior information. In contrast to the previous methods, our proposed method calculates the camera motion with a direct method rather than regressing the ego-motion from the pose network. With this approach, the consistency of the scale factor of translation can be constrained. Additionally, the proposed method is thus compatible with the traditional SLAM pipeline. Experiments on the KITTI dataset demonstrate the effectiveness of our method.


D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry

We propose D3VO as a novel framework for monocular visual odometry that ...

DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning

In the current monocular depth research, the dominant approach is to emp...

Dyna-DepthFormer: Multi-frame Transformer for Self-Supervised Depth Estimation in Dynamic Scenes

Self-supervised methods have showed promising results on depth estimatio...

Feature-metric Loss for Self-supervised Learning of Depth and Egomotion

Photometric loss is widely used for self-supervised depth and egomotion ...

Unsupervised Learning-based Depth Estimation aided Visual SLAM Approach

The RGB-D camera maintains a limited range for working and is hard to ac...

MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection

Due to the inherent ill-posed nature of 2D-3D projection, monocular 3D o...

Motion Basis Learning for Unsupervised Deep Homography Estimation with Subspace Projection

In this paper, we introduce a new framework for unsupervised deep homogr...