DeepAI AI Chat
Log In Sign Up

Attentional Separation-and-Aggregation Network for Self-supervised Depth-Pose Learning in Dynamic Scenes

by   Feng Gao, et al.

Learning depth and ego-motion from unlabeled videos via self-supervision from epipolar projection can improve the robustness and accuracy of the 3D perception and localization of vision-based robots. However, the rigid projection computed by ego-motion cannot represent all scene points, such as points on moving objects, leading to false guidance in these regions. To address this problem, we propose an Attentional Separation-and-Aggregation Network (ASANet), which can learn to distinguish and extract the scene's static and dynamic characteristics via the attention mechanism. We further propose a novel MotionNet with an ASANet as the encoder, followed by two separate decoders, to estimate the camera's ego-motion and the scene's dynamic motion field. Then, we introduce an auto-selecting approach to detect the moving objects for dynamic-aware learning automatically. Empirical experiments demonstrate that our method can achieve the state-of-the-art performance on the KITTI benchmark.


page 3

page 5


Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from Video

Estimating geometric elements such as depth, camera motion, and optical ...

CeMNet: Self-supervised learning for accurate continuous ego-motion estimation

In this paper, we propose a novel self-supervised learning model for est...

D^2NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video

Given a monocular video, segmenting and decoupling dynamic objects while...

Deep Semantic Classification for 3D LiDAR Data

Robots are expected to operate autonomously in dynamic environments. Und...

Instance-aware multi-object self-supervision for monocular depth prediction

This paper proposes a self-supervised monocular image-to-depth predictio...

Self-Supervised Depth Estimation with Isometric-Self-Sample-Based Learning

Managing the dynamic regions in the photometric loss formulation has bee...

CbwLoss: Constrained Bidirectional Weighted Loss for Self-supervised Learning of Depth and Pose

Photometric differences are widely used as supervision signals to train ...