Moving SLAM: Fully Unsupervised Deep Learning in Non-Rigid Scenes

05/05/2021
by   Dan Xu, et al.
11

We propose a method to train deep networks to decompose videos into 3D geometry (camera and depth), moving objects, and their motions, with no supervision. We build on the idea of view synthesis, which uses classical camera geometry to re-render a source image from a different point-of-view, specified by a predicted relative pose and depth map. By minimizing the error between the synthetic image and the corresponding real image in a video, the deep network that predicts pose and depth can be trained completely unsupervised. However, the view synthesis equations rely on a strong assumption: that objects do not move. This rigid-world assumption limits the predictive power, and rules out learning about objects automatically. We propose a simple solution: minimize the error on small regions of the image instead. While the scene as a whole may be non-rigid, it is always possible to find small regions that are approximately rigid, such as inside a moving object. Our network can then predict different poses for each region, in a sliding window. This represents a significantly richer model, including 6D object motions, with little additional complexity. We establish new state-of-the-art results on unsupervised odometry and depth prediction on KITTI. We also demonstrate new capabilities on EPIC-Kitchens, a challenging dataset of indoor videos, where there is no ground truth information for depth, odometry, object segmentation or motion. Yet all are recovered automatically by our method.

READ FULL TEXT

page 4

page 6

page 7

page 8

research
06/27/2018

Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding

Learning to estimate 3D geometry in a single image by watching unlabeled...
research
02/26/2019

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

While learning based depth estimation from images/videos has achieved su...
research
12/11/2019

Training Deep SLAM on Single Frames

Learning-based visual odometry and SLAM methods demonstrate a steady imp...
research
06/04/2020

Unsupervised Depth Learning in Challenging Indoor Video: Weak Rectification to Rescue

Single-view depth estimation using CNNs trained from unlabelled videos h...
research
01/11/2021

Learning to Segment Rigid Motions from Two Frames

Appearance-based detectors achieve remarkable performance on common scen...
research
11/15/2018

Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

Learning to predict scene depth from RGB inputs is a challenging task bo...
research
07/11/2020

Learning Object Depth from Camera Motion and Video Object Segmentation

Video object segmentation, i.e., the separation of a target object from ...

Please sign up or login with your details

Forgot password? Click here to reset