Log In Sign Up

VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry

by   Noha Radwan, et al.

Visual localization is one of the fundamental enablers of robot autonomy which has been mostly tackled using local feature-based pipelines that efficiently encode knowledge about the environment and the underlying geometrical constraints. Although deep learning based approaches have shown considerable robustness in the context of significant perceptual changes, repeating structures and textureless regions, their performance has been subpar in comparison to local feature-based pipelines. In this paper, we propose the novel VLocNet++ architecture that attempts to overcome this limitation by simultaneously embedding geometric and semantic knowledge of the world into the pose regression network. We adopt a multitask learning approach that exploits the inter-task relationship between learning semantics, regressing 6-DoF global pose and odometry, for the mutual benefit of each of these tasks. VLocNet++ incorporates the Geometric Consistency Loss function that utilizes the predicted motion from the odometry stream to enforce global consistency during pose regression. Furthermore, we propose a self-supervised warping technique that uses the relative motion to warp intermediate network representations in the segmentation stream for learning consistent semantics. In addition, we propose a novel adaptive weighted fusion layer to leverage inter and intra task dependencies based on region activations. Finally, we introduce a first-of-a-kind urban outdoor localization dataset with pixel-level semantic labels and multiple loops for training deep networks. Extensive experiments on the challenging indoor Microsoft 7-Scenes benchmark and our outdoor DeepLoc dataset demonstrate that our approach exceeds the state-of-the-art, outperforming local feature-based methods while exhibiting substantial robustness in challenging scenarios.


page 1

page 8

page 9

page 10

page 11

page 12

page 13


Deep Auxiliary Learning for Visual Localization and Odometry

Localization is an indispensable component of a robot's autonomy stack t...

Deep auxiliary learning for visual localization using colorization task

Visual localization is one of the most important components for robotics...

Deep Global-Relative Networks for End-to-End 6-DoF Visual Localization and Odometry

For the autonomous navigation of mobile robots, robust and fast visual l...

Two Stream Networks for Self-Supervised Ego-Motion Estimation

Learning depth and camera ego-motion from raw unlabeled RGB video stream...

MapNet: Geometry-Aware Learning of Maps for Camera Localization

Maps are a key component in image-based camera localization and visual S...

Local Supports Global: Deep Camera Relocalization with Sequence Enhancement

We propose to leverage the local information in image sequences to suppo...

Deep Visual Odometry with Adaptive Memory

We propose a novel deep visual odometry (VO) method that considers globa...