Deep auxiliary learning for visual localization using colorization task

by   Mi Tian, et al.

Visual localization is one of the most important components for robotics and autonomous driving. Recently, inspiring results have been shown with CNN-based methods which provide a direct formulation to end-to-end regress 6-DoF absolute pose. Additional information like geometric or semantic constraints is generally introduced to improve performance. Especially, the latter can aggregate high-level semantic information into localization task, but it usually requires enormous manual annotations. To this end, we propose a novel auxiliary learning strategy for camera localization by introducing scene-specific high-level semantics from self-supervised representation learning task. Viewed as a powerful proxy task, image colorization task is chosen as complementary task that outputs pixel-wise color version of grayscale photograph without extra annotations. In our work, feature representations from colorization network are embedded into localization network by design to produce discriminative features for pose regression. Meanwhile an attention mechanism is introduced for the benefit of localization performance. Extensive experiments show that our model significantly improve localization accuracy over state-of-the-arts on both indoor and outdoor datasets.


page 2

page 4

page 6

page 7


VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry

Visual localization is one of the fundamental enablers of robot autonomy...

Understanding the Limitations of CNN-based Absolute Camera Pose Regression

Visual localization is the task of accurate camera pose estimation in a ...

3D Scene Geometry-Aware Constraint for Camera Localization with Deep Learning

Camera localization is a fundamental and key component of autonomous dri...

Improving the generalization of network based relative pose regression: dimension reduction as a regularizer

Visual localization occupies an important position in many areas such as...

Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization

Long-term visual localization is the problem of estimating the camera po...

Image-based localization using LSTMs for structured feature correlation

In this work we propose a new CNN+LSTM architecture for camera pose regr...

Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization

Cross-view geo-localization is to spot images of the same geographic tar...