MVLoc: Multimodal Variational Geometry-Aware Learning for Visual Localization

03/12/2020
by   Rui Zhou, et al.
8

Recent learning-based research has achieved impressive results in the field of single-shot camera relocalization. However, how best to fuse multiple modalities, for example, image and depth, and how to deal with degraded or missing input are less well studied. In particular, we note that previous approaches towards deep fusion do not perform significantly better than models employing a single modality. We conjecture that this is because of the naive approaches to feature space fusion through summation or concatenation which do not take into account the different strengths of each modality, specifically appearance for images and structure for depth. To address this, we propose an end-to-end framework to fuse different sensor inputs through a variational Product-of-Experts (PoE) joint encoder followed by attention-based fusion. Unlike prior work which draws a single sample from the joint encoder, we show how accuracy can be increased through importance sampling and reparameterization of the latent space. Our model is extensively evaluated on RGB-D datasets, outperforming existing baselines by a large margin.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2022

Dynamic Multimodal Fusion

Deep multimodal learning has achieved great progress in recent years. Ho...
research
06/05/2019

OctopusNet: A Deep Learning Segmentation Network for Multi-modal Medical Images

Deep learning models, such as the fully convolutional network (FCN), hav...
research
11/06/2019

UNO: Uncertainty-aware Noisy-Or Multimodal Fusion for Unanticipated Input Degradation

The fusion of multiple sensor modalities, especially through deep learni...
research
11/18/2019

Modality To Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion

Learning joint embedding space for various modalities is of vital import...
research
07/25/2021

Will Multi-modal Data Improves Few-shot Learning?

Most few-shot learning models utilize only one modality of data. We woul...
research
08/11/2021

Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

We propose a compact and effective framework to fuse multimodal features...
research
07/12/2022

Wayformer: Motion Forecasting via Simple Efficient Attention Networks

Motion forecasting for autonomous driving is a challenging task because ...

Please sign up or login with your details

Forgot password? Click here to reset