A Multi-modal Approach to Single-modal Visual Place Classification

05/10/2023
by   Tomoya Iwasaki, et al.
0

Visual place classification from a first-person-view monocular RGB image is a fundamental problem in long-term robot navigation. A difficulty arises from the fact that RGB image classifiers are often vulnerable to spatial and appearance changes and degrade due to domain shifts, such as seasonal, weather, and lighting differences. To address this issue, multi-sensor fusion approaches combining RGB and depth (D) (e.g., LIDAR, radar, stereo) have gained popularity in recent years. Inspired by these efforts in multimodal RGB-D fusion, we explore the use of pseudo-depth measurements from recently-developed techniques of “domain invariant" monocular depth estimation as an additional pseudo depth modality, by reformulating the single-modal RGB image classification task as a pseudo multi-modal RGB-D classification problem. Specifically, a practical, fully self-supervised framework for training, appropriately processing, fusing, and classifying these two modalities, RGB and pseudo-D, is described. Experiments on challenging cross-domain scenarios using public NCLT datasets validate effectiveness of the proposed framework.

READ FULL TEXT

page 2

page 3

page 5

research
07/03/2023

Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization

Geometric navigation is nowadays a well-established field of robotics an...
research
04/04/2022

MultiMAE: Multi-modal Multi-task Masked Autoencoders

We propose a pre-training strategy called Multi-modal Multi-task Masked ...
research
03/26/2021

Translate to Adapt: RGB-D Scene Recognition across Domains

Scene classification is one of the basic problems in computer vision res...
research
09/30/2020

Depth Estimation from Monocular Images and Sparse Radar Data

In this paper, we explore the possibility of achieving a more accurate d...
research
04/23/2021

Co-training for Deep Object Detection: Comparing Single-modal and Multi-modal Approaches

Top-performing computer vision models are powered by convolutional neura...
research
03/23/2022

CroMo: Cross-Modal Learning for Monocular Depth Estimation

Learning-based depth estimation has witnessed recent progress in multipl...
research
10/05/2022

Depth Is All You Need for Monocular 3D Detection

A key contributor to recent progress in 3D detection from single images ...

Please sign up or login with your details

Forgot password? Click here to reset