Learning Cross-Scale Visual Representations for Real-Time Image Geo-Localization

09/09/2021
by   Tianyi Zhang, et al.
8

Robot localization remains a challenging task in GPS denied environments. State estimation approaches based on local sensors, e.g. cameras or IMUs, are drifting-prone for long-range missions as error accumulates. In this study, we aim to address this problem by localizing image observations in a 2D multi-modal geospatial map. We introduce the cross-scale dataset and a methodology to produce additional data from cross-modality sources. We propose a framework that learns cross-scale visual representations without supervision. Experiments are conducted on data from two different domains, underwater and aerial. In contrast to existing studies in cross-view image geo-localization, our approach a) performs better on smaller-scale multi-modal maps; b) is more computationally efficient for real-time applications; c) can serve directly in concert with state estimation pipelines.

READ FULL TEXT

page 1

page 3

page 5

page 6

research
02/16/2022

Cross-view and Cross-domain Underwater Localization based on Optical Aerial and Acoustic Underwater Images

Cross-view image matches have been widely explored on terrestrial image ...
research
02/19/2022

Multi-Modal Recurrent Fusion for Indoor Localization

This paper considers indoor localization using multi-modal wireless sign...
research
08/11/2023

Image-based Geolocalization by Ground-to-2.5D Map Matching

We study the image-based geolocalization problem that aims to locate gro...
research
06/24/2021

Planetary UAV localization based on Multi-modal Registration with Pre-existing Digital Terrain Model

The autonomous real-time optical navigation of planetary UAV is of the k...
research
10/11/2022

AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization

An audio-visual event (AVE) is denoted by the correspondence of the visu...
research
07/23/2020

METEOR: Learning Memory and Time Efficient Representations from Multi-modal Data Streams

Many learning tasks involve multi-modal data streams, where continuous d...
research
06/06/2023

Energy-Based Models for Cross-Modal Localization using Convolutional Transformers

We present a novel framework using Energy-Based Models (EBMs) for locali...

Please sign up or login with your details

Forgot password? Click here to reset