RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

We study an important, yet largely unexplored problem of large-scale cross-modal visual localization by matching ground RGB images to a geo-referenced aerial LIDAR 3D point cloud (rendered as depth images). Prior works were demonstrated on small datasets and did not lend themselves to scaling up for large-scale applications. To enable large-scale evaluation, we introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images. We propose a novel joint embedding based method that effectively combines the appearance and semantic cues from both modalities to handle drastic cross-modal variations. Experiments on the proposed dataset show that our model achieves a strong result of a median rank of 5 in matching across a large test set of 50K location pairs collected from a 14km^2 area. This represents a significant advancement over prior works in performance and scale. We conclude with qualitative results to highlight the challenging nature of this task and the benefits of the proposed model. Our work provides a foundation for further research in cross-modal visual localization.

READ FULL TEXT

page 1

page 4

page 7

page 8

research
09/19/2023

Sound Source Localization is All about Cross-Modal Alignment

Humans can easily perceive the direction of sound sources in a visual sc...
research
12/05/2016

Deep Multi-Modal Image Correspondence Learning

Inference of correspondences between images from different modalities is...
research
04/14/2022

CroCo: Cross-Modal Contrastive learning for localization of Earth Observation data

It is of interest to localize a ground-based LiDAR point cloud on remote...
research
03/28/2022

Text2Pos: Text-to-Point-Cloud Cross-Modal Localization

Natural language-based communication with mobile devices and home applia...
research
04/17/2023

(LC)^2: LiDAR-Camera Loop Constraints For Cross-Modal Place Recognition

Localization has been a challenging task for autonomous navigation. A lo...
research
10/19/2022

Spatio-channel Attention Blocks for Cross-modal Crowd Counting

Crowd counting research has made significant advancements in real-world ...
research
09/14/2022

Learning to Evaluate Performance of Multi-modal Semantic Localization

Semantic localization (SeLo) refers to the task of obtaining the most re...

Please sign up or login with your details

Forgot password? Click here to reset