TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization

04/09/2022
by   chen chen, et al.
0

The dominant CNN-based methods for cross-view image geo-localization rely on polar transform and fail to model global correlation. We propose a pure transformer-based approach (TransGeo) to address these limitations from a different perspective. TransGeo takes full advantage of the strengths of transformer related to global information modeling and explicit position information encoding. We further leverage the flexibility of transformer input and propose an attention-guided non-uniform cropping method, so that uninformative image patches are removed with negligible drop on performance to reduce computation cost. The saved computation can be reallocated to increase resolution only for informative patches, resulting in performance improvement with no additional computation cost. This "attend and zoom-in" strategy is highly similar to human behavior when observing images. Remarkably, TransGeo achieves state-of-the-art results on both urban and rural datasets, with significantly less computation cost than CNN-based methods. It does not rely on polar transform and infers faster than CNN-based methods. Code is available at https://github.com/Jeff-Zilence/TransGeo2022

READ FULL TEXT

page 3

page 4

page 8

page 12

page 13

research
06/10/2021

CAT: Cross Attention in Vision Transformer

Since Transformer has found widespread use in NLP, the potential of Tran...
research
10/13/2022

RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer

Recently, transformer-based networks have shown impressive results in se...
research
06/02/2022

Modeling Image Composition for Complex Scene Generation

We present a method that achieves state-of-the-art results on challengin...
research
01/23/2022

A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

Cross-view geo-localization is a task of matching the same geographic im...
research
03/21/2023

ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization

Domain generalization (DG) aims to learn a model that generalizes well t...
research
06/22/2022

Polar Parametrization for Vision-based Surround-View 3D Detection

3D detection based on surround-view camera system is a critical techniqu...
research
04/13/2023

NeRD: Neural field-based Demosaicking

We introduce NeRD, a new demosaicking method for generating full-color i...

Please sign up or login with your details

Forgot password? Click here to reset