PIGEON: Predicting Image Geolocations

07/11/2023
by   Lukas Haas, et al.
0

We introduce PIGEON, a multi-task end-to-end system for planet-scale image geolocalization that achieves state-of-the-art performance on both external benchmarks and in human evaluation. Our work incorporates semantic geocell creation with label smoothing, conducts pretraining of a vision transformer on images with geographic information, and refines location predictions with ProtoNets across a candidate set of geocells. The contributions of PIGEON are three-fold: first, we design a semantic geocells creation and splitting algorithm based on open-source data which can be adapted to any geospatial dataset. Second, we show the effectiveness of intra-geocell refinement and the applicability of unsupervised clustering and ProtNets to the task. Finally, we make our pre-trained CLIP transformer model, StreetCLIP, publicly available for use in adjacent domains with applications to fighting climate change and urban and rural scene understanding.

READ FULL TEXT

page 5

page 7

page 13

page 14

page 21

page 22

page 24

page 25

research
10/22/2020

mT5: A massively multilingual pre-trained text-to-text transformer

The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified ...
research
11/16/2020

End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features

Transformer networks and self-supervised pre-training have consistently ...
research
05/11/2023

IUST_NLP at SemEval-2023 Task 10: Explainable Detecting Sexism with Transformers and Task-adaptive Pretraining

This paper describes our system on SemEval-2023 Task 10: Explainable Det...
research
08/12/2021

Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations

Large-scale pretraining of visual representations has led to state-of-th...
research
03/15/2022

Inverted Pyramid Multi-task Transformer for Dense Scene Understanding

Multi-task dense scene understanding is a thriving research domain that ...
research
01/12/2023

ViTs for SITS: Vision Transformers for Satellite Image Time Series

In this paper we introduce the Temporo-Spatial Vision Transformer (TSViT...

Please sign up or login with your details

Forgot password? Click here to reset