Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images

07/29/2023
by   Aayush Dhakal, et al.
0

We propose a novel weakly supervised approach for creating maps using free-form textual descriptions (or captions). We refer to this new line of work of creating textual maps as zero-shot mapping. Prior works have approached mapping tasks by developing models that predict over a fixed set of attributes using overhead imagery. However, these models are very restrictive as they can only solve highly specific tasks for which they were trained. Mapping text, on the other hand, allows us to solve a large variety of mapping problems with minimal restrictions. To achieve this, we train a contrastive learning framework called Sat2Cap on a new large-scale dataset of paired overhead and ground-level images. For a given location, our model predicts the expected CLIP embedding of the ground-level scenery. Sat2Cap is also conditioned on temporal information, enabling it to learn dynamic concepts that vary over time. Our experimental results demonstrate that our models successfully capture fine-grained concepts and effectively adapt to temporal variations. Our approach does not require any text-labeled data making the training easily scalable. The code, dataset, and models will be made publicly available.

READ FULL TEXT

page 2

page 5

page 6

page 7

page 8

page 12

page 14

page 15

research
07/30/2021

CLIP-Art: Contrastive Pre-Training for Fine-Grained Art Classification

Existing computer vision research in artwork struggles with artwork's fi...
research
11/26/2019

Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions

Powerful generative adversarial networks (GAN) have been developed to au...
research
08/22/2023

ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data

Vision-language models (VLMs), such as CLIP and ALIGN, are generally tra...
research
12/29/2020

Learning a Dynamic Map of Visual Appearance

The appearance of the world varies dramatically not only from place to p...
research
09/16/2019

Learning to Map Nearly Anything

Looking at the world from above, it is possible to estimate many propert...
research
06/29/2015

Tell and Predict: Kernel Classifier Prediction for Unseen Visual Classes from Unstructured Text Descriptions

In this paper we propose a framework for predicting kernelized classifie...
research
09/19/2018

Learning to Interpret Satellite Images Using Wikipedia

Despite recent progress in computer vision, fine-grained interpretation ...

Please sign up or login with your details

Forgot password? Click here to reset