ViTs for SITS: Vision Transformers for Satellite Image Time Series

01/12/2023
by   Michail Tarasiou, et al.
0

In this paper we introduce the Temporo-Spatial Vision Transformer (TSViT), a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT). TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial encoder. We argue, that in contrast to natural images, a temporal-then-spatial factorization is more intuitive for SITS processing and present experimental evidence for this claim. Additionally, we enhance the model's discriminative power by introducing two novel mechanisms for acquisition-time-specific temporal positional encodings and multiple learnable class tokens. The effect of all novel design choices is evaluated through an extensive ablation study. Our proposed architecture achieves state-of-the-art performance, surpassing previous approaches by a significant margin in three publicly available SITS semantic segmentation and classification datasets. All model, training and evaluation codes are made publicly available to facilitate further research.

READ FULL TEXT
research
07/16/2021

Panoptic Segmentation of Satellite Image Time Series with Convolutional Temporal Attention Networks

Unprecedented access to multi-temporal satellite imagery has opened new ...
research
02/10/2021

NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting

Although Transformer has made breakthrough success in widespread domains...
research
03/29/2021

ViViT: A Video Vision Transformer

We present pure-transformer based models for video classification, drawi...
research
03/27/2023

Learning Expressive Prompting With Residuals for Vision Transformers

Prompt learning is an efficient approach to adapt transformers by insert...
research
10/12/2021

Satellite Image Semantic Segmentation

In this paper, we propose a method for the automatic semantic segmentati...
research
06/01/2021

TransVOS: Video Object Segmentation with Transformers

Recently, Space-Time Memory Network (STM) based methods have achieved st...
research
07/11/2023

PIGEON: Predicting Image Geolocations

We introduce PIGEON, a multi-task end-to-end system for planet-scale ima...

Please sign up or login with your details

Forgot password? Click here to reset