Towards DeepSentinel: An extensible corpus of labelled Sentinel-1 and -2 imagery and a general-purpose sensor-fusion semantic embedding model
Earth observation offers new insight into anthropogenic changes to nature, and how these changes are effecting (and are effected by) the built environment and the real economy. With the global availability of medium-resolution (10-30m) synthetic aperture radar (SAR) Sentinel-1 and multispectral Sentinel-2 imagery, machine learning can be employed to offer these insights at scale, unbiased to the reporting of companies and countries. In this paper, I introduce DeepSentinel, a data pipeline and experimentation framework for producing general-purpose semantic embeddings of paired Sentinel-1 and Sentinel-2 imagery. I document the development of an extensible corpus of labelled and unlabelled imagery for the purposes of sensor fusion research. With this new dataset I develop a set of experiments applying popular self-supervision methods and encoder architectures to a land cover classification problem. Tile2vec spatial encoding with a self-attention enabled ResNet model outperforms deeper ResNet variants as well as pretraining with variational autoencoding and contrastive loss. All supporting and derived data and code are made publicly available.
READ FULL TEXT