A Billion-scale Foundation Model for Remote Sensing Images

04/11/2023
by   Keumgang Cha, et al.
0

As the potential of foundation models in visual tasks has garnered significant attention, pretraining these models before downstream tasks has become a crucial step. The three key factors in pretraining foundation models are the pretraining method, the size of the pretraining dataset, and the number of model parameters. Recently, research in the remote sensing field has focused primarily on the pretraining method and the size of the dataset, with limited emphasis on the number of model parameters. This paper addresses this gap by examining the effect of increasing the number of model parameters on the performance of foundation models in downstream tasks such as rotated object detection and semantic segmentation. We pretrained foundation models with varying numbers of parameters, including 86M, 605.26M, 1.3B, and 2.4B, to determine whether performance in downstream tasks improved with an increase in parameters. To the best of our knowledge, this is the first billion-scale foundation model in the remote sensing field. Furthermore, we propose an effective method for scaling up and fine-tuning a vision transformer in the remote sensing field. To evaluate general performance in downstream tasks, we employed the DOTA v2.0 and DIOR-R benchmark datasets for rotated object detection, and the Potsdam and LoveDA datasets for semantic segmentation. Experimental results demonstrated that, across all benchmark datasets and downstream tasks, the performance of the foundation models and data efficiency improved as the number of parameters increased. Moreover, our models achieve the state-of-the-art performance on several datasets including DIOR-R, Postdam, and LoveDA.

READ FULL TEXT

page 1

page 3

page 6

page 8

page 12

research
04/18/2023

SAM Fails to Segment Anything? – SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, and More

The emergence of large models, also known as foundation models, has brou...
research
11/03/2022

Could Giant Pretrained Image Models Extract Universal Representations?

Frozen pretrained models have become a viable alternative to the pretrai...
research
06/19/2023

RemoteCLIP: A Vision Language Foundation Model for Remote Sensing

General-purpose foundation models have become increasingly important in ...
research
04/05/2023

Towards Efficient Task-Driven Model Reprogramming with Foundation Models

Vision foundation models exhibit impressive power, benefiting from the e...
research
09/15/2023

BROW: Better featuRes fOr Whole slide image based on self-distillation

Whole slide image (WSI) processing is becoming part of the key component...
research
09/20/2023

Hyperspectral Benchmark: Bridging the Gap between HSI Applications through Comprehensive Dataset and Pretraining

Hyperspectral Imaging (HSI) serves as a non-destructive spatial spectros...
research
06/14/2021

Automatically eliminating seam lines with Poisson editing in complex relative radiometric normalization mosaicking scenarios

Relative radiometric normalization (RRN) mosaicking among multiple remot...

Please sign up or login with your details

Forgot password? Click here to reset