DeepAI AI Chat
Log In Sign Up

Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling

by   Yang Long, et al.

Given an aerial image, aerial scene parsing (ASP) targets to interpret the semantic structure of the image content, e.g., by assigning a semantic label to every pixel of the image. With the popularization of data-driven methods, the past decades have witnessed promising progress on ASP by approaching the problem with the schemes of tile-level scene classification or segmentation-based image analysis, when using high-resolution aerial images. However, the former scheme often produces results with tile-wise boundaries, while the latter one needs to handle the complex modeling process from pixels to semantics, which often requires large-scale and well-annotated image samples with pixel-wise semantic labels. In this paper, we address these issues in ASP, with perspectives from tile-level scene classification to pixel-wise semantic labeling. Specifically, we first revisit aerial image interpretation by a literature review. We then present a large-scale scene classification dataset that contains one million aerial images termed Million-AID. With the presented dataset, we also report benchmarking experiments using classical convolutional neural networks (CNNs). Finally, we perform ASP by unifying the tile-level scene classification and object-based image analysis to achieve pixel-wise semantic labeling. Intensive experiments show that Million-AID is a challenging yet useful dataset, which can serve as a benchmark for evaluating newly developed algorithms. When transferring knowledge from Million-AID, fine-tuning CNN models pretrained on Million-AID perform consistently better than those pretrained ImageNet for aerial scene classification. Moreover, our designed hierarchical multi-task learning method achieves the state-of-the-art pixel-wise classification on the challenging GID, bridging the tile-level scene classification toward pixel-wise semantic labeling for aerial image interpretation.


page 4

page 7

page 8

page 9

page 12

page 15

page 19

page 21


AID++: An Updated Version of AID on Scene Classification

Aerial image scene classification is a fundamental problem for understan...

CubiCasa5K: A Dataset and an Improved Multi-Task Model for Floorplan Image Analysis

Better understanding and modelling of building interiors and the emergen...

Object Boundary Detection and Classification with Image-level Labels

Semantic boundary and edge detection aims at simultaneously detecting ob...

Aerial-PASS: Panoramic Annular Scene Segmentation in Drone Videos

Aerial pixel-wise scene perception of the surrounding environment is an ...

High-Resolution Semantic Labeling with Convolutional Neural Networks

Convolutional neural networks (CNNs) have received increasing attention ...

Object-Based Image Coding: A Learning-Driven Revisit

The Object-Based Image Coding (OBIC) that was extensively studied about ...

ImageSpirit: Verbal Guided Image Parsing

Humans describe images in terms of nouns and adjectives while algorithms...