Bounding Boxes Are All We Need: Street View Image Classification via Context Encoding of Detected Buildings

by   Kun Zhao, et al.

Street view images have been increasingly used in tasks like urban land use classification and urban functional zone portraying. Street view image classification is difficult because the class labels such as commercial area, are concepts with higher abstract level compared to general visual tasks. Therefore, classification models using only visual features often fail to achieve satisfactory performance. We believe that the efficient representation of significant objects and their context relations in street view images are the keys to solve this problem. In this paper, a novel approach based on a detector-encoder-classifier framework is proposed. Different from common image-level end-to-end models, our approach does not use visual features of the whole image directly. The proposed framework obtains the bounding boxes of buildings in street view images from a detector. Their contextual information such as building classes and positions are then encoded into metadata and finally classified by a recurrent neural network (RNN). To verify our approach, we made a dataset of 19,070 street view images and 38,857 buildings based on the BIC_GSV dataset through a combination of automatic label acquisition and expert annotation. The dataset can be used not only for street view image classification aiming at urban land use analysis, but also for multi-class building detection. Experiments show that the proposed approach achieves a 12.65 the models based on end-to-end convolutional neural network (CNN). Our code and dataset are available at



There are no comments yet.


page 1

page 4

page 6

page 12

page 13

page 17

page 18

page 19


Building Instance Classification Using Street View Images

Land-use classification based on spaceborne or aerial remote sensing ima...

Building Facade Parsing R-CNN

Building facade parsing, which predicts pixel-level labels for building ...

Holistic Multi-View Building Analysis in the Wild with Projection Pooling

We address six different classification tasks related to fine-grained bu...

Fast and Regularized Reconstruction of Building Façades from Street-View Images using Binary Integer Programming

Regularized arrangement of primitives on building façades to aligned loc...

Take a Look Around: Using Street View and Satellite Images to Estimate House Prices

When an individual purchases a home, they simultaneously purchase its st...

Functional Map of the World

We present a new dataset, Functional Map of the World (fMoW), which aims...

Large Scale Business Discovery from Street Level Imagery

Search with local intent is becoming increasingly useful due to the popu...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.