Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

06/18/2014
by   Kaiming He, et al.
0

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224x224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102x faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.

READ FULL TEXT

page 1

page 3

page 9

page 13

research
03/04/2015

Temporal Pyramid Pooling Based Convolutional Neural Networks for Action Recognition

Encouraged by the success of Convolutional Neural Networks (CNNs) in ima...
research
07/30/2018

Efficient feature learning and multi-size image steganalysis based on CNN

For steganalysis, many studies showed that convolutional neural network ...
research
12/03/2020

D-Unet: A Dual-encoder U-Net for Image Splicing Forgery Detection and Localization

Recently, many detection methods based on convolutional neural networks ...
research
04/15/2016

Latent Model Ensemble with Auto-localization

Deep Convolutional Neural Networks (CNN) have exhibited superior perform...
research
05/29/2020

Fixed-size Objects Encoding for Visual Relationship Detection

In this paper, we propose a fixed-size object encoding method (FOE-VRD) ...
research
10/04/2022

How deep convolutional neural networks lose spatial information with training

A central question of machine learning is how deep nets manage to learn ...
research
12/16/2012

Visual Objects Classification with Sliding Spatial Pyramid Matching

We present a method for visual object classification using only a single...

Please sign up or login with your details

Forgot password? Click here to reset