LIP: Local Importance-based Pooling

08/12/2019
by   Ziteng Gao, et al.
9

Spatial downsampling layers are favored in convolutional neural networks (CNNs) to downscale feature maps for larger receptive fields and less memory consumption. However, for discriminative tasks, there are possibilities that these layers lose the discriminative details due to improper pooling strategies, which could hinder the learning process and eventually result in suboptimal models. In this paper, we present a unified framework over the existing downsampling layers (e.g., average pooling, max pooling, and strided convolution) from a local importance perspective. In this framework, we analyze the problems of these widely-used pooling layers and figure out the criteria for designing an effective downsampling layer. According to this analysis, we propose a conceptually simple, general, and effective pooling layer based on local importance modeling, termed as Local Importance-based Pooling (LIP). LIP can automatically enhance discriminative features during the downsampling procedure by learning adaptive importance weights based on inputs in an end-to-end manner. Experiment results show that LIP consistently yields notable gains with different depths and different architectures on ImageNet classification. In the challenging MS COCO dataset, detectors with our LIP-ResNets as backbones obtain a consistent improvement (> 1.4%) over plain ResNets, and especially achieve state-of-the-art performance in detecting small objects.

READ FULL TEXT
research
02/12/2022

Fuzzy Pooling

Convolutional Neural Networks (CNNs) are artificial learning systems typ...
research
04/08/2018

Ordinal Pooling Networks: For Preserving Information over Shrinking Feature Maps

In the framework of convolutional neural networks that lie at the heart ...
research
10/07/2018

Hartley Spectral Pooling for Deep Learning

In most convolution neural networks (CNNs), downsampling hidden layers i...
research
01/21/2019

Learning Graph Pooling and Hybrid Convolutional Operations for Text Representations

With the development of graph convolutional networks (GCN), deep learnin...
research
11/23/2015

Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation

Deep neural networks with alternating convolutional, max-pooling and dec...
research
12/08/2022

Group Generalized Mean Pooling for Vision Transformer

Vision Transformer (ViT) extracts the final representation from either c...
research
03/11/2017

Viraliency: Pooling Local Virality

In our overly-connected world, the automatic recognition of virality - t...

Please sign up or login with your details

Forgot password? Click here to reset