Self-Attentive Pooling for Efficient Deep Learning

09/16/2022
by   Fang Chen, et al.
0

Efficient custom pooling techniques that can aggressively trim the dimensions of a feature map and thereby reduce inference compute and memory footprint for resource-constrained computer vision applications have recently gained significant traction. However, prior pooling works extract only the local context of the activation maps, limiting their effectiveness. In contrast, we propose a novel non-local self-attentive pooling method that can be used as a drop-in replacement to the standard pooling layers, such as max/average pooling or strided convolution. The proposed self-attention module uses patch embedding, multi-head self-attention, and spatial-channel restoration, followed by sigmoid activation and exponential soft-max. This self-attention mechanism efficiently aggregates dependencies between non-local activation patches during down-sampling. Extensive experiments on standard object classification and detection tasks with various convolutional neural network (CNN) architectures demonstrate the superiority of our proposed mechanism over the state-of-the-art (SOTA) pooling techniques. In particular, we surpass the test accuracy of existing pooling techniques on different variants of MobileNet-V2 on ImageNet by an average of 1.2 in the initial layers (providing up to 22x reduction in memory consumption), our approach achieves 1.43 with iso-memory footprints. This enables the deployment of our models in memory-constrained devices, such as micro-controllers (without losing significant accuracy), because the initial activation maps consume a significant amount of on-chip memory for high-resolution images required for complex vision tasks. Our proposed pooling method also leverages the idea of channel pruning to further reduce memory footprints.

READ FULL TEXT

page 1

page 3

page 7

research
08/22/2022

Mix-Pooling Strategy for Attention Mechanism

Recently many effective self-attention modules are proposed to boot the ...
research
12/08/2022

Group Generalized Mean Pooling for Vision Transformer

Vision Transformer (ViT) extracts the final representation from either c...
research
12/27/2021

Augmenting Convolutional networks with attention-based aggregation

We show how to augment any convolutional network with an attention-based...
research
03/26/2022

Exploring Self-Attention for Visual Intersection Classification

In robot vision, self-attention has recently emerged as a technique for ...
research
11/01/2021

AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling

Pooling layers are essential building blocks of Convolutional Neural Net...
research
07/13/2019

Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge

In this report, the Brno University of Technology (BUT) team submissions...
research
02/27/2020

RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference

Pooling operators are key components in most Convolutional Neural Networ...

Please sign up or login with your details

Forgot password? Click here to reset