Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention

11/03/2021
by   Sia Huat Tan, et al.
2

Most feedforward convolutional neural networks spend roughly the same efforts for each pixel. Yet human visual recognition is an interaction between eye movements and spatial attention, which we will have several glimpses of an object in different regions. Inspired by this observation, we propose an end-to-end trainable Multi-Glimpse Network (MGNet) which aims to tackle the challenges of high computation and the lack of robustness based on recurrent downsampled attention mechanism. Specifically, MGNet sequentially selects task-relevant regions of an image to focus on and then adaptively combines all collected information for the final prediction. MGNet expresses strong resistance against adversarial attacks and common corruptions with less computation. Also, MGNet is inherently more interpretable as it explicitly informs us where it focuses during each iteration. Our experiments on ImageNet100 demonstrate the potential of recurrent downsampled attention mechanisms to improve a single feedforward manner. For example, MGNet improves 4.76 cost. Moreover, while the baseline incurs an accuracy drop to 7.6 manages to maintain 44.2 ResNet-50 backbone. Our code is available at https://github.com/siahuat0727/MGNet.

READ FULL TEXT

page 1

page 2

page 4

page 18

page 19

research
02/13/2020

Recurrent Attention Model with Log-Polar Mapping is Robust against Adversarial Attacks

Convolutional neural networks are vulnerable to small ℓ^p adversarial at...
research
08/07/2021

Information Bottleneck Approach to Spatial Attention Learning

The selective visual attention mechanism in the human visual system (HVS...
research
02/08/2023

Cross-Layer Retrospective Retrieving via Layer Attention

More and more evidence has shown that strengthening layer interactions c...
research
11/13/2021

Where to Look: A Unified Attention Model for Visual Recognition with Reinforcement Learning

The idea of using the recurrent neural network for visual attention has ...
research
11/15/2021

A Probabilistic Hard Attention Model For Sequentially Observed Scenes

A visual hard attention model actively selects and observes a sequence o...
research
08/23/2019

Assessing Knee OA Severity with CNN attention-based end-to-end architectures

This work proposes a novel end-to-end convolutional neural network (CNN)...
research
07/01/2022

TopicFM: Robust and Interpretable Feature Matching with Topic-assisted

Finding correspondences across images is an important task in many visua...

Please sign up or login with your details

Forgot password? Click here to reset