Where to Look: A Unified Attention Model for Visual Recognition with Reinforcement Learning

11/13/2021
by   Gang Chen, et al.
0

The idea of using the recurrent neural network for visual attention has gained popularity in computer vision community. Although the recurrent attention model (RAM) leverages the glimpses with more large patch size to increasing its scope, it may result in high variance and instability. For example, we need the Gaussian policy with high variance to explore object of interests in a large image, which may cause randomized search and unstable learning. In this paper, we propose to unify the top-down and bottom-up attention together for recurrent visual attention. Our model exploits the image pyramids and Q-learning to select regions of interests in the top-down attention mechanism, which in turn to guide the policy search in the bottom-up approach. In addition, we add another two constraints over the bottom-up recurrent neural networks for better exploration. We train our model in an end-to-end reinforcement learning framework, and evaluate our method on visual classification tasks. The experimental results outperform convolutional neural networks (CNNs) baseline and the bottom-up recurrent attention models on visual classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2014

Recurrent Models of Visual Attention

Applying convolutional neural networks to large images is computationall...
research
04/28/2018

CRAM: Clued Recurrent Attention Model

To overcome the poor scalability of convolutional neural network, recurr...
research
01/24/2017

Learning an attention model in an artificial visual system

The Human visual perception of the world is of a large fixed image that ...
research
12/15/2017

Pre-training Attention Mechanisms

Recurrent neural networks with differentiable attention mechanisms have ...
research
10/10/2019

NEURO-DRAM: a 3D recurrent visual attention model for interpretable neuroimaging classification

Deep learning is attracting significant interest in the neuroimaging com...
research
11/03/2021

Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention

Most feedforward convolutional neural networks spend roughly the same ef...
research
11/14/2017

Reinforcement Learning in a large scale photonic Recurrent Neural Network

Photonic Neural Network implementations have been gaining considerable a...

Please sign up or login with your details

Forgot password? Click here to reset