Multiple Object Recognition with Visual Attention

12/24/2014
by   Jimmy Ba, et al.
0

We present an attention-based model for recognizing multiple objects in images. The proposed model is a deep recurrent neural network trained with reinforcement learning to attend to the most relevant regions of the input image. We show that the model learns to both localize and recognize multiple objects despite being given only class labels during training. We evaluate the model on the challenging task of transcribing house number sequences from Google Street View images and show that it is both more accurate than the state-of-the-art convolutional networks and uses fewer parameters and less computation.

READ FULL TEXT
research
06/12/2017

Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition

We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) - an...
research
06/24/2014

Recurrent Models of Visual Attention

Applying convolutional neural networks to large images is computationall...
research
02/15/2018

Teaching Machines to Code: Neural Markup Generation with Visual Attention

We present a deep recurrent neural network model with soft visual attent...
research
10/14/2016

Recurrent 3D Attentional Networks for End-to-End Active Object Recognition in Cluttered Scenes

Active vision is inherently attention-driven: The agent selects views of...
research
07/30/2019

Towards Pure End-to-End Learning for Recognizing Multiple Text Sequences from an Image

Here we address a challenging problem: recognizing multiple text sequenc...
research
02/26/2017

Seeing What Is Not There: Learning Context to Determine Where Objects Are Missing

Most of computer vision focuses on what is in an image. We propose to tr...
research
03/26/2019

Learning Where to See: A Novel Attention Model for Automated Immunohistochemical Scoring

Estimating over-amplification of human epidermal growth factor receptor ...

Please sign up or login with your details

Forgot password? Click here to reset