Unsupervised Foveal Vision Neural Networks with Top-Down Attention

by   Ryan Burt, et al.

Deep learning architectures are an extremely powerful tool for recognizing and classifying images. However, they require supervised learning and normally work on vectors the size of image pixels and produce the best results when trained on millions of object images. To help mitigate these issues, we propose the fusion of bottom-up saliency and top-down attention employing only unsupervised learning techniques, which helps the object recognition module to focus on relevant data and learn important features that can later be fine-tuned for a specific task. In addition, by utilizing only relevant portions of the data, the training speed can be greatly improved. We test the performance of the proposed Gamma saliency technique on the Toronto and CAT2000 databases, and the foveated vision in the Street View House Numbers (SVHN) database. The results in foveated vision show that Gamma saliency is comparable to the best and computationally faster. The results in SVHN show that our unsupervised cognitive architecture is comparable to fully supervised methods and that the Gamma saliency also improves CNN performance if desired. We also develop a topdown attention mechanism based on the Gamma saliency applied to the top layer of CNNs to improve scene understanding in multi-object images or images with strong background clutter. When we compare the results with human observers in an image dataset of animals occluded in natural scenes, we show that topdown attention is capable of disambiguating object from background and improves system performance beyond the level of human observers.


page 6

page 10

page 11

page 17

page 19

page 20

page 21

page 22


Implicit Saliency in Deep Neural Networks

In this paper, we show that existing recognition and localization deep a...

Saliency for Fine-grained Object Recognition in Domains with Scarce Training Data

This paper investigates the role of saliency to improve the classificati...

Reproduction of Lateral Inhibition-Inspired Convolutional Neural Network for Visual Attention and Saliency Detection

In recent years, neural networks have continued to flourish, achieving h...

DISC: Deep Image Saliency Computing via Progressive Representation Learning

Salient object detection increasingly receives attention as an important...

Recurrent 3D Attentional Networks for End-to-End Active Object Recognition in Cluttered Scenes

Active vision is inherently attention-driven: The agent selects views of...

Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks

We introduce a saliency-based distortion layer for convolutional neural ...

DeepGaze II: Reading fixations from deep features trained on object recognition

Here we present DeepGaze II, a model that predicts where people look in ...