Learning an attention model in an artificial visual system

by   Alon Hazan, et al.

The Human visual perception of the world is of a large fixed image that is highly detailed and sharp. However, receptor density in the retina is not uniform: a small central region called the fovea is very dense and exhibits high resolution, whereas a peripheral region around it has much lower spatial resolution. Thus, contrary to our perception, we are only able to observe a very small region around the line of sight with high resolution. The perception of a complete and stable view is aided by an attention mechanism that directs the eyes to the numerous points of interest within the scene. The eyes move between these targets in quick, unconscious movements, known as "saccades". Once a target is centered at the fovea, the eyes fixate for a fraction of a second while the visual system extracts the necessary information. An artificial visual system was built based on a fully recurrent neural network set within a reinforcement learning protocol, and learned to attend to regions of interest while solving a classification task. The model is consistent with several experimentally observed phenomena, and suggests novel predictions.


Where to Look: A Unified Attention Model for Visual Recognition with Reinforcement Learning

The idea of using the recurrent neural network for visual attention has ...

ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions

We present ASSET, a neural architecture for automatically modifying an i...

StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation

We introduce a high resolution, 3D-consistent image and shape generation...

Recurrent Models of Visual Attention

Applying convolutional neural networks to large images is computationall...

Emergence of foveal image sampling from learning to attend in visual scenes

We describe a neural attention model with a learnable retinal sampling l...

FoveaTer: Foveated Transformer for Image Classification

Many animals and humans process the visual field with a varying spatial ...

Please sign up or login with your details

Forgot password? Click here to reset