Focus of Attention Improves Information Transfer in Visual Features

06/16/2020
by   Matteo Tiezzi, et al.
10

Unsupervised learning from continuous visual streams is a challenging problem that cannot be naturally and efficiently managed in the classic batch-mode setting of computation. The information stream must be carefully processed accordingly to an appropriate spatio-temporal distribution of the visual data, while most approaches of learning commonly assume uniform probability density. In this paper we focus on unsupervised learning for transferring visual information in a truly online setting by using a computational model that is inspired to the principle of least action in physics. The maximization of the mutual information is carried out by a temporal process which yields online estimation of the entropy terms. The model, which is based on second-order differential equations, maximizes the information transfer from the input to a discrete space of symbols related to the visual features of the input, whose computation is supported by hidden neurons. In order to better structure the input probability distribution, we use a human-like focus of attention model that, coherently with the information maximization model, is also based on second-order differential equations. We provide experimental results to support the theory by showing that the spatio-temporal filtering induced by the focus of attention allows the system to globally transfer more information from the input stream over the focused areas and, in some contexts, over the whole frames with respect to the unfiltered case that yields uniform probability distributions.

READ FULL TEXT
research
04/12/2019

Spatio-Temporal Deep Graph Infomax

Spatio-temporal graphs such as traffic networks or gene regulatory syste...
research
08/09/2023

Hierarchical Representations for Spatio-Temporal Visual Attention Modeling and Understanding

This PhD. Thesis concerns the study and development of hierarchical repr...
research
11/08/2020

Right on Time: Multi-Temporal Convolutions for Human Action Recognition in Videos

The variations in the temporal performance of human actions observed in ...
research
10/16/2021

Visual-aware Attention Dual-stream Decoder for Video Captioning

Video captioning is a challenging task that captures different visual pa...
research
08/06/2018

Incorporating Scalability in Unsupervised Spatio-Temporal Feature Learning

Deep neural networks are efficient learning machines which leverage upon...

Please sign up or login with your details

Forgot password? Click here to reset