Digging Deeper into Egocentric Gaze Prediction

by   Hamed R. Tavakoli, et al.

This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed as representatives of top-down information. We also look into the contribution of these factors by investigating a simple recurrent neural model for ego-centric gaze prediction. First, deep features are extracted for all input video frames. Then, a gated recurrent unit is employed to integrate information over time and to predict the next fixation. We also propose an integrated model that combines the recurrent model with several top-down and bottom-up cues. Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up saliency models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction. Our findings suggest that (1) there should be more emphasis on hand-object interaction and (2) the egocentric vision community should consider larger datasets including diverse stimuli and more subjects.


page 1

page 2

page 3

page 4

page 5

page 7

page 8

page 9


Fixation prediction with a combined model of bottom-up saliency and vanishing point

By predicting where humans look in natural scenes, we can understand how...

GASP: Gated Attention For Saliency Prediction

Saliency prediction refers to the computational task of modeling overt a...

Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition

We present a new computational model for gaze prediction in egocentric v...

Vanishing point attracts gaze in free-viewing and visual search tasks

To investigate whether the vanishing point (VP) plays a significant role...

Unsupervised Gaze Prediction in Egocentric Videos by Energy-based Surprise Modeling

Egocentric perception has grown rapidly with the advent of immersive com...

Noise-Aware Saliency Prediction for Videos with Incomplete Gaze Data

Deep-learning-based algorithms have led to impressive results in visual-...

Characterizing the Temporal Dynamics of Information in Visually Guided Predictive Control Using LSTM Recurrent Neural Networks

Theories for visually guided action account for online control in the pr...

Please sign up or login with your details

Forgot password? Click here to reset