Supersaliency: Predicting Smooth Pursuit-Based Attention with Slicing CNNs Improves Fixation Prediction for Naturalistic Videos

01/26/2018

∙

Predicting attention is a popular topic at the intersection of human and computer vision, but video saliency prediction has only recently begun to benefit from deep learning-based approaches. Even though most of the available video-based saliency data sets and models claim to target human observers' fixations, they fail to differentiate them from smooth pursuit (SP), a major eye movement type that is unique to perception of dynamic scenes. In this work, we aim to make this distinction explicit, to which end we (i) use both algorithmic and manual annotations of SP traces and other eye movements for two well-established video saliency data sets, (ii) train Slicing Convolutional Neural Networks (S-CNN) for saliency prediction on either fixation- or SP-salient locations, and (iii) evaluate ours and over 20 popular published saliency models on the two annotated data sets for predicting both SP and fixations, as well as on another data set of human fixations. Our proposed model, trained on an independent set of videos, outperforms the state-of-the-art saliency models in the task of SP prediction on all considered data sets. Moreover, this model also demonstrates superior performance in the prediction of "classical" fixation-based saliency. Our results emphasize the importance of selectively approaching training set construction for attention modelling.

READ FULL TEXT

Supersaliency: Predicting Smooth Pursuit-Based Attention with Slicing CNNs Improves Fixation Prediction for Naturalistic Videos

Sign in with Google

Consider DeepAI Pro