Are all the frames equally important?

05/20/2019
by   Oleksii Sidorov, et al.
0

In this work, we address the problem of measuring and predicting temporal video saliency -- a measure which defines the importance of a video frame for human attention. Unlike the conventional spatial saliency which defines the location of the salient regions within a frame (as it is done for still images), temporal saliency considers importance of a frame as a whole and may not exist apart from context. The proposed interface is an interactive cursor-based algorithm for collecting experimental data about temporal saliency. We collect the first human responses and perform their analysis. As a result, we show that qualitatively, the produced scores have very explicit meaning of the semantic changes in a frame, while quantitatively being highly correlated between all the observers. Apart from that, we show that the proposed tool can simultaneously collect fixations similar to the ones produced by eye-tracker in a more affordable way. Further, this approach may be used for creation of first temporal saliency datasets which will allow training computational predictive algorithms. The proposed interface does not rely on any special equipment, which allows to run it remotely and cover a wide audience.

READ FULL TEXT
research
03/31/2022

Rethinking Video Salient Object Ranking

Salient Object Ranking (SOR) involves ranking the degree of saliency of ...
research
08/15/2019

TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection

TASED-Net is a 3D fully-convolutional network architecture for video sal...
research
09/19/2017

Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

Over the past few years, deep neural networks (DNNs) have exhibited grea...
research
03/11/2016

Learning Gaze Transitions from Depth to Improve Video Saliency Estimation

In this paper we introduce a novel Depth-Aware Video Saliency approach t...
research
01/26/2018

Supersaliency: Predicting Smooth Pursuit-Based Attention with Slicing CNNs Improves Fixation Prediction for Naturalistic Videos

Predicting attention is a popular topic at the intersection of human and...
research
03/31/2017

Semantic-driven Generation of Hyperlapse from 360^∘ Video

We present a system for converting a fully panoramic (360^∘) video into ...

Please sign up or login with your details

Forgot password? Click here to reset