Revisiting Deep Architectures for Head Motion Prediction in 360° Videos

Head motion prediction is an important problem with 360 videos, in particular to inform the streaming decisions. Various methods tackling this problem with deep neural networks have been proposed recently. In this article we first show the startling result that all such existing methods, which attempt to benefit both from the history of past positions and knowledge of the video content, perform worse than a simple no-motion baseline. We then propose an LSTM-based architecture which processes the positional information only. It is able to establish state-of-the-art performance and we consider it our position-only baseline. Through a thorough root cause analysis, we first show that the content can indeed inform the head position prediction for horizons longer than 2 to 3s, the trajectory inertia being predominant earlier. We also identify that a sequence-to-sequence auto-regressive framework is crucial to improve the prediction accuracy over longer prediction windows, and that a dedicated recurrent network handling the time series of positions is necessary to reach the performance of the position-only baseline in the early prediction steps. This allows to make the most of the positional information and ground-truth saliency. Finally we show how the level of noise in the estimated saliency impacts the architecture's performance, and we propose a new architecture establishing state-of-the-art performance with estimated saliency, supporting its assets with an ablation study.

READ FULL TEXT
research
06/06/2022

Subtitle-based Viewport Prediction for 360-degree Virtual Tourism Video

360-degree streaming videos can provide a rich immersive experiences to ...
research
03/01/2020

Deep Learning for Content-based Personalized Viewport Prediction of 360-Degree VR Videos

In this paper, the problem of head movement prediction for virtual reali...
research
11/20/2020

ATSal: An Attention Based Architecture for Saliency Prediction in 360 Videos

The spherical domain representation of 360 video/image presents many cha...
research
05/06/2017

On human motion prediction using recurrent neural networks

Human motion modelling is a classical problem at the intersection of gra...
research
09/21/2023

Using Saliency and Cropping to Improve Video Memorability

Video memorability is a measure of how likely a particular video is to b...
research
04/23/2023

A Neuro-Symbolic Approach for Enhanced Human Motion Prediction

Reasoning on the context of human beings is crucial for many real-world ...
research
03/11/2016

Learning Gaze Transitions from Depth to Improve Video Saliency Estimation

In this paper we introduce a novel Depth-Aware Video Saliency approach t...

Please sign up or login with your details

Forgot password? Click here to reset