Panoramic Vision Transformer for Saliency Detection in 360° Videos

09/19/2022
by   Heeseung Yun, et al.
0

360^∘ video saliency detection is one of the challenging benchmarks for 360^∘ video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360^∘ videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature. We present a new framework named Panoramic Vision Transformer (PAVER). We design the encoder using Vision Transformer with deformable convolution, which enables us not only to plug pretrained models from normal videos into our architecture without additional modules or finetuning but also to perform geometric approximation only once, unlike previous deep CNN-based approaches. Thanks to its powerful encoder, PAVER can learn the saliency from three simple relative relations among local patch features, outperforming state-of-the-art models for the Wild360 benchmark by large margins without supervision or auxiliary information like class activation. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision, including head movement.

READ FULL TEXT

page 5

page 13

page 14

research
09/15/2023

UniST: Towards Unifying Saliency Transformer for Video Saliency Prediction and Detection

Video saliency prediction and detection are thriving research domains th...
research
03/13/2023

MRET: Multi-resolution Transformer for Video Quality Assessment

No-reference video quality assessment (NR-VQA) for user generated conten...
research
06/06/2022

Subtitle-based Viewport Prediction for 360-degree Virtual Tourism Video

360-degree streaming videos can provide a rich immersive experiences to ...
research
08/22/2021

StarVQA: Space-Time Attention for Video Quality Assessment

The attention mechanism is blooming in computer vision nowadays. However...
research
06/04/2018

Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos

Automatic saliency prediction in 360 videos is critical for viewpoint gu...
research
07/06/2022

FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling

Current deep video quality assessment (VQA) methods are usually with hig...
research
11/16/2018

Saliency Supervision: An Intuitive and Effective Approach for Pain Intensity Regression

Getting pain intensity from face images is an important problem in auton...

Please sign up or login with your details

Forgot password? Click here to reset