Video Saliency Prediction Using Enhanced Spatiotemporal Alignment Network

01/02/2020
by   Jin Chen, et al.
13

Due to a variety of motions across different frames, it is highly challenging to learn an effective spatiotemporal representation for accurate video saliency prediction (VSP). To address this issue, we develop an effective spatiotemporal feature alignment network tailored to VSP, mainly including two key sub-networks: a multi-scale deformable convolutional alignment network (MDAN) and a bidirectional convolutional Long Short-Term Memory (Bi-ConvLSTM) network. The MDAN learns to align the features of the neighboring frames to the reference one in a coarse-to-fine manner, which can well handle various motions. Specifically, the MDAN owns a pyramidal feature hierarchy structure that first leverages deformable convolution (Dconv) to align the lower-resolution features across frames, and then aggregates the aligned features to align the higher-resolution features, progressively enhancing the features from top to bottom. The output of MDAN is then fed into the Bi-ConvLSTM for further enhancement, which captures the useful long-time temporal information along forward and backward timing directions to effectively guide attention orientation shift prediction under complex scene transformation. Finally, the enhanced features are decoded to generate the predicted saliency map. The proposed model is trained end-to-end without any intricate post processing. Extensive evaluations on four VSP benchmark datasets demonstrate that the proposed method achieves favorable performance against state-of-the-art methods. The source codes and all the results will be released.

READ FULL TEXT

page 7

page 8

page 14

research
03/28/2022

Pyramid Feature Alignment Network for Video Deblurring

Video deblurring remains a challenging task due to various causes of blu...
research
05/07/2019

EDVR: Video Restoration with Enhanced Deformable Convolutional Networks

Video restoration tasks, including super-resolution, deblurring, etc, ar...
research
07/16/2021

Progressive Deep Video Dehazing without Explicit Alignment Estimation

To solve the issue of video dehazing, there are two main tasks to attain...
research
03/27/2016

Recurrent Mixture Density Network for Spatiotemporal Visual Attention

In many computer vision tasks, the relevant information to solve the pro...
research
01/22/2022

DCNGAN: A Deformable Convolutional-Based GAN with QP Adaptation for Perceptual Quality Enhancement of Compressed Video

In this paper, we propose a deformable convolution-based generative adve...
research
11/09/2018

Semantic and Contrast-Aware Saliency

In this paper, we proposed an integrated model of semantic-aware and con...

Please sign up or login with your details

Forgot password? Click here to reset