VS-Net: Multiscale Spatiotemporal Features for Lightweight Video Salient Document Detection

01/11/2023
by   Hemraj Singh, et al.
0

Video Salient Document Detection (VSDD) is an essential task of practical computer vision, which aims to highlight visually salient document regions in video frames. Previous techniques for VSDD focus on learning features without considering the cooperation among and across the appearance and motion cues and thus fail to perform in practical scenarios. Moreover, most of the previous techniques demand high computational resources, which limits the usage of such systems in resource-constrained settings. To handle these issues, we propose VS-Net, which captures multi-scale spatiotemporal information with the help of dilated depth-wise separable convolution and Approximation Rank Pooling. VS-Net extracts the key features locally from each frame across embedding sub-spaces and forwards the features between adjacent and parallel nodes, enhancing model performance globally. Our model generates saliency maps considering both the background and foreground simultaneously, making it perform better in challenging scenarios. The immense experiments regulated on the benchmark MIDV-500 dataset show that the VS-Net model outperforms state-of-the-art approaches in both time and robustness measures.

READ FULL TEXT
research
08/15/2019

TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection

TASED-Net is a 3D fully-convolutional network architecture for video sal...
research
08/04/2017

Region-Based Multiscale Spatiotemporal Saliency for Video

Detecting salient objects from a video requires exploiting both spatial ...
research
08/04/2017

Video Salient Object Detection Using Spatiotemporal Deep Features

This paper presents a method for detecting salient objects in videos whe...
research
11/29/2019

CAGNet: Content-Aware Guidance for Salient Object Detection

Beneficial from Fully Convolutional Neural Networks (FCNs), saliency det...
research
11/20/2020

SalSum: Saliency-based Video Summarization using Generative Adversarial Networks

The huge amount of video data produced daily by camera-based systems, su...
research
08/04/2020

Select, Extract and Generate: Neural Keyphrase Generation with Syntactic Guidance

In recent years, deep neural sequence-to-sequence framework has demonstr...
research
04/19/2018

CANDID: Robust Change Dynamics and Deterministic Update Policy for Dynamic Background Subtraction

Background subtraction in video provides the preliminary information whi...

Please sign up or login with your details

Forgot password? Click here to reset