Semantic Adversarial Network with Multi-scale Pyramid Attention for Video Classification

03/06/2019
by   De Xie, et al.
16

Two-stream architecture have shown strong performance in video classification task. The key idea is to learn spatio-temporal features by fusing convolutional networks spatially and temporally. However, there are some problems within such architecture. First, it relies on optical flow to model temporal information, which are often expensive to compute and store. Second, it has limited ability to capture details and local context information for video data. Third, it lacks explicit semantic guidance that greatly decrease the classification performance. In this paper, we proposed a new two-stream based deep framework for video classification to discover spatial and temporal information only from RGB frames, moreover, the multi-scale pyramid attention (MPA) layer and the semantic adversarial learning (SAL) module is introduced and integrated in our framework. The MPA enables the network capturing global and local feature to generate a comprehensive representation for video, and the SAL can make this representation gradually approximate to the real video semantics in an adversarial manner. Experimental results on two public benchmarks demonstrate our proposed methods achieves state-of-the-art results on standard video datasets.

READ FULL TEXT

page 1

page 3

page 7

research
03/04/2019

Spatiotemporal Pyramid Network for Video Action Recognition

Two-stream convolutional networks have shown strong performance in video...
research
04/22/2016

Convolutional Two-Stream Network Fusion for Video Action Recognition

Recent applications of Convolutional Neural Networks (ConvNets) for huma...
research
11/10/2022

Efficient Unsupervised Video Object Segmentation Network Based on Motion Guidance

Considerable unsupervised video object segmentation algorithms based on ...
research
03/28/2018

Adversarial Spatio-Temporal Learning for Video Deblurring

Camera shake or target movement often leads to undesired blur effects in...
research
03/01/2020

STC-Flow: Spatio-temporal Context-aware Optical Flow Estimation

In this paper, we propose a spatio-temporal contextual network, STC-Flow...
research
07/27/2021

Multi-Scale Local-Temporal Similarity Fusion for Continuous Sign Language Recognition

Continuous sign language recognition (cSLR) is a public significant task...
research
12/24/2020

An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time Video Enhancement

Video enhancement is a challenging problem, more than that of stills, ma...

Please sign up or login with your details

Forgot password? Click here to reset