Residual Frames with Efficient Pseudo-3D CNN for Human Action Recognition

08/03/2020
by   Jiawei Chen, et al.
0

Human action recognition is regarded as a key cornerstone in domains such as surveillance or video understanding. Despite recent progress in the development of end-to-end solutions for video-based action recognition, achieving state-of-the-art performance still requires using auxiliary hand-crafted motion representations, e.g., optical flow, which are usually computationally demanding. In this work, we propose to use residual frames (i.e., differences between adjacent RGB frames) as an alternative "lightweight" motion representation, which carries salient motion information and is computationally efficient. In addition, we develop a new pseudo-3D convolution module which decouples 3D convolution into 2D and 1D convolution. The proposed module exploits residual information in the feature space to better structure motions, and is equipped with a self-attention mechanism that assists to recalibrate the appearance and motion features. Empirical results confirm the efficiency and effectiveness of residual frames as well as the proposed pseudo-3D convolution module.

READ FULL TEXT
research
01/16/2020

Rethinking Motion Representation: Residual Frames with 3D ConvNets for Better Action Recognition

Recently, 3D convolutional networks yield good performance in action rec...
research
07/16/2020

Challenge report:VIPriors Action Recognition Challenge

This paper is a brief report to our submission to the VIPriors Action Re...
research
06/21/2020

Motion Representation Using Residual Frames with 3D CNN

Recently, 3D convolutional networks (3D ConvNets) yield good performance...
research
03/22/2019

On the Importance of Video Action Recognition for Visual Lipreading

We focus on the word-level visual lipreading, which requires to decode t...
research
11/23/2022

Dynamic Appearance: A Video Representation for Action Recognition with Joint Training

Static appearance of video may impede the ability of a deep neural netwo...
research
10/17/2021

TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding

Most of existing video action recognition models ingest raw RGB frames. ...
research
05/04/2021

Motion-Augmented Self-Training for Video Recognition at Smaller Scale

The goal of this paper is to self-train a 3D convolutional neural networ...

Please sign up or login with your details

Forgot password? Click here to reset