Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks

11/28/2017
by   Zhaofan Qiu, et al.
0

Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal video representation. A few studies have shown that performing 3D convolutions is a rewarding approach to capture both spatial and temporal dimensions in videos. However, the development of a very deep 3D CNN from scratch results in expensive computational cost and memory demand. A valid question is why not recycle off-the-shelf 2D networks for a 3D CNN. In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating 3×3×3 convolutions with 1×3×3 convolutional filters on spatial domain (equivalent to 2D CNN) plus 3×1×1 convolutions to construct temporal connections on adjacent feature maps in time. Furthermore, we propose a new architecture, named Pseudo-3D Residual Net (P3D ResNet), that exploits all the variants of blocks but composes each in different placement of ResNet, following the philosophy that enhancing structural diversity with going deep could improve the power of neural networks. Our P3D ResNet achieves clear improvements on Sports-1M video classification dataset against 3D CNN and frame-based 2D CNN by 5.3 respectively. We further examine the generalization performance of video representation produced by our pre-trained P3D ResNet on five different benchmarks and three different tasks, demonstrating superior performances over several state-of-the-art techniques.

READ FULL TEXT
research
06/13/2019

Learning Spatio-Temporal Representation with Local and Global Diffusion

Convolutional Neural Networks (CNN) have been regarded as a powerful cla...
research
02/18/2020

V4D:4D Convolutional Neural Networks for Video-level Representation Learning

Most existing 3D CNNs for video representation learning are clip-based m...
research
07/25/2018

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net

Convolutional neural networks (CNNs) have achieved great successes in ma...
research
01/03/2021

RegNet: Self-Regulated Network for Image Classification

The ResNet and its variants have achieved remarkable successes in variou...
research
06/19/2018

Spatio-Temporal Channel Correlation Networks for Action Classification

The work in this paper is driven by the question if spatio-temporal corr...
research
06/13/2017

SEP-Nets: Small and Effective Pattern Networks

While going deeper has been witnessed to improve the performance of conv...
research
10/22/2019

4-Connected Shift Residual Networks

The shift operation was recently introduced as an alternative to spatial...

Please sign up or login with your details

Forgot password? Click here to reset