Rethinking Spatiotemporal Feature Learning For Video Understanding

12/13/2017
by   Saining Xie, et al.
0

In this paper we study 3D convolutional networks for video understanding tasks. Our starting point is the state-of-the-art I3D model, which "inflates" all the 2D filters of the Inception architecture to 3D. We first consider "deflating" the I3D model at various levels to understand the role of 3D convolutions. Interestingly, we found that 3D convolutions at the top layers of the network contribute more than 3D convolutions at the bottom layers, while also being computationally more efficient. This indicates that I3D is better at capturing high-level temporal patterns than low-level motion signals. We also consider replacing 3D convolutions with spatiotemporal-separable 3D convolutions (i.e., replacing convolution using a k * k * k filter with 1 * k * k followed by k * 1 * 1 filters); we show that such a model, which we call S3D, is 1.5x more computationally efficient (in terms of FLOPS) than I3D, and achieves better accuracy. Finally, we explore spatiotemporal feature gating on top of S3D. The resulting model, which we call S3D-G, outperforms the state-of-the-art I3D model by 3.5 by 34 other action classification (UCF-101 and HMDB-51) and detection (UCF-101 and JHMDB) datasets.

READ FULL TEXT
research
11/30/2017

A Closer Look at Spatiotemporal Convolutions for Action Recognition

In this paper we discuss several forms of spatiotemporal convolutions fo...
research
12/02/2014

Learning Spatiotemporal Features with 3D Convolutional Networks

We propose a simple, yet effective approach for spatiotemporal feature l...
research
01/16/2017

Towards a New Interpretation of Separable Convolutions

In recent times, the use of separable convolutions in deep convolutional...
research
09/15/2020

Comparison of Spatiotemporal Networks for Learning Video Related Tasks

Many methods for learning from video sequences involve temporally proces...
research
04/23/2021

Skip-Convolutions for Efficient Video Processing

We propose Skip-Convolutions to leverage the large amount of redundancie...
research
06/03/2019

Separable Layers Enable Structured Efficient Linear Substitutions

In response to the development of recent efficient dense layers, this pa...
research
06/05/2019

Butterfly Transform: An Efficient FFT Based Neural Architecture Design

In this paper, we introduce the Butterfly Transform (BFT), a light weigh...

Please sign up or login with your details

Forgot password? Click here to reset