Learning Spatiotemporal Features with 3D Convolutional Networks

12/02/2014
by   Du Tran, et al.
0

We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. Our findings are three-fold: 1) 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets; 2) A homogeneous architecture with small 3x3x3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets; and 3) Our learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. In addition, the features are compact: achieving 52.8 with only 10 dimensions and also very efficient to compute due to the fast inference of ConvNets. Finally, they are conceptually very simple and easy to train and use.

READ FULL TEXT

page 6

page 10

page 11

page 12

page 13

page 14

page 15

page 16

research
12/13/2017

Rethinking Spatiotemporal Feature Learning For Video Understanding

In this paper we study 3D convolutional networks for video understanding...
research
01/06/2017

Abnormal Event Detection in Videos using Spatiotemporal Autoencoder

We present an efficient method for detecting anomalies in videos. Recent...
research
02/02/2020

Interpreting video features: a comparison of 3D convolutional networks and convolutional LSTM networks

A number of techniques for interpretability have been presented for deep...
research
04/04/2019

Video Classification with Channel-Separated Convolutional Networks

Group convolution has been shown to offer great computational savings in...
research
11/26/2019

Learning Efficient Video Representation with Video Shuffle Networks

3D CNN shows its strong ability in learning spatiotemporal representatio...
research
04/17/2016

Structured Sparse Convolutional Autoencoder

This paper aims to improve the feature learning in Convolutional Network...
research
10/10/2019

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

Computer vision has undergone a dramatic revolution in performance, driv...

Please sign up or login with your details

Forgot password? Click here to reset