DeepAI AI Chat
Log In Sign Up

Video Classification with Channel-Separated Convolutional Networks

by   Du Tran, et al.

Group convolution has been shown to offer great computational savings in various 2D convolutional architectures for image classification. It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks. This paper studies different effects of group convolution in 3D convolutional networks for video classification. We empirically demonstrate that the amount of channel interactions plays an important role in the accuracy of group convolutional networks. Our experiments suggest two main findings. First, it is a good practice to factorize 3D convolutions by separating channel interactions and spatiotemporal interactions as this leads to improved accuracy and lower computational cost. Second, 3D channel-separated convolutions provide a form of regularization, yielding lower training accuracy but higher test accuracy compared to 3D convolutions. These two empirical findings lead us to design an architecture -- Channel-Separated Convolutional Network (CSN) -- which is simple, efficient, yet accurate. On Kinetics and Sports1M, our CSNs significantly outperform state-of-the-art models while being 11-times more efficient.


page 1

page 2

page 3

page 5

page 6

page 7

page 8

page 9


Building Efficient Deep Neural Networks with Unitary Group Convolutions

We propose unitary group convolutions (UGConvs), a building block for CN...

Attentive Group Equivariant Convolutional Networks

Although group convolutional networks are able to learn powerful represe...

Rethinking the Inception Architecture for Computer Vision

Convolutional networks are at the core of most state-of-the-art computer...

XSepConv: Extremely Separated Convolution

Depthwise convolution has gradually become an indispensable operation fo...

Learning Spatiotemporal Features with 3D Convolutional Networks

We propose a simple, yet effective approach for spatiotemporal feature l...

What Do We Understand About Convolutional Networks?

This document will review the most prominent proposals using multilayer ...

Similarity-Based Clustering for Enhancing Image Classification Architectures

Convolutional networks are at the center of best in class computer visio...