Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video Processing

09/30/2020
by   Okan Köpüklü, et al.
19

Convolutional Neural Networks with 3D kernels (3D CNNs) currently achieve state-of-the-art results in video recognition tasks due to their supremacy in extracting spatiotemporal features within video frames. There have been many successful 3D CNN architectures surpassing the state-of-the-art results successively. However, nearly all of them are designed to operate offline creating several serious handicaps during online operation. Firstly, conventional 3D CNNs are not dynamic since their output features represent the complete input clip instead of the most recent frame in the clip. Secondly, they are not temporal resolution-preserving due to their inherent temporal downsampling. Lastly, 3D CNNs are constrained to be used with fixed temporal input size limiting their flexibility. In order to address these drawbacks, we propose dissected 3D CNNs, where the intermediate volumes of the network are dissected and propagated over depth (time) dimension for future calculations, substantially reducing the number of computations at online operation. For action classification, the dissected version of ResNet models performs 74-90 fewer computations at online operation while achieving ∼5 classification accuracy on the Kinetics-600 dataset than conventional 3D ResNet models. Moreover, the advantages of dissected 3D CNNs are demonstrated by deploying our approach onto several vision tasks, which consistently improved the performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/26/2019

Learning Efficient Video Representation with Video Shuffle Networks

3D CNN shows its strong ability in learning spatiotemporal representatio...
research
05/31/2021

Continual 3D Convolutional Neural Networks for Real-time Processing of Videos

This paper introduces Continual 3D Convolutional Neural Networks (Co3D C...
research
01/19/2020

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recogntion

To efficiently extract spatiotemporal features of video for action recog...
research
11/17/2018

Recurrence to the Rescue: Towards Causal Spatiotemporal Representations

Recently, three dimensional (3D) convolutional neural networks (CNNs) ha...
research
07/31/2021

Greedy Network Enlarging

Recent studies on deep convolutional neural networks present a simple pa...
research
03/11/2022

TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning

Temporal Reasoning is one important functionality for vision intelligenc...
research
12/27/2016

Steerable CNNs

It has long been recognized that the invariance and equivariance propert...

Please sign up or login with your details

Forgot password? Click here to reset