Towards Good Practices for Very Deep Two-Stream ConvNets

07/08/2015
by   Limin Wang, et al.
0

Deep convolutional networks have achieved great success for object recognition in still images. However, for action recognition in videos, the improvement of deep convolutional networks is not so evident. We argue that there are two reasons that could probably explain this result. First the current network architectures (e.g. Two-stream ConvNets) are relatively shallow compared with those very deep models in image domain (e.g. VGGNet, GoogLeNet), and therefore their modeling capacity is constrained by their depth. Second, probably more importantly, the training dataset of action recognition is extremely small compared with the ImageNet dataset, and thus it will be easy to over-fit on the training dataset. To address these issues, this report presents very deep two-stream ConvNets for action recognition, by adapting recent very deep architectures into video domain. However, this extension is not easy as the size of action recognition is quite small. We design several good practices for the training of very deep two-stream ConvNets, namely (i) pre-training for both spatial and temporal nets, (ii) smaller learning rates, (iii) more data augmentation techniques, (iv) high drop out ratio. Meanwhile, we extend the Caffe toolbox into Multi-GPU implementation with high computational efficiency and low memory consumption. We verify the performance of very deep two-stream ConvNets on the dataset of UCF101 and it achieves the recognition accuracy of 91.4%.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2016

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Deep convolutional networks have achieved great success for visual recog...
research
06/09/2014

Two-Stream Convolutional Networks for Action Recognition in Videos

We investigate architectures of discriminatively trained deep Convolutio...
research
05/08/2017

Temporal Segment Networks for Action Recognition in Videos

Deep convolutional networks have achieved great success for image recogn...
research
05/22/2017

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

The paucity of videos in current action classification datasets (UCF-101...
research
08/05/2016

Fusing Deep Convolutional Networks for Large Scale Visual Concept Classification

Deep learning architectures are showing great promise in various compute...
research
05/29/2023

FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition

3D Convolutional Neural Networks are gaining increasing attention from r...
research
10/12/2016

Semi-Coupled Two-Stream Fusion ConvNets for Action Recognition at Extremely Low Resolutions

Deep convolutional neural networks (ConvNets) have been recently shown t...

Please sign up or login with your details

Forgot password? Click here to reset