Two-Stream Convolutional Networks for Action Recognition in Videos

06/09/2014
by   Karen Simonyan, et al.
0

We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion between frames. We also aim to generalise the best performing hand-crafted features within a data-driven learning framework. Our contribution is three-fold. First, we propose a two-stream ConvNet architecture which incorporates spatial and temporal networks. Second, we demonstrate that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data. Finally, we show that multi-task learning, applied to two different action classification datasets, can be used to increase the amount of training data and improve the performance on both. Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art. It also exceeds by a large margin previous attempts to use deep nets for video classification.

READ FULL TEXT

page 3

page 5

page 9

page 10

research
12/09/2016

ActionFlowNet: Learning Motion Representation for Action Recognition

Even with the recent advances in convolutional neural networks (CNN) in ...
research
04/08/2015

Evaluating Two-Stream CNN for Video Classification

Videos contain very rich semantic information. Traditional hand-crafted ...
research
07/08/2015

Towards Good Practices for Very Deep Two-Stream ConvNets

Deep convolutional networks have achieved great success for object recog...
research
09/12/2017

Learning Gating ConvNet for Two-Stream based Methods in Action Recognition

For the two-stream style methods in action recognition, fusing the two s...
research
06/07/2019

Video Modeling with Correlation Networks

Motion is a salient cue to recognize actions in video. Modern action rec...
research
11/20/2018

Reversing Two-Stream Networks with Decoding Discrepancy Penalty for Robust Action Recognition

We discuss the robustness and generalization ability in the realm of act...
research
12/22/2016

Efficient Action Detection in Untrimmed Videos via Multi-Task Learning

This paper studies the joint learning of action recognition and temporal...

Please sign up or login with your details

Forgot password? Click here to reset