Convolutional Two-Stream Network Fusion for Video Action Recognition

04/22/2016
by   Christoph Feichtenhofer, et al.
0

Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information. We study a number of ways of fusing ConvNet towers both spatially and temporally in order to best take advantage of this spatio-temporal information. We make the following findings: (i) that rather than fusing at the softmax layer, a spatial and temporal network can be fused at a convolution layer without loss of performance, but with a substantial saving in parameters; (ii) that it is better to fuse such networks spatially at the last convolutional layer than earlier, and that additionally fusing at the class prediction layer can boost accuracy; finally (iii) that pooling of abstract convolutional features over spatiotemporal neighbourhoods further boosts performance. Based on these studies we propose a new ConvNet architecture for spatiotemporal fusion of video snippets, and evaluate its performance on standard benchmarks where this architecture achieves state-of-the-art results.

READ FULL TEXT

page 1

page 5

research
03/04/2019

Spatiotemporal Pyramid Network for Video Action Recognition

Two-stream convolutional networks have shown strong performance in video...
research
02/26/2019

IF-TTN: Information Fused Temporal Transformation Network for Video Action Recognition

Effective spatiotemporal feature representation is crucial to the video-...
research
10/12/2016

Semi-Coupled Two-Stream Fusion ConvNets for Action Recognition at Extremely Low Resolutions

Deep convolutional neural networks (ConvNets) have been recently shown t...
research
03/06/2019

Semantic Adversarial Network with Multi-scale Pyramid Attention for Video Classification

Two-stream architecture have shown strong performance in video classific...
research
04/10/2020

Spatiotemporal Fusion in 3D CNNs: A Probabilistic View

Despite the success in still image recognition, deep neural networks for...
research
07/22/2018

Correlation Net : spatio temporal multimodal deep learning

This paper describes a network that is able to capture spatiotemporal co...
research
07/03/2017

Temporal HeartNet: Towards Human-Level Automatic Analysis of Fetal Cardiac Screening Video

We present an automatic method to describe clinically useful information...

Please sign up or login with your details

Forgot password? Click here to reset