Learning Gating ConvNet for Two-Stream based Methods in Action Recognition

09/12/2017
by   Jiagang Zhu, et al.
0

For the two-stream style methods in action recognition, fusing the two streams' predictions is always by the weighted averaging scheme. This fusion method with fixed weights lacks of pertinence to different action videos and always needs trial and error on the validation set. In order to enhance the adaptability of two-stream ConvNets and improve its performance, an end-to-end trainable gated fusion method, namely gating ConvNet, for the two-stream ConvNets is proposed in this paper based on the MoE (Mixture of Experts) theory. The gating ConvNet takes the combination of feature maps from the same layer of the spatial and the temporal nets as input and adopts ReLU (Rectified Linear Unit) as the gating output activation function. To reduce the over-fitting of gating ConvNet caused by the redundancy of parameters, a new multi-task learning method is designed, which jointly learns the gating fusion weights for the two streams and learns the gating ConvNet for action classification. With our gated fusion method and multi-task learning approach, a high accuracy of 94.5

READ FULL TEXT

page 1

page 8

research
05/06/2020

Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition

Two-stream networks have provided an alternate way of exploiting the spa...
research
06/09/2014

Two-Stream Convolutional Networks for Action Recognition in Videos

We investigate architectures of discriminatively trained deep Convolutio...
research
02/10/2021

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

Temporal modelling is the key for efficient video action recognition. Wh...
research
11/20/2017

Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion

Action recognition is an important yet challenging task in computer visi...
research
11/20/2018

Reversing Two-Stream Networks with Decoding Discrepancy Penalty for Robust Action Recognition

We discuss the robustness and generalization ability in the realm of act...
research
12/22/2016

Efficient Action Detection in Untrimmed Videos via Multi-Task Learning

This paper studies the joint learning of action recognition and temporal...
research
07/22/2018

Correlation Net : spatio temporal multimodal deep learning

This paper describes a network that is able to capture spatiotemporal co...

Please sign up or login with your details

Forgot password? Click here to reset