Hierarchical Deep Recurrent Architecture for Video Understanding

07/11/2017
by   Luming Tang, et al.
0

This paper introduces the system we developed for the Youtube-8M Video Understanding Challenge, in which a large-scale benchmark dataset was used for multi-label video classification. The proposed framework contains hierarchical deep architecture, including the frame-level sequence modeling part and the video-level classification part. In the frame-level sequence modelling part, we explore a set of methods including Pooling-LSTM (PLSTM), Hierarchical-LSTM (HLSTM), Random-LSTM (RLSTM) in order to address the problem of large amount of frames in a video. We also introduce two attention pooling methods, single attention pooling (ATT) and multiply attention pooling (Multi-ATT) so that we can pay more attention to the informative frames in a video and ignore the useless frames. In the video-level classification part, two methods are proposed to increase the classification performance, i.e. Hierarchical-Mixture-of-Experts (HMoE) and Classifier Chains (CC). Our final submission is an ensemble consisting of 18 sub-models. In terms of the official evaluation metric Global Average Precision (GAP) at 20, our best submission achieves 0.84346 on the public 50 50

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2017

The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge

This article describes the final solution of team monkeytyping, who fini...
research
06/02/2019

Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network

High accuracy video label prediction (classification) models are attribu...
research
11/06/2017

End-to-End Video Classification with Knowledge Graphs

Video understanding has attracted much research attention especially sin...
research
06/21/2017

Learnable pooling with Context Gating for video classification

Common video representations often deploy an average or maximum pooling ...
research
07/13/2017

UTS submission to Google YouTube-8M Challenge 2017

In this paper, we present our solution to Google YouTube-8M Video Classi...
research
11/15/2019

Multi-attention Networks for Temporal Localization of Video-level Labels

Temporal localization remains an important challenge in video understand...
research
06/14/2017

Large-Scale YouTube-8M Video Understanding with Deep Neural Networks

Video classification problem has been studied many years. The success of...

Please sign up or login with your details

Forgot password? Click here to reset