Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding

07/14/2017
by   Fu Li, et al.
0

This paper describes our solution for the video recognition task of the Google Cloud and YouTube-8M Video Understanding Challenge that ranked the 3rd place. Because the challenge provides pre-extracted visual and audio features instead of the raw videos, we mainly investigate various temporal modeling approaches to aggregate the frame-level features for multi-label video recognition. Our system contains three major components: two-stream sequence model, fast-forward sequence model and temporal residual neural networks. Experiment results on the challenging Youtube-8M dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing temporal modeling approaches in the large-scale video recognition tasks. To be noted, our fast-forward LSTM with a depth of 7 layers achieves 82.75 of GAP@20 on the Kaggle Public test set.

READ FULL TEXT

page 3

page 4

research
08/12/2017

Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification

This paper describes our solution for the video recognition task of Acti...
research
09/29/2018

Non-local NetVLAD Encoding for Video Classification

This paper describes our solution for the 2^nd YouTube-8M video understa...
research
06/14/2017

Deep Learning Methods for Efficient Large Scale Video Labeling

We present a solution to "Google Cloud and YouTube-8M Video Understandin...
research
06/26/2017

An Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform

Large-scale datasets have played a significant role in progress of neura...
research
10/25/2019

Learning to Localize Temporal Events in Large-scale Video Data

We address temporal localization of events in large-scale video data, in...
research
06/21/2017

Learnable pooling with Context Gating for video classification

Common video representations often deploy an average or maximum pooling ...
research
07/05/2017

Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video Classification

We report on CMU Informedia Lab's system used in Google's YouTube 8 Mill...

Please sign up or login with your details

Forgot password? Click here to reset