Aggregating Frame-level Features for Large-Scale Video Classification

07/04/2017
by   Shaoxiang Chen, et al.
0

This paper introduces the system we developed for the Google Cloud & YouTube-8M Video Understanding Challenge, which can be considered as a multi-label classification problem defined on top of the large scale YouTube-8M Dataset. We employ a large set of techniques to aggregate the provided frame-level feature representations and generate video-level predictions, including several variants of recurrent neural networks (RNN) and generalized VLAD. We also adopt several fusion strategies to explore the complementarity among the models. In terms of the official metric GAP@20 (global average precision at 20), our best fusion model attains 0.84198 on the public 50% of test data and 0.84193 on the private 50% of test data, ranking 4th out of 650 teams worldwide in the competition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2018

NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification

This paper introduces a fast and efficient network architecture, NeXtVLA...
research
07/13/2017

UTS submission to Google YouTube-8M Challenge 2017

In this paper, we present our solution to Google YouTube-8M Video Classi...
research
12/02/2019

BERT for Large-scale Video Segment Classification with Test-time Augmentation

This paper presents our approach to the third YouTube-8M video understan...
research
06/02/2019

Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network

High accuracy video label prediction (classification) models are attribu...
research
11/30/2018

Deep Multimodal Learning: An Effective Method for Video Classification

Videos have become ubiquitous on the Internet. And video analysis can pr...
research
11/06/2017

End-to-End Video Classification with Knowledge Graphs

Video understanding has attracted much research attention especially sin...
research
09/21/2018

Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling

This paper presents the Axon AI's solution to the 2nd YouTube-8M Video U...

Please sign up or login with your details

Forgot password? Click here to reset