NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification

11/12/2018
by   Rongcheng Lin, et al.
0

This paper introduces a fast and efficient network architecture, NeXtVLAD, to aggregate frame-level features into a compact feature vector for large-scale video classification. Briefly speaking, the basic idea is to decompose a high-dimensional feature into a group of relatively low-dimensional vectors with attention before applying NetVLAD aggregation over time. This NeXtVLAD approach turns out to be both effective and parameter efficient in aggregating temporal information. In the 2nd Youtube-8M video understanding challenge, a single NeXtVLAD model with less than 80M parameters achieves a GAP score of 0.87846 in private leaderboard. A mixture of 3 NeXtVLAD models results in 0.88722, which is ranked 3rd over 394 teams. The code is publicly available at https://github.com/linrongc/youtube-8m.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2017

UTS submission to Google YouTube-8M Challenge 2017

In this paper, we present our solution to Google YouTube-8M Video Classi...
research
07/04/2017

Aggregating Frame-level Features for Large-Scale Video Classification

This paper introduces the system we developed for the Google Cloud & You...
research
11/08/2021

Triple-level Model Inferred Collaborative Network Architecture for Video Deraining

Video deraining is an important issue for outdoor vision systems and has...
research
10/01/2018

Learnable Pooling Methods for Video Classification

We introduce modifications to state-of-the-art approaches to aggregating...
research
02/07/2020

iqiyi Submission to ActivityNet Challenge 2019 Kinetics-700 challenge: Hierarchical Group-wise Attention

In this report, the method for the iqiyi submission to the task of Activ...
research
06/21/2017

Learnable pooling with Context Gating for video classification

Common video representations often deploy an average or maximum pooling ...
research
11/01/2017

Reducing Model Complexity for DNN Based Large-Scale Audio Classification

Audio classification is the task of identifying the sound categories tha...

Please sign up or login with your details

Forgot password? Click here to reset