BERT for Large-scale Video Segment Classification with Test-time Augmentation

12/02/2019
by   Tianqi Liu, et al.
0

This paper presents our approach to the third YouTube-8M video understanding competition that challenges par-ticipants to localize video-level labels at scale to the pre-cise time in the video where the label actually occurs. Ourmodel is an ensemble of frame-level models such as GatedNetVLAD and NeXtVLAD and various BERT models withtest-time augmentation. We explore multiple ways to ag-gregate BERT outputs as video representation and variousways to combine visual and audio information. We proposetest-time augmentation as shifting video frames to one leftor right unit, which adds variety to the predictions and em-pirically shows improvement in evaluation metrics. We firstpre-train the model on the 4M training video-level data, andthen fine-tune the model on 237K annotated video segment-level data. We achieve MAP@100K 0.7871 on private test-ing video segment data, which is ranked 9th over 283 teams.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2019

Multi-attention Networks for Temporal Localization of Video-level Labels

Temporal localization remains an important challenge in video understand...
research
07/04/2017

Aggregating Frame-level Features for Large-Scale Video Classification

This paper introduces the system we developed for the Google Cloud & You...
research
06/14/2017

Deep Learning Methods for Efficient Large Scale Video Labeling

We present a solution to "Google Cloud and YouTube-8M Video Understandin...
research
10/25/2019

Learning to Localize Temporal Events in Large-scale Video Data

We address temporal localization of events in large-scale video data, in...
research
09/27/2016

YouTube-8M: A Large-Scale Video Classification Benchmark

Many recent advancements in Computer Vision are attributed to large data...
research
09/21/2018

Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling

This paper presents the Axon AI's solution to the 2nd YouTube-8M Video U...
research
09/12/2018

Label Denoising with Large Ensembles of Heterogeneous Neural Networks

Despite recent advances in computer vision based on various convolutiona...

Please sign up or login with your details

Forgot password? Click here to reset