The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection

02/23/2016
by   Pascal Mettes, et al.
0

This paper strives for video event detection using a representation learned from deep convolutional neural networks. Different from the leading approaches, who all learn from the 1,000 classes defined in the ImageNet Large Scale Visual Recognition Challenge, we investigate how to leverage the complete ImageNet hierarchy for pre-training deep networks. To deal with the problems of over-specific classes and classes with few images, we introduce a bottom-up and top-down approach for reorganization of the ImageNet hierarchy based on all its 21,814 classes and more than 14 million images. Experiments on the TRECVID Multimedia Event Detection 2013 and 2015 datasets show that video representations derived from the layers of a deep neural network pre-trained with our reorganized hierarchy i) improves over standard pre-training, ii) is complementary among different reorganizations, iii) maintains the benefits of fusion with other modalities, and iv) leads to state-of-the-art event detection results. The reorganized hierarchies and their derived Caffe models are publicly available at http://tinyurl.com/imagenetshuffle.

READ FULL TEXT

page 1

page 3

page 5

page 6

page 8

research
05/19/2021

Unsupervised Discriminative Learning of Sounds for Audio Event Classification

Recent progress in network-based audio event classification has shown th...
research
03/10/2022

MVP: Multimodality-guided Visual Pre-training

Recently, masked image modeling (MIM) has become a promising direction f...
research
05/03/2019

Leveraging Large-Scale Uncurated Data for Unsupervised Pre-training of Visual Features

Pre-training general-purpose visual features with convolutional neural n...
research
07/08/2017

Embedding Visual Hierarchy with Deep Networks for Large-Scale Visual Recognition

In this paper, a level-wise mixture model (LMM) is developed by embeddin...
research
07/27/2017

A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets

The original ImageNet dataset is a popular large-scale benchmark for tra...
research
04/10/2020

Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?

How can we collect and use a video dataset to further improve spatiotemp...

Please sign up or login with your details

Forgot password? Click here to reset