Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

02/25/2015
by   Yu-Gang Jiang, et al.
0

In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by rigorously imposing regularizations in the learning process of a deep neural network (DNN). Such a regularized DNN (rDNN) can be efficiently realized using a GPU-based implementation with an affordable training cost. Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed rDNN is more suitable for modeling video semantics. With extensive experimental evaluations, we show that rDNN produces superior performance over several state-of-the-art approaches. On the well-known Hollywood2 and Columbia Consumer Video benchmarks, we obtain very competitive results: 66.9% and 73.5% respectively in terms of mean average precision. In addition, to substantially evaluate our rDNN and stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.

READ FULL TEXT

page 1

page 2

page 5

page 11

page 12

page 15

page 16

page 20

research
06/14/2017

Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification

Videos are inherently multimodal. This paper studies the problem of how ...
research
11/09/2020

Improved Soccer Action Spotting using both Audio and Video Streams

In this paper, we propose a study on multi-modal (audio and video) actio...
research
06/23/2016

VideoMCC: a New Benchmark for Video Comprehension

While there is overall agreement that future technology for organizing, ...
research
04/08/2015

Evaluating Two-Stream CNN for Video Classification

Videos contain very rich semantic information. Traditional hand-crafted ...
research
06/02/2019

Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network

High accuracy video label prediction (classification) models are attribu...
research
04/29/2020

Video Contents Understanding using Deep Neural Networks

We propose a novel application of Transfer Learning to classify video-fr...
research
01/20/2020

The benefits of synthetic data for action categorization

In this paper, we study the value of using synthetically produced videos...

Please sign up or login with your details

Forgot password? Click here to reset