Discriminative Video Representation Learning Using Support Vector Classifiers

09/05/2019
by   Jue Wang, et al.
0

Most popular deep models for action recognition in videos generate independent predictions for short clips, which are then pooled heuristically to assign an action label to the full video segment. As not all frames may characterize the underlying action---many are common across multiple actions---pooling schemes that impose equal importance on all frames might be unfavorable. In an attempt to tackle this problem, we propose discriminative pooling, based on the notion that among the deep features generated on all short clips, there is at least one that characterizes the action. To identify these useful features, we resort to a negative bag consisting of features that are known to be irrelevant, for example, they are sampled either from datasets that are unrelated to our actions of interest or are CNN features produced via random noise as input. With the features from the video as a positive bag and the irrelevant features as the negative bag, we cast an objective to learn a (nonlinear) hyperplane that separates the unknown useful features from the rest in a multiple instance learning formulation within a support vector machine setup. We use the parameters of this separating hyperplane as a descriptor for the full video segment. Since these parameters are directly related to the support vectors in a max-margin framework, they can be treated as a weighted average pooling of the features from the bags, with zero weights given to non-support vectors. Our pooling scheme is end-to-end trainable within a deep learning framework. We report results from experiments on eight computer vision benchmark datasets spanning a variety of video-related tasks and demonstrate state-of-the-art performance across these tasks.

READ FULL TEXT

page 1

page 9

research
03/26/2018

Video Representation Learning Using Discriminative Pooling

Popular deep models for action recognition in videos generate independen...
research
04/06/2017

Action Representation Using Classifier Decision Boundaries

Most popular deep learning based models for action recognition are desig...
research
11/24/2016

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos

We propose a novel method for temporally pooling frames in a video for t...
research
04/07/2017

Generalized Rank Pooling for Activity Recognition

Most popular deep models for action recognition split video sequences in...
research
05/30/2017

Discriminatively Learned Hierarchical Rank Pooling Networks

In this work, we present novel temporal encoding methods for action and ...
research
01/25/2017

Deep Local Video Feature for Action Recognition

We investigate the problem of representing an entire video using CNN fea...
research
04/11/2017

UC Merced Submission to the ActivityNet Challenge 2016

This notebook paper describes our system for the untrimmed classificatio...

Please sign up or login with your details

Forgot password? Click here to reset