Discriminatively Learned Hierarchical Rank Pooling Networks

05/30/2017
by   Basura Fernando, et al.
0

In this work, we present novel temporal encoding methods for action and activity classification by extending the unsupervised rank pooling temporal encoding method in two ways. First, we present "discriminative rank pooling" in which the shared weights of our video representation and the parameters of the action classifiers are estimated jointly for a given training dataset of labelled vector sequences using a bilevel optimization formulation of the learning problem. When the frame level features vectors are obtained from a convolutional neural network (CNN), we rank pool the network activations and jointly estimate all parameters of the model, including CNN filters and fully-connected weights, in an end-to-end manner which we coined as "end-to-end trainable rank pooled CNN". Importantly, this model can make use of any existing convolutional neural network architecture (e.g., AlexNet or VGG) without modification or introduction of additional parameters. Then, we extend rank pooling to a high capacity video representation, called "hierarchical rank pooling". Hierarchical rank pooling consists of a network of rank pooling functions, which encode temporal semantics over arbitrary long video clips based on rich frame level features. By stacking non-linear feature functions and temporal sub-sequence encoders one on top of the other, we build a high capacity encoding network of the dynamic behaviour of the video. The resulting video representation is a fixed-length feature vector describing the entire video clip that can be used as input to standard machine learning classifiers. We demonstrate our approach on the task of action and activity recognition. Obtained results are comparable to state-of-the-art methods on three important activity recognition benchmarks with classification performance of 76.7 Hollywood2, 69.4

READ FULL TEXT

page 12

page 16

research
04/07/2017

Generalized Rank Pooling for Activity Recognition

Most popular deep models for action recognition split video sequences in...
research
12/06/2015

Rank Pooling for Action Recognition

We propose a function-based temporal pooling method that captures the la...
research
08/22/2018

Deep Adaptive Temporal Pooling for Activity Recognition

Deep neural networks have recently achieved competitive accuracy for hum...
research
03/04/2015

Temporal Pyramid Pooling Based Convolutional Neural Networks for Action Recognition

Encouraged by the success of Convolutional Neural Networks (CNNs) in ima...
research
11/21/2016

Deep Temporal Linear Encoding Networks

The CNN-encoding of features from entire videos for the representation o...
research
12/21/2017

Encoding CNN Activations for Writer Recognition

The encoding of local features is an essential part for writer identific...
research
09/05/2019

Discriminative Video Representation Learning Using Support Vector Classifiers

Most popular deep models for action recognition in videos generate indep...

Please sign up or login with your details

Forgot password? Click here to reset