Vectors of Locally Aggregated Centers for Compact Video Representation

09/13/2015
by   Alhabib Abbas, et al.
0

We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al., under the same compaction factor and the same set of distortions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2016

Using Apache Lucene to Search Vector of Locally Aggregated Descriptors

Surrogate Text Representation (STR) is a profitable solution to efficien...
research
11/14/2014

A Discriminative CNN Video Representation for Event Detection

In this paper, we propose a discriminative video representation for even...
research
02/23/2019

Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation

In this paper, we propose a novel representation for text documents base...
research
11/21/2016

Deep Temporal Linear Encoding Networks

The CNN-encoding of features from entire videos for the representation o...
research
10/24/2019

ProLFA: Representative Prototype Selection for Local Feature Aggregation

Given a set of hand-crafted local features, acquiring a global represent...
research
08/16/2021

Non-Local Feature Aggregation on Graphs via Latent Fixed Data Structures

In contrast to image/text data whose order can be used to perform non-lo...
research
08/24/2017

Relaxed Spatio-Temporal Deep Feature Aggregation for Real-Fake Expression Prediction

Frame-level visual features are generally aggregated in time with the te...

Please sign up or login with your details

Forgot password? Click here to reset