Counting Grid Aggregation for Event Retrieval and Recognition

04/05/2016
by   Zhanning Gao, et al.
0

Event retrieval and recognition in a large corpus of videos necessitates a holistic fixed-size visual representation at the video clip level that is comprehensive, compact, and yet discriminative. It shall comprehensively aggregate information across relevant video frames, while suppress redundant information, leading to a compact representation that can effectively differentiate among different visual events. In search for such a representation, we propose to build a spatially consistent counting grid model to aggregate together deep features extracted from different video frames. The spatial consistency of the counting grid model is achieved by introducing a prior model estimated from a large corpus of video data. The counting grid model produces an intermediate tensor representation for each video, which automatically identifies and removes the feature redundancy across the different frames. The tensor representation is subsequently reduced to a fixed-size vector representation by averaging over the counting grid. When compared to existing methods on both event retrieval and event classification benchmarks, we achieve significantly better accuracy with much more compact representation.

READ FULL TEXT
research
06/07/2021

Video Imprint

A new unified video analytics framework (ER3) is proposed for complex ev...
research
05/30/2015

Bag-of-Genres for Video Genre Retrieval

This paper presents a higher level representation for videos aiming at v...
research
04/16/2021

Self-supervised Video Retrieval Transformer Network

Content-based video retrieval aims to find videos from a large video dat...
research
01/27/2016

Comprehensive Feature-based Robust Video Fingerprinting Using Tensor Model

Content-based near-duplicate video detection (NDVD) is essential for eff...
research
03/12/2015

Hierarchical learning of grids of microtopics

The counting grid is a grid of microtopics, sparse word/feature distribu...
research
12/08/2021

VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation

For online video instance segmentation (VIS), fully utilizing the inform...
research
07/15/2016

DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection

This paper presents a novel two-phase method for audio representation, D...

Please sign up or login with your details

Forgot password? Click here to reset