DeepAI AI Chat
Log In Sign Up

Exploiting Structure Sparsity for Covariance-based Visual Representation

by   Jianjia Zhang, et al.
University of Wollongong

The past few years have witnessed increasing research interest on covariance-based feature representation. A variety of methods have been proposed to boost its efficacy, with some recent ones resorting to nonlinear kernel technique. Noting that the essence of this feature representation is to characterise the underlying structure of visual features, this paper argues that an equally, if not more, important approach to boosting its efficacy shall be to improve the quality of this characterisation. Following this idea, we propose to exploit the structure sparsity of visual features in skeletal human action recognition, and compute sparse inverse covariance estimate (SICE) as feature representation. We discuss the advantage of this new representation on dealing with small sample, high dimensionality, and modelling capability. Furthermore, utilising the monotonicity property of SICE, we efficiently generate a hierarchy of SICE matrices to characterise the structure of visual features at different sparsity levels, and two discriminative learning algorithms are then developed to adaptively integrate them to perform recognition. As demonstrated by extensive experiments, the proposed representation leads to significantly improved recognition performance over the state-of-the-art comparable methods. In particular, as a method fully based on linear technique, it is comparable or even better than those employing nonlinear kernel technique. This result well demonstrates the value of exploiting structure sparsity for covariance-based feature representation.


Adaptive Feature Representation for Visual Tracking

Robust feature representation plays significant role in visual tracking....

CORAL: Colored structural representation for bi-modal place recognition

Place recognition is indispensable for drift-free localization system. D...

Multi-modal Egocentric Activity Recognition using Audio-Visual Features

Egocentric activity recognition in first-person videos has an increasing...

Blind Quality Assessment for in-the-Wild Images via Hierarchical Feature Fusion and Iterative Mixed Database Training

Image quality assessment (IQA) is very important for both end-users and ...

Video Action Recognition with Attentive Semantic Units

Visual-Language Models (VLMs) have significantly advanced action video r...

Audio Tampering Detection Based on Shallow and Deep Feature Representation Learning

Digital audio tampering detection can be used to verify the authenticity...

Pure Exploration in Kernel and Neural Bandits

We study pure exploration in bandits, where the dimension of the feature...