DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection

07/15/2016
by   Liping Jing, et al.
0

This paper presents a novel two-phase method for audio representation, Discriminative and Compact Audio Representation (DCAR), and evaluates its performance at detecting events in consumer-produced videos. In the first phase of DCAR, each audio track is modeled using a Gaussian mixture model (GMM) that includes several components to capture the variability within that track. The second phase takes into account both global structure and local structure. In this phase, the components are rendered more discriminative and compact by formulating an optimization problem on Grassmannian manifolds, which we found represents the structure of audio effectively. Our experiments used the YLI-MED dataset (an open TRECVID-style video corpus based on YFCC100M), which includes ten events. The results show that the proposed DCAR representation consistently outperforms state-of-the-art audio representations. DCAR's advantage over i-vector, mv-vector, and GMM representations is significant for both easier and harder discrimination tasks. We discuss how these performance differences across easy and hard cases follow from how each type of model leverages (or doesn't leverage) the intrinsic structure of the data. Furthermore, DCAR shows a particularly notable accuracy advantage on events where humans have more difficulty classifying the videos, i.e., events with lower mean annotator confidence.

READ FULL TEXT

page 11

page 14

research
04/19/2018

Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events

Audio-visual representation learning is an important task from the persp...
research
04/29/2016

Learning Compact Structural Representations for Audio Events Using Regressor Banks

We introduce a new learned descriptor for audio signals which is efficie...
research
12/03/2022

A subjective study of the perceptual acceptability of audio-video desynchronization in sports videos

This paper presents the results of a study conducted on the perceptual a...
research
06/15/2023

Towards Long Form Audio-visual Video Understanding

We live in a world filled with never-ending streams of multimodal inform...
research
10/28/2021

Audio-visual Representation Learning for Anomaly Events Detection in Crowds

In recent years, anomaly events detection in crowd scenes attracts many ...
research
10/26/2020

Contrastive Unsupervised Learning for Audio Fingerprinting

The rise of video-sharing platforms has attracted more and more people t...
research
04/05/2016

Counting Grid Aggregation for Event Retrieval and Recognition

Event retrieval and recognition in a large corpus of videos necessitates...

Please sign up or login with your details

Forgot password? Click here to reset