Audio Bank: A High-Level Acoustic Signal Representation for Audio Event Recognition

04/11/2023
by   Tushar Sandhan, et al.
0

Automatic audio event recognition plays a pivotal role in making human robot interaction more closer and has a wide applicability in industrial automation, control and surveillance systems. Audio event is composed of intricate phonic patterns which are harmonically entangled. Audio recognition is dominated by low and mid-level features, which have demonstrated their recognition capability but they have high computational cost and low semantic meaning. In this paper, we propose a new computationally efficient framework for audio recognition. Audio Bank, a new high-level representation of audio, is comprised of distinctive audio detectors representing each audio class in frequency-temporal space. Dimensionality of the resulting feature vector is reduced using non-negative matrix factorization preserving its discriminability and rich semantic information. The high audio recognition performance using several classifiers (SVM, neural network, Gaussian process classification and k-nearest neighbors) shows the effectiveness of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

research
03/15/2023

Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation

Existing audio analysis methods generally first transform the audio stre...
research
06/24/2016

Fully DNN-based Multi-label regression for audio tagging

Acoustic event detection for content analysis in most cases relies on lo...
research
02/22/2018

Deep Multimodal Learning for Emotion Recognition in Spoken Language

In this paper, we present a novel deep multimodal framework to predict h...
research
11/09/2018

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

We tackle the problem of audiovisual scene analysis for weakly-labeled d...
research
04/23/2021

The Influence of Audio on Video Memorability with an Audio Gestalt Regulated Video Memorability System

Memories are the tethering threads that tie us to the world, and memorab...
research
11/26/2018

Combining High-Level Features of Raw Audio Waves and Mel-Spectrograms for Audio Tagging

In this paper, we describe our contribution to Task 2 of the DCASE 2018 ...
research
04/29/2016

Learning Compact Structural Representations for Audio Events Using Regressor Banks

We introduce a new learned descriptor for audio signals which is efficie...

Please sign up or login with your details

Forgot password? Click here to reset