Robust Feature Learning on Long-Duration Sounds for Acoustic Scene Classification

08/11/2021
by   Yuzhong Wu, et al.
0

Acoustic scene classification (ASC) aims to identify the type of scene (environment) in which a given audio signal is recorded. The log-mel feature and convolutional neural network (CNN) have recently become the most popular time-frequency (TF) feature representation and classifier in ASC. An audio signal recorded in a scene may include various sounds overlapping in time and frequency. The previous study suggests that separately considering the long-duration sounds and short-duration sounds in CNN may improve ASC accuracy. This study addresses the problem of the generalization ability of acoustic scene classifiers. In practice, acoustic scene signals' characteristics may be affected by various factors, such as the choice of recording devices and the change of recording locations. When an established ASC system predicts scene classes on audios recorded in unseen scenarios, its accuracy may drop significantly. The long-duration sounds not only contain domain-independent acoustic scene information, but also contain channel information determined by the recording conditions, which is prone to over-fitting. For a more robust ASC system, We propose a robust feature learning (RFL) framework to train the CNN. The RFL framework down-weights CNN learning specifically on long-duration sounds. The proposed method is to train an auxiliary classifier with only long-duration sound information as input. The auxiliary classifier is trained with an auxiliary loss function that assigns less learning weight to poorly classified examples than the standard cross-entropy loss. The experimental results show that the proposed RFL framework can obtain a more robust acoustic scene classifier towards unseen devices and cities.

READ FULL TEXT
research
01/06/2019

Enhancing Sound Texture in CNN-Based Acoustic Scene Classification

Acoustic scene classification is the task of identifying the scene from ...
research
10/14/2019

Acoustic Scene Classification Based on a Large-margin Factorized CNN

In this paper, we present an acoustic scene classification framework bas...
research
04/19/2020

Consonant gemination in Italian: the nasal and liquid case

All Italian consonants affected by gemination, that is affricates, frica...
research
05/20/2019

Robust sound event detection in bioacoustic sensor networks

Bioacoustic sensors, sometimes known as autonomous recording units (ARUs...
research
03/31/2022

1-D CNN based Acoustic Scene Classification via Reducing Layer-wise Dimensionality

This paper presents an alternate representation framework to commonly us...
research
06/09/2023

Acoustic Scene Clustering Using Joint Optimization of Deep Embedding Learning and Clustering Iteration

Recent efforts have been made on acoustic scene classification in the au...
research
05/27/2020

ACGAN-based Data Augmentation Integrated with Long-term Scalogram for Acoustic Scene Classification

In acoustic scene classification (ASC), acoustic features play a crucial...

Please sign up or login with your details

Forgot password? Click here to reset