ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

04/23/2021
by   Andrey Guzhov, et al.
0

Environmental Sound Classification (ESC) is a rapidly evolving field that recently demonstrated the advantages of application of visual domain techniques to the audio-related tasks. Previous studies indicate that the domain-specific modification of cross-domain approaches show a promise in pushing the whole area of ESC forward. In this paper, we present a new time-frequency transformation layer that is based on complex frequency B-spline (fbsp) wavelets. Being used with a high-performance audio classification model, the proposed fbsp-layer provides an accuracy improvement over the previously used Short-Time Fourier Transform (STFT) on standard datasets. We also investigate the influence of different pre-training strategies, including the joint use of two large-scale datasets for weight initialization: ImageNet and AudioSet. Our proposed model out-performs other approaches by achieving accuracies of 95.20 and 89.14 Additionally, we assess the increase of model robustness against additive white Gaussian noise and reduction of an effective sample rate introduced by the proposed layer and demonstrate that the fbsp-layer improves the model's ability to withstand signal perturbations, in comparison to STFT-based training. For the sake of reproducibility, our code is made available.

READ FULL TEXT

page 2

page 5

research
04/15/2020

ESResNet: Environmental Sound Classification Based on Visual Domain Models

Environmental Sound Classification (ESC) is an active research area in t...
research
06/22/2017

Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks

Recent successful applications of convolutional neural networks (CNNs) t...
research
06/24/2021

AudioCLIP: Extending CLIP to Image, Text and Audio

In the past, the rapidly evolving field of sound classification greatly ...
research
12/14/2019

Learning discriminative and robust time-frequency representations for environmental sound classification

Convolutional neural networks (CNN) are one of the best-performing neura...
research
11/09/2020

Bayesian Reconstruction of Fourier Pairs

In a number of data-driven applications such as detection of arrhythmia,...
research
04/23/2020

Flexible framework for audio restoration

The paper presents a unified, flexible framework for the tasks of audio ...
research
11/26/2022

Transform Once: Efficient Operator Learning in Frequency Domain

Spectral analysis provides one of the most effective paradigms for infor...

Please sign up or login with your details

Forgot password? Click here to reset