Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks

06/22/2017
by   M. Huzaifah, et al.
0

Recent successful applications of convolutional neural networks (CNNs) to audio classification and speech recognition have motivated the search for better input representations for more efficient training. Visual displays of an audio signal, through various time-frequency representations such as spectrograms offer a rich representation of the temporal and spectral structure of the original signal. In this letter, we compare various popular signal processing methods to obtain this representation, such as short-time Fourier transform (STFT) with linear and Mel scales, constant-Q transform (CQT) and continuous Wavelet transform (CWT), and assess their impact on the classification performance of two environmental sound datasets using CNNs. This study supports the hypothesis that time-frequency representations are valuable in learning useful features for sound classification. Moreover, the actual transformation used is shown to impact the classification accuracy, with Mel-scaled STFT outperforming the other discussed methods slightly and baseline MFCC features to a large degree. Additionally, we observe that the optimal window size during transformation is dependent on the characteristics of the audio signal and architecturally, 2D convolution yielded better results in most cases compared to 1D.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2022

Time-Frequency Distributions of Heart Sound Signals: A Comparative Study using Convolutional Neural Networks

Time-Frequency Distributions (TFDs) support the heart sound characterisa...
research
01/08/2022

A novel audio representation using space filling curves

Since convolutional neural networks (CNNs) have revolutionized the image...
research
12/27/2019

nnAudio: An on-the-fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolution Neural Networks

Converting time domain waveforms to frequency domain spectrograms is typ...
research
04/23/2021

ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

Environmental Sound Classification (ESC) is a rapidly evolving field tha...
research
08/25/2021

Temporal envelope and fine structure cues for dysarthric speech detection using CNNs

Deep learning-based techniques for automatic dysarthric speech detection...
research
10/04/2022

Learning the Spectrogram Temporal Resolution for Audio Classification

The audio spectrogram is a time-frequency representation that has been w...
research
07/27/2020

From Sound Representation to Model Robustness

In this paper, we demonstrate the extreme vulnerability of a residual de...

Please sign up or login with your details

Forgot password? Click here to reset