A sinusoidal signal reconstruction method for the inversion of the mel-spectrogram

01/07/2022
by   Anastasia Natsiou, et al.
0

The synthesis of sound via deep learning methods has recently received much attention. Some problems for deep learning approaches to sound synthesis relate to the amount of data needed to specify an audio signal and the necessity of preserving both the long and short time coherence of the synthesised signal. Visual time-frequency representations such as the log-mel-spectrogram have gained in popularity. The log-mel-spectrogram is a perceptually informed representation of audio that greatly compresses the amount of information required for the description of the sound. However, because of this compression, this representation is not directly invertible. Both signal processing and machine learning techniques have previously been applied to the inversion of the log-mel-spectrogram but they both caused audible distortions in the synthesized sounds due to issues of temporal and spectral coherence. In this paper, we outline the application of a sinusoidal model to the inversion of the log-mel-spectrogram for pitched musical instrument sounds outperforming state-of-the-art deep learning methods. The approach could be later used as a general decoding step from spectral to time intervals in neural applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2019

Deep Learning for Audio Signal Processing

Given the recent surge in developments of deep learning, this article pr...
research
01/18/2023

An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms

In audio processing applications, the generation of expressive sounds ba...
research
03/30/2020

VaPar Synth – A Variational Parametric Model for Audio Synthesis

With the advent of data-driven statistical modeling and abundant computi...
research
01/07/2022

Audio representations for deep learning in sound synthesis: A review

The rise of deep learning algorithms has led many researchers to withdra...
research
01/14/2020

DDSP: Differentiable Digital Signal Processing

Most generative models of audio directly generate samples in one of two ...
research
05/05/2023

Time-weighted Frequency Domain Audio Representation with GMM Estimator for Anomalous Sound Detection

Although deep learning is the mainstream method in unsupervised anomalou...
research
05/09/2019

Sound texture synthesis using convolutional neural networks

The following article introduces a new parametric synthesis algorithm fo...

Please sign up or login with your details

Forgot password? Click here to reset