An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms

01/18/2023
by   Anastasia Natsiou, et al.
0

In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand. These representations can be used to manipulate the timbre and influence the synthesis of creative instrumental notes. Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument timbre compression. Unsupervised deep learning methods can achieve audio compression by training the network to learn a mapping from waveforms or spectrograms to low-dimensional representations. This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch. Further exploration of hyper-parameters and regularization techniques is demonstrated to enhance the performance of the initial design. In an unsupervised manner, the network is able to reconstruct a monophonic and harmonic sound based on latent representations. In addition, we introduce an evaluation metric to measure the similarity between the original and reconstructed samples. Evaluating a deep generative model for the synthesis of sound is a challenging task. Our approach is based on the accuracy of the generated frequencies as it presents a significant metric for the perception of harmonic sounds. This work is expected to accelerate future experiments on audio compression using neural autoencoders.

READ FULL TEXT

page 1

page 5

page 6

research
01/07/2022

Audio representations for deep learning in sound synthesis: A review

The rise of deep learning algorithms has led many researchers to withdra...
research
05/21/2016

Deep convolutional networks on the pitch spiral for musical instrument recognition

Musical performance combines a wide range of pitches, nuances, and expre...
research
01/07/2022

A sinusoidal signal reconstruction method for the inversion of the mel-spectrogram

The synthesis of sound via deep learning methods has recently received m...
research
08/08/2021

Audio Spectral Enhancement: Leveraging Autoencoders for Low Latency Reconstruction of Long, Lossy Audio Sequences

With active research in audio compression techniques yielding substantia...
research
06/11/2018

Autoencoders for music sound synthesis: a comparison of linear, shallow, deep and variational models

This study investigates the use of non-linear unsupervised dimensionalit...
research
04/10/2023

Leveraging Neural Representations for Audio Manipulation

We investigate applying audio manipulations using pretrained neural netw...
research
11/14/2021

Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations

The high temporal resolution of audio and our perceptual sensitivity to ...

Please sign up or login with your details

Forgot password? Click here to reset