Generative timbre spaces with variational audio synthesis

05/22/2018
by   Philippe Esling, et al.
0

Timbre spaces have been used in music perception to study the relationships between instruments based on dissimilarity ratings. However, these spaces do not generalize, need to be reconstructed for each novel example and are not continuous, preventing audio synthesis. In parallel, generative models have aimed to provide methods for synthesizing novel timbres. However, these systems do not provide an explicit control structure, nor do they provide an understanding of their inner workings and are not related to any perceptually relevant information. Here, we show that Variational Auto-Encoders (VAE) can alleviate these limitations by constructing generative timbre spaces. To do so, we adapt VAEs to create a generative latent space, while using perceptual ratings from timbre studies to regularize the organization of this space. The resulting space allows to analyze novel instruments, while being able to synthesize audio from any point of this space. We introduce a specific regularization allowing to directly enforce given similarity ratings onto these spaces. We compare the resulting space to existing timbre spaces and show that they provide almost similar distance relationships. We evaluate several spectral transforms and show that the Non-Stationary Gabor Transform (NSGT) provides the highest correlation to timbre spaces and the best quality of synthesis. We show that these spaces can generalize to novel instruments and can generate any path between instruments to understand their timbre relationships. As these spaces are continuous, we study how the traditional acoustic descriptors behave along the latent dimensions. We show that descriptors have an overall non-linear topology, but follow a locally smooth evolution. Based on this, we introduce a method for descriptor-based synthesis and show that we can control the descriptors of an instrument while keeping its timbre structure.

READ FULL TEXT
research
05/22/2018

Generative timbre spaces: regularizing variational auto-encoders with perceptual metrics

Timbre spaces have been used in music perception to study the perceptual...
research
07/13/2020

Vector-Quantized Timbre Representation

Timbre is a set of perceptual attributes that identifies different types...
research
08/04/2020

Timbre latent space: exploration and creative aspects

Recent studies show the ability of unsupervised models to learn invertib...
research
09/24/2020

Timbre Space Representation of a Subtractive Synthesizer

In this study, we produce a geometrically scaled perceptual timbre space...
research
07/13/2023

Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar

Real-time music information retrieval (RT-MIR) has much potential to aug...
research
01/11/2023

Decoding Structure-Spectrum Relationships with Physically Organized Latent Spaces

A new semi-supervised machine learning method for the discovery of struc...
research
07/01/2019

Universal audio synthesizer control with normalizing flows

The ubiquity of sound synthesizers has reshaped music production and eve...

Please sign up or login with your details

Forgot password? Click here to reset