Vector-Quantized Timbre Representation

07/13/2020
by   Adrien Bitton, et al.
0

Timbre is a set of perceptual attributes that identifies different types of sound sources. Although its definition is usually elusive, it can be seen from a signal processing viewpoint as all the spectral features that are perceived independently from pitch and loudness. Some works have studied high-level timbre synthesis by analyzing the feature relationships of different instruments, but acoustic properties remain entangled and generation bound to individual sounds. This paper targets a more flexible synthesis of an individual timbre by learning an approximate decomposition of its spectral properties with a set of generative features. We introduce an auto-encoder with a discrete latent space that is disentangled from loudness in order to learn a quantized representation of a given timbre distribution. Timbre transfer can be performed by encoding any variable-length input signals into the quantized latent features that are decoded according to the learned timbre. We detail results for translating audio between orchestral instruments and singing voice, as well as transfers from vocal imitations to instruments as an intuitive modality to drive sound synthesis. Furthermore, we can map the discrete latent space to acoustic descriptors and directly perform descriptor-based synthesis.

READ FULL TEXT
research
08/04/2020

Neural Granular Sound Synthesis

Granular sound synthesis is a popular audio generation technique based o...
research
05/22/2018

Generative timbre spaces: regularizing variational auto-encoders with perceptual metrics

Timbre spaces have been used in music perception to study the perceptual...
research
05/22/2018

Generative timbre spaces with variational audio synthesis

Timbre spaces have been used in music perception to study the relationsh...
research
08/04/2020

Timbre latent space: exploration and creative aspects

Recent studies show the ability of unsupervised models to learn invertib...
research
02/03/2023

A geometrically aware auto-encoder for multi-texture synthesis

We propose an auto-encoder architecture for multi-texture synthesis. The...
research
08/27/2020

DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks

Synthetic creation of drum sounds (e.g., in drum machines) is commonly p...
research
02/10/2020

Unsupervised Learning of Audio Perception for Robotics Applications: Learning to Project Data to T-SNE/UMAP space

Audio perception is a key to solving a variety of problems ranging from ...

Please sign up or login with your details

Forgot password? Click here to reset