Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders

06/19/2019
by   Yin-Jyun Luo, et al.
2

In this paper, we learn disentangled representations of timbre and pitch for musical instrument sounds. We adapt a framework based on variational autoencoders with Gaussian mixture latent distributions. Specifically, we use two separate encoders to learn distinct latent spaces for timbre and pitch, which form Gaussian mixture components representing instrument identity and pitch, respectively. For reconstruction, latent variables of timbre and pitch are sampled from corresponding mixture components, and are concatenated as the input to a decoder. We show the model efficacy by latent space visualization, and a quantitative analysis indicates the discriminability of these spaces, even with a limited number of instrument labels for training. The model allows for controllable synthesis of selected instrument sounds by sampling from the latent spaces. To evaluate this, we trained instrument and pitch classifiers using original labeled data. These classifiers achieve high accuracy when tested on our synthesized sounds, which verifies the model performance of controllable realistic timbre and pitch synthesis. Our model also enables timbre transfer between multiple instruments, with a single autoencoder architecture, which is evaluated by measuring the shift in posterior of instrument classification. Our in depth evaluation confirms the model ability to successfully disentangle timbre and pitch.

READ FULL TEXT
research
09/25/2019

Disentangling to Cluster: Gaussian Mixture Variational Ladder Autoencoders

In clustering we normally output one cluster variable for each datapoint...
research
04/16/2021

Better Latent Spaces for Better Autoencoders

Autoencoders as tools behind anomaly searches at the LHC have the struct...
research
06/03/2020

Open-Set Recognition with Gaussian Mixture Variational Autoencoders

In inference, open-set classification is to either classify a sample int...
research
06/16/2020

Generative Modelling for Controllable Audio Synthesis of Expressive Piano Performance

We present a controllable neural audio synthesizer based on Gaussian Mix...
research
09/29/2018

Modulated Variational auto-Encoders for many-to-many musical timbre transfer

Generative models have been successfully applied to image style transfer...
research
12/03/2019

Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

We propose a flexible framework that deals with both singer conversion a...
research
05/22/2018

Generative timbre spaces: regularizing variational auto-encoders with perceptual metrics

Timbre spaces have been used in music perception to study the perceptual...

Please sign up or login with your details

Forgot password? Click here to reset