Generative Modelling for Controllable Audio Synthesis of Expressive Piano Performance

06/16/2020
by   Hao Hao Tan, et al.
0

We present a controllable neural audio synthesizer based on Gaussian Mixture Variational Autoencoders (GM-VAE), which can generate realistic piano performances in the audio domain that closely follows temporal conditions of two essential style features for piano performances: articulation and dynamics. We demonstrate how the model is able to apply fine-grained style morphing over the course of synthesizing the audio. This is based on conditions which are latent variables that can be sampled from the prior or inferred from other pieces. One of the envisioned use cases is to inspire creative and brand new interpretations for existing pieces of piano music.

READ FULL TEXT
research
09/17/2020

Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis

This paper proposes a hierarchical generative model with a multi-grained...
research
11/09/2021

RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

Deep generative models applied to audio have improved by a large margin ...
research
06/19/2019

Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders

In this paper, we learn disentangled representations of timbre and pitch...
research
08/19/2023

Controllable Multi-domain Semantic Artwork Synthesis

We present a novel framework for multi-domain synthesis of artwork from ...
research
11/14/2021

Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations

The high temporal resolution of audio and our perceptual sensitivity to ...
research
03/17/2021

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

Previous works on neural text-to-speech (TTS) have been addressed on lim...
research
08/23/2023

Audio Generation with Multiple Conditional Diffusion Model

Text-based audio generation models have limitations as they cannot encom...

Please sign up or login with your details

Forgot password? Click here to reset