RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

11/09/2021
by   Antoine Caillon, et al.
0

Deep generative models applied to audio have improved by a large margin the state-of-the-art in many speech and music related tasks. However, as raw waveform modelling remains an inherently difficult task, audio generative models are either computationally intensive, rely on low sampling rates, are complicated to control or restrict the nature of possible signals. Among those models, Variational AutoEncoders (VAE) give control over the generation by exposing latent variables, although they usually suffer from low synthesis quality. In this paper, we introduce a Realtime Audio Variational autoEncoder (RAVE) allowing both fast and high-quality audio waveform synthesis. We introduce a novel two-stage training procedure, namely representation learning and adversarial fine-tuning. We show that using a post-training analysis of the latent space allows a direct control between the reconstruction fidelity and the representation compactness. By leveraging a multi-band decomposition of the raw waveform, we show that our model is the first able to generate 48kHz audio signals, while simultaneously running 20 times faster than real-time on a standard laptop CPU. We evaluate synthesis quality using both quantitative and qualitative subjective experiments and show the superiority of our approach compared to existing models. Finally, we present applications of our model for timbre transfer and signal compression. All of our source code and audio examples are publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2022

Conditional variational autoencoder to improve neural audio synthesis for polyphonic music sound

Deep generative models for audio synthesis have recently been significan...
research
09/05/2021

Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks

This research project investigates the application of deep learning to t...
research
04/05/2017

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

Generative models in vision have seen rapid progress due to algorithmic ...
research
10/26/2022

Full-band General Audio Synthesis with Score-based Diffusion

Recent works have shown the capability of deep generative models to tack...
research
06/30/2021

A Generative Model for Raw Audio Using Transformer Architectures

This paper proposes a novel way of doing audio synthesis at the waveform...
research
04/14/2022

Streamable Neural Audio Synthesis With Non-Causal Convolutions

Deep learning models are mostly used in an offline inference fashion. Ho...
research
06/16/2020

Generative Modelling for Controllable Audio Synthesis of Expressive Piano Performance

We present a controllable neural audio synthesizer based on Gaussian Mix...

Please sign up or login with your details

Forgot password? Click here to reset