Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

04/05/2017
by   Jesse Engel, et al.
0

Generative models in vision have seen rapid progress due to algorithmic improvements and the availability of high-quality image datasets. In this paper, we offer contributions in both these areas to enable similar progress in audio modeling. First, we detail a powerful new WaveNet-style autoencoder model that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets. Using NSynth, we demonstrate improved qualitative and quantitative performance of the WaveNet autoencoder over a well-tuned spectral autoencoder baseline. Finally, we show that the model learns a manifold of embeddings that allows for morphing between instruments, meaningfully interpolating in timbre to create new types of sounds that are realistic and expressive.

READ FULL TEXT

page 5

page 6

page 8

page 9

page 10

page 14

page 15

research
12/17/2021

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Musical expression requires control of both what notes are played, and h...
research
11/09/2021

RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

Deep generative models applied to audio have improved by a large margin ...
research
10/23/2018

SING: Symbol-to-Instrument Neural Generator

Recent progress in deep learning for audio synthesis opens the way to mo...
research
09/04/2021

Network Modulation Synthesis: New Algorithms for Generating Musical Audio Using Autoencoder Networks

A new framework is presented for generating musical audio using autoenco...
research
08/19/2020

HpRNet : Incorporating Residual Noise Modeling for Violin in a Variational Parametric Synthesizer

Generative Models for Audio Synthesis have been gaining momentum in the ...
research
04/12/2019

Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders

Generative models have thrived in computer vision, enabling unprecedente...
research
06/30/2022

R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

This paper introduces R-MelNet, a two-part autoregressive architecture w...

Please sign up or login with your details

Forgot password? Click here to reset