SING: Symbol-to-Instrument Neural Generator

10/23/2018
by   Alexandre Défossez, et al.
0

Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their successes, current state-of-the-art neural audio synthesizers such as WaveNet and SampleRNN suffer from prohibitive training and inference times because they are based on autoregressive models that generate audio samples one at a time at a rate of 16kHz. In this work, we study the more computationally efficient alternative of generating the waveform frame-by-frame with large strides. We present SING, a lightweight neural audio synthesizer for the original task of generating musical notes given desired instrument, pitch and velocity. Our model is trained end-to-end to generate notes from nearly 1000 instruments with a single decoder, thanks to a new loss function that minimizes the distances between the log spectrograms of the generated and target waveforms. On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.

READ FULL TEXT
research
10/27/2019

Transferring neural speech waveform synthesizers to musical instrument sounds generation

Recent neural waveform synthesizers such as WaveNet, WaveGlow, and the n...
research
06/11/2022

Multi-instrument Music Synthesis with Spectrogram Diffusion

An ideal music synthesizer should be both interactive and expressive, ge...
research
07/11/2021

Neural Waveshaping Synthesis

We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, ful...
research
04/05/2017

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

Generative models in vision have seen rapid progress due to algorithmic ...
research
08/06/2020

Unsupervised Cross-Domain Singing Voice Conversion

We present a wav-to-wav generative model for the task of singing voice c...
research
06/28/2018

GenerationMania: Learning to Semantically Choreograph

Beatmania is a rhythm action game where players play the role of a DJ th...
research
09/16/2023

SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription

Guitar tablature is a form of music notation widely used among guitarist...

Please sign up or login with your details

Forgot password? Click here to reset