DDSP: Differentiable Digital Signal Processing

01/14/2020
by   Jesse Engel, et al.
11

Most generative models of audio directly generate samples in one of two domains: time or frequency. While sufficient to express any signal, these representations are inefficient, as they do not utilize existing knowledge of how sound is generated and perceived. A third approach (vocoders/synthesizers) successfully incorporates strong domain knowledge of signal processing and perception, but has been less actively researched due to limited expressivity and difficulty integrating with modern auto-differentiation-based machine learning methods. In this paper, we introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods. Focusing on audio synthesis, we achieve high-fidelity generation without the need for large autoregressive models or adversarial losses, demonstrating that DDSP enables utilizing strong inductive biases without losing the expressive power of neural networks. Further, we show that combining interpretable modules permits manipulation of each separate model component, with applications such as independent control of pitch and loudness, realistic extrapolation to pitches not seen during training, blind dereverberation of room acoustics, transfer of extracted room acoustics to new environments, and transformation of timbre between disparate sources. In short, DDSP enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning. The library is publicly available at https://github.com/magenta/ddsp and we welcome further contributions from the community and domain experts.

READ FULL TEXT

page 9

page 13

page 14

page 16

research
03/12/2021

Real-time Timbre Transfer and Sound Synthesis using DDSP

Neural audio synthesis is an actively researched topic, having yielded a...
research
08/29/2023

A Review of Differentiable Digital Signal Processing for Music Speech Synthesis

The term "differentiable digital signal processing" describes a family o...
research
08/06/2020

HooliGAN: Robust, High Quality Neural Vocoding

Recent developments in generative models have shown that deep learning c...
research
01/07/2022

A sinusoidal signal reconstruction method for the inversion of the mel-spectrogram

The synthesis of sound via deep learning methods has recently received m...
research
03/12/2021

Latent Space Explorations of Singing Voice Synthesis using DDSP

Machine learning based singing voice models require large datasets and l...
research
08/09/2022

DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation

A vocoder is a conditional audio generation model that converts acoustic...
research
06/19/2023

Vocal Timbre Effects with Differentiable Digital Signal Processing

We explore two approaches to creatively altering vocal timbre using Diff...

Please sign up or login with your details

Forgot password? Click here to reset